<a href="https://colab.research.google.com/github/Lighthouse-Reports/amsterdam_fairness/blob/main/model_analysis_pseudocode.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Title: Pseudocode for Model Analysis

### Authors: Tahmeed Shafiq (tahmeed@lighthousereports.com)

### Last updated: 02/10/24

Pseudocode and planning for how to white box the model.

**Setup**:<br>
Import models as needed.<br>
Set `warnings.filterwarnings("ignore", category=FutureWarning)` to suppress dependency warnings.

**Unpickle**:<br>
Unpickle model.<br>
Extract all keys and parameters in case we need them later.

**Build pipeline**:<br>
Extract pipeline from `prep` key.<br>
Extract feature names and final class labels.<br>

**Features**:<br>
The feature descriptions are in [this spreadsheet](https://docs.google.com/spreadsheets/d/1llIYdRJwTqUrx4tISmDKI68q1Xb-DifF/edit?gid=91782168#gid=91782168).<br>
Here's a translated version. Where no range is given, we estimate.

| Feature                                | Description                                                                                                                                                                                                                  | Given range | Estimated range | Note                                                 |
| -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | --------------- | ---------------------------------------------------- |
| deelnames_started_percentage_last_year | Of all the applicant's participations (such as pathways and other instruments) in the year prior to this application, what percentage did he/she start?                                                                      |             | 0-100           | Should check if percentage is between 0-100 or 0-1.0 |
| at_least_one_address_in_amsterdam      | Number of active addresses of the requester. Can be several, e.g. residential address and shipping address.                                                                                                                  |             | 0-3             | Is 0-3 reasonable?                                   |
| active_address_count                   | Number of active addresses of the requester. Can be several, e.g. residential address and shipping address                                                                                                                   |             | 0-3             | Is 0-3 reasonable?                                   |
| days_since_last_relocation             | Number of days since the applicant last moved house (based on BRP data provided). If there is no known address or only a mailing address, this feature will be populated as if the requester last moved a long time ago.<br> |             | 0-3650          | How long is "a long time ago"?                       |
| days_since_last_dienst_end             | Number of days since the applicant's last shift ended. There is a maximum of 365 days of review.                                                                                                                             | 0-365       |                 |                                                      |
| has_medebewoner                        | Does the applicant have at least 1 co-occupant? (yes/no)                                                                                                                                                                     | 0-1         |                 |                                                      |
| avg_percentage_maatregel               | This feature indicates the average discount percentage of all measures in the year prior to application. This gives an indication of how serious the offences were.                                                          |             | 0-100           | Check percentage range                               |
| total_vermogen                         | "Sum of the applicant's assets (from Socrates Assets). If unknown: a power of 0 is used.<br>                                                                                                                                 |             |                 | Use income threshold to estimate                     |
| afspraken_no_show_count_last_year      | Number of appointments with the applicant in the year preceding this application at which the applicant did not appear                                                                                                       | NA          | 0-15            | More than one a month                                |
| has_partner                            | Does the applicant have a partner? (yes/no)                                                                                                                                                                                  | 0-1         |                 |                                                      |
| sum_inkomen_bruto_was_mean_imputed     | The applicant's gross income is unknown; The average gross income of all applications from the dataset (yes/no) is used                                                                                                      |             |                 |                                                      |
| applied_for_same_product_last_year     | "Has the applicant already applied for the same product in the year prior to the application that he/she is now applying for? (yes/no)<br>                                                                                   | 0-1         |                 |                                                      |
| received_same_product_last_year        | Has the applicant already received the same product in the year prior to the application that he/she is currently applying for? (yes/no)                                                                                     | 0-1         |                 |                                                      |
| afspraken_no_contact_count_last_year   | "Number of appointments with the applicant in the year prior to this application where no contact could be made or the applicant did not respond                                                                             |             | 0-15            | More than one a month                                |
| sum_inkomen_bruto_value                | Sum of the gross amounts of all the applicant's incomes (from Socrates Income)                                                                                                                                               |             |                 | Use income threshold to estimate                     |

**Synthetic data**:<br>
Generate synthetic data using Justin's notebook from Rotterdam repo. What adjustments are needed?

**Run model**:<br>
Run model using `predict_proba`.