## Running FairNow's Synthetic Fairness Simulation

#### FairNow's Synthetic Fairness Simulation is a way to evaluate a model for bias without using real data. The simulation works by taking synthetically generated candidate resumes, then creates variants of each resume that belong to a different demographic group. We can evaluate bias by looking at the difference in scores for resumes from each demographic group.

In [None]:
import os
import requests
import json
import zipfile
from time import sleep

### Prerequisites:

#### To use this notebook, you'll need a `client_id` and `client_secret`. These will either have been provided to you, or you can generate from https://app.fairnow.ai and going the the Admin menu. This notebook assumes you have these stored as environment variables:

* FAIRNOW_CLIENT_ID
* FAIRNOW_CLIENT_SECRET

#### To run this you will need a `model_id` and `version` for the specific model you want to test. Details of how to create and lookup models can be found here: https://github.com/FairNow/API-Guides/blob/main/notebooks/Models%20API.ipynb

#### Finally, you'll need a `bucket` value for generating the synthetic data, which will be provided to you or configurable within the app.

In [None]:
# Get the client secret and Id needed for OAuth2.0:
client_id = os.getenv("FAIRNOW_CLIENT_ID")
client_secret = os.getenv("FAIRNOW_CLIENT_SECRET")

model_id = "{model_id}" # Replace with the correct modelId
version = "{version}" # Replace with the correct version

bucket = "{bucket}" # Replace with the correct bucket

#### First, let's get an access token:

In [None]:
access_token = None

# Call the Auth endpoint to request a token:
fairnow_token_endpoint = "https://auth.fairnow.ai/oauth2/token"
scope = "https://auth.fairnow.ai/FULL_ACCESS"

token_request_data = {
    'grant_type': 'client_credentials',
    'client_id': client_id,
    'client_secret': client_secret,
    'scope': scope
}

try:
    response = requests.post(fairnow_token_endpoint, data=token_request_data)
    if response.status_code == 200:
        access_token = response.json().get('access_token')
        print('Successfully created token')
    else:
        print(f'Error: {response.status_code} - {response.text}')
        print(response)


except Exception as e:
    print(f'Request failed: {e}')

#### Set up headers and endpoints that we will be using:

In [None]:
headers = {"Authorization": f"Bearer {access_token}", "Accept": "application/json"}

fairnow_api = "https://api.fairnow.ai/v1"
url = f"{fairnow_api}/syntheticData"
post_scores_url = f"{fairnow_api}/syntheticData/scores"

#### To start the process, hit POST:/v1/syntheticData. This returns the taskId used to track this job. In the background, synthetic data is being generated.

#### The API has six arguments:
  - `dataType` refers to the type of analysis job, which will be 'resume' in your case. 
  - `subType` refers to the specific bucket you want to test]. Valid values for subType vary by customer can be found in the app. 
  - `n_resumes_per_template` refers to the number of resumes to create from each template. This value must be passed as a string
  - `modelId` and `version` refer to your model. Theses can be accessed through the GET models API in the Models API notebook in this repo
  - `threshold` refers to the model score used to determine pass/fail.  

In [None]:
payload = {
    "dataType": "resume",
    "subType": bucket,
    "nResumesPerTemplate": "5",
    "modelId": model_id,
    "version": version 
    "threshold": 0.5 
}

response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=4))

task_id = response.json()["taskId"]

#### The synthetic data generation task runs in the background and can take a few minutes. Use the code below to query the API to know when the synthetic data is ready to download

In [None]:
get_task_url = f"{fairnow_api}/syntheticData/tasks/{task_id}"

response = requests.get(get_task_url, headers=headers)
current_status = response.json()['task']['status']

while current_status == 'CREATING_DATA':
    print('Polling task table to learn status every 15 seconds')
    response = requests.get(get_task_url, headers=headers)
    current_status = response.json()['task']['status']
    print(f'Task status: `{current_status}`')
    print()
    sleep(15)

download_scores_presigned_url = response.json()["task"]["presignedUrlSyntheticData"]
    
print(f'Synthetic data has been created.')

#### Once the synthetic data is ready, the API will return a link for you to download the resumes as a zip file. The code below downloads and unzips the files.

In [None]:
zip_file_path = 'temp_resumes.zip'
extract_dir = 'temp_resumes'
response = requests.get(download_scores_presigned_url)
if response.status_code == 200:
    with open(zip_file_path, 'wb') as file:
        file.write(response.content)

    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(extract_dir)
    print(f"File extracted to {extract_dir}")
else:
    print("Failed to download the file.")

#### This is where you'll score each of the resumes and your model and collect the model scores. To do the analysis, you'll need to write each of the model scores to a csv file with the format {file_name},{model_score}.

#### Here's an example of the csv format:

```
resume_template_1_Female_Asian_1.txt,0.4651875127622357
resume_template_1_Female_Asian_2.txt,0.9039321758324333
...
resume_template_36_Male_White_5.txt,0.54982361689192743
```

**Note:** The csv file should have no header. It must contain a record for every file found in the downloaded zip file. And `model_score` must be a number value between 0 and 1.

#### Once the file is ready, hit the API below to receive a link that you'll use to upload the scores.

In [None]:
post_scores_url = f"{fairnow_api}/syntheticData/scores/{task_id}"

response = requests.post(post_scores_url, headers=headers)

presigned_url = response.json()["task"]["uploadURL"]
key = response.json()["task"]["key"]
fields = response.json()["task"]["fields"]

#### Use the link to upload scores. Once the scores are uploaded, this triggers the analysis job. This runs in the background again and can take a few minutes.

In [None]:
scores_path = 'scores.csv'
files = {'file': (key, open(scores_path, 'r'))}

response = requests.post(presigned_url, data=fields, files=files)

In [None]:
print(response)

#### We'll query the API again to know when the analysis has been finished

In [None]:
get_task_url = f"{fairnow_api}/syntheticData/tasks/{task_id}"

response = requests.get(get_task_url, headers=headers)
current_status = response.json()['task']['status']

while current_status != 'READY':
    print('Polling task table to learn status every 15 seconds')
    response = requests.get(get_task_url, headers=headers)
    current_status = response.json()['task']['status']
    print(f'Task status: `{current_status}`')
    print()
    sleep(15)

print(f'Analysis results ready to download.')

analysis_download_presigned_url = response.json()["task"]["presignedUrlAnalysisResults"]

#### Once the analysis task is ready, it returns a presigned link you can use to download the analysis results. The output is a csv with the average model score by race and gender.

In [None]:
response = requests.get(analysis_download_presigned_url)

with open('results.csv', 'wb') as file:
    file.write(response.content)

In [None]:
!cat results.csv