# Generating Synthetic User Bias Test Data


#### This notebook demonstrates how to generate synthetic test data to simulate the evaluation of bias for protected classes.  This is optional as may have real scoring data to be evaluated.

### Prerequisites:

#### To use this notebook, you'll need a `Client ID` and `Client Secret`. These will either have been provided to you, or you can generate from https://app.fairnow.ai and going the the Admin menu. This notebook assumes you have these available to enter when prompted.

In [None]:
import json
import datetime
import random
import csv


from getpass import getpass
import httpx
from httpx_auth import OAuth2ClientCredentials

client_id = "{client_id}" # Replace with your Client Id
client_secret = getpass("Client Secret")
fairnow_token_endpoint = "https://auth.fairnow.ai/oauth2/token"

auth = OAuth2ClientCredentials(
    token_url=fairnow_token_endpoint,
    client_id=client_id,
    client_secret=client_secret,
)

fairnow_base_url = "https://api.fairnow.ai/v2"

client = httpx.Client(base_url=fairnow_base_url, auth=auth)

### Data Requirements

#### The CSV data files require the first row to contain the column names.

#### The following columns are required:
* `TimeStamp` (ISO8601 Timestamp, e.g `2023-12-14T16:26:05.898156Z`)
* `Score` (a number between 0 and 1)
*  `Each of the  Protected Class Columns` (see below.)

#### Optionally, you can add up to 3 additional columns to use for grouping and filtering the bias results after testing.

#### Lookup the Protected Class Column Names

#### The following code can be used the Fairnow API to lookup the column names that correspond to protected classes used to evaluate bias.  If you need a different set of column names for your test please contact Fairnow.


In [None]:
reference_columns_route = "/reference/protected-class-columns"

response = client.get(reference_columns_route)

if response.status_code == 200:
    print(json.dumps(response.json(), indent=4))
else:
    print(f"Error: {response.status_code} - {response.text}")

#### Now pull the column names out and generate some random data.  In addition to the required columns `Timestamp` and `Score` and the protected class columns the example contains two columns that can be used to filter the results.  

In [None]:
table_data = response.json()
protected_class_columns = [ column["id"] for column in table_data["table_data"]]
print(f"Protected class column names: {protected_class_columns}")

number_of_protected_class_values = 10
number_of_rows = 1000

required_columns = [ "Timestamp", "Score" ]
filter_columns = [ "Color", "Size"]
all_columns = required_columns + protected_class_columns + filter_columns

colors = [ "Blue", "Red", "Yellow", "Green", "Orange", "Purple" ]
sizes = [ "X-Small", "Small", "Medium", "Large", "X-Large" ]

data = [ all_columns ]
for x in range(number_of_rows):
    row = [
        datetime.datetime.now(datetime.timezone.utc).isoformat().replace("+00:00", "Z"),  # ISO Timestamp
        random.random()                                                                   # Score
    ]
    for column in protected_class_columns:
        row.append(f"{column}-{random.randrange(number_of_protected_class_values)}")
    
    row.append(random.choice(colors))   
    row.append(random.choice(sizes))
    
    data.append(row)


#### Now create the CSV file in the local directory.

In [None]:
with open('scores.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(data)


#### Check that the file exists and contains the randomly generated data.


In [None]:
!cat scores.csv

#### You can now use this file for testing bias - see the `User Data Testing` notebook for instructions.