# Assignment: Bias and Fairness Assessment of Models 

## Anoosha Valliani

### Deadline: 03/31/2023 

### TOTAL POINTS : 10 

The goal of this assignment is to explore the concept of bias through querying an existing natural language processing model — specifically, the Perspective API released by Google Jigsaw. For this assignment, you will need to examine a dataset of internet comments and their scores, in addition to formulating your own queries and using the Perspective API client to score the toxicity of each comment.

Content warning: the Perspective API is designed for use in content moderation, and the dataset contained very “toxic” comments. Some of the comments in this dataset are racist, sexist, ableist, and offensive. Some comments contain profane language. If you are uncomfortable with exposure to these comments in the manner of this assignment, please let me know by 03/24/2023 and we will work on finding an alternative approach.

Documentation for the Perspective API can be found hereLinks to an external site..

 

## Step 1: Set up a Perspective API key

In order to access the Perspective API, you will need to register. First, create a Google Cloud account at console.cloud.google.comLinks to an external site. (we do not need to use any billing services for this project). Then, follow the instructions hereLinks to an external site.. Once you receive a confirmation email, you will be able to find a unique API key in the Credentials tab of your Google Cloud console.

 

## Step 2: Explore the sample dataset to form hypotheses

Download the CSV file from Canvas (under Canvas->Files->Data_Bias_Assignment->Sample_labaled_data.csvand open them in your preferred app (e.g., Excel, Jupyter etc). Examine the labels assigned by manual reviewers. You may wish to do some data parsing to find out things like the most common words used in comments where toxicity was present. Visual inspection of the comments may spark some ideas for how to test the API for potential biases. 

 

## Step 3: Form hypotheses, Design and perform tests [4 MARKS]

Decide how you would like to test the Perspective model for bias. Document your methods, all queries that you make to the API, and all scores received in a Jupyter notebook. 

There are plenty of different things you could examine in the data exploration step, but one thing that you will most likely want to do is determine a threshold for the model, or the point x at which scores above x are considered toxic or abusive. Take a few examples from the sample data provided and determine the threshold. Setting a good threshold is important for the later steps. 
For the testing step, you should develop a hypothesis about Perspective’s performance based on your understanding of how the API is trained and used. Whether or not your hypothesis is correct will have no bearing on your grade. Example hypotheses could be that Perspective will make more mistakes on shorter or more informal pieces of content (like tweets), that Perspective will be less likely to mark anti-male content as toxic when compared to anti-female content, or that Perspective will fail if we replace the most common swear words with less common obscenities. You will then develop your own (small) test set of N example comments, document the model scores, and assess whether or not your hypothesis was correct based on your sample. These examples may be freshly written by you or downloaded from the internet or even copied from the sample data provided.
Your tests do not need to be extensive or exhaustive; an N of low double digits is fine. However, we know that the larger the number of examples, the more accurate we are in our estimates of Perspective’s performance. How does a low sample size impact your results, and the conclusions we can draw from them?
 

## Step 4: Write about your results [4 MARKS]

Write a few paragraphs, either in the README or in the notebook, reflecting on what you have learned, what you found, what (if anything) surprised you about your findings, and/or what theories you have about why any biases might exist, if you find they exist. You can also include any questions this assignment raised for you about bias or machine learning. Questions you may wish to answer include:

What biases do you think might exist in the model based on intuitions or public documentation about how the model was created?
What were your results?
What theories do you have about why your results are what they are?

## Step 5: Publication and submission [2 MARKS] 

Create a GitHub repository and upload your code. Add a README and a LICENSE for the repo. The README should include all of your documentation about your analysis (what you are testing, your hypotheses, and your results). The LICENSE should be an MIT LicenseLinks to an external site. for your code.

On Canvas, submit a link to your GitHub repo, which should contain your code, your test example comments submitted to the API along with their scores (in csv preferably), a README and a LICENSE.



## Hypothesis

The Perspective API will accuse comments that are not toxic of being toxic.

In [23]:
import pandas as pd
import requests
from collections import Counter
import re

data = pd.read_csv("Sample_labaled_data.csv")
data = pd.DataFrame(data)
print(data)

toxic_labels = data["toxic"].value_counts()
print(toxic_labels)

toxic_comments = data[data["toxic"] == "yes"]["comment_text"].tolist()
words = []
for comment in toxic_comments:
    words += re.findall(r'\b\w+\b', comment.lower())
word_counts = Counter(words)
print(word_counts.most_common(1000))

       Unnamed: 0                id  \
0               5  0001ea8717f6de06   
1               7  000247e83dcc1211   
2              11  0002f87b16116a7f   
3              13  0003e1cccfd5a40a   
4              14  00059ace3e3e9a53   
...           ...               ...   
55247      153147  fff83b80284d8440   
55248      153149  fff8f521a7dbcd47   
55249      153150  fff8f64043129fa2   
55250      153151  fff9d70fe0722906   
55251      153154  fffa8a11c4378854   

                                            comment_text toxic  
0      Thank you for understanding I think very highl...    no  
1                         Dear god this site is horrible    no  
2       Somebody will invariably try to add Religion ...    no  
3        It says it right there that it IS a type The...    no  
4         Before adding a new product to the list mak...    no  
...                                                  ...   ...  
55247  Consensus for ruining Wikipedia I think that c...    no  
55248  shut

I am going to test the API with comments that were deemed toxic.

In [24]:
API_KEY = 'AIzaSyDwOEmbZuVzQmampYmHqa2ZVs95vGegjmM'
url = 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=' + API_KEY
headers = {'Content-Type': 'application/json'}

comments = [
    'Hey shithead stop vandilizing articles',
    'Being drunk made Gibson say what he really believes The little freak is a Nazi like his father and I hope he dies of lung cancer Soon',
    'where I feel like a man instead of an idiot',
    'no hes not he have his own show whihc is a cheap copy of ragu romeo show and hes only popular because of stupid people who auditions and watch this gutter show'
]

for comment in comments:
    data = {
        'comment': {'text': comment},
        'languages': ['en'],
        'requestedAttributes': {'TOXICITY': {}},
        'doNotStore': True
    }

response = requests.post(url, headers=headers, json=data)
response_dict = response.json()
toxicity_score = response_dict['attributeScores']['TOXICITY']['summaryScore']['value']
print('Toxicity score:', toxicity_score)

threshold = 0.5
if toxicity_score > threshold:
    print('The comment is toxic.')
else:
    print('The comment is not toxic.')

Toxicity score: 0.7105462
The comment is toxic.


Now, I am going to test comments with the API that are non-toxic but can be deemed toxic.

In [25]:
API_KEY = 'AIzaSyDwOEmbZuVzQmampYmHqa2ZVs95vGegjmM'
url = 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=' + API_KEY
headers = {'Content-Type': 'application/json'}

comments = [
    '  Ive removed the frivolous criticism section  ',
    '  Think youre missing the point  David and Goliath as is commonly used in the English language refers to the Biblical legend  The clothing store should be a disambiguation link in the main article   ',
    '   Thank you   James thank you for reverting that horrible vandalism from my user page   ',
    '  I dont think he belongs in this category  of course he believed that the world was created by God   that much is obvious But so does an overwhelming majority of Orthodox Jews  that doesnt mean that every Orthodox Jew who has an article should be in this category I think that only people who are publicly and specifically outspoken about the issue of creation vs evolution I dont think the Rebbe was known for that  '
]

for comment in comments:
    data = {
        'comment': {'text': comment},
        'languages': ['en'],
        'requestedAttributes': {'TOXICITY': {}},
        'doNotStore': True
    }

response = requests.post(url, headers=headers, json=data)
response_dict = response.json()
toxicity_score = response_dict['attributeScores']['TOXICITY']['summaryScore']['value']
print('Toxicity score:', toxicity_score)

threshold = 0.5
if toxicity_score > threshold:
    print('The comment is toxic.')
else:
    print('The comment is not toxic.')

Toxicity score: 0.14950264
The comment is not toxic.
