**Data Bias Coding Project**

For this project, I chose to analyze how accurately Perspective API detects the toxicity of gender stereotypes. I compared the toxicity scores of stereotypes against males and stereotypes against females to see if Perspective API would detect both of these categories as equally toxic. In my dataset, there are 8 statements that are toxic towards males, 8 statements that are toxic towards females and also 8 statements that are non-toxic towards males and 8 statements that are non-toxic towards females for comparison.

Hypothesis: Pespective API would detect gender stereotypes against females as more toxic than stereotypes against males.

Below is the block of code that I used to get the toxicity score for my statements. I simply reused the same code block by inputting all of my statements in it one by one to attain the toxicity score.

In [72]:
from googleapiclient import discovery
import json

API_KEY = 'AIzaSyD_qG9Q_DbpNbNHWgTuqgPhj6-2Plo53D0'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'The only job a woman should have is to have kids and take care of her house.' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(response)

{'attributeScores': {'TOXICITY': {'spanScores': [{'begin': 0, 'end': 76, 'score': {'value': 0.32128486, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.32128486, 'type': 'PROBABILITY'}}}, 'languages': ['en'], 'detectedLanguages': ['en']}


After getting the toxicity score, I created a table in excel with the headings "Sensitive Attribute", "Text", "Expected Result", and "Perspective API Score". Here, "Expected Result" refers to whether the statements were actually toxic or not. "Perspective API Score" refers to the score produced by the API when I inputted it into the code block above.

Below, you can see that I created a function called create_label which would input values "Toxic" or "Non-toxic" into the column "API Predicted Label" based on whether the Perspective API Score was above or below 0.5. For the purpose of this project, all of the Perspective API Scores that are below 0.5 are considered non-toxic and those above 0.5 are considered toxic.

In [78]:
import numpy as np
import pandas as pd

test_df = pd.read_csv("Stereotypes3 - Sheet1 (2).csv")


def create_label(score):
  if score>0.5:
    return "Toxic"
  else:
    return "Non-toxic"


test_df["API Predicted Label"]= test_df["Perspective API Score"].apply(create_label)
test_df.head()


Unnamed: 0,Sensitive Attribute,Text,Expected Label,Perspective API Score,API Predicted Label
0,Male,\nI hate the term “toxic masculinity”. Men hav...,Toxic,0.34,Non-toxic
1,Male,Real men don't cry; they tough it out.,Toxic,0.08,Non-toxic
2,Male,All men are obsessed with sports and care abou...,Toxic,0.18,Non-toxic
3,Male,Men are solely driven by their sexual desires ...,Toxic,0.45,Non-toxic
4,Male,Men are aggressive and always ready to resort ...,Toxic,0.35,Non-toxic


In [81]:
test_df.to_csv("Gender_Stereotypes.csv")

In [74]:
import pickle
from sklearn.metrics import accuracy_score


y_actual = [1 if y=="Toxic" else 0 for y in test_df['Expected Label']]
y_predicted = [1 if y=="Toxic" else 0 for y in test_df['API Predicted Label']]

accuracy = accuracy_score(y_predicted, y_actual)


print (f"Accuracy of the classifier = {accuracy}")

Accuracy of the classifier = 0.59375


In [75]:
print(test_df.columns)


Index(['Sensitive Attribute ', 'Text', 'Expected Label',
       'Perspective API Score', 'API Predicted Label'],
      dtype='object')


In the below blocks of code, I am comparing the expected results with the API predited labels. This would allow me to see how accurate perspective API was in detecting gender stereotypes for both males and females.

In [76]:


gender_column = test_df["Sensitive Attribute "]
male_indices = []
female_indices = []


for i in range(len(gender_column)):
    if gender_column[i] == "Male":
        male_indices.append(i)
    elif gender_column[i] == "Female":
        female_indices.append(i)


y_actual_male = [y_actual[i] for i in male_indices]
y_predicted_male = [y_predicted[i] for i in male_indices]

y_actual_female = [y_actual[i] for i in female_indices]
y_predicted_female = [y_predicted[i] for i in female_indices]

print (len(male_indices))
print (len(female_indices))

16
16


In [77]:
def class_wise_acc(y_actual, y_predicted):
    total_p = 0
    total_n = 0
    TP=0
    TN=0
    for i in range(len(y_predicted)):
        if y_actual[i]==1:
            total_p = total_p+1
            if y_actual[i]==y_predicted[i]:
               TP=TP+1
        if y_actual[i]==0:
            total_n=total_n+1
            if y_actual[i]==y_predicted[i]:
               TN=TN+1
    return(TP/total_p, TN/total_n)

class_1_acc_male, class_0_acc_male = class_wise_acc(y_actual_male, y_predicted_male)
class_1_acc_female, class_0_acc_female = class_wise_acc(y_actual_female, y_predicted_female)

print (f"Stereotypes category 1 Toxic Comments (Male Class) = {class_1_acc_male}")
print (f"Stereotypes category 2 Non Toxic Comments (Male Class) = {class_0_acc_male}")
print (f"Stereotypes category 3 Toxic Comments (Female Class) = {class_1_acc_female}")
print (f"Stereotypes category 4 Non Toxic Comments (Female Class) = {class_0_acc_female}")

Stereotypes category 1 Toxic Comments (Male Class) = 0.0
Stereotypes category 2 Non Toxic Comments (Male Class) = 1.0
Stereotypes category 3 Toxic Comments (Female Class) = 0.375
Stereotypes category 4 Non Toxic Comments (Female Class) = 1.0


**Insights**

From the above results, we can see that the API did not detect any of the toxic comments against males. On the other hand, it detected 37.5% of the toxic comments directded towards females. For both males and females, it detected all the non-toxic comments.

These results prove my initial hypothesis to be correct. Perspective API does detect gender stereotypes against females as more toxic that those against males.

However, we should also consider that the API only detected 37.5% of toxic comments against females. When we look at the big picture, this is a very small percentage. The API still missed the remaining 62.5% of toxic comments against women. Overall, a more accurate conclusion to make is that Perspective API cannot accurately detect statements that contain gender stereotypes as toxic according to the parameters of toxicity that were used in this study.

From this project, I learned that Perpective API is not very accurate in detecting casual sexism through sentences, especially when directed towards males. This might be because, although these statements are problematic as they promote gender stereotypes, they are too casual to be detected as "toxic" by the API.

Future research should be done by lowering the parameters of the statements considered to be toxic from 0.5 to a lower number like 0.3. This would be a better indicator to compare the habits of the API in detecting stereotypes against females versus males. Another method that could be done is to provide a larger and more diverse dataset of statements that better represent gender stereotypes in everyday language. There might have been a limitation in the dataset because I was generating the statements out of memory of common stereotypes instead of extracting real statements said by people.

Overall, this project proved to be very insightful to under how Perspective API can be used. There is clearly a very important use for this API to detect toxicity in statements. The findings highlight the importance of leveraging advanced technologies, such as Perspective API, to enhance the assessment and management of potentially harmful content. Perspective API can be used to ensure that all online platforms provide a safe and healthy environment for a diverse group of people.