# **Hypothesis**: Perspective will be less likely to identify harmful female tweets as toxic than harmful male content.

Here is the API predictor of whether the text is Toxic or Not. Anything the test brings back that is above 0.5 is deemed toxic and anything below 0.5 is non-toxic.

In [1]:
from googleapiclient import discovery
import json

API_KEY = 'AIzaSyBn1oAtsBPqvayf5gdUOZg7qkzJjFfeo8E'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text':'Those guys are very passionate sports fans'
 },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(response)

{'attributeScores': {'TOXICITY': {'spanScores': [{'begin': 0, 'end': 42, 'score': {'value': 0.02968135, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.02968135, 'type': 'PROBABILITY'}}}, 'languages': ['en'], 'detectedLanguages': ['en']}


Import data set that contains the entered phrases into the API. The data set shows the level that the API computed but also contains the actual classification of the phrase.

In [3]:
import numpy as np
import pandas as pd

test_df = pd.read_csv("Data Bias Project test - Sheet1.csv")
test_df.head()

Unnamed: 0,Sex,Phrase,Level,Actual Toxic/Not
0,female,"""There is a reason the majority of success in ...",0.2,TOXIC
1,female,"“She was asking for it, she's such a slut”",0.8,TOXIC
2,female,Feminism is just a bunch of angry women who ca...,0.51,TOXIC
3,female,“She was ugly and fat anyway”,0.7,TOXIC
4,female,"I told her to just go have kids, men are smarter",0.27,TOXIC


In [4]:
y_actual = [1 if y == 'TOXIC' else 0 for y in test_df['Actual Toxic/Not']]

In [5]:
y_predicted = [1 if y > 0.5 else 0 for y in test_df['Level']]

Computed the accuracy of the API classifier

In [6]:
import pickle
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_predicted,y_actual)

print (f"Accuracy of the classifier = {accuracy}")

Accuracy of the classifier = 0.84375


We can see the classifer accuracy is about 84% which is not terrible, however the model may identify phrases incorrectly about 20% of the time.

Computed the model fairness to see if there were equal numbers of observations for each sex.

In [7]:

gender_column = test_df["Sex"]

male_indices = []
female_indices = []

for i in range(len(gender_column)):
    if gender_column[i] == "male":
        male_indices.append(i)
    else:
        female_indices.append(i)

y_actual_male = [y_actual[i] for i in male_indices]
y_predicted_male = [y_predicted[i] for i in male_indices]

y_actual_female = [y_actual[i] for i in female_indices]
y_predicted_female = [y_predicted[i] for i in female_indices]

print (len(male_indices))
print (len(female_indices))

16
16


In [8]:
def class_wise_acc(y_actual, y_predicted):
    total_p = 0
    total_n = 0
    TP=0
    TN=0
    for i in range(len(y_predicted)):
        if y_actual[i]==1:
            total_p = total_p+1
            if y_actual[i]==y_predicted[i]:
               TP=TP+1
        if y_actual[i]==0:
            total_n=total_n+1
            if y_actual[i]==y_predicted[i]:
               TN=TN+1
    return(TP/total_p, TN/total_n)

class_1_acc_male, class_0_acc_male = class_wise_acc(y_actual_male, y_predicted_male)
class_1_acc_female, class_0_acc_female = class_wise_acc(y_actual_female, y_predicted_female)

print (f"Class 1 (i.e., TOXIC) accuracy for Male = {class_1_acc_male}")
print (f"Class 0 (i.e., NON TOXIC) accuracy for Male = {class_0_acc_male}")
print (f"Class 1 (i.e., TOXIC) accuracy for Female = {class_1_acc_female}")
print (f"Class 0 (i.e., NON TOXIC) accuracy for Female = {class_0_acc_female}")

Class 1 (i.e., TOXIC) accuracy for Male = 0.625
Class 0 (i.e., NON TOXIC) accuracy for Male = 1.0
Class 1 (i.e., TOXIC) accuracy for Female = 0.75
Class 0 (i.e., NON TOXIC) accuracy for Female = 1.0


**Conclusion:**

From the data above, I was surprised. My hypothesis that Perspective will be less likely to mark harmful female tweets as toxic than harmful male ones seems to be incorrect. Looking at the Class 1 data, which is the identifier of the "Toxic" tweets, the accuracy for males is 62%, while for females it was 75%. Therefore, it seems that the API is better at identifying toxic language regarding females than males. The class 0 accuracy for both females and males was 100% because the Perspective correctly identified all of the non-toxic tweets for both genders.


The possible biases that might cause the difference in accuracy between females and males might be because of societal differences between the two groups. Phrases that are identified as toxic toward women may just be identified as a joke towards men. The idea of toxic masculinity may lead to these biases because men may be "less sensitive" to phrases, and therefore, they are identified as non-toxic even though they are harmful.

From how the model was created, these biases, as I explained above, could have been created. In training the model, three raters took a phrase and rated the phrase on a toxic scale. The model then learned based on the phrases what is toxic and what might not be. Therefore, possible gender biases within the raters may translate to biases in the model with the phrases above. Although the raters tried to remain unbiased, there are always unconscious biases within humans that might affect the training. How can these types of models be fixed so these societal biases do not interfere with the correct identification of data points?