# Hypothesis

I am going to test the Perspective API's accuracy of asssessing toxicity of anti-LGBTQ comments vs. non anti-LGBTQ comments.

I hypothesize that Perspective API will be biased in that it overlooks anti-LGBTQ content as toxic more frequently than other content and classifies non-toxic LGBTQ comments as toxic more often.

# Code

In [24]:
from googleapiclient import discovery
import json
import pandas as pd

In [25]:
comment_data = pd.read_csv("/content/toxic_comment_df.csv")


In [None]:
comment_data = comment_data.dropna()
comment_data

In [None]:
comment_data.iloc[:, 0].info()

In [28]:
comments = (comment_data.iloc[:, 0]).to_list()
content_col = (comment_data.iloc[:, 1]).to_list()
class_col =(comment_data.iloc[:, 2]).to_list()


In [None]:

toxicity_df = []

for i in comments:
  #print(i)

  API_KEY = 'AIzaSyA3ZHICwRtP0hg61L_4yBJmTni5-nP522A'

  client = discovery.build(
    "commentanalyzer",
    "v1alpha1",
    developerKey=API_KEY,
    discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
    static_discovery=False,
  )

  analyze_request = {
    'comment': { 'text': i },
    'requestedAttributes': {'TOXICITY': {}}
  }

  response = client.comments().analyze(body=analyze_request).execute()
  #print(response)
  dict1 = response['attributeScores']
  dict2 = dict1['TOXICITY']
  dict3 = dict2['spanScores']
  dict4 = dict3[0]
  dict5 = dict4['score']
  tox_score = dict5['value']
  #print(tox_score)
  toxicity_df.append(tox_score)
  #print(response)

toxicity_df

In [30]:
toxic_class_api = []

for i in toxicity_df:
  if i >= 0.5:
    toxic_class_api.append("toxic")
  else:
    toxic_class_api.append("non-toxic")


In [31]:
list_df = [comments, content_col, toxic_class_api]
column_names = ['comments', 'content', 'score']
df = pd.DataFrame(list(zip(comments,content_col,class_col, toxicity_df,toxic_class_api)))
df.columns = ['comments', 'content', 'toxicity_class', 'score', 'api_class']
df

Unnamed: 0,comments,content,toxicity_class,score,api_class
0,You are gay or antisemmitian? \n\nArchangel WH...,LGBTQ,toxic,0.767545,toxic
1,"""""== new ==\n\n{{userbox \n TABTAB| id = \n T...",LGBTQ,toxic,0.519808,toxic
2,Homosexuals are intent on legitimizing their b...,LGBTQ,toxic,0.338998,non-toxic
3,"Fuck you, block me, you faggot pussy!",LGBTQ,toxic,0.981763,toxic
4,u r a tw@ fuck off u gay boy.U r smelly.Fuck u...,LGBTQ,toxic,0.974994,toxic
5,the LGBTQ association is meeting on Monday. Pl...,LGBTQ,non-toxic,0.038991,non-toxic
6,pride parade on june 13th!!!! Marching through...,LGBTQ,non-toxic,0.028267,non-toxic
7,gay representation in the new movie!! Yay ! Th...,LGBTQ,non-toxic,0.519808,toxic
8,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,neutral,toxic,0.916254,toxic
9,Stupid peace of shit stop deleting my stuff as...,neutral,toxic,0.981763,toxic


In [None]:
from sklearn.metrics import accuracy_score

y_actual = [1 if y == 'toxic' else 0 for y in df['toxicity_class']]
y_predicted = [1 if y == 'toxic' else 0 for y in df['api_class']]
accuracy = accuracy_score(y_predicted,y_actual)

print (f"Accuracy of the classifier = {accuracy}")

In [33]:
lgbtq_indices = []
nonlgbtq_indices = []

for i in range(len(content_col)):
    if content_col[i] == 'neutral ':
        nonlgbtq_indices.append(i)
    else:
        lgbtq_indices.append(i)

y_actual_lgbtq = [y_actual[i] for i in lgbtq_indices]
y_predicted_lgbtq = [y_predicted[i] for i in lgbtq_indices]

y_actual_nonlgbtq = [y_actual[i] for i in nonlgbtq_indices]
y_predicted_nonlgbtq = [y_predicted[i] for i in nonlgbtq_indices]

In [None]:
def class_wise_acc(y_actual, y_predicted):
    total_p = 0
    total_n = 0
    TP=0
    TN=0
    for i in range(len(y_predicted)):
        if y_actual[i]==1:
            total_p = total_p+1
            if y_actual[i]==y_predicted[i]:
               TP=TP+1
        if y_actual[i]==0:
            total_n=total_n+1
            if y_actual[i]==y_predicted[i]:
               TN=TN+1
    return(TP/total_p, TN/total_n)

class_1_acc_lgbtq, class_0_acc_lgbtq = class_wise_acc(y_actual_lgbtq, y_predicted_lgbtq)
class_1_acc_nonlgbtq, class_0_acc_nonlgbtq = class_wise_acc(y_actual_nonlgbtq, y_predicted_nonlgbtq)

print (f"Class 1 (toxic)  accuracy for LGBTQ content = {class_1_acc_lgbtq}")
print (f"Class 0 (non-toxic) accuracy for LGBTQ content = {class_0_acc_lgbtq}")
print (f"Class 1 (toxic) accuracy for NonLGBTQ content= {class_1_acc_nonlgbtq}")
print (f"Class 0 (non-toxic) accuracy for NonLGBTQ content= {class_0_acc_nonlgbtq}")

# Results and Insights

As we can see, the accuracy for LGBTQ content is much lower than for non LGBTQ content, for both toxic classifications (class 1) and for non-toxic classifications (class 0). The overall accuracy for the classifier is 0.8823529411764706, leading users to believe that it is fairly accurate on average. However, when looking at the type of content it classifies correctly or incorrectly, it has much lower accuracy rates for content mentioning the LGBTQ community versus ones that do not, with a perfect accuracy rate for comments that do not mention the LGBTQ community. I was not surprised at these findings, as often machine learning algorithms work less well for marginalized groups than they do for the general population.


Based on intuition, I feel that the Perspective API is likely biased in that it works better for majority groups (straight, cisgender, white, and male populations) than for marginalized groups. The Perspective API was trained off of internet comments gathered from a wide variety of sources, and uses human raters to rate the comments to give them labels to be fed to the algorithm. Because bias against marginalized communities is pervasive in American society, it is likely that bias is encoded into the labels given by human raters, with them being more likely to rate content related to the LGBTQ community that is not toxic as toxic, and to overlook toxic comments directed towards the LGBTQ community as non-toxic. While often crowdsourcing works well for producing reliable factual information, however in opinion based tasks such as rating the toxicity of a comment, crowdsourcing can produce biases reflected in the majority culture. The rating guidelines also instruct rates to err on the side of “Yes” or “i’m not sure” when rating, meaning that in situations concerning communities outside of their knowledge, which are most likely marginalized communities as they are minorities of the population, there will likely be more false positives, which could be contributing to the bias in the algorithm



From this assignment, I have many questions on the standards and regulations used in machine learning methods. Currently, there does not seem to be any standard to measure bias or performance against as these algorithms are being used as a product rather than in a research context where there are measurable standards that must be met. Due to the lack of standards, I am wondering how companies can be incentivized to decrease bias in their applications.



# References

Toxic comment classification challenge. Kaggle. (n.d.). https:/
   www.kaggle.com/competitions
   jigsaw-toxic-comment-classification-challenge/data