## Perspective API Exploration
First, we have a dataset of Wikipedia comments made available by Jigsaw, a subsidiary of Google that created the Perspective tool. The dataset includes a unique comment id, the text of the comment, and a series of binary labels applied by human raters: "toxic," "severe_toxic," "obscene," "threat," "insult," and "identity_hate." I have appended the "score" column, which represents the toxicity score assigned to the comment text by the live version of the Perspective API. The data is available under a CC0 license.

In [15]:
import pandas as pd
import time

df = pd.read_csv('labeled_and_scored_comments.csv')

In [16]:
df.sort_values(['score'])

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
1158,61a85266de70cca0,Можешь говорить по русски.,0.000004,0,0,0,0,0,0
5246,030a9ba0b32b4456,Rodzina Krusenstern Potomkowie Admirała Adama ...,0.000009,0,0,0,0,0,0
11040,a6814193ac6b4705,The opening paragraph states that the game is ...,0.000091,0,0,0,0,0,0
24192,f0d6dee96e057a5b,The book is used as a source in a number of ar...,0.000116,0,0,0,0,0,0
2332,cbb9268d3c7429c1,Vals Vienés \n |Rumba \n |Cha Cha Chá(ciclo 7),0.000131,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
19994,1429a2c930688317,you fucking piece of shit,0.997278,1,1,1,0,1,0
28728,d82e45eb103ad07b,fuck off you stupid aspy asshole,0.997736,1,1,1,0,1,0
25535,e9b92c6c6b01aeef,FUCK YOU YOU PIECE OF SHIT \n FUCK YOU YOU PIE...,0.997982,1,1,1,0,1,0
25945,e783fd267f3a9d3b,FUCK WIKIPEDIA ON WHEELS! \n\nFuck off wikiped...,0.998136,1,1,1,0,1,0


This is the function that will be used to make calls to the Perspective API for my own testing. I generated my own API key and inserted it into API_KEY.

In [45]:
from googleapiclient.discovery import build
import json

def get_toxicity_score(comment):
    
  API_KEY = 'AIzaSyAzX67x7Oj38oiFDq15JiYZUPHVMqTfenM' # Put your API key here
    
  client = build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  )

  analyze_request = {
  'comment': { 'text': comment },
  'requestedAttributes': {'TOXICITY': {}}
  }
    
  response = client.comments().analyze(body=analyze_request).execute()
  toxicity_score = response["attributeScores"]["TOXICITY"]["summaryScore"]["value"]
    
  return toxicity_score

After parsing through the data, I have determined the threshold of toxicity begins at a toxicity score of 0.5 as the scores rand from 0 (not toxic at all) to 1 (extremely toxic). I believe 0.5 represents the exact point when a comment is no longer neutral and become offensive. 
Additionally, I will only be viewing comments that are considered toxic, threats, or insults.

In [42]:
result = df[(df['score']>= 0.50)]
result

Unnamed: 0,id,comment_text,score,toxic,severe_toxic,obscene,threat,insult,identity_hate
2,0002f87b16116a7f,"""::: Somebody will invariably try to add Relig...",0.667964,0,0,0,0,0,0
8,00091c35fa9d0465,"== Arabs are committing genocide in Iraq, but ...",0.635929,1,0,0,0,0,0
9,000968ce11f5ee34,Please stop. If you continue to vandalize Wiki...,0.604778,0,0,0,0,0,0
12,000bafe2080bba82,. \n\n Jews are not a race because y...,0.628810,0,0,0,0,0,0
17,001068b809feee6b,""" \n\n ==balance== \n This page has one senten...",0.729625,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
41317,9495e55ccf912a2a,"""\n\n """"Endocinal Jubachina System"""" \n\nThis ...",0.647981,0,0,0,0,0,0
41325,948cb4052152869b,WHAT? That's past stuff. Stop being an insti...,0.515460,1,0,0,0,0,0
41330,948628822e4dd6c7,"""\n\nThese sources don't exactly exude a sense...",0.559237,0,0,0,0,0,0
41332,9481cd7393b583c9,"RE: \n\nIt's a fucking album cover, how the fu...",0.932649,1,0,1,0,0,0


In [43]:
result.to_csv('perspective_results.csv')

Hypothesis:
I believe that comments insulting a person's mother will be more offensive than comments insulting a person's father.

In [82]:
comment_list = ['ur mom', 'ur dad', 'ur mom\'s a hoe', 'ur dad\'s a hoe', 'your mom is ugly', 'your dad is ugly',
                'your mom has rabies', 'your dad has rabies', 'mom', 'dad', 'mommy', 'daddy',
                'that\'s why your mom left you', 'that\'s why your dad left you',
                'your mother', 'your father', 'motherless', 'fatherless', 'i love moms', 'i love dads']
score_list = []

for comment in comment_list:
    score = get_toxicity_score(comment)
    score_list.append(score)
    print(comment, score)
    time.sleep(1)

ur mom 0.27690583
ur dad 0.119228795
ur mom's a hoe 0.85173553
ur dad's a hoe 0.8457076
your mom is ugly 0.90742445
your dad is ugly 0.8795294
your mom has rabies 0.5035055
your dad has rabies 0.4497033
mom 0.12760067
dad 0.06372416
mommy 0.10699297
daddy 0.08501353
that's why your mom left you 0.34388447
that's why your dad left you 0.22987647
your mother 0.26589394
your father 0.06492286
motherless 0.41256893
fatherless 0.22987647
i love moms 0.06012806
i love dads 0.19789414


As seen from my testing above, my hypothesis was correct. The Perspetive API screened the exact same comment differently when 'mom' or 'mommy' was switched to 'dad' or 'daddy'. Every single test comment referring to a female gendered parent was given a higher toxicity score than its male counterpart.