## Prerequisites

In [77]:
import pandas as pd
import numpy as np
from collections import Counter

## Loading Data Into Pandas
I use pandas to read the 'tab separated' data into a 'data frame' object for ease of interaction.

Loading aggression annotation data:

In [205]:
aggression_labels = pd.DataFrame()
aggression_labels = pd.read_csv('aggression_annotations.tsv', sep='\t')

Loading aggression comment data:

In [206]:
aggression_comments = pd.DataFrame()
aggression_comments = pd.read_csv('aggression_annotated_comments.tsv', sep='\t')

Loading aggression worker data:


In [207]:
aggression_workers = pd.DataFrame()
aggression_workers = pd.read_csv('aggression_worker_demographics.tsv', sep='\t')

Loading toxicity annotation data:

In [208]:
toxicity_labels = pd.DataFrame()
toxicity_labels = pd.read_csv('toxicity_annotations.tsv', sep='\t')

Loading toxicity comment data:

In [209]:
toxicity_comments = pd.DataFrame()
toxicity_comments = pd.read_csv('toxicity_annotated_comments.tsv', sep='\t')

Loading toxicity worker data:

In [210]:
toxicity_workers = pd.DataFrame()
toxicity_workers = pd.read_csv('toxicity_worker_demographics.tsv', sep='\t')

## Analysis
For both data sets loaded above, I investigate the question of **whether males and females find different words toxic or aggressive**. This would provide some insight into just how much perception of the same content can differ, and therefore what kinds of words (if any) bring some bias into the labelling. If there are particular words for which the labels differ significantly by some demographic breakdown, then any model trained on the data would probably perform poorly when dealing with such words and could apply bias.

The intuition motivating this analysis is the thought that there might be certain words or phrases that are more personally applicable to different demographics, so the perception of the negativity of these words will be biased.

*First, I must select the annotations that indicate toxicity or aggression.*

In [229]:
toxicity_labels_bad = toxicity_labels.loc[toxicity_labels['toxicity'] == 1]

aggression_labels_bad = aggression_labels.loc[aggression_labels['aggression'] == 1]

*Next, I isolate the demographic data I will be focusing on: gender.*

In [230]:
tox_worker_gender = toxicity_workers[['worker_id', 'gender']]

agg_worker_gender = aggression_workers[['worker_id', 'gender']]

*Now I combine the gender data with the annotation data.*

In [231]:
tox_labels_gender = toxicity_labels_bad.merge(tox_worker_gender, on='worker_id')

agg_labels_gender = aggression_labels_bad.merge(agg_worker_gender, on='worker_id')

*I also isolate the comment data and combine it with the above.*

In [232]:
tox_comments = toxicity_comments[['rev_id', 'comment']]

agg_comments = aggression_comments[['rev_id', 'comment']]

In [233]:
tox = tox_labels_gender.merge(tox_comments, on='rev_id', how='inner')

agg = agg_labels_gender.merge(agg_comments, on='rev_id', how='inner')

*I create lists for all words included in toxic or aggressive comments as annotated by females. I turn the strings to lower case before splitting them into lists of words to avoid some amount of word duplicates.*

*Female words for toxicity:*

In [197]:
fem_tox_words = []
fem_tox = tox.loc[tox['gender'] == 'female']
fem_tox.reset_index(drop=True, inplace=True)

for i in range(len(fem_tox)):
    fem_tox_words.extend(fem_tox['comment'][i].lower().split())
    

*Female words for aggression:*

In [227]:
fem_agg_words = []
fem_agg = agg.loc[agg['gender'] == 'female']
fem_agg.reset_index(drop=True, inplace=True)

for i in range(len(fem_agg)):
    fem_agg_words.extend(fem_agg['comment'][i].lower().split())

*Male words for toxicity:*

In [234]:
male_tox_words = []
male_tox = tox.loc[tox['gender'] == 'male']
male_tox.reset_index(drop=True, inplace=True)

for i in range(len(male_tox)):
    male_tox_words.extend(male_tox['comment'][i].lower().split())

*Male words for aggression:*

In [236]:
male_agg_words = []
male_agg = agg.loc[agg['gender'] == 'male']
male_agg.reset_index(drop=True, inplace=True)

for i in range(len(male_agg)):
    male_agg_words.extend(male_agg['comment'][i].lower().split())

*For reference, I calculate the proportion of male and female entries*

In [221]:
print("Male Percentage: ", (len(male_tox) / len(tox)) * 100)
print("Female Percentage: ", (len(fem_tox) / len(tox)) * 100)


Male Percentage:  63.36965547455667
Female Percentage:  36.60028833565123


*Using the 'collections' package, I generate a dictionary with counts for each word. I then turn this back into a sorted dataframe to easily examine most prominent words.* 

*Female word count for toxicity:*

In [198]:
fem_tox_counts = Counter(fem_tox_words)
fem_tox_counts_df = pd.DataFrame.from_dict(fem_tox_counts, orient='index')
fem_tox_counts_df.reset_index(inplace=True)
fem_tox_counts_df.columns = ['word', 'frequency']
fem_tox_counts_df = fem_tox_counts_df.sort_values('frequency', ignore_index=True)

*Female word count for aggression:*

In [237]:
fem_agg_counts = Counter(fem_agg_words)
fem_agg_counts_df = pd.DataFrame.from_dict(fem_agg_counts, orient='index')
fem_agg_counts_df.reset_index(inplace=True)
fem_agg_counts_df.columns = ['word', 'frequency']
fem_agg_counts_df = fem_agg_counts_df.sort_values('frequency', ignore_index=True)

*Male word count for toxicity:*

In [213]:
male_tox_counts = Counter(male_tox_words)
male_tox_counts_df = pd.DataFrame.from_dict(male_tox_counts, orient='index')
male_tox_counts_df.reset_index(inplace=True)
male_tox_counts_df.columns = ['word', 'frequency']
male_tox_counts_df = male_tox_counts_df.sort_values('frequency', ignore_index=True)

*Male word count for aggression:*

In [238]:
male_agg_counts = Counter(male_agg_words)
male_agg_counts_df = pd.DataFrame.from_dict(male_agg_counts, orient='index')
male_agg_counts_df.reset_index(inplace=True)
male_agg_counts_df.columns = ['word', 'frequency']
male_agg_counts_df = male_agg_counts_df.sort_values('frequency', ignore_index=True)

Now we can observe 40 of the top words featured in toxic comments as labeled by males and females. I skip the top 20 words because in both lists these are just basic parts of speech ('you', 'of', 'is', etc.)

In [224]:
fem_tox_counts_df.iloc[len(fem_tox_counts_df) - 60: len(fem_tox_counts_df)-20]

Unnamed: 0,word,frequency
203431,page,8811
203432,would,9039
203433,fucking,9127
203434,get,9166
203435,one,9406
203436,his,9490
203437,know,9697
203438,article,9739
203439,nigger,9928
203440,`,10223


In [225]:
male_tox_counts_df.iloc[len(male_tox_counts_df) - 60: len(male_tox_counts_df)-20]

Unnamed: 0,word,frequency
237608,his,15286
237609,know,15367
237610,shit,15377
237611,nigger,15407
237612,`,15530
237613,fucking,15597
237614,go,15787
237615,has,16102
237616,get,16252
237617,faggot,17321


Similarly we can observe some of the top words featured in aggressive comments as labeled by males and females.

In [246]:
fem_agg_counts_df.iloc[len(fem_agg_counts_df) - 80: len(fem_agg_counts_df)-20]

Unnamed: 0,word,frequency
177610,which,6674
177611,i'm,6714
177612,up,6735
177613,huge,6754
177614,think,6789
177615,other,6890
177616,why,6890
177617,should,7000
177618,there,7140
177619,-,7203


In [245]:
male_agg_counts_df.iloc[len(male_agg_counts_df) - 80: len(male_agg_counts_df)-20]

Unnamed: 0,word,frequency
202346,any,10483
202347,other,10629
202348,up,10646
202349,gay,10673
202350,why,10812
202351,some,10843
202352,there,10879
202353,should,11118
202354,been,11198
202355,would,11591


### Toxicity: Word Frequency Insights
Two main things stand out from the results of word frequency in the toxicity dataset above:
1. Males and females reacted similarly to 'nigger' and 'fuck'.
2. Males responded to some words that females did not, specifically 'faggot', and 'suck'.

The first of these seems to make sense; the n-word is widely understood to be extremely negative, and though 'fuck' is very flexible in its use, it is commonly used to express things such as frustration and anger.

The second observation appears to be a solid example of bias in the data. It makes sense that males would respond more strongly to the words 'faggot' and 'suck', as these are both used in homophobic slurs, a staple of toxic masculinity directed from males to each other.

This discrepancy **indicates bias** by gender in the annotation of toxicity. Clearly males and females perceive the toxicity of certain words differently based on cultural significance. If annotators respond more strongly to specific words that could offend them personally in their own life, does that mean that the dataset is biased unless the annotators are demographically representative of the Wiki editing platform?

### Aggression: Word Frequency Insights
Similar to the toxicity results above, there are some words of agreement and some that are unique:
1. Males and females agree on 'nigger', 'fuck', 'faggot', 'suck', 'die'
2. There are differences in 'shit, 'ass', and 'fat'

There is much more agreement here over some of the words that stood out for the toxicity ratings. The toxic masculinity / homophobia associated words are considered aggressive by both males and females, and of course the implication of death.

Interestingly, the word 'shit' appears much more in comments labeled aggressive by females. The words 'ass' and 'fat' appear more in those labeled by males. Perhaps due to the cultural way in which 'potty mouth' speech is considered more acceptable or normal for males, the word shit is perceived differently by gender. I have no intuition for why 'ass' and 'fat' would be noticed more by males than females, but it is a difference.

Like with the toxicity results, these observations **indicate bias** by gender in the annotation of aggression. 

## Further Implications
Here I consider how the insights from my analysis relate to potential applications as shown in [Perspective Hacks](https://github.com/conversationai/perspectiveapi/wiki/perspective-hacks), what some limitations of using this data might be, and potentially unintended consequences.

### Perspective Hacks Applications: Which Would a Model Based on This Data perform Well in?
The application that seems most likely to be successful using this data is the 'Toxicity Timeline'. Even though I identified some potential sources of bias in the toxicity annotations, a general view of trends in negative activity can be useful to deal with problem users or group harassment. Since it is not about interpreting specific instances of toxicity, the bias for perception of specific words is less relevant; the words where genders agreed on toxicity were the most frequent.

### What Hostile Speech Would be Difficult to Detect?
The most difficult speech to deal with, given the Perspective API and this training data, would be sarcasm or otherwise 'ironic' language. The problem with this is that there is no good training data to learn from - we humans ourselves are very inconsistent in recongizing and interpreting these things, especially in writing. The context of a user's previous comments might give indication to whether something is meant literally, but the data used here does not offer that.

### What are Unintended Consequences?
Two potentially negative consequences that come to mind are: playful language can be restricted, and some demographically specific toxicity could be treated differently, leading to a kind of indirect discrimination. Based on my analyses, there are some words which could easily be identified as toxic or aggressive, but which can also be used in humorous and casual ways ('fuck' and 'shit', for example). Also, I observed that homophobic words appeared more in male labeled comments; if there are other such demographic biases in labelling, perhaps some targeted toxic language will be recognized and dealt with less easily than others, which could leave groups of users in an unfair situation.