# Assignment 10: Data Bias (Coding) 
## 1. Explore the sample dataset to form hypotheses
### Initial Findings
   There are several examples where the data seems to fail at finding meaning. For example, "I WILL BURN YOU TO HELL IF YOU REVOKE MY TALK PAGE ACCESS" is considered non-toxic. My theory is that more convoluted or indirect ways of expressing anger will not be detected within the dataset. As such, the API will be very direct about its detection and not accurately detect toxicity. In this example, it might also be defaulting to non-toxic due to it containing a religious term. My thought process here is that those creating the dataset didn't want to blanket sensor religious speech by weighing relevent terms as toxic. "Christ" and "Hell" may be used in negative ways in casual or online speech, but terminology like this can be used in multiple ways, as can "BURN". 
   
   Furthermore, I hypothesize that the API will ignore all toxic slang terms due to the lack of direct negative content. The original dataset seems not to notice when words are modified in order to avoid detection. For example, " hello   you are an T0TL D0UCH3 BG" is seen as non-toxic due to the self-censoring of the offending word. If such a blatant example of detection avoidance is not caught, then I am expecting almost any attempt to avoid detection to work.
    
## 2. Design and perform tests
### Queries made to the API
   All queries made to the Perspective API were made utilizing this code fragment. The modifications were mainly made to comment text content for the purposes of testing what is considered toxic.

In [None]:
from googleapiclient import discovery
import json

API_KEY = 'This key is censored to avoid misuse of the API'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': ' COMMENT TEXT' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

### Example Queries and Initial Conclusions
Each set of tests is followed by a small description of the conclusions drawn from that test set. Each set is performed upon an initial comment, followed by modifications of that initial comment.
###### 1.
    'text':  
' IS THAT BOLD ENUF U jerk BEEYATCH'

    'toxicity': 
    0.7308154

###### 2.
    'text': 
' IS THAT BOLD ENUF U jerk BITCH'

    'toxicity': 
    0.9391453

###### 3.
    'text':  
' IS THAT BOLD ENUF U jerk'

    'toxicity': 
    0.7510937

###### This set of tests showcases that misspelled terms seem to be weighted as non-toxic. The test lacking the offending word scores almost identically to the misspelled instance.

###### 4.
    'text': 
'2014 UTC  Um Grells Confessional came out LAST YEAR dude You know the one where she flat out SAID SHE WAS A WOMAN Heres a link with helpful highlights if you dont remember httpdweebzilladeviantartcomartGRELLSUTCLIFFISAWOMAN386666411 Shes a woman end of story Now fix the damn pronouns  2054 15 May'

    'toxicity': 
    0.50789946

###### 5.
    'text': 
'2014 UTC  Um Grells Confessional came out LAST YEAR dude You know the one where she flat out SAID SHE WAS A WOMAN Heres a link with helpful highlights if you dont remember httpdweebzilladeviantartcomartGRELLSUTCLIFFISAWOMAN386666411 Shes a woman end of story Now fix the pronouns  2054 15 May'

    'toxicity':  
    0.20839658

###### 6.
    'text':  
' 2014 UTC  Um Grells Confessional came out LAST YEAR dude You know the one where she flat out SAID SHE WAS A WOMAN Heres a link with helpful highlights if you dont remember httpdweebzilladeviantartcomartGRELLSUTCLIFFISAWOMAN386666411 Shes a woman end of story Now fix the darn pronouns  2054 15 May'

    'toxicity': 
    0.47119883
    
###### This set of tests showcases how every word seems to have an impact on toxicity score. Simply changing damn to darn makes this comment considered non-toxic by the API.
    
###### 7.
    'text': 
'The Boy Who Cried Wolf  logging onto enwiki to call Marine Le Pen an ugly bitch destroys all credibility'

    'toxicity': 
    0.7105462

###### 8.
    'text': 
The Boy Who Cried Wolf  logging onto enwiki to call Marine Le Pen an ugly bitch destroys all credibility. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. Really. Yes, really. 

    'toxicity':  
    0.64447093

###### 9.
    'text':  
'Calling Marine Le Pen an ugly bitch destroys credibility.'

    'toxicity': 
    0.47900817
    
###### 10.
    'text':
    
'The Boy Who Cried Wolf   Calling Marine Le Pen an ugly bitch destroys credibility.'

    'toxicity': 
    0.62702215
    
###### This set of tests showcases how keywords seem to largely dictate the toxicity score of a comment. Loading a comment with filler statements does impact the score, but to a degree so small that bypassing filters would be unreasonable. However, including negative words such as "cried" will influence score by quite a bit despite not being directed.
    
###### 11.
    'text':  
' FReak    NO IM FINE TLKIN TO YOU THANXS MATE INIT1'

    'toxicity': 
    0.4269174
    
###### 12.
    'text':  
' FREAK    NO IM FINE TLKIN TO YOU THANXS MATE INIT1'

    'toxicity': 
    0.47473195
    
###### 13.
    'text':  
' freak    no im fine tlkin to you thanxs mate init1'

    'toxicity': 
    0.47119883
    
###### 14.
    'text':  
' FReak    no im fine tlkin to you thanxs mate init1'

    'toxicity': 
    0.40336934
    
###### This set of tests, despite being performed on an unassuming piece of data, were decently informative. The API seems to greatly value capitalization when detecting toxicity in comments. This is a relatively neutral comment without the addition of "freak". The API seems to weight toxic words more when given matching-case. By alternating the case of an offending word, we can reduce its weight in detection. This method seems to work regardless of case.
    
###### 15.
    'text':  
'You Suck Eggs'

    'toxicity': 
    0.8364697
    
###### 16.
    'text':  
'Suck Eggs'

    'toxicity': 
    0.71760553
    
###### 17.
    'text':  
'I Suck Eggs'

    'toxicity': 
    0.68408644
    
###### Interestingly enough, directed language seems to have quite the effect on how the API percieves a comment. Directing a statement outwards is more toxic than to yourself. Directing a comment towards another is again more toxic than a general command.

###### 18.
    'text':  
'I doubt you are truly so ignorant as to think I misunderstood the KKK reference Cracker is a racial slur much like redneck and I would appreciate your avoiding the usage of such terms in an offensive manner particularly in regards to a president of our nation'

    'toxicity': 
    0.50789946
    
###### 19.
    'text':  
'I doubt you are truly so ignorant as to think I misunderstood the KKK reference Cr*cker is a racial slur much like redneck and I would appreciate your avoiding the usage of such terms in an offensive manner particularly in regards to a president of our nation'

    'toxicity': 
    0.5885171
    
###### Self-censoring seems to actually make a comment more toxic. Perhaps this applies to non-toxic comments as well? Could it be weighted so that censored words are all seen a more toxic due to possible resulting words?

###### 20.
    'text':  
'I am a wordsmith not a peice of rotten meat'

    'toxicity': 
    0.2854971
    
###### 21.
    'text':  
'I am a wordsmith not a p**** of rotten ****'

    'toxicity': 
    0.52811706
    
###### This set of tests all but confirms that censoring your own comment makes it appear to be more toxic to the API. Presumably, this is due to the possibility for more directly toxic phrases in the place of benign terms. There may be situations where censoring makes a comment appear less toxic, but I wasn't confortable typing those possibilities to test! From this set of tests, we can see that idividual words and their possible meanings have an absolutely massive impact on toxicity ratings.

This is only a sampling of the tests performed. Any and all original comment text was found in the given Sample_labeled_data.csv file, which will be included in the github repository containing this file.

## 3. Extended Conclusions on API Performance
##### I. Conclusions from example tests
The presented tests all focus on individual hypotheses. By testing for multiple different influencing factors we can get a better idea of what influences the toxicity score of comments. From these tests we can see that almost every piece of a comment influences scoring in some way. The proportion of influence is what seems most important.

My main conclusion from testing the API on modifications of original comments is that minor alterations to offending words can greatly reduce the toxicity scoring of a comment. For example, simply capitalizing half of a word which increased toxicity score (even adjacent letters) provided a lower toxicity score from Perspective API. Presumably, this is due to the method in which the API searches for toxicity. By attempting to match an instance of a word to in-text instances, the API can only do so much without checking against every possible case permutation of toxic phrases. This is obviously an absurd idea, making the reality of Perspective API's detection quite impressive. 

Interestingly, the API's reliance on similarity to toxic phrases leads to some unorthodox results when we attempt to factor in ways that users may attempt to avoid automatic censors / bans. Mainly, I tested the use of " * " in place of characters in offending words. This (relatively mainstream) method of self-censorship appears to have been considered. Simply by replacing pieces of non-toxic words, the scoring of a comment could be radically altered. I theorize that the inclusion of indefinite characters (or even just symbols) creates more possible meanings for character groups, forcing the API to investigate and rank based on massively greater numbers of possible meanings. As such, self-censoring seems to hurt respectful comments while helping toxic users' scores. Including possibly less toxic results means that particularly toxic comments could be seen as less harmful if other possibilities are to be considered.

This effect was not quite as exaggerated as the replacing of entire words. When we adjusted the wording of comments to be more or less harmful, toxicity scores were very accurate. I was extremely impressed by the overall reliability of the API. Even without directly harmful terminology, the API could accurately sense a change in tone. Furthermore, the API would notice the tense and subject of comments. I didn't expect that! By redirecting a rude phrase from another user and towards themselves, the toxicity score could reliably be lowered.

##### II. Conclusions from extended testing
After running more tests on portions of the dataset larger than single comments I found some interesting trends. Firstly, longer comments seem to score lower in toxicity on average than shorter comments. I have two theories regarding why this is. My initial theory was that longer comments would be more developed and less likely to include gut reactions. Those gut reactions and short outbursts seemed to be the largest number of the comments deemed toxic in the original dataset.

However, my perspective shifted as I examined the specific instances listed above. Longer comments were less toxic on average - However, this may simply be due to the larger number of benign words required to form a complete thought of length. I have highlighted a case specifically in order to test for this. The results of that test indicate that there is a relationship, small as it may be. The idea behind this theory holds water, but repeated instances of similar words and phrases do not seem to influence the toxicity ratings. 

This lower toxicity rating seems consistent. Extending the number of words will lower the toxicity value (assuming the added words and phrases are non-toxic). This may create a bias towards longer comments or comments which hide offensive material or phrases within a larger, more respectful comment. The rating for certain words is strong enough that no extension would allow the comment to pass inspection by the API. However, when we combine more indirect ways of toxicity (sarcasm, general negativity, pestering) with this concept, cracks begin to show. When users become aware of the eccentricities of Perspective API, they could easily change behavior in order to disguise toxic comments. 