## Wikipedia Personal Attacks

The data and most of the code in this Notebook are taken from Ellery Wulczyn, Nithum Thain, and Lucas Dixon. (paper here: https://arxiv.org/abs/1610.08914, notebook here: https://github.com/ewulczyn/wiki-detox/blob/master/src/figshare/Wikipedia%20Talk%20Data%20-%20Getting%20Started.ipynb)

These authors' data contain:

- a large historical corpus of discussion comments on Wikipedia talk pages
- a sample of over 100k comments with human labels for whether the comment contains a personal attack
- a sample of over 100k comments with human labels for whether the comment has aggressive tone


Please note that some of these comments contain offensive language. 

## Building a classifier for personal attacks (code from Wulczyn et al)

First we import some packages.

In [2]:
import pandas as pd
import urllib
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

### Question 1

What are these packages and why are we using them?  (Feel free to Google around.)  It is okay if you do not understand all of this, just do your best.

### Your answer to Question 1

#### Pandas
- Pandas is a package for data manipulation and analysis. It makes csv file reading and writing easier.
#### urllib 
- Urllib is a package for url opening and reading, parsing and throwing errors. 
#### sklearn.pipeline 
- This package is a Pipeline object of transformers with a final estimator. It's helpful when performing sequences of different transformations and assembling several steps that can be cross-validated together with different parameters.
#### CountVectorizer
- CountVectorizer is used to convert a collection of text documents to a vector of term or token counts.
####  Tfidftransformer
- Scikit-learn’s Tfidftransformer converts a collection of raw documents to a matrix of TF-IDF features.  
#### LogisticRegression
- This is a logistic regression classifier. The output can take only discrete values for given set of inputs.
#### roc_auc_score
- It "Compute(s) Area Under the Receiver Operating Characteristic Curve" (ROC AUC) from prediction scores.

In [3]:
# download annotated comments and annotations

ANNOTATED_COMMENTS_URL = 'https://ndownloader.figshare.com/files/7554634' 
ANNOTATIONS_URL = 'https://ndownloader.figshare.com/files/7554637' 


def download_file(url, fname):
    urllib.request.urlretrieve(url, fname)

                
download_file(ANNOTATED_COMMENTS_URL, 'attack_annotated_comments.tsv')
download_file(ANNOTATIONS_URL, 'attack_annotations.tsv')

In [4]:
comments = pd.read_csv('attack_annotated_comments.tsv', sep = '\t', index_col = 0)
annotations = pd.read_csv('attack_annotations.tsv',  sep = '\t')

In [5]:
print(comments)


                                                     comment  year  logged_in  \
rev_id                                                                          
37675      `-NEWLINE_TOKENThis is not ``creative``.  Thos...  2002      False   
44816      `NEWLINE_TOKENNEWLINE_TOKEN:: the term ``stand...  2002      False   
49851      NEWLINE_TOKENNEWLINE_TOKENTrue or false, the s...  2002      False   
89320       Next, maybe you could work on being less cond...  2002       True   
93890                   This page will need disambiguation.   2002       True   
...                                                      ...   ...        ...   
699848324  `NEWLINE_TOKENNEWLINE_TOKENNEWLINE_TOKENThese ...  2016       True   
699851288  NEWLINE_TOKENNEWLINE_TOKENThe Institute for Hi...  2016       True   
699857133  NEWLINE_TOKEN:The way you're trying to describ...  2016       True   
699897151  Alternate option===NEWLINE_TOKENIs there perha...  2016       True   

                ns   sample

In [6]:
print(annotations)

            rev_id  worker_id  quoting_attack  recipient_attack  \
0            37675       1362             0.0               0.0   
1            37675       2408             0.0               0.0   
2            37675       1493             0.0               0.0   
3            37675       1439             0.0               0.0   
4            37675        170             0.0               0.0   
...            ...        ...             ...               ...   
1365212  699897151        628             0.0               0.0   
1365213  699897151         15             0.0               0.0   
1365214  699897151         57             0.0               0.0   
1365215  699897151       1815             0.0               0.0   
1365216  699897151        472             0.0               0.0   

         third_party_attack  other_attack  attack  
0                       0.0           0.0     0.0  
1                       0.0           0.0     0.0  
2                       0.0           0

### Question 2

We've now downloaded the data.  Please open it up and take a look.  How are the data formatted?  What's in there?  What do you notice?

### Your answer to Question 2

- The data are formatted as a TSV file. Each column is separated from the adjacent one with a tab. 
- It contains a lot of wikipeadia revision comments and some information including if it was made by a logged in user, if it's for article or user, train data or dev data. The second dataset contains if annotators think the comments are attacks for each entry. 
- I noticed each comment starts with "NEWLINE_TOKENNEWLINE". Need to clean the data by removing those phrases. Also very few entries are labelled as attacks.

In [7]:
len(annotations['rev_id'].unique())


115864

In [8]:
# labels a comment as an atack if the majority of annoatators did so
labels = annotations.groupby('rev_id')['attack'].mean() > 0.5

In [9]:
# join labels and comments
comments['attack'] = labels

In [10]:
# remove newline and tab tokens
comments['comment'] = comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
comments['comment'] = comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

In [11]:
comments.query('attack')['comment'].head()


rev_id
801279             Iraq is not good  ===  ===  USA is bad   
2702703      ____ fuck off you little asshole. If you wan...
4632658         i have a dick, its bigger than yours! hahaha
6545332      == renault ==  you sad little bpy for drivin...
6545351      == renault ==  you sad little bo for driving...
Name: comment, dtype: object

In [12]:
comments.query('attack')['comment'].tail()


rev_id
699645524     Brandon Semenuk has won the event four times ...
699659494    im soory since when is google images not allow...
699660419    what ever you fuggin fag Question how did you ...
699661020      == Nice try but no cigar........idiot ==  Th...
699664687     shut up mind your own business and go fuck so...
Name: comment, dtype: object

In [13]:
# fit a simple text classifier

train_comments = comments.query("split=='train'")
test_comments = comments.query("split=='test'")

clf = Pipeline([
    ('vect', CountVectorizer(max_features = 10000, ngram_range = (1,2))),
    ('tfidf', TfidfTransformer(norm = 'l2')),
    ('clf', LogisticRegression()),
])
clf = clf.fit(train_comments['comment'], train_comments['attack'])
auc = roc_auc_score(test_comments['attack'], clf.predict_proba(test_comments['comment'])[:, 1])
print('Test ROC AUC: %.3f' %auc)

Test ROC AUC: 0.957


### Question 3

What has happened here?  Can you explain, in general terms, what this code is doing?  What does ROC AUC mean?

### Your answer to Question 3

- The code first labels the comments as attack when more than half of the annotators thinks so. Then it add the label to the comments. For data cleaning, it uses a lambda expression to remove the "NEWLINE_TOKEN" and other irrelevant phrases for each comment. 
- Then we print out some entries that are labelled as attack. "head()" ".tail()" are used to show the first few results and last few results.
- Lastly we use a simple text classifier to predict how likely the sample comments are "attacks"
- In general, the code is labelling the comments as "attack" or not. It prints some examples of comments labelled as "attack" and evaluates how likely the test data are "attack".
- ROC AUC means Compute Area Under the Receiver Operating Characteristic Curve. Using ROC AUC we could evaluate a logistic regression model many times with different classification thresholds.

In [14]:
# now try to classify new comments
clf.predict(['Thanks for you contribution, you did a great job!'])


array([False])

In [15]:
clf.predict(['People as stupid as you should not edit Wikipedia!'])

array([ True])

In [17]:
clf.predict(['Your revision is so brillant that it adds nothing to the article.'])

array([False])

In [19]:
clf.predict(['Go home kid. No mess with this job'])

array([False])

In [20]:
clf.predict(['You f**** C******* a*****e return to your country!'])

array([False])

In [22]:
clf.predict(['Go home you Chinese. You should not edit wiki'])

array([False])

In [23]:
clf.predict(['You are such an idiot.'])

array([ True])

In [24]:
clf.predict(['Woman should go home and cook, not editing wiki.'])

array([False])

### Question 4

Edit the code above to try out some new nice and nasty comments of your own invention.  Can you "break" the classifier?  How/why or not?

### Your answer to Question 4

- I added a few nasty examples above with sacarstic strategy and racial/ nationality bias. It seems this classifier doesn't label "You should not edit wiki" and those negation as personal attack. It only tells by negative word usages such as "stupid" or "idiot". And it can't tell words with gender/ racial bias. I'm supposing because it's based on TF-IDF method for each word, it doens't understand the negative meaning of phrases without the usage of curse words.

### Question 5

Please summarize what has happened in this notebook as if you are explaining it to someone who has never heard of document classification or machine learning.

### Your answer to Question 5

This model helps to classify personal attacks in wikipeadia revision comments. We first train the model on labelled data. Those comments are already labelled by people. We are teaching our model what is personal attack, what is not. Then we can ask the model to predict it for the new data. It's like teaching a kid what a cat is by showing them some animals, and tell them "this is not a cat" or "this is a cat".

### Question 6

Now please take a look at the authors' original paper ( https://arxiv.org/abs/1610.08914).  What did they do with these Wikipedia comments?  What was their larger goal?

### Your answer to Question 6

- After running the classifer, the researchers analyzed the data. They concluded anonymous users are more likely to personal attack the users. And wikipeadia users with higher activity levels are less likely to make personal attack. Personal attacks are also concentrated among a few toxic users.
- Their larger goal is to create effective policies to identify and appropriately respond to harassement. The goal is to create a quantitative analysis method using personal attacks wikipeadia data as an example. 

### Question 7

Please read the Document Classification chapter of our in-progress "textbook" and use bullet points to indicate 5 things you learned and/or constructive suggestions.

### Your answer to Question 7

#### What I learned 
- The perceptron is a network with input and output layer only.It calculates the total score of the edges between nodes and decides which category is more likely.
- Pre-trained word embeddings carrys information of words meanings and co-occurence patterns.
- Sentiment analysis is also a classifier. We can feed positive/ negative reviewes and train it to tell which is positive review for test data.

#### Suggestions 
- It would be easier to read if the section for BERT (p. 153) has a subtitle. Instead of blending in the discussion of neutral networks. 
- Would be nice to make a graph to show what's a neural network. 