In the tutorial, you learned about six different types of bias.  In this exercise, you'll train a model with **real data** and get practice with identifying bias.  Don't worry if you're new to coding: you'll still be able to complete the exercise!

# Introduction

At the end of 2017, the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) platform shut down and released their ~2 million public comments in a lasting open archive. Jigsaw sponsored this effort and helped to comprehensively annotate the data.  In 2019, Kaggle held the [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview) competition so that data scientists worldwide could work together to investigate ways to mitigate bias.

The code cell below loads some of the data from the competition.  We'll work with thousands of comments, where each comment is labeled as either "toxic" or "not toxic".

Begin by running the next code cell.  
- Clicking inside the code cell.
- Click on the triangle (in the shape of a "Play button") that appears to the left of the code cell.

The code will run for approximately 30 seconds.  When it finishes, you should see as output a message saying that the data was successfully loaded, along with two examples of comments: one is toxic, and the other is not.

> **Optional** note: The original competition data uses a toxicity score ranging from 0 to 1.  We've simplified this score to either 0 or 1 by thresholding the value: scores > 0.7 are assigned "1", scores < 0.3 are assigned "0", and comments with scores between 0.3 and 0.7 are dropped from the dataset.  Additionally, to reduce runtime, we have reduced the size of the dataset with subsampling. To preprocess the comments, we use a "bag of words" approach with `CountVectorizer()`.  Note that this is a simple approach, and in practice you'll want to spend time cleaning up the data.  Here's a great example for how to do that: https://www.kaggle.com/christofhenkel/how-to-preprocessing-for-glove-part1-eda

In [3]:
# Set up feedback system
"""from learntools.core import binder
binder.bind(globals())
from learntools.ethics.ex3 import *"""

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Get the same results each time
np.random.seed(0)

# Load the (full) training data
full_data = pd.read_csv("train.csv")

# Work with a small subset of the data: if target > 0.7, toxic.  If target < 0.3, non-toxic
full_toxic = full_data[full_data["target"]>0.7]
full_nontoxic = full_data[full_data["target"]<0.3].sample(len(full_toxic))
data = pd.concat([full_toxic, full_nontoxic], ignore_index=True)
comments = data["comment_text"]
target = (data["target"]>0.7).astype(int)

# Break into training and test sets
comments_train, comments_test, y_train, y_test = train_test_split(comments, target, test_size=0.30, stratify=target)

# Get vocabulary from training data
vectorizer = CountVectorizer()
vectorizer.fit(comments_train)

# Get word counts for training and test sets
X_train = vectorizer.transform(comments_train)
X_test = vectorizer.transform(comments_test)

# Preview the dataset
print("Data successfully loaded!\n")
print("Sample toxic comment:", comments_train.iloc[18])
print("Sample not-toxic comment:", comments_train.iloc[3])

Data successfully loaded!

Sample toxic comment: Yeah, just like you, you moron.
Sample not-toxic comment: I'm lovin' it.


In [4]:
print(full_data.shape, full_toxic.shape, full_nontoxic.shape, data.shape, comments_train.shape, comments_test.shape)

(1804874, 45) (45451, 45) (45451, 45) (90902, 45) (63631,) (27271,)


In [5]:
print(type(comments_train))
comments_train[:10]

<class 'pandas.core.series.Series'>


43081    As the title of the editorial says “the reacti...
10787    They're all guilty, I wish we had the death pe...
65788    The USA has exactly the opposite problem that ...
66265                                       I'm lovin' it.
72267    I agree. We will never tax ourselves into pros...
13200    I wouldn't have known you are black unless you...
44885    A coup .... about bloody time!\n\nRelatives sh...
4514     I say we take the little bitch and his loyalis...
86531    We CANNOT "borrow our way to prosperity" any m...
41724            He is a delusional psychopath. Neohitler.
Name: comment_text, dtype: object

In [6]:
print(type(X_train), X_train.shape)
print(len(X_train[0].toarray()[0]))

<class 'scipy.sparse.csr.csr_matrix'> (63631, 58041)
58041


In [7]:
print(len(X_train[0,:].toarray()[0]))
X_train[0,:].toarray()[0]

58041


array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [8]:
print(len(X_train[:,0].toarray()))
X_train[:,0].toarray()

63631


array([[0],
       [0],
       [0],
       ...,
       [0],
       [0],
       [0]], dtype=int64)

Run the next code cell without changes to use the data to train a simple model.  The output shows the accuracy of the model on some test data.

In [9]:
from sklearn.linear_model import LogisticRegression

# Train a model and evaluate performance on test dataset
classifier = LogisticRegression(max_iter=2000)
classifier.fit(X_train, y_train)
score = classifier.score(X_test, y_test)
print("Accuracy:", score)

# Function to classify any string
def classify_string(string, investigate=False):
    prediction = classifier.predict(vectorizer.transform([string]))[0]
    if prediction == 0:
        print("NOT TOXIC:", string)
    else:
        print("TOXIC:", string)



Accuracy: 0.9294488650947893


Roughly 93% of the comments in the test data are classified correctly!

# 1) Try out the model

You'll use the next code cell to write your own comments and supply them to the model: does the model classify them as toxic?  

1. Begin by running the code cell as-is to classify the comment `"I love apples"`.  You should see that was classified as "NOT TOXIC".

2. Then, try out another comment: `"Apples are stupid"`.  To do this, change only `"I love apples"` and leaving the rest of the code as-is.  Make sure that your comment is enclosed in quotes, as below.
```python
my_comment = "Apples are stupid"
```
3. Try out several comments (not necessarily about apples!), to see how the model performs: does it perform as suspected?

In [11]:
# Comment to pass through the model
my_comment = "I love apples"
#my_comment = comments_train.iloc[np.random.randint(0,100,1)[0]]

# Do not change the code below
classify_string(my_comment)
#q_1.check()

NOT TOXIC: I love apples


Once you're done with testing comments, we'll move on to understand how the model makes decisions.  Run the next code cell without changes.

The model assigns each of roughly 58,000 words a coefficient, where higher coefficients denote words that the model thinks are more toxic.  The code cell outputs the ten words that are considered most toxic, along with their coefficients.  

In [12]:
coefficients = pd.DataFrame({"word": sorted(list(vectorizer.vocabulary_.keys())), "coeff": classifier.coef_[0]})
coefficients

Unnamed: 0,word,coeff
0,00,0.431559
1,000,-0.155165
2,0000000000000000000,-0.025017
3,00001,-0.173488
4,0001,-0.044941
...,...,...
58036,𝒑𝒖𝒃𝒍𝒊𝒄𝒍𝒚,0.000301
58037,𝒑𝒖𝒓𝒄𝒉𝒂𝒔𝒆,0.000301
58038,𝒕𝒉𝒆,0.000301
58039,𝒕𝒐,0.000301


In [13]:
coefficients.sort_values(by=['coeff']).tail(10)

Unnamed: 0,word,coeff
25848,hypocrite,6.2186
16985,dumb,6.446921
12995,crap,6.519769
34285,moron,6.626779
38378,pathetic,6.643814
26015,idiotic,6.669001
49888,stupidity,7.503452
26021,idiots,8.549985
26013,idiot,8.637935
49876,stupid,9.369515


# 2) Most toxic words

Take a look at the most toxic words from the code cell above.  Are you surprised to see any of them?  Are there any words that seem like they should not be in the list?

In [15]:
# Check your answer (Run this code cell to get credit!)
#q_2.check()

# 3) A closer investigation

We'll take a closer look at how the model classifies comments.
1. Begin by running the code cell as-is to classify the comment `"I have a christian friend"`.  You should see that was classified as "NOT TOXIC".  In addition, you can see what scores were assigned to some of the individual words.  Note that all words in the comment likely won't appear.
2. Next, try out another comment: `"I have a muslim friend"`.  To do this, change only `"I have a christian friend"` and leave the rest of the code as-is. Make sure that your comment is enclosed in quotes, as below.
```python
new_comment = "I have a muslim friend"
```
3. Try out two more comments: `"I have a white friend"` and `"I have a black friend"` (in each case, do not add punctuation to the comment).
4. Feel free to try out more comments, to see how the model classifies them.

In [20]:
# Set the value of new_comment
new_comment = "I have a christian friend"
new_comment = "I have a good friend"

# Do not change the code below
classify_string(new_comment)
coefficients[coefficients.word.isin(new_comment.split())]
#q_3.check()

NOT TOXIC: I have a good friend


Unnamed: 0,word,coeff
21422,friend,0.055318
22755,good,-0.277616
24208,have,-0.066489


# 4) Identify bias

Do you see any signs of potential bias in the model?  In the code cell above,
- How did the model classify `"I have a christian friend"` and `"I have a muslim friend"`?  
- How did it classify `"I have a white friend"` and `"I have a black friend"`?    

Once you have an answer, run the next code cell.

---




*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/230765) to chat with other Learners.*

In [23]:
classify_string("I have a christian friend")
classify_string("I have a muslim friend")
classify_string("I have a white friend")
classify_string("I have a black friend")

NOT TOXIC: I have a christian friend
TOXIC: I have a muslim friend
NOT TOXIC: I have a white friend
TOXIC: I have a black friend
