# Explaining and debugging your models

There are two ways to try to improve your model:

1. Change things randomly
2. Actually think about what you're doing

While the first one is probably faster, let's look a little more deeper into how we can explain our models. By looking at incorrect predictions or asking it to predict specific sentences we can (potentially) see what needs tweaking.

## Read in some data

In [114]:
import pandas as pd
pd.options.display.max_colwidth = 400

df = pd.read_csv("sentiment140-subset.csv", nrows=1000)
df.head(3)

Unnamed: 0,polarity,text
0,0,@kconsidder You never tweet
1,0,Sick today coding from the couch.
2,1,"@ChargerJenn Thx for answering so quick,I was afraid I was gonna crash twitter with all the spamming I did 2 RR..sorry bout that"


In [115]:
df.polarity.value_counts()

1    517
0    483
Name: polarity, dtype: int64

## Vectorize our text

First we'll convert our text into word counts. In this case it's TF-IDF adjusted stemmed word counts, but you get the idea.

In [116]:
# Uncomment if you need to install these
# !pip install pystemmer
# !pip install sklearn

In [117]:
from sklearn.feature_extraction.text import TfidfVectorizer
import Stemmer

stemmer = Stemmer.Stemmer('en')
analyzer = TfidfVectorizer().build_analyzer()

class StemmedTfidfVectorizer(TfidfVectorizer):
    def build_analyzer(self):
        analyzer = super(TfidfVectorizer, self).build_analyzer()
        return lambda doc: stemmer.stemWords(analyzer(doc))

vectorizer = StemmedTfidfVectorizer(max_features=300)
matrix = vectorizer.fit_transform(df.text)
matrix

<1000x300 sparse matrix of type '<class 'numpy.float64'>'
	with 7623 stored elements in Compressed Sparse Row format>

## Using a train/text split to measure performance

In [118]:
# X is what we're using to predict (word counts)
# y is what we're predicting (pos/neg)
X = matrix
y = df.polarity

When we build the confusion matrix, `normalize='true'` will give us percentages instead of raw numbers

In [119]:
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Train using a LinearSVC on the training data
# We could also use RandomForestClassifier or anything else
clf = LinearSVC(class_weight='balanced')
clf.fit(X_train, y_train)

In [120]:
# Test
y_true = y_test
y_pred = clf.predict(X_test)
# matrix = confusion_matrix(y_true, y_pred)
matrix = confusion_matrix(y_true, y_pred, normalize='true')

# How did it do?
label_names = pd.Series(['negative', 'positive'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)

Unnamed: 0,Predicted negative,Predicted positive
Is negative,0.692913,0.307087
Is positive,0.268293,0.731707


## Understanding where it made mistakes

To understand where it made mistakes, we want to both look at the **model overall** as well as **specific instances of decisions it made**.

### Note: Raw prediction scores

Something changed versus what we did in class!

For most classifiers (random forest, naive bayes, etc), use what we did in class to get the raw scores:

```python
df['pred_score'] = clf.predict_proba(X_test)[:,1]
```

If you're using LinearSVC, don't do the `CalibratedClassifierCV` step and use this code for the raw scores:

```python
df['pred_score'] = clf.decision_function(X)
```

Okay let's go!

> **Why?** In class we *always* got the raw score by using `predict_proba`. This doesn't work by default with LinearSVC, so I did a little trick and combine it with CalibratedClassifierCV to get `predict_proba`. That's why we didn't just do `.fit` on the LinearSVC itself.

In [121]:
# Filter for only the ones we tested on
test_subset = df.loc[y_test.index]

# Store the prediction
test_subset['prediction'] = y_pred

# Store the prediction calculation
#     if using LinearSVC, clf.decision_function
#         clf.decision_function(X_test)
#     If anything else, you need predict_proba + [:,1]
#         clf.predict_proba(X_test)[:,1] 
# Using a Random Forest, so we use the second one
# test_subset['pred_score'] = clf.predict_proba(X_test)[:,1]
test_subset['pred_score'] = clf.decision_function(X_test)

### We can look at the rows it predicted incorrectly...

In [122]:
incorrect = test_subset[test_subset.prediction != test_subset.polarity]
incorrect.head(5)

Unnamed: 0,polarity,text,prediction,pred_score
694,0,@Heaatherrr noooo - if you were here you could borrow mine x,1,0.005805
520,0,@thrivingivory When are you going on? I def way to see y'all but i might be a little late,1,0.594922
930,1,@thomiduvigneau Thanks mister now I can also sing with my laptop,0,-0.176119
78,0,@jonskeeetskeeet Dude Tylenol is ineffective with headaches like these I will need a new liver with as may I gota take.,1,0.578899
13,1,"@i_am_girlfriday aw, i'm sure you were absolutely cute with those bangs",0,-0.295079


### We can look at the highest- and lowest-scoring rows...

You'll probably want more than `.head(3)` when you're looking through things! I just hate scrolling.

It might also be useful to look at the highest/lowest *incorrectly* predicted ones.

In [123]:
# Highest score
test_subset.sort_values(by='pred_score', ascending=False).head(3)

Unnamed: 0,polarity,text,prediction,pred_score
753,1,@renagades read your blog and posted.,1,1.831807
124,1,@clarescoffee So excited about your @envirosax bag!!! Can't wait to order one for myself!,1,1.76388
87,1,@captainnathanj: Hahah! That's awesome.,1,1.567001


In [124]:
# Lowest score
test_subset.sort_values(by='pred_score').head(3)

Unnamed: 0,polarity,text,prediction,pred_score
273,0,got my iphone! only had to stand in line for like 10 minutes. but i still don't have any service,0,-2.094725
822,0,@shadafuxupbitxh u know I would babes.but I have to be some where in like an hour.,0,-2.065745
188,0,I hate packing,0,-1.926679


## Explaining the classifier in general

Instead of looking at results, we can use `eli5` to ask **what words the classifier feels is important.**

When we pass `top=20` to `show_weights`, it will show us the top 20 most important features (some push in a positive direction, some push in a negative direction).

In [125]:
#!pip install eli5

In [126]:
import eli5

eli5.show_weights(clf, vec=vectorizer, top=20)



Weight?,Feature
+1.863,awesom
+1.458,pretti
+1.447,haha
+1.373,world
+1.329,final
+1.295,will
+1.239,happi
… 133 more positive …,… 133 more positive …
… 147 more negative …,… 147 more negative …
-1.242,aw


## Explain individual items

We can also ask `eli5` to predict individual items and explain the decision the classifier made.

Note that when you ask to explain individual items, **green doesn't mean positive**. Green just means, "this contributed to the decision we made." So if it decided that it's a negative tweet, green will mean negative. If it decided it's a positive tweet, green will mean positive.

In [127]:
text = "Sadly I have the worst sickness and haters are hating."

A few of these words will get shortened by the StemmedTfidfVectorizer into stemmed versions. This allows us to combine similar words.

|word|stemmed version|
|---|---|
|sadly|sad|
|sickness|sick|
|haters|hate|
|hating|hate|

It causes trouble with eli5, though: **they don't highlighted when we ask for an explanation!** I swear this didn't used to be the case, but who knows.

In [128]:
eli5.show_prediction(clf, text, vec=vectorizer)

Contribution?,Feature
0.956,sick
0.944,sad
0.899,hate
-0.054,<BIAS>
-0.176,Highlighted in text (sum)


If we use `force_weights=True` at least we can see everything in the table. Note that we wouldn't have this issue if we used a normal TfidfVectorizer!

In [129]:
eli5.show_prediction(clf, text, vec=vectorizer, force_weights=True)



Contribution?,Feature
0.956,sick
0.944,sad
0.899,hate
0.225,have
-0.054,<BIAS>
-0.091,the
-0.134,and
-0.176,are

Contribution?,Feature
0.956,sick
0.944,sad
0.899,hate
-0.054,<BIAS>
-0.176,Highlighted in text (sum)


## What do we do with this?

The cycle will generally look like this:

1. Look at the confusion matrix. Be dissatisfied.
2. Find some incorrect predictions you think it should have gotten correct.
3. Ask eli5 to explain the predictions. What words caused them?
4. Tweak the vectorizer by adding `stop_words`, adjusting `max_features`, or raising/lowering `max_df` or `min_df` (you could also try add/remove stemming)
5. Re-train, generate another confusion matrix, go to Step 1 until satisfied