In [None]:
# Run this every time you open the spreadsheet
%load_ext autoreload
%autoreload 2
from collections import Counter
import lib
import random
import nltk

## Optional Exercise: Add bigram capabilities to the classifier!

So far our Naive Bayes classifier scores an Average F1 score of 66.9% on the test set.
Let's see if we can improve on that by incorporating bigrams!


In [None]:
def add_bigrams(tweet):
    # Currently, tweet has an attribute called tweet.tokenList which is a list of tokens.
    # You want to add a new attribute to tweet called tweet.bigramList which is a list of bigrams.
    # Each bigram should be a pair of strings. You can define the bigram like this: bigram = (token1, token2).
    # In Python, this is called a tuple. You can read more about tuples here: https://www.programiz.com/python-programming/tuple

    ##### YOUR CODE STARTS HERE #####
    
    
    
    ##### YOUR CODE ENDS HERE #####


tweets, test_tweets = lib.read_data()
for tweet in tweets+test_tweets:
    add_bigrams(tweet)
print("Checking if bigrams are correct...")
for tweet in tweets+test_tweets:
    assert tweet._bigramList==tweet.bigramList, "Error in your implementation of the bigram list!"
print("Bigrams are correct.\n")

prior_probs, token_probs = lib.learn_nb(tweets)
predictions = [(tweet, lib.classify_nb(tweet, prior_probs, token_probs)) for tweet in test_tweets]
lib.evaluate(predictions)

## Re-run the classifier and get evaluation score

This notebook uses our implementation of the Naive Bayes classifier, but it's very similar to what you implemented yesterday. If you're interested in the details, take a look at the `learn_nb` and `classify_nb` functions in `lib.py` in the `sailors2017` directory.

In [None]:
tweets, test_tweets = lib.read_data()
prior_probs, token_probs = lib.learn_nb(tweets)
predictions = [(tweet, lib.classify_nb(tweet, prior_probs, token_probs)) for tweet in test_tweets]
lib.evaluate(predictions)

## Inspecting the Classifier

After implementing and training a classifier, you often want to inspect what kind of things it has learned and how it is making predictions on individual examples. This can help you make sure that you implemented everything correctly and it might give you ideas on how to further improve the classifier.

### Most discriminative words

Let's first look again at the most discriminative words for each category, i.e. the words that maximize P(category|word), for each category.

In [None]:
lib.most_discriminative(tweets, token_probs, prior_probs)

These five lists show you which words are most predictive of the five categories. For example, the word _bottled_ is a very strong indicator that the tweet is about water or the word _canned_ is a very strong indicator that the tweet is about food.

Many of you used several of these words in your rule-based classifiers in week 1. It's reassuring (and exciting!) to see that the Naive Bayes classifier learned that these words are good indicators of the categories as well.


### Confusion matrix

Another useful type of visualization is a so-called confusion matrix. A confusion matrix shows you for each true category _c_ how many of the tweets in _c_ were classified into the five different categories. (In this way it tells you which categories are "confused" for others by the classifier).

In [None]:
lib.show_confusion_matrix(predictions)

In the matrix, the **rows** correspond to the **true category** and the **columns** correspond to the **predicted category**.

For example, this matrix shows you that of all the 79 tweets in the category _None_, 13 were incorrectly classified as _Energy_, 3 as _Food_, and 1 as _Medical_. 62 of them were actually correctly classified as _None_.

### Visualizing individual tweets

It can also be really useful to visualize the probabilities of each token in an individual tweet. This can help you understand why a classifier made a correct or wrong prediction. We've implemented a visualization for you so that you can use to inspect how the classifier works on individual tweets.

In [None]:
# The following code visualizes a random tweet from the test data. 
# Hover your mouse over the words!

random_tweet = random.choice(test_tweets)
lib.visualize_tweet(random_tweet, prior_probs, token_probs)

The color of each word tells you for which category $P(\text{token} \mid \text{category})$ is the highest. When you move the mouse over a word, it shows you the actual values of $P(\text{token} \mid \text{category})$ for each category that the classifier uses to make its predictions.

You can also have the classifier make a prediction on your own tweets. Change the text in `my_tweet` below and run the cell below to see what the classifier would predict.

In [None]:
my_tweet = "I urgently need some bottled water."

lib.visualize_tweet(lib.Tweet(my_tweet, "?", ""), prior_probs, token_probs)

## Error analysis: Figuring out remaining errors

Often, one wants to know in which scenarios a classifier makes mistakes. This can be really useful when you want to improve your classifier.

In this exercise, try to break the Naive Bayes classifier. Use the cell above and try to come up with a tweet which should be classified as _Food_ but which is assigned a different category. Once you find such a tweet, use the visualization to figure out why the classifier gets this example wrong.

Repeat this exercise for all the other categories. Based on your observations, do you have any ideas on how to further improve the classifier?
