----
Exercise: Text Classification for Sentiment Analysis
===

![](http://choyeski.com/wp-content/uploads/2013/02/movie-review.jpg)


Let's pick where we left off last session...

Let's refine the best model so far.

----

Calcute the precision on the test data

<br>
<details><summary>
Click here for the precision score...
</summary>
<br>
85.65%
</details>

In [2]:
from sklearn.metrics import precision_score

import os
from urllib.request import urlretrieve

url = "http://www.cs.cornell.edu/People/pabo/movie-review-data/"
filename = "review_polarity.tar.gz"

if not os.path.exists(filename):
    filename, _ = urlretrieve(url+filename, filename)
    
import tarfile

path = "./txt_sentoken/"

if not os.path.exists(path):
    with tarfile.open(filename, "r:gz") as tar:
        tar.extractall()
        
from sklearn.datasets import load_files

# Load data
sentiment = load_files(path, 
                       encoding='utf-8',
                       random_state=42)
sentiment.target_names

from sklearn.cross_validation import train_test_split

# Create train/test split with labels
train_data, test_data, train_target, test_target = train_test_split(sentiment.data,
                                                                    sentiment.target,
                                                                    random_state=42)

<class 'sklearn.datasets.base.Bunch'>


In [29]:
from sklearn.feature_extraction.text import CountVectorizer

# Transform train data from a list of strings into a matrix of frequency counts
vectorizer_count = CountVectorizer()
vectorized_count_train_data = vectorizer_count.fit_transform(train_data)

print("There are {:,} words in the vocabulary.".format(len(vectorizer_count.vocabulary_)))
print("'{}' appers {:,} times.".format('bacon', vectorizer_count.vocabulary_['bacon']))

from sklearn.naive_bayes import MultinomialNB

# Create an instance of the Naive Bayes class 
clf = MultinomialNB()
# Call fit method
clf.fit(vectorized_count_train_data, train_target)

accuracy = clf.score(vectorizer_count.transform(test_data), test_target)

print("The accuracy on the test data is {:.2%}".format(accuracy))

There are 35,350 words in the vocabulary.
'bacon' appers 2,600 times.
The accuracy on the test data is 82.00%


In [30]:
clf.predict?

In [31]:
X_test = vectorizer_count.transform(test_data)

In [32]:
# Calcute the precision on the test data
# Click here for the precision score...

# 85.65%

precision_score?

y_predict = clf.predict(X_test)

precision_score(test_target, y_predict)
print('Precision: {:.2%}'.format(precision_score(test_target, y_predict)))

Precision: 85.65%


----

Calcute the recall on the test data

<br>
<details><summary>
Click here for the recall score...
</summary>
<br>
78.38%
</details>

In [33]:
from sklearn.metrics import recall_score

# recall_score(test_target, y_predict)
print('Recall: {:.2%}'.format(recall_score(test_target, y_predict)))

Recall: 78.38%


Why do you think one is so much higher than the other?

What does that tell us about our model?

In [34]:
recall_score?

Precision is higher than recall because there are fewer false positives than false negatives.  The type II error is higher than type I.

So we incorrectly predict bad reviews.  The model is pessimistic.

----

Calcute the $F_1$ score 

<br>
<details><summary>
Click here for the $F_1$ score...
</summary>
<br>
$F_1$ = 80.86% - not the actual number
</details>

In [35]:
f1_score?

Object `f1_score` not found.


In [36]:
from sklearn.metrics import f1_score

# recall_score(test_target, y_predict)
print('F1 score: {:.2%}'.format(f1_score(test_target, y_predict)))

F1 score: 81.85%


What is the best evaluation metric for your model? Why pick that one?

the F1 score is usually the best metric.  Our precision is higher than recall, so there are more false negatives, and the model is pessimistic.

How could you improve your $F_1$ score?

Get fewer false positives or false negatives, increase precision and recall, and get more true positives.

We want fewer false negatives, change the parameters and threshold so that neutral reviews lean towards positive.

----

Let's experiment with another sentiment tool: [TextBlob](https://textblob.readthedocs.org/en/dev/quickstart.html#sentiment-analysis)

It comes pretrained...

In [37]:
from textblob import TextBlob

In [38]:
# Roger Ebert hatin' http://www.rogerebert.com/reviews/north-1994

testimonial = TextBlob("""I have no idea why Rob Reiner, or anyone else, wanted to make this story into a movie, and close examination of the film itself is no help. 
"North" is one of the most unpleasant, contrived, artificial, cloying experiences I've had at the movies. 
To call it manipulative would be inaccurate; it has an ambition to manipulate, but fails""")

testimonial.sentiment

Sentiment(polarity=-0.35, subjectivity=0.7)

In [39]:
testimonial.sentiment.polarity

-0.35

Fit TextBlob to the moview reviews.

In [47]:
X_test_polarity = [TextBlob(item).sentiment.polarity for item in str(X_test)]

In [50]:
import numpy as np

y_pred = np.array([0 if item < 0 else 1 for item in X_test_polarity])

How does it do?

In [53]:
# print('Accuracy: {:.2%}'.format(accuracy_score(y_test, y_pred)))
# print('Precision: {:.2%}'.format(precision_score(y_test, y_pred)))
# print('Recall: {:.2%}'.format(recall_score(y_test, y_pred)))
# print('F1_Score: {:.2%}'.format(f1_score(y_test, y_pred)))

NameError: name 'y_test' is not defined

We can also train TextBlob...

In [None]:
train = [
    ('I love this sandwich.', 'pos'),
    ('This is an amazing place!', 'pos'),
    ('I feel very good about these beers.', 'pos'),
    ('This is my best work.', 'pos'),
    ("What an awesome view", 'pos'),
    ('I do not like this restaurant', 'neg'),
    ('I am tired of this stuff.', 'neg'),
    ("I can't deal with this", 'neg'),
    ('He is my sworn enemy!', 'neg'),
    ('My boss is horrible.', 'neg')
]
test = [
    ('The beer was good.', 'pos'),
    ('I do not enjoy my job', 'neg'),
    ("I ain't feeling dandy today.", 'neg'),
    ("I feel amazing!", 'pos'),
    ('Gary is a friend of mine.', 'pos'),
    ("I can't believe I'm doing this.", 'neg')
]

In [None]:
from textblob.classifiers import NaiveBayesClassifier

In [None]:
cl = NaiveBayesClassifier(train)

In [None]:
for review, label in test:
    print(review)
    print("  gold: \t", label)
    print("  predicted: \t", cl.classify(review))
    print()

In [None]:
print("{:.2}".format(cl.accuracy(test)))

In [None]:
cl.classify("Their burgers are amazing")  # "pos"

In [None]:
cl.classify("I don't like their pizza.") # neg

In [None]:
cl.show_informative_features()

Train TextBlob on the moview reviews.

In [None]:
# Load data
sentiment = load_files(path, 
                       encoding='utf-8', # You need to specific encoding for TextBlob; Optional for scikit-learn
                       random_state=42)

How does it do?

Which model would you deploy for a moview review site? Think about performance, speed, and ease of use.

For example, [Rotten Tomatoes](https://www.rottentomatoes.com/m/battlefield_earth/) shows postive and negative reviews.

-----
Challenge exercises
------

Add features for negation

Let's model negation as shown in the videos by adding `NOT_` to every word between negation and following punctuation:  
    $e.g.$ `"didn’t like this movie , but I"`  
    $\rightarrow$ `"didn’t NOT_like NOT_this NOT_movie but I"`  

See [Stack Overflow](http://stackoverflow.com/questions/23384351/how-to-add-tags-to-negated-words-in-strings-that-follow-not-no-and-never) for how an example of how to do this.

Does adding negation improve your overall performance as measured in confusion matrix? 

Try other modes and hyperparameters. 

Can you find a model with higher F1 score?

<br>
<br>
---