# Good Movie or Bad Movie?

## good reviews = good movie

If people like a movie, they'll say good things about it.

## bad reviews = bad movie

If people don't like a movie, they'll say bad things about it.

In [1]:
import pandas as pd

movie_reviews = pd.read_csv("movie_reviews.csv")

In [2]:
movie_reviews.sample(5)

Unnamed: 0,label,movie_review
19949,0,saw this piece of work at a film fest in ca my...
5303,1,a bizarre and brilliant combination of talents...
31481,1,i had never heard of this film when a good fri...
42142,0,coming from kiarostami this art house visual a...
4731,1,the unflappable william powell he is a joy to ...


# 0 = bad movie
# 1 = good movie

## How can you tell what make a review good and what makes one bad?

## What are some good review words?

## What are some bad review words?

# Now Teach the Computer!

We're going to make a model that learns what makes a review good and what makes a review bad by reading a lot of reviews.

In [3]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

cv = CountVectorizer(binary=True)
cv.fit(movie_reviews.movie_review)
X = cv.transform(movie_reviews.movie_review)
final_model = LogisticRegression(C=0.05)
final_model.fit(X, movie_reviews.label)

LogisticRegression(C=0.05, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

# Words the computer thinks are good:

In [4]:
feature_to_coef = {
    word: coef for word, coef in zip(
        cv.get_feature_names(), final_model.coef_[0]
    )
}
for best_positive in sorted(
    feature_to_coef.items(), 
    key=lambda x: x[1], 
    reverse=True)[:5]:
    print (best_positive)

('excellent', 0.98252868392301984)
('perfect', 0.85705986731381534)
('amazing', 0.79860375312126897)
('refreshing', 0.7801779052137443)
('superb', 0.73730132207294796)


# Words the computer thinks are bad:

In [5]:
for best_negative in sorted(
    feature_to_coef.items(), 
    key=lambda x: x[1])[:5]:
    print (best_negative)

('worst', -1.5068754883931079)
('waste', -1.3761623901081808)
('awful', -1.1717973879673844)
('disappointment', -0.95984204805465301)
('boring', -0.95111673415175479)


# Let's Write our Own Review!

Let's write our own movie review and see if the computer knows whether we liked the movie or not.

__Our Review:__

In [6]:
our_reviews = ["infinity war was really really good but it made me cry",
               "i loved when nemo found dory",
               "the star wars prequels were so bad",
               "the percy jackson movies are the worst movies ever",
               "thor ragnarock is a hilarious movie",
               "i loved the first part of the movie but i hated the first part",
               "good bad"]
our_translated_review = cv.transform(our_reviews)

Remember...

# 0 = bad movie
# 1 = good movie

In [7]:
final_model.predict(our_translated_review)

array([1, 1, 0, 0, 1, 1, 0])