### Bayesian Tomatoes

In this project I'll be analysing movie reviews from rotten tomatoes!!!

Things I'll learn - 
    - Working with web APIs
    - Making and interpreting predictions from a Bayesian perspective
    - Using the Naive Bayes algorithm to predict whether a movie review is positive or negative
    - Using cross validation to optimize models

In [1]:
%matplotlib inline 
#for loading the data
import json
#traditional ritual
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 30)
#defaults
from matplotlib import rcParams
#these colors come from colorbrewer2.org. Each is an RGB triplet
dark2_colors = [(0.10588235294117647, 0.6196078431372549, 0.4666666666666667),
                (0.8509803921568627, 0.37254901960784315, 0.00784313725490196),
                (0.4588235294117647, 0.4392156862745098, 0.7019607843137254),
                (0.9058823529411765, 0.1607843137254902, 0.5411764705882353),
                (0.4, 0.6509803921568628, 0.11764705882352941),
                (0.9019607843137255, 0.6705882352941176, 0.00784313725490196),
                (0.6509803921568628, 0.4627450980392157, 0.11372549019607843),
                (0.4, 0.4, 0.4)]
rcParams['figure.figsize'] = (10, 6)
rcParams['figure.dpi'] = 150
rcParams['axes.color_cycle'] = dark2_colors
rcParams['lines.linewidth'] = 2
rcParams['axes.grid'] = False
rcParams['axes.facecolor'] = 'white'
rcParams['font.size'] = 14
rcParams['patch.edgecolor'] = 'none'



In [2]:
#removing the border of the plot 
"""
    Minimize chartjunk by stripping out unnecessary plot borders and axis ticks
    
    The top/right/left/bottom keywords toggle whether the corresponding plot border is drawn
    """
def remove_border(axis = None, top = False, right = False, left = True, bottom = True):
    ax = axis or plt.gca()
    ax.spines['top'].set_visible(top)
    ax.spines['right'].set_visible(right)
    ax.spines['left'].set_visible(left)
    ax.spines['bottom'].set_visible(bottom)
    
    #turn off all ticks
    ax.yaxis.set_ticks_position('none')
    ax.xaxis.set_ticks_position('none')
    
    #now re-enable visibles accordingly
    if top:
        ax.xaxis.tick_top()
    if bottom:
        ax.xaxis.tick_bottom()
    if left:
        ax.yaxis.tick_left()
    if right:
        ax.yaxis.tick_right()

## Introduction

Rotten Tomatoes gathers movie reviews from critics. An entry on the website typically consists of a short quote, a link to the full review, and a Fresh/Rotten classification which summarizes whether the critic liked/disliked the movie.

When critics give quantitative ratings (say 3/4 stars, Thumbs up, etc.), determining the Fresh/Rotten classification is easy. However, publications like the New York Times don't assign numerical ratings to movies, and thus the Fresh/Rotten classification must be inferred from the text of the review itself.

This basic task of categorizing text has many applications. All of the following questions boil down to text classification:
* Is a movie review positive or negative?
* Is an email spam, or not?
* Is a comment on a blog discussion board appropriate, or not?
* Is a tweet about your company positive, or not?

NLP is a whole branch which is dedicated to this but I'll be using a relatively simpler and straightforward technique of NLP in this project...

### Getting the data from the api