# Sentiment Analysis Web App

_Deep Learning Nanodegree Program | Deployment_

---

In this notebook we will use Amazon's SageMaker service to construct a random tree model to predict the sentiment of a movie review. In addition, we will deploy this model to an endpoint and construct a very simple web app which will interact with our model's deployed endpoint.

## General Outline

Typically, when using a notebook instance with SageMaker, you will proceed through the following steps. Of course, not every step will need to be done with each project. Also, there is quite a lot of room for variation in many of the steps, as you will see throughout these lessons.

1. Download or otherwise retrieve the data.
2. Process / Prepare the data.
3. Upload the processed data to S3.
4. Train a chosen model.
5. Test the trained model (typically using a batch transform job).
6. Deploy the trained model.
7. Use the deployed model.

In this notebook we will progress through each of the steps above. We will also see that the final step, using the deployed model, can be quite challenging.

## Step 1: Downloading the data

The dataset we are going to use is very popular among researchers in Natural Language Processing, usually referred to as the [IMDb dataset](http://ai.stanford.edu/~amaas/data/sentiment/). It consists of movie reviews from the website [imdb.com](http://www.imdb.com/), each labeled as either '**pos**itive', if the reviewer enjoyed the film, or '**neg**ative' otherwise.

> Maas, Andrew L., et al. [Learning Word Vectors for Sentiment Analysis](http://ai.stanford.edu/~amaas/data/sentiment/). In _Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies_. Association for Computational Linguistics, 2011.

We begin by using some Jupyter Notebook magic to download and extract the dataset.

In [1]:
%mkdir ../data
!wget -O ../data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf ../data/aclImdb_v1.tar.gz -C ../data

mkdir: cannot create directory ‘../data’: File exists
--2020-05-08 19:13:05--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘../data/aclImdb_v1.tar.gz’


2020-05-08 19:13:09 (20.2 MB/s) - ‘../data/aclImdb_v1.tar.gz’ saved [84125825/84125825]



## Step 2: Preparing and Processing the data

The data we have downloaded is split into various files, each of which contains a single review. It will be much easier going forward if we combine these individual files into two large files, one for training and one for testing.

In [2]:
import os
import glob

def read_imdb_data(data_dir='../data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Here we represent a positive review by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [3]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


In [4]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    #Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    #Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [5]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {}, test = {}".format(len(train_X), len(test_X)))

IMDb reviews (combined): train = 25000, test = 25000


In [6]:
train_X[100]

'Loyalty to Peter Falk is all that kept me from giving this awful picture the (1) it deserved. (For that matter, loyalty to Mr. Falk was what kept me watching this film all the way from heads to tails.) Even if you forgive all the glaring errors, this was just plain the poorest excuse for a made-for-TV "Columbo" film ever. I\'m glad I watched it on TV for free; would have hated to have coughed up the bucks for a print.'

## Processing the data

Now that we have our training and testing datasets merged and ready to use, we need to start processing the raw data into something that will be useable by our machine learning algorithm. To begin with, we remove any html formatting and any non-alpha numeric characters that may appear in the reviews. We will do this in a very simplistic way using Python's regular expression module. We will discuss the reason for this rather simplistic pre-processing later on.

In [7]:
import re

REPLACE_NO_SPACE = re.compile("(\.)|(\;)|(\:)|(\!)|(\')|(\?)|(\,)|(\")|(\()|(\))|(\[)|(\])")
REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")

def review_to_words(review):
    words = REPLACE_NO_SPACE.sub("", review.lower())
    words = REPLACE_WITH_SPACE.sub(" ", words)
    return words

In [8]:
review_to_words(train_X[100])

'loyalty to peter falk is all that kept me from giving this awful picture the 1 it deserved for that matter loyalty to mr falk was what kept me watching this film all the way from heads to tails even if you forgive all the glaring errors this was just plain the poorest excuse for a made for tv columbo film ever im glad i watched it on tv for free would have hated to have coughed up the bucks for a print'

In [9]:
import pickle

cache_dir = os.path.join("../cache", "sentiment_web_app")  # where to store cache files
os.makedirs(cache_dir, exist_ok=True)  # ensure cache directory exists

def preprocess_data(data_train, data_test, labels_train, labels_test,
                    cache_dir=cache_dir, cache_file="preprocessed_data.pkl"):
    """Convert each review to words; read from cache if available."""

    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), "rb") as f:
                cache_data = pickle.load(f)
            print("Read preprocessed data from cache file:", cache_file)
        except:
            pass  # unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Preprocess training and test data to obtain words for each review
        #words_train = list(map(review_to_words, data_train))
        #words_test = list(map(review_to_words, data_test))
        words_train = [review_to_words(review) for review in data_train]
        words_test = [review_to_words(review) for review in data_test]
        
        # Write to cache file for future runs
        if cache_file is not None:
            cache_data = dict(words_train=words_train, words_test=words_test,
                              labels_train=labels_train, labels_test=labels_test)
            with open(os.path.join(cache_dir, cache_file), "wb") as f:
                pickle.dump(cache_data, f)
            print("Wrote preprocessed data to cache file:", cache_file)
    else:
        # Unpack data loaded from cache file
        words_train, words_test, labels_train, labels_test = (cache_data['words_train'],
                cache_data['words_test'], cache_data['labels_train'], cache_data['labels_test'])
    
    return words_train, words_test, labels_train, labels_test

In [10]:
# Preprocess data
train_X, test_X, train_y, test_y = preprocess_data(train_X, test_X, train_y, test_y)

Wrote preprocessed data to cache file: preprocessed_data.pkl


### Extract Bag-of-Words features

For the model we will be implementing, rather than using the reviews directly, we are going to transform each review into a Bag-of-Words feature representation. Keep in mind that 'in the wild' we will only have access to the training set so our transformer can only use the training set to construct a representation.

In [11]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import joblib
# joblib is an enhanced version of pickle that is more efficient for storing NumPy arrays

def extract_BoW_features(words_train, words_test, vocabulary_size=5000,
                         cache_dir=cache_dir, cache_file="bow_features.pkl"):
    """Extract Bag-of-Words for a given set of documents, already preprocessed into words."""
    
    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), "rb") as f:
                cache_data = joblib.load(f)
            print("Read features from cache file:", cache_file)
        except:
            pass  # unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Fit a vectorizer to training documents and use it to transform them
        # NOTE: Training documents have already been preprocessed and tokenized into words;
        #       pass in dummy functions to skip those steps, e.g. preprocessor=lambda x: x
        vectorizer = CountVectorizer(max_features=vocabulary_size)
        features_train = vectorizer.fit_transform(words_train).toarray()

        # Apply the same vectorizer to transform the test documents (ignore unknown words)
        features_test = vectorizer.transform(words_test).toarray()
        
        # NOTE: Remember to convert the features using .toarray() for a compact representation
        
        # Write to cache file for future runs (store vocabulary as well)
        if cache_file is not None:
            vocabulary = vectorizer.vocabulary_
            cache_data = dict(features_train=features_train, features_test=features_test,
                             vocabulary=vocabulary)
            with open(os.path.join(cache_dir, cache_file), "wb") as f:
                joblib.dump(cache_data, f)
            print("Wrote features to cache file:", cache_file)
    else:
        # Unpack data loaded from cache file
        features_train, features_test, vocabulary = (cache_data['features_train'],
                cache_data['features_test'], cache_data['vocabulary'])
    
    # Return both the extracted features as well as the vocabulary
    return features_train, features_test, vocabulary

In [12]:
# Extract Bag of Words features for both training and test datasets
train_X, test_X, vocabulary = extract_BoW_features(train_X, test_X)

Wrote features to cache file: bow_features.pkl


In [13]:
len(train_X[100])

5000

## Step 3: Upload data to S3

Now that we have created the feature representation of our training (and testing) data, it is time to start setting up and using the XGBoost classifier provided by SageMaker.

### Writing the datasets

The XGBoost classifier that we will be using requires the dataset to be written to a file and stored using Amazon S3. To do this, we will start by splitting the training dataset into two parts, the data we will train the model with and a validation set. Then, we will write those datasets to a file locally and then upload the files to S3. In addition, we will write the test set to a file and upload that file to S3. This is so that we can use SageMakers Batch Transform functionality to test our model once we've fit it.

In [14]:
import pandas as pd

# Earlier we shuffled the training dataset so to make things simple we can just assign
# the first 10 000 reviews to the validation set and use the remaining reviews for training.
val_X = pd.DataFrame(train_X[:10000])
train_X = pd.DataFrame(train_X[10000:])

val_y = pd.DataFrame(train_y[:10000])
train_y = pd.DataFrame(train_y[10000:])

The documentation for the XGBoost algorithm in SageMaker requires that the training and validation datasets should contain no headers or index and that the label should occur first for each sample.

For more information about this and other algorithms, the SageMaker developer documentation can be found on __[Amazon's website.](https://docs.aws.amazon.com/sagemaker/latest/dg/)__

In [15]:
# First we make sure that the local directory in which we'd like to store the training and validation csv files exists.
data_dir = '../data/sentiment_web_app'
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

In [16]:
pd.DataFrame(test_X).to_csv(os.path.join(data_dir, 'test.csv'), header=False, index=False)

pd.concat([val_y, val_X], axis=1).to_csv(os.path.join(data_dir, 'validation.csv'), header=False, index=False)
pd.concat([train_y, train_X], axis=1).to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)

In [17]:
# To save a bit of memory we can set text_X, train_X, val_X, train_y and val_y to None.

test_X = train_X = val_X = train_y = val_y = None

### Uploading Training / Validation files to S3

Amazon's S3 service allows us to store files that can be accessed by both the built-in training models such as the XGBoost model we will be using as well as custom models such as the one we will see a little later.

For this and most other tasks we will be doing using SageMaker, there are two methods we could use. The first is to use the low level functionality of SageMaker which requires knowing each of the objects involved in the SageMaker environment. The second is to use the high level functionality in which certain choices have been made on the user's behalf. The low level approach benefits from allowing the user a great deal of flexibility while the high level approach makes development much quicker. For our purposes we will opt to use the high level approach although using the low-level approach is certainly an option.

Recall the method `upload_data()` which is a member of the object representing our current SageMaker session. What this method does is upload the data to the default bucket (which is created if it does not exist) into the path described by the key_prefix variable. To see this for yourself, once you have uploaded the data files, go to the S3 console and look to see where the files have been uploaded.

For additional resources, see the __[SageMaker API documentation](http://sagemaker.readthedocs.io/en/latest/)__ and in addition the __[SageMaker Developer Guide.](https://docs.aws.amazon.com/sagemaker/latest/dg/)__

In [18]:
import sagemaker

session = sagemaker.Session() # Store the current SageMaker session

# S3 prefix (which folder will we use)
prefix = 'sentiment-web-app'

test_location = session.upload_data(os.path.join(data_dir, 'test.csv'), key_prefix=prefix)
val_location = session.upload_data(os.path.join(data_dir, 'validation.csv'), key_prefix=prefix)
train_location = session.upload_data(os.path.join(data_dir, 'train.csv'), key_prefix=prefix)

## Step 4: Creating the XGBoost model

Now that the data has been uploaded it is time to create the XGBoost model. To begin with, we need to do some setup. At this point it is worth discussing what a model is in SageMaker. It is easiest to think of a model of comprising three different objects in the SageMaker ecosystem, which interact with one another.

- Model Artifacts
- Training Code (Container)
- Inference Code (Container)

The Model Artifacts are what you might think of as the actual model itself. For example, if you were building a neural network, the model artifacts would be the weights of the various layers. In our case, for an XGBoost model, the artifacts are the actual trees that are created during training.

The other two objects, the training code and the inference code are then used to manipulate the training artifacts. More precisely, the training code uses the training data that is provided and creates the model artifacts, while the inference code uses the model artifacts to make predictions on new data.

The way that SageMaker runs the training and inference code is by making use of Docker containers. For now, think of a container as being a way of packaging code up so that dependencies aren't an issue.

In [19]:
from sagemaker import get_execution_role

# Our current execution role is required when creating the model as the training
# and inference code will need to access the model artifacts.
role = get_execution_role()

In [21]:
# We need to retrieve the location of the container which is provided by Amazon for using XGBoost.
# As a matter of convenience, the training and inference code both use the same container.
from sagemaker.amazon.amazon_estimator import get_image_uri

container = get_image_uri(session.boto_region_name, 'xgboost', repo_version = '0.90-2')

In [22]:
# First we create a SageMaker estimator object for our model.
xgb = sagemaker.estimator.Estimator(container, # The location of the container we wish to use
                                    role,                                    # What is our current IAM Role
                                    train_instance_count=1,                  # How many compute instances
                                    train_instance_type='ml.m4.xlarge',      # What kind of compute instances
                                    output_path='s3://{}/{}/output'.format(session.default_bucket(), prefix),
                                    sagemaker_session=session)

# And then set the algorithm specific parameters.
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        early_stopping_rounds=10,
                        num_round=500)

### Fit the XGBoost model

Now that our model has been set up we simply need to attach the training and validation datasets and then ask SageMaker to set up the computation.

In [23]:
s3_input_train = sagemaker.s3_input(s3_data=train_location, content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data=val_location, content_type='csv')

In [24]:
xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})

2020-05-08 19:29:55 Starting - Starting the training job...
2020-05-08 19:29:57 Starting - Launching requested ML instances............
2020-05-08 19:31:59 Starting - Preparing the instances for training...
2020-05-08 19:32:51 Downloading - Downloading input data...
2020-05-08 19:33:11 Training - Downloading the training image..
2020-05-08 19:33:39 Training - Training image download completed. Training in progress.[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is

[34m[112]#011train-error:0.094467#011validation-error:0.1503[0m
[34m[113]#011train-error:0.0938#011validation-error:0.1504[0m
[34m[114]#011train-error:0.0934#011validation-error:0.1517[0m
[34m[115]#011train-error:0.0928#011validation-error:0.1513[0m
[34m[116]#011train-error:0.0924#011validation-error:0.1509[0m
[34m[117]#011train-error:0.092333#011validation-error:0.1504[0m
[34m[118]#011train-error:0.0922#011validation-error:0.1498[0m
[34m[119]#011train-error:0.092133#011validation-error:0.1494[0m
[34m[120]#011train-error:0.092067#011validation-error:0.1494[0m
[34m[121]#011train-error:0.090867#011validation-error:0.1486[0m
[34m[122]#011train-error:0.090667#011validation-error:0.1489[0m
[34m[123]#011train-error:0.090067#011validation-error:0.1489[0m
[34m[124]#011train-error:0.089733#011validation-error:0.1482[0m
[34m[125]#011train-error:0.0896#011validation-error:0.1486[0m
[34m[126]#011train-error:0.089067#011validation-error:0.1481[0m
[34m[127]#011train-er

## Step 5: Testing the model

Now that we've fit our XGBoost model, it's time to see how well it performs. To do this we will use SageMakers Batch Transform functionality. Batch Transform is a convenient way to perform inference on a large dataset in a way that is not realtime. That is, we don't necessarily need to use our model's results immediately and instead we can perform inference on a large number of samples. An example of this in industry might be performing an end of month report. This method of inference can also be useful to us as it means that we can perform inference on our entire test set. 

To perform a Batch Transformation we need to first create a transformer objects from our trained estimator object.

In [25]:
xgb_transformer = xgb.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')

Next we actually perform the transform job. When doing so we need to make sure to specify the type of data we are sending so that it is serialized correctly in the background. In our case we are providing our model with csv data so we specify `text/csv`. Also, if the test data that we have provided is too large to process all at once then we need to specify how the data file should be split up. Since each line is a single entry in our data set we tell SageMaker that it can split the input on each line.

In [26]:
xgb_transformer.transform(test_location, content_type='text/csv', split_type='Line')

Currently the transform job is running but it is doing so in the background. Since we wish to wait until the transform job is done and we would like a bit of feedback we can run the `wait()` method.

In [27]:
xgb_transformer.wait()

....................[34m[2020-05-08 19:41:21 +0000] [16] [INFO] Starting gunicorn 19.10.0[0m
[34m[2020-05-08 19:41:21 +0000] [16] [INFO] Listening at: unix:/tmp/gunicorn.sock (16)[0m
[34m[2020-05-08 19:41:21 +0000] [16] [INFO] Using worker: gevent[0m
[34m[2020-05-08 19:41:21 +0000] [23] [INFO] Booting worker with pid: 23[0m
[34m[2020-05-08 19:41:21 +0000] [24] [INFO] Booting worker with pid: 24[0m
[34m[2020-05-08 19:41:21 +0000] [25] [INFO] Booting worker with pid: 25[0m
[34m[2020-05-08 19:41:21 +0000] [26] [INFO] Booting worker with pid: 26[0m
[34m[2020-05-08:19:41:43:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m169.254.255.130 - - [08/May/2020:19:41:43 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"[0m
[35m[2020-05-08:19:41:43:INFO] No GPUs detected (normal if no gpus installed)[0m
[35m169.254.255.130 - - [08/May/2020:19:41:43 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"[0m
[34m[2020-05-08:19:41:43:INFO] No GPUs detected 

[34m169.254.255.130 - - [08/May/2020:19:42:05 +0000] "POST /invocations HTTP/1.1" 200 12232 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [08/May/2020:19:42:05 +0000] "POST /invocations HTTP/1.1" 200 12247 "-" "Go-http-client/1.1"[0m
[34m[2020-05-08:19:42:05:INFO] Determined delimiter of CSV input is ','[0m
[35m169.254.255.130 - - [08/May/2020:19:42:05 +0000] "POST /invocations HTTP/1.1" 200 12232 "-" "Go-http-client/1.1"[0m
[35m169.254.255.130 - - [08/May/2020:19:42:05 +0000] "POST /invocations HTTP/1.1" 200 12247 "-" "Go-http-client/1.1"[0m
[35m[2020-05-08:19:42:05:INFO] Determined delimiter of CSV input is ','[0m
[34m[2020-05-08:19:42:05:INFO] Determined delimiter of CSV input is ','[0m
[35m[2020-05-08:19:42:05:INFO] Determined delimiter of CSV input is ','[0m
[34m169.254.255.130 - - [08/May/2020:19:42:07 +0000] "POST /invocations HTTP/1.1" 200 12224 "-" "Go-http-client/1.1"[0m
[35m169.254.255.130 - - [08/May/2020:19:42:07 +0000] "POST /invocations HTTP/1.1"

Now the transform job has executed and the result, the estimated sentiment of each review, has been saved on S3. Since we would rather work on this file locally we can perform a bit of notebook magic to copy the file to the `data_dir`.

In [28]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir

Completed 256.0 KiB/474.2 KiB (4.0 MiB/s) with 1 file(s) remainingCompleted 474.2 KiB/474.2 KiB (7.0 MiB/s) with 1 file(s) remainingdownload: s3://sagemaker-us-east-2-053596391548/sagemaker-xgboost-2020-05-08-19-38-07-572/test.csv.out to ../data/sentiment_web_app/test.csv.out


The last step is now to read in the output from our model, convert the output to something a little more usable, in this case we want the sentiment to be either `1` (positive) or `0` (negative), and then compare to the ground truth labels.

In [29]:
predictions = pd.read_csv(os.path.join(data_dir, 'test.csv.out'), header=None)
predictions = [round(num) for num in predictions.squeeze().values]

In [30]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.8604

## Step 6: Deploying the model

Once we construct and fit our model, SageMaker stores the resulting model artifacts and we can use those to deploy an endpoint (inference code). To see this, look in the SageMaker console and you should see that a model has been created along with a link to the S3 location where the model artifacts have been stored.

Deploying an endpoint is a lot like training the model with a few important differences. The first is that a deployed model doesn't change the model artifacts, so as you send it various testing instances the model won't change. Another difference is that since we aren't performing a fixed computation, as we were in the training step or while performing a batch transform, the compute instance that gets started stays running until we tell it to stop. This is important to note as if we forget and leave it running we will be charged the entire time.

In other words **If you are no longer using a deployed endpoint, shut it down!**

In [31]:
xgb_predictor = xgb.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')



---------------!

### Testing the model (again)

Now that we have deployed our endpoint, we can send the testing data to it and get back the inference results. We already did this earlier using the batch transform functionality of SageMaker, however, we will test our model again using the newly deployed endpoint so that we can make sure that it works properly and to get a bit of a feel for how the endpoint works.

When using the created endpoint it is important to know that we are limited in the amount of information we can send in each call so we need to break the testing data up into chunks and then send each chunk. Also, we need to serialize our data before we send it to the endpoint to ensure that our data is transmitted properly. Fortunately, SageMaker can do the serialization part for us provided we tell it the format of our data.

In [32]:
from sagemaker.predictor import csv_serializer

# We need to tell the endpoint what format the data we are sending is in so that SageMaker can perform the serialization.
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer

In [33]:
# We split the data into chunks and send each chunk seperately, accumulating the results.

def predict(data, rows=512):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])
    
    return np.fromstring(predictions[1:], sep=',')

In [34]:
test_X = pd.read_csv(os.path.join(data_dir, 'test.csv'), header=None).values

predictions = predict(test_X)
predictions = [round(num) for num in predictions]

Lastly, we check to see what the accuracy of our model is.

In [35]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.8604

And the results here should agree with the model testing that we did earlier using the batch transform job.

### Cleaning up

Now that we've determined that deploying our model works as expected, we are going to shut it down. Remember that the longer the endpoint is left running, the greater the cost and since we have a bit more work to do before we are able to use our endpoint with our simple web app, we should shut everything down.

In [None]:
xgb_predictor.delete_endpoint()

## Step 7: Putting our model to work

As we've mentioned a few times now, our goal is to have our model deployed and then access it using a very simple web app. The intent is for this web app to take some user submitted data (a review), send it off to our endpoint (the model) and then display the result.

However, there is a small catch. Currently the only way we can access the endpoint to send it data is using the SageMaker API. We can, if we wish, expose the actual URL that our model's endpoint is receiving data from, however, if we just send it data ourselves we will not get anything in return. This is because the endpoint created by SageMaker requires the entity accessing it have the correct permissions. So, we would need to somehow authenticate our web app with AWS.

Having a website that authenticates to AWS seems a bit beyond the scope of this lesson so we will opt for an alternative approach. Namely, we will create a new endpoint which does not require authentication and which acts as a proxy for the SageMaker endpoint.

As an additional constraint, we will try to avoid doing any data processing in the web app itself. Remember that when we constructed and tested our model we started with a movie review, then we simplified it by removing any html formatting and punctuation, then we constructed a bag of words embedding and the resulting vector is what we sent to our model. All of this needs to be done to our user input as well.

Fortunately we can do all of this data processing in the backend, using Amazon's Lambda service.

<img src="Web App Diagram.svg">

The diagram above gives an overview of how the various services will work together. On the far right is the model which we trained above and which will be deployed using SageMaker. On the far left is our web app that collects a user's movie review, sends it off and expects a positive or negative sentiment in return.

In the middle is where some of the magic happens. We will construct a Lambda function, which you can think of as a straightforward Python function that can be executed whenever a specified event occurs. This Python function will do the data processing we need to perform on a user submitted review. In addition, we will give this function permission to send and recieve data from a SageMaker endpoint.

Lastly, the method we will use to execute the Lambda function is a new endpoint that we will create using API Gateway. This endpoint will be a url that listens for data to be sent to it. Once it gets some data it will pass that data on to the Lambda function and then return whatever the Lambda function returns. Essentially it will act as an interface that lets our web app communicate with the Lambda function.

### Processing a single review

For now, suppose we are given a movie review by our user in the form of a string, like so:

In [36]:
test_review = "Nothing but a disgusting materialistic pageant of glistening abed remote control greed zombies, totally devoid of any heart or heat. A romantic comedy that has zero romantic chemestry and zero laughs!"

How do we go from this string to the bag of words feature vector that is expected by our model?

If we recall at the beginning of this notebook, the first step is to remove any unnecessary characters using the `review_to_words` method. Remember that we intentionally did this in a very simplistic way. This is because we are going to have to copy this method to our (eventual) Lambda function (we will go into more detail later) and this means it needs to be rather simplistic.

In [37]:
test_words = review_to_words(test_review)
print(test_words)

nothing but a disgusting materialistic pageant of glistening abed remote control greed zombies totally devoid of any heart or heat a romantic comedy that has zero romantic chemestry and zero laughs


In [38]:
vocabulary['nothing']

3047

Next, we need to construct a bag of words embedding of the `test_words` string. To do this, remember that a bag of words embedding uses a `vocabulary` consisting of the most frequently appearing words in a set of documents. Then, for each word in the vocabulary we record the number of times that word appears in `test_words`. We constructed the `vocabulary` earlier using the training set for our problem so encoding `test_words` is relatively straightforward.

In [39]:
def bow_encoding(words, vocabulary):
    bow = [0] * len(vocabulary) # Start by setting the count for each word in the vocabulary to zero.
    for word in words.split():  # For each word in the string
        if word in vocabulary:  # If the word is one that occurs in the vocabulary, increase its count.
            bow[vocabulary[word]] += 1
    return bow

In [40]:
test_bow = bow_encoding(test_words, vocabulary)
print(test_bow)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [41]:
len(test_bow)

5000

So now we know how to construct a bag of words encoding of a user provided review, how to we send it to our endpoint? First, we need to start the endpoint back up.

In [None]:
xgb_predictor = xgb.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

At this point we could just do the same thing that we did earlier when we tested our deployed model and send `test_bow` to our endpoint using the `xgb_predictor` object. However, when we eventually construct our Lambda function we won't have access to this object, so how do we call a SageMaker endpoint?

It turns out that Python functions that are used in Lambda have access to another Amazon library called `boto3`. This library provides an API for working with Amazon services, including SageMaker. To start with, we need to get a handle to the SageMaker runtime.

In [42]:
import boto3

runtime = boto3.Session().client('sagemaker-runtime')

And now that we have access to the SageMaker runtime, we can ask it to make use of (invoke) an endpoint that has already been created. However, we need to provide SageMaker with the name of the deployed endpoint. To find this out we can print it out using the `xgb_predictor` object.

In [43]:
xgb_predictor.endpoint

'sagemaker-xgboost-2020-05-08-19-29-55-770'

Using the SageMaker runtime and the name of our endpoint, we can invoke the endpoint and send it the `test_bow` data.

In [44]:
response = runtime.invoke_endpoint(EndpointName = xgb_predictor.endpoint, # The name of the endpoint we created
                                       ContentType = 'text/csv',                     # The data format that is expected
                                       Body = test_bow)

ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0], type: <class 'list'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

So why did we get an error?

Because we tried to send the endpoint a list of integers but it expected us to send data of type `text/csv`. So, we need to convert it.

In [45]:
response = runtime.invoke_endpoint(EndpointName = xgb_predictor.endpoint, # The name of the endpoint we created
                                       ContentType = 'text/csv',                     # The data format that is expected
                                       Body = ','.join([str(val) for val in test_bow]).encode('utf-8'))

In [46]:
print(response)

{'ResponseMetadata': {'RequestId': 'f09da68b-5ab8-4d56-90a8-9534058c438e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f09da68b-5ab8-4d56-90a8-9534058c438e', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Fri, 8 May 2020 19:59:11 GMT', 'content-type': 'text/csv; charset=utf-8', 'content-length': '18'}, 'RetryAttempts': 0}, 'ContentType': 'text/csv; charset=utf-8', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f35f091a048>}


As we can see, the response from our model is a somewhat complicated looking dict that contains a bunch of information. The bit that we are most interested in is `'Body'` object which is a streaming object that we need to `read` in order to make use of.

In [47]:
response = response['Body'].read().decode('utf-8')
print(response)

0.4011608958244324


Now that we know how to process the incoming user data we can start setting up the infrastructure to make our simple web app work. To do this we will make use of two different services. Amazon's Lambda and API Gateway services.

Lambda is a service which allows someone to write some relatively simple code and have it executed whenever a chosen trigger occurs. For example, you may want to update a database whenever new data is uploaded to a folder stored on S3.

API Gateway is a service that allows you to create HTTP endpoints (url addresses) which are connected to other AWS services. One of the benefits to this is that you get to decide what credentials, if any, are required to access these endpoints.

In our case we are going to set up an HTTP endpoint through API Gateway which is open to the public. Then, whenever anyone sends data to our public endpoint we will trigger a Lambda function which will send the input (in our case a review) to our model's endpoint and then return the result.

### Setting up a Lambda function

The first thing we are going to do is set up a Lambda function. This Lambda function will be executed whenever our public API has data sent to it. When it is executed it will receive the data, perform any sort of processing that is required, send the data (the review) to the SageMaker endpoint we've created and then return the result.

#### Part A: Create an IAM Role for the Lambda function

Since we want the Lambda function to call a SageMaker endpoint, we need to make sure that it has permission to do so. To do this, we will construct a role that we can later give the Lambda function.

Using the AWS Console, navigate to the **IAM** page and click on **Roles**. Then, click on **Create role**. Make sure that the **AWS service** is the type of trusted entity selected and choose **Lambda** as the service that will use this role, then click **Next: Permissions**.

In the search box type `sagemaker` and select the check box next to the **AmazonSageMakerFullAccess** policy. Then, click on **Next: Review**.

Lastly, give this role a name. Make sure you use a name that you will remember later on, for example `LambdaSageMakerRole`. Then, click on **Create role**.

#### Part B: Create a Lambda function

Now it is time to actually create the Lambda function. Remember from earlier that in order to process the user provided input and send it to our endpoint we need to gather two pieces of information:

 - The name of the endpoint, and
 - the vocabulary object.

We will copy these pieces of information to our Lambda function after we create it.

To start, using the AWS Console, navigate to the AWS Lambda page and click on **Create a function**. When you get to the next page, make sure that **Author from scratch** is selected. Now, name your Lambda function, using a name that you will remember later on, for example `sentiment_analysis_xgboost_func`. Make sure that the **Python 3.6** runtime is selected and then choose the role that you created in the previous part. Then, click on **Create Function**.

On the next page you will see some information about the Lambda function you've just created. If you scroll down you should see an editor in which you can write the code that will be executed when your Lambda function is triggered. Collecting the code we wrote above to process a single review and adding it to the provided example `lambda_handler` we arrive at the following.

```python
# We need to use the low-level library to interact with SageMaker since the SageMaker API
# is not available natively through Lambda.
import boto3

# And we need the regular expression library to do some of the data processing
import re

REPLACE_NO_SPACE = re.compile("(\.)|(\;)|(\:)|(\!)|(\')|(\?)|(\,)|(\")|(\()|(\))|(\[)|(\])")
REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")

def review_to_words(review):
    words = REPLACE_NO_SPACE.sub("", review.lower())
    words = REPLACE_WITH_SPACE.sub(" ", words)
    return words
    
def bow_encoding(words, vocabulary):
    bow = [0] * len(vocabulary) # Start by setting the count for each word in the vocabulary to zero.
    for word in words.split():  # For each word in the string
        if word in vocabulary:  # If the word is one that occurs in the vocabulary, increase its count.
            bow[vocabulary[word]] += 1
    return bow


def lambda_handler(event, context):
    
    vocab = {'havent': 2053, 'been': 438, 'able': 66, 'to': 4507, 'decide': 1148, 'if': 2222, 'this': 4457, 'movie': 2926, 'is': 2360, 'so': 4046, 'bad': 382, 'its': 2370, 'good': 1937, 'or': 3124, 'quote': 3496, 'gone': 1935, 'past': 3207, 'and': 226, 'back': 378, 'again': 146, 'no': 3020, 'matter': 2769, 'it': 2366, 'forced': 1787, 'me': 2778, 'look': 2646, 'much': 2934, 'the': 4426, 'same': 3778, 'way': 4820, 'pile': 3275, 'of': 3083, 'weird': 4841, 'might': 2835, 'offers': 3090, 'up': 4683, 'number': 3060, 'scenes': 3811, 'that': 4424, 'you': 4986, 'wont': 4932, 'forget': 1793, 'even': 1533, 'want': 4788, 'theres': 4440, 'young': 4989, 'ray': 3533, 'telling': 4396, 'her': 2085, 'creative': 1041, 'writing': 4971, 'looks': 2649, 'like': 2599, 'bit': 488, 'later': 2525, 'not': 3040, 'with': 4911, 'but': 636, 'garden': 1872, 'mom': 2885, 'she': 3923, 'wants': 4791, 'go': 1924, 'bed': 436, 'father': 1666, 'walter': 4784, 'actress': 105, 'in': 2257, 'scene': 3809, 'best': 469, 'line': 2610, 'ever': 1538, 'written': 4972, 'by': 642, 'anyone': 259, 'else': 1448, 'as': 311, 'waves': 4819, 'face': 1616, 'more': 2904, 'your': 4991, 'liking': 2604, 'date': 1118, 'turned': 4606, 'on': 3105, 'each': 1395, 'other': 3135, 'they': 4442, 'start': 4156, 'others': 3136, 'clothes': 860, 'off': 3084, 'dress': 1357, 'perhaps': 3235, 'all': 187, 'instead': 2307, 'there': 4438, 'are': 285, 'talking': 4360, 'heads': 2059, 'those': 4460, 'who': 4871, 'have': 2052, 'most': 2909, 'afterwards': 145, 'was': 4805, 'afraid': 140, 'open': 3112, 'my': 2951, 'finally': 1724, 'at': 331, 'awards': 367, 'hollywood': 2137, 'for': 1784, 'out': 3141, 'second': 3846, 'guess': 1989, 'im': 2232, 'only': 3110, 'one': 3107, 'has': 2043, 'had': 2003, 'top': 4525, 'see': 3855, 'already': 197, 'spent': 4108, 'time': 4495, 'lonely': 2643, 'lady': 2507, 'than': 4420, 'far': 1653, 'better': 472, 'pictures': 3270, 'ill': 2228, 'quit': 3494, 'be': 418, 'though': 4461, 'once': 3106, 'watching': 4815, 'probably': 3407, 'take': 4349, 'eyes': 1613, 'screen': 3829, 'until': 4680, 'two': 4620, 'hours': 2177, 'life': 2592, 'forever': 1792, 'read': 3543, 'book': 535, 'couple': 1016, 'times': 4497, 'doesnt': 1309, 'follow': 1772, 'exactly': 1551, 'should': 3951, 'could': 1007, 'let': 2579, 'after': 143, 'however': 2183, 'serious': 3891, 'issues': 2365, 'setting': 3900, 'nobody': 3022, 'seemed': 3862, 'mention': 2810, 'based': 404, 'actual': 108, 'events': 1536, 'happened': 2027, 'live': 2625, 'grew': 1972, 'town': 4541, 'supposed': 4306, 'first': 1736, 'small': 4033, 'talked': 4359, 'about': 67, 'third': 4455, 'city': 825, 'state': 4160, 'population': 3335, 'around': 297, 'grand': 1953, 'island': 2361, 'between': 475, 'lincoln': 2608, 'scenery': 3810, 'wrong': 4973, 'river': 3707, 'valley': 4707, 'which': 4867, 'very': 4723, 'flat': 1748, 'few': 1700, 'trees': 4577, 'tried': 4584, 'made': 2697, 'mad': 2696, 'being': 448, 'treated': 4573, 'real': 3547, 'event': 1535, 'large': 2517, 'were': 4848, 'talk': 4358, 'riding': 3695, 'from': 1840, 'park': 3190, 'dont': 1324, 'mind': 2849, '15': 7, 'mile': 2842, 'ride': 3692, 'know': 2489, 'what': 4858, 'really': 3556, 'here': 2086, 'apparently': 270, 'viewer': 4738, 'knows': 2493, 'nothing': 3047, 'history': 2121, 'europe': 1529, 'including': 2264, 'germany': 1898, 'whole': 4873, 'central': 735, 'eastern': 1408, 'well': 4843, 'hitler': 2124, 'era': 1513, 'lot': 2661, 'forgotten': 1797, 'over': 3148, 'revenge': 3675, 'do': 1302, 'think': 4452, 'why': 4878, 'any': 256, 'american': 213, 'british': 589, 'french': 1830, 'soviet': 4089, 'wwii': 4975, 'etc': 1527, 'war': 4792, 'crimes': 1056, 'berlin': 467, 'germans': 1897, 'too': 4522, 'justice': 2438, 'main': 2707, 'point': 3316, 'must': 2950, 'also': 199, 'title': 4504, 'america': 212, 'film': 1716, 'an': 223, 'awful': 372, 'full': 1846, 'lie': 2590, 'propaganda': 3435, 'surprisingly': 4316, 'wasnt': 4807, 'enough': 1491, 'process': 3412, 'itself': 2371, 'nightmare': 3014, 'total': 4531, 'darkness': 1117, '60': 53, 'years': 4979, 'hate': 2046, 'lack': 2502, 'self': 3869, 'criticism': 1063, 'cause': 723, 'vietnam': 4735, 'iraq': 2351, 'forth': 1804, 'criminals': 1058, 'clever': 841, 'would': 4957, 'become': 433, 'fox': 1814, 'seen': 3865, 'fantastic': 1651, 'actor': 103, 'starred': 4153, 'walk': 4775, 'water': 4816, 'minor': 2857, 'role': 3726, 'stars': 4155, 'together': 4511, 'tiny': 4500, 'apartment': 265, 'them': 4432, 'joe': 2407, 'sort': 4077, 'friends': 1837, 'sex': 3907, 'thing': 4450, 'going': 1931, 'mostly': 2910, 'gay': 1876, 'just': 2437, 'end': 1468, 'can': 665, 'their': 4431, 'lives': 2628, 'peace': 3219, 'thats': 4425, 'how': 2181, 'many': 2732, 'put': 3480, 'bring': 585, 'sides': 3967, 'table': 4346, 'keep': 2449, 'whatever': 4859, 'reason': 3557, 'side': 3965, 'both': 547, 'us': 4690, 'will': 4886, 'consider': 949, 'job': 2405, 'come': 886, 'ones': 3108, 'interest': 2323, 'children': 792, 'suffer': 4275, 'always': 203, 'watch': 4811, 'funny': 1852, 'engaging': 1481, 'away': 369, 'something': 4066, 'tragic': 4552, 'happen': 2026, 'course': 1019, 'does': 1308, 'feelings': 1686, 'running': 3756, 'strong': 4229, 'tragedy': 4551, 'happens': 2029, 'makes': 2716, 'adult': 125, 'version': 4720, 'beautiful': 428, 'completely': 922, 'need': 2987, 'madonna': 2699, 'act': 97, 'rent': 3626, 'body': 527, 'evidence': 1546, 'least': 2554, 'while': 4868, 'sets': 3899, 'may': 2775, 'dialog': 1232, 'miss': 2866, 'anything': 260, 'bruce': 602, 'wasted': 4809, 'becomes': 434, 'amusing': 221, 'original': 3129, 'into': 2332, 'mess': 2818, 'never': 3001, 'thought': 4462, 'certainly': 738, 'enjoy': 1484, 'actors': 104, 'actually': 109, '10': 0, 'watched': 4813, 'show': 3953, 'simply': 3986, 'didnt': 1242, 'find': 1726, 'episode': 1507, 'lately': 2524, 'realize': 3552, 'abc': 63, 'playing': 3300, 'stupid': 4247, 'shows': 3960, 'nowadays': 3055, 'down': 1335, 'station': 4164, 'characters': 760, 'pretty': 3384, 'jokes': 2415, 'script': 3834, 'horrible': 2163, 'still': 4188, 'say': 3799, 'believe': 454, 'seeing': 3857, 'doing': 1312, 'quality': 3484, 'because': 432, 'average': 362, 'compared': 912, 'buffs': 612, 'love': 2669, 'loud': 2664, 'wicked': 4879, 'witch': 4909, 'cartoon': 704, 'sequel': 3885, 'wizard': 4919, 'oz': 3160, 'imagination': 2236, 'fantasy': 1652, 'excitement': 1562, 'action': 100, 'footage': 1782, 'music': 2946, 'video': 4733, 'little': 2624, 'dorothy': 1330, 'ruin': 3750, 'meets': 2795, 'old': 3100, 'new': 3003, 'animation': 242, 'stuck': 4237, 'somewhere': 4069, 'weakest': 4825, 'disney': 1289, 'less': 2575, 'inspired': 2301, 'songs': 4072, 'particularly': 3195, 'sweet': 4336, 'land': 2511, 'superbly': 4297, 'performed': 3230, 'count': 1010, 'energy': 1478, 'whenever': 4863, 'sure': 4308, 'baby': 376, 'get': 1899, 'charge': 761, 'since': 3990, 'long': 2644, 'curiosity': 1082, 'fair': 1630, 'henry': 2084, 'weekend': 4838, 'rotten': 3740, 'piece': 3272, 'low': 2676, 'budget': 610, 'horror': 2168, 'some': 4061, 'teenage': 4389, 'girls': 1911, 'spending': 4106, 'evil': 1548, 'assistant': 326, 'bizarre': 492, 'scheme': 3812, 'perform': 3227, 'hideous': 2098, 'brain': 562, 'victims': 4730, 'garbage': 1870, 'features': 1680, 'lots': 2662, 'nudity': 3059, 'cheesy': 781, 'laughable': 2530, 'musical': 2947, 'acting': 99, 'horrendous': 2162, 'utterly': 4702, 'such': 4268, 'crap': 1031, 'widely': 4881, 'beyond': 476, 'blatant': 502, 'exploitation': 1592, 'chinese': 796, 'worker': 4942, 'generally': 1881, 'female': 1693, 'business': 633, 'owner': 3159, 'his': 2118, 'workers': 4943, 'given': 1913, 'voice': 4761, 'drunken': 1376, 'americans': 214, 'wear': 4830, 'hope': 2155, 'comes': 891, 'when': 4862, 'people': 3221, 'making': 2718, 'getting': 1901, 'paid': 3169, 'per': 3223, 'hour': 2176, 'feeling': 1685, 'chance': 747, 'escape': 1517, 'working': 4944, 'factory': 1623, 'fed': 1682, 'punishment': 3467, 'treatment': 4574, 'popular': 3333, 'work': 4940, '20': 29, 'day': 1129, 'hard': 2033, 'wondered': 4927, 'where': 4864, 'came': 656, 'realizing': 3555, 'make': 2713, 'truly': 4597, 'appreciated': 280, 'beautifully': 429, 'portrays': 3344, 'impact': 2244, 'we': 4823, 'relatively': 3600, 'our': 3139, 'society': 4050, 'peoples': 3222, 'world': 4946, 'goes': 1930, 'amazing': 209, 'clearly': 840, 'style': 4249, 'filming': 1718, 'considered': 951, 'inducing': 2279, 'interesting': 2325, 'plot': 3309, 'couldnt': 1008, 'missed': 2867, 'stop': 4201, 'motion': 2913, 'stopped': 4202, 'mid': 2832, 'through': 4477, 'try': 4600, 'suppose': 4305, 'home': 2142, 'alone': 194, 'own': 3158, 'dark': 1115, 'evening': 1534, 'ticket': 4487, 'ps': 3453, 'spoiler': 4123, 'review': 3676, 'hello': 2078, 'comment': 898, 'wonderful': 4928, 'exciting': 1563, 'believable': 453, 'tale': 4353, 'romance': 3732, 'intrigue': 2333, 'memorable': 2802, 'colorful': 878, 'another': 250, 'liked': 2600, 'high': 2100, 'thanks': 4423, 'listening': 2620, 'classic': 835, 'films': 1721, 'late': 2523, '60s': 54, 'ability': 65, 'audience': 352, 'cold': 872, 'blood': 512, 'hasnt': 2044, 'lost': 2660, 'power': 3359, 'exceptionally': 1559, 'yet': 4984, 'forces': 1788, 'source': 4086, 'non': 3027, 'fiction': 1702, 'novel': 3052, 'message': 2819, 'true': 4596, 'definitely': 1162, 'case': 707, 'refreshing': 3586, 'especially': 1520, 'considering': 952, 'todays': 4509, 'simplistic': 3985, 'manipulative': 2727, 'moral': 2902, 'dramas': 1347, 'convinced': 984, 'political': 3324, 'force': 1786, 'honest': 2146, 'agree': 155, 'admire': 121, 'nonetheless': 3029, 'disagree': 1269, 'anti': 254, 'capital': 673, 'plenty': 3307, 'leads': 2548, 'terrific': 4409, 'scott': 3824, 'wilson': 4891, 'underrated': 4642, 'chilling': 794, 'leader': 2546, 'uses': 4697, 'charisma': 762, 'hide': 2097, 'robert': 3713, 'blake': 498, 'character': 758, 'obviously': 3072, 'against': 147, 'amount': 219, 'part': 3193, 'cinematography': 820, 'gritty': 1977, 'giving': 1915, 'impression': 2252, 'documentary': 1307, 'add': 115, 'score': 3822, 'jones': 2418, 'masterpiece': 2759, 'jazz': 2387, 'treasure': 4571, 'short': 3946, 'showing': 3958, 'men': 2805, 'today': 4508, 'give': 1912, 'treat': 4572, 'person': 3238, 'marie': 2737, 'sings': 3997, 'street': 4216, 'heard': 2062, 'sing': 3992, 'before': 440, 'sounds': 4084, 'remarkably': 3614, 'folks': 1771, 'black': 493, 'cinema': 816, 'white': 4870, 'stereotypes': 4179, 'great': 1964, 'production': 3419, 'values': 4710, 'things': 4451, 'right': 3696, 'bat': 410, 'having': 2054, 'lead': 2545, 'admittedly': 123, 'soundtrack': 4085, 'independent': 2272, 'creepy': 1052, 'factor': 1622, 'gorgeous': 1942, 'actresses': 106, 'special': 4097, 'effects': 1431, 'werent': 4849, 'either': 1438, 'ways': 4822, 'died': 1244, 'now': 3054, 'throughout': 4478, 'thus': 4486, 'development': 1225, 'fear': 1676, 'bomb': 531, 'random': 3514, 'places': 3284, 'yes': 4982, 'understand': 4643, 'swear': 4334, 'seems': 3864, 'boy': 557, 'scripted': 3835, 'himself': 2111, 'cool': 989, 'language': 2516, 'storyline': 4208, 'although': 201, 'parts': 3199, 'left': 2561, 'conversation': 980, 'leaves': 2556, 'often': 3095, 'wondering': 4930, 'whats': 4860, 'happening': 2028, 'porno': 3337, 'violence': 4747, 'trying': 4601, 'pull': 3461, 'youre': 4992, 'looking': 2648, 'flesh': 1753, 'banned': 395, 'every': 1539, 'foreign': 1790, 'country': 1014, 'japanese': 2382, 'star': 4150, 'desperate': 1208, 'heres': 2087, 'three': 4469, 'thugs': 4484, 'torture': 4529, 'hell': 2077, 'woman': 4921, 'use': 4692, 'kinds': 2478, 'eventually': 1537, 'kill': 2468, 'burn': 625, 'kick': 2460, 'spin': 4112, 'chair': 742, 'sound': 4081, 'listen': 2619, 'nuts': 3064, 'throw': 4479, 'guts': 1999, 'animal': 239, 'shes': 3930, 'finale': 1723, 'greatest': 1966, 'these': 4441, 'without': 4913, 'knowing': 2490, 'ask': 316, 'charlie': 765, 'freak': 1824, 'sick': 3963, 'said': 3774, 'pure': 3471, 'underground': 4640, 'check': 777, 'fan': 1648, 'gore': 1941, 'highly': 2105, 'recommend': 3571, 'series': 3890, 'shock': 3938, 'creativity': 1042, 'aspect': 321, 'gets': 1900, 'stuff': 4243, 'final': 1722, 'rating': 3529, 'isnt': 2362, 'saying': 3800, 'sell': 3871, 'story': 4207, 'china': 795, 'early': 1400, 'sequences': 3888, 'map': 2733, 'focusing': 1769, 'current': 1084, 'claim': 828, 'areas': 287, 'granted': 1957, 'distinct': 1296, 'nation': 2969, 'did': 1241, 'studios': 4241, 'fail': 1625, 'grade': 1951, 'relationship': 3597, 'wendy': 4846, 'superficial': 4298, 'somehow': 4063, 'feels': 1687, 'connected': 945, 'him': 2110, 'training': 4558, 'lets': 2580, 'cut': 1089, 'chase': 769, 'everything': 1544, 'laugh': 2529, 'cry': 1073, 'played': 3297, 'unrealistic': 4677, 'ending': 1471, 'descent': 1193, 'worth': 4953, 'dog': 1310, 'brenda': 575, 'song': 4071, 'studio': 4240, 'tell': 4395, 'compare': 911, 'potential': 3355, 'performances': 3229, 'everyone': 1542, 'suspense': 4330, 'worthwhile': 4955, 'night': 3013, 'remembered': 3617, 'olivier': 3103, 'supporting': 4304, 'roles': 3727, 'vincent': 4746, 'behind': 447, 'next': 3007, 'days': 1130, 'different': 1248, 'done': 1322, 'difficult': 1249, 'elizabeth': 1445, 'brilliant': 583, 'fire': 1734, 'ships': 3935, 'sent': 3880, 'worked': 4941, 'empire': 1464, 'accurate': 90, 'historically': 2120, 'bette': 471, 'davis': 1126, 'scripts': 3836, 'halfway': 2008, 'season': 3843, 'got': 1944, 'caught': 722, 'school': 3813, 'activities': 102, 'write': 4967, 'finish': 1732, 'released': 3603, 'dvd': 1391, 'maybe': 2776, 'then': 4436, 'theyll': 4444, 'disappointed': 1273, 'survive': 4321, 'loved': 2670, 'looked': 2647, 'forward': 1808, 'imagine': 2238, 'disappointment': 1275, 'discover': 1279, 'disappeared': 1271, 'needless': 2989, 'happy': 2032, 'filmed': 1717, 'mean': 2779, 'project': 3427, 'half': 2007, 'car': 680, 'useless': 4695, 'damn': 1100, 'fans': 1650, 'suck': 4269, 'gratuitous': 1961, 'references': 3583, 'intended': 2315, 'elvira': 1450, 'shoot': 3942, 'impress': 2250, 'foster': 1809, 'suffice': 4279, 'rocks': 3722, '50s': 52, 'fun': 1848, 'reference': 3582, 'shines': 3932, 'absolute': 71, 'charm': 767, 'saw': 3798, 'tender': 4401, 'age': 148, 'buying': 641, 'copy': 994, 'worse': 4951, 'mail': 2706, 'feel': 1684, 'free': 1827, 'performance': 3228, 'edie': 1419, 'attention': 345, 'deserves': 1201, 'terrible': 4407, 'remake': 3612, '70s': 56, 'minute': 2858, 'major': 2711, 'government': 1948, 'involvement': 2347, 'fetched': 1698, 'flow': 1762, 'beer': 439, 'hit': 2122, 'bathroom': 412, 'took': 4523, 'change': 749, 'oil': 3097, 'pan': 3180, 'takes': 4351, 'longer': 2645, 'guys': 2001, 'notice': 3048, 'fool': 1779, 'chose': 804, 'trash': 4565, '1968': 16, 'abuse': 74, 'shot': 3949, 'dull': 1384, 'bullet': 620, 'fact': 1621, 'arent': 288, 'painfully': 3172, 'obvious': 3071, 'passing': 3204, 'lacked': 2503, 'emotion': 1458, 'step': 4175, 'above': 68, 'last': 2521, 'surely': 4309, 'radio': 3504, 'statement': 4162, 'joke': 2414, 'waste': 4808, 'man': 2721, 'cast': 710, 'includes': 2263, 'william': 4887, 'hurt': 2202, 'peter': 3247, 'michael': 2829, 'arthur': 305, 'christopher': 811, 'excellent': 1555, 'thrills': 4475, 'surprising': 4315, 'quite': 3495, 'keeps': 2451, 'thrilling': 4474, 'rest': 3657, 'thrillers': 4473, 'strongly': 4231, 'steve': 4181, 'starring': 4154, '40': 48, 'year': 4978, 'virgin': 4749, 'daily': 1097, 'gotten': 1947, 'taste': 4369, 'comedy': 890, 'influenced': 2287, 'comedic': 888, 'air': 164, 'level': 2584, 'he': 2056, 'innocent': 2294, 'lovable': 2668, 'hilarious': 2106, 'laughs': 2534, 'twice': 4613, 'perfect': 3224, 'entire': 1502, 'lines': 2612, 'remember': 3616, 'fit': 1740, 'tone': 4517, 'director': 1264, 'knew': 2486, 'filled': 1715, 'wedding': 4836, 'along': 195, 'absolutely': 72, 'someone': 4064, 'mentioned': 2811, 'begins': 444, 'blond': 510, 'girl': 1909, 'killing': 2472, 'cat': 713, 'photographer': 3259, 'writer': 4968, 'meet': 2793, 'trip': 4588, 'mountains': 2918, 'spend': 4105, 'slightly': 4029, 'deaf': 1134, 'seek': 3858, 'overly': 3155, 'strangers': 4214, 'leave': 2555, 'bold': 528, 'attempt': 340, 'logical': 2639, 'backdrop': 379, 'intriguing': 2335, 'location': 2635, 'mysterious': 2955, 'easily': 1406, 'obscure': 3068, 'supernatural': 4302, 'command': 897, 'photography': 3260, 'nice': 3008, 'wish': 4904, 'sense': 3877, 'view': 4736, 'viewed': 4737, 'unless': 4669, 'yourself': 4993, 'severe': 3906, 'damage': 1099, 'hero': 2088, 'kevin': 2457, 'amy': 222, 'conservative': 948, 'girlfriend': 1910, 'red': 3576, 'shorts': 3948, 'enjoys': 1489, 'phone': 3255, 'boyfriend': 558, 'nick': 3012, 'army': 295, 'rip': 3700, 'youll': 4988, 'anywhere': 263, 'sorry': 4076, 'someones': 4065, 'dreams': 1355, 'guessed': 1990, 'wind': 4893, 'strip': 4228, 'club': 863, 'dream': 1354, 'revealed': 3671, 'flick': 1754, 'via': 4726, 'mst3k': 2932, 'mike': 2838, 'nelson': 2997, 'tom': 4514, 'robot': 3717, 'laughing': 2533, 'stock': 4193, 'cinematic': 818, 'anymore': 258, 'gives': 1914, 'urge': 4689, 'drive': 1363, 'besides': 468, 'mentioning': 2812, 'general': 1880, 'warning': 4800, 'family': 1645, 'west': 4852, 'strange': 4211, 'ghost': 1902, 'middle': 2833, 'desert': 1198, 'test': 4416, 'site': 4005, 'nuclear': 3057, 'deadly': 1133, 'presence': 3373, 'mystery': 2956, 'cube': 1076, 'birds': 485, 'examples': 1554, 'movies': 2927, 'answers': 252, 'intrigued': 2334, 'wanting': 4790, 'kind': 2476, 'whether': 4866, 'succeeded': 4263, 'personally': 3243, 'ho': 2127, 'react': 3540, 'situations': 4009, 'walking': 4779, 'investigate': 2342, 'tv': 4610, 'environment': 1505, 'offered': 3088, 'striking': 4226, 'ground': 1979, 'glass': 1917, 'eye': 1611, 'wonder': 4926, 'speak': 4094, 'creatures': 1046, 'shown': 3959, 'clue': 864, 'aliens': 184, 'ghosts': 1903, 'ancient': 225, 'indian': 2274, 'spirits': 4115, 'oh': 3096, 'clichés': 846, 'folk': 1770, 'dumb': 1385, 'guy': 2000, 'thinks': 4454, 'crazy': 1035, 'turns': 4609, 'correct': 1000, 'scenario': 3808, 'tries': 4585, 'intelligent': 2314, 'ultimately': 4628, 'fails': 1628, 'familiar': 1643, 'direct': 1258, 'hey': 2095, 'ive': 2372, 'aint': 163, 'described': 1195, 'words': 4938, 'unbelievable': 4634, 'cant': 671, 'anybody': 257, 'anyways': 262, 'buy': 640, 'overall': 3150, 'cliché': 844, 'andy': 230, 'shots': 3950, 'beauty': 430, 'nature': 2975, 'confused': 941, 'bits': 490, 'pieces': 3273, 'hes': 2093, 'afford': 138, 'art': 304, 'alive': 186, 'dead': 1132, 'ends': 1474, 'remarkable': 3613, 'writers': 4969, 'directors': 1266, 'continue': 969, 'bite': 489, 'basically': 407, 'simple': 3983, 'lone': 2642, 'chasing': 772, 'criminal': 1057, 'extremely': 1610, 'violent': 4748, 'atmosphere': 333, 'constant': 957, 'seem': 3861, 'straight': 4210, 'chan': 746, 'trilogy': 4586, 'vengeance': 4719, 'exception': 1557, 'infamous': 2284, 'iii': 2227, 'sadistic': 3768, 'brutal': 603, 'depicted': 1183, 'ruthless': 3764, 'streets': 4217, 'hong': 2149, 'kong': 2494, 'wife': 4884, 'judge': 2424, 'restaurant': 3658, 'police': 3322, 'arrives': 302, 'place': 3282, 'officer': 3092, 'sees': 3866, 'crime': 1055, 'dogs': 1311, 'results': 3662, 'bath': 411, 'kills': 2474, 'several': 3905, 'friend': 1835, 'packed': 3166, 'mouse': 2919, 'game': 1864, 'frustrated': 1842, 'cop': 991, 'professional': 3421, 'killer': 2470, 'latter': 2528, 'saves': 3796, 'sexual': 3908, 'stays': 4169, 'shed': 3924, 'local': 2634, 'routine': 3743, 'thriller': 4472, 'fascinating': 1658, 'apart': 264, 'explicit': 1591, 'backgrounds': 381, 'trained': 4557, 'fight': 1707, 'money': 2888, 'child': 789, 'barely': 399, 'speaks': 4096, 'word': 4937, 'became': 431, 'model': 2880, 'lies': 2591, 'drug': 1373, 'related': 3594, 'incident': 2260, 'question': 3487, 'suspects': 4328, 'witnesses': 4916, 'prepared': 3371, 'sacrifice': 3766, 'order': 3126, 'brand': 565, 'weak': 4824, 'stomach': 4196, 'stay': 4166, 'featuring': 1681, 'type': 4621, 'fly': 1763, 'intense': 2316, 'disturbing': 1300, 'crush': 1072, 'mentally': 2809, 'physically': 3263, 'locations': 2636, 'effectively': 1430, 'eerie': 1427, 'wouldnt': 4958, 'surprised': 4313, 'sam': 3777, 'lee': 2559, 'enemies': 1476, 'set': 3898, 'hatred': 2049, 'powerful': 3360, 'unforgettable': 4652, 'recommended': 3572, 'wait': 4770, 'inevitable': 2282, 'premise': 3370, 'success': 4265, 'batman': 413, 'animated': 241, 'warner': 4799, 'brothers': 599, 'team': 4378, 'responsible': 3656, 'producing': 3417, 'feature': 1678, 'length': 2569, 'originally': 3131, 'theatrical': 4430, 'status': 4165, 'known': 2492, 'mask': 2752, 'ten': 4399, 'boys': 559, 'sub': 4252, 'zero': 4996, 'return': 3665, 'basic': 406, 'similar': 3978, 'herself': 2092, 'run': 3755, 'mob': 2879, 'boss': 546, 'aka': 168, 'futuristic': 1857, 'seasons': 3844, 'adventures': 131, 'nicely': 3009, 'cgi': 740, 'alright': 198, 'exotic': 1574, 'league': 2549, 'points': 3319, 'theme': 4433, 'send': 3875, 'nostalgia': 3038, 'older': 3101, 'hearts': 2066, 'wayne': 4821, 'bob': 524, 'gordon': 1940, 'detective': 1220, 'barbara': 397, 'cameo': 657, 'hints': 2113, 'romantic': 3733, 'villains': 4745, 'appears': 278, 'fourth': 1813, 'reaching': 3539, 'standard': 4144, 'ago': 154, 'carries': 699, 'hardcore': 2034, 'five': 1743, 'der': 1191, 'notorious': 3051, 'continuity': 972, 'destruction': 1216, 'complete': 921, 'sometimes': 4067, 'important': 2247, 'works': 4945, 'unpleasant': 4674, 'images': 2235, 'corpse': 999, 'various': 4715, 'acts': 107, 'depressing': 1188, 'pain': 3170, 'post': 3352, 'form': 1798, 'pig': 3274, 'japan': 2381, 'opinion': 3118, 'bunch': 623, 'realistic': 3550, 'possible': 3350, 'cases': 708, 'example': 1553, 'floor': 1759, 'near': 2979, 'include': 2261, 'thrown': 4481, 'exposed': 1598, 'placed': 3283, 'horrific': 2166, 'detail': 1217, 'spirited': 4114, 'valuable': 4708, 'significant': 3972, 'entry': 1504, 'amateurish': 207, 'bloody': 513, 'shocking': 3940, 'devils': 1229, 'experiment': 1583, 'closest': 857, 'difference': 1246, 'genuine': 1891, 'dies': 1245, 'hurts': 2203, 'cannot': 670, 'meaning': 2780, 'achievement': 94, 'disgusting': 1286, 'extreme': 1609, 'cute': 1090, 'pass': 3201, 'proved': 3446, 'enormous': 1490, 'mistake': 2871, 'single': 3996, 'tired': 4501, 'pathetic': 3210, 'call': 651, 'almost': 193, 'indie': 2276, 'pamela': 3179, 'anderson': 227, 'richards': 3687, 'theyve': 4446, 'agreed': 156, 'recent': 3566, 'appear': 273, 'despite': 1211, 'across': 96, 'sexy': 3911, 'category': 718, 'everybody': 1540, 'involved': 2346, 'avoid': 363, 'concerns': 934, 'student': 4238, 'falls': 1640, 'waitress': 4773, 'called': 652, 'affair': 135, 'numerous': 3062, 'versions': 4721, 'cruel': 1070, 'winning': 4898, 'magnificent': 2705, 'interpretation': 2328, 'account': 89, 'occurs': 3078, 'beginning': 443, 'rose': 3739, 'tough': 4537, 'crude': 1069, 'repeated': 3631, 'leslie': 2574, 'howard': 2182, 'essentially': 1523, 'decent': 1147, 'destroyed': 1214, 'awesome': 371, 'producer': 3415, 'plane': 3288, 'crash': 1033, 'during': 1387, 'play': 3296, 'forest': 1791, 'picture': 3269, 'entirely': 1503, 'convincing': 985, 'parker': 3191, 'paul': 3214, 'ken': 2453, 'kim': 2475, 'harvey': 2042, 'dated': 1119, 'winds': 4895, 'whose': 4877, 'affected': 137, 'release': 3602, 'confusion': 943, 'wide': 4880, 'audiences': 353, 'debut': 1143, 'thick': 4447, 'jumped': 2430, 'thousand': 4465, 'stories': 4205, 'carrying': 701, 'arms': 294, 'feet': 1688, 'started': 4157, 'flying': 1764, 'bugs': 613, 'technology': 4385, 'unlike': 4670, 'princess': 3397, 'directly': 1263, 'importance': 2246, 'living': 2629, 'naturally': 2974, 'castle': 712, 'shortly': 3947, 'followed': 1773, 'regard': 3588, 'corny': 997, 'author': 359, 'psycho': 3457, 'paying': 3217, 'terror': 4412, 'kid': 2464, 'miles': 2843, 'usually': 4700, 'theater': 4427, 'executed': 1565, 'opportunity': 3121, 'resort': 3650, 'held': 2074, 'scary': 3807, 'eva': 1531, 'ideal': 2214, 'talent': 4354, 'taking': 4352, 'editing': 1421, 'gun': 1996, 'moving': 2928, 'share': 3920, 'problems': 3409, 'learn': 2550, 'commentary': 899, 'details': 1219, 'virtually': 4751, 'positive': 3346, 'badly': 383, 'seriously': 3892, 'poorly': 3329, 'ralph': 3510, 'faced': 1617, 'foot': 1781, 'elements': 1443, 'drawn': 1351, 'situation': 4008, 'immediately': 2242, 'climax': 849, 'element': 1442, 'contrived': 976, 'nude': 3058, 'semi': 3874, 'filmmakers': 1720, 'ridiculous': 3693, 'dialogue': 1234, 'bare': 398, 'producers': 3416, 'credit': 1049, 'subject': 4253, '1984': 23, '90s': 61, 'execution': 1566, 'dreadful': 1353, 'campy': 664, 'network': 3000, 'none': 3028, 'teachers': 4376, '2005': 35, 'international': 2326, 'festival': 1697, '14': 6, 'rank': 3518, 'received': 3564, 'standing': 4146, 'falk': 1636, 'displayed': 1292, 'chemistry': 782, 'odd': 3080, 'sounding': 4083, 'am': 204, 'majority': 2712, 'junk': 2436, 'relate': 3593, 'sorts': 4078, 'gem': 1877, 'rate': 3526, '80': 58, 'scale': 3802, 'expect': 1575, 'miracle': 2860, 'cure': 1081, 'faith': 1633, 'promises': 3432, 'christ': 807, 'worst': 4952, 'die': 1243, 'holding': 2130, 'onto': 3111, 'integrity': 2311, 'god': 1926, 'trust': 4598, 'directions': 1262, 'term': 4405, 'outcome': 3142, 'heart': 2064, 'attack': 337, 'letters': 2582, 'position': 3345, 'save': 3794, 'anger': 235, 'typical': 4623, 'jerry': 2396, 'designed': 1203, 'conflict': 939, 'ensues': 1493, 'behave': 445, 'appropriate': 283, 'natural': 2973, 'chaos': 755, 'funniest': 1851, 'respected': 3653, 'wonders': 4931, 'allow': 189, 'help': 2079, 'morning': 2906, 'religion': 3607, '100': 1, 'therefore': 4439, 'rather': 3528, 'pick': 3265, 'mini': 2854, 'childhood': 790, 'went': 4847, 'fast': 1661, 'problem': 3408, 'told': 4513, 'sum': 4289, 'doubts': 1333, 'taylor': 4372, 'max': 2774, 'nevertheless': 3002, 'glad': 1916, '75': 57, 'impressed': 2251, 'jane': 2380, 'emma': 1457, 'rachel': 3500, 'store': 4204, 'discovered': 1280, 're': 3535, 'united': 4664, 'nicholas': 3010, 'rented': 3628, 'critical': 1062, 'adaptation': 113, 'perfectly': 3226, 'capture': 676, 'spirit': 4113, 'accent': 78, 'concerned': 932, 'initially': 2291, 'english': 1483, 'taken': 4350, 'period': 3236, 'countryside': 1015, 'appeal': 271, 'idea': 2213, 'fare': 1655, 'death': 1141, 'settings': 3901, 'attractive': 351, 'charming': 768, 'combine': 884, 'entertainment': 1500, 'repeat': 3630, 'viewings': 4741, 'subtle': 4260, 'expressions': 1603, 'essential': 1522, 'complicated': 925, 'enjoyable': 1485, 'endearing': 1469, 'touching': 4536, 'competent': 917, 'favorite': 1671, 'delightful': 1168, 'lively': 2627, 'within': 4912, 'brought': 600, 'unique': 4662, 'personality': 3242, 'circle': 821, 'close': 853, 'drama': 1346, 'laughter': 2535, 'tears': 4380, 'develops': 1226, 'capturing': 679, 'common': 906, 'emotions': 1461, 'actions': 101, 'reactions': 3542, 'epic': 1506, 'head': 2057, 'moments': 2887, 'won': 4925, 'academy': 77, 'award': 366, 'glenn': 1918, 'win': 4892, 'fatal': 1664, 'attraction': 350, 'anyway': 261, 'oscar': 3133, 'golden': 1934, 'nominated': 3025, 'plays': 3301, 'pay': 3216, 'asked': 317, 'mr': 2929, 'johnny': 2410, 'danny': 1112, 'marry': 2745, 'month': 2896, 'mother': 2911, 'needs': 2990, 'attend': 344, 'finds': 1728, 'brother': 598, 'cage': 645, 'moon': 2900, 'john': 2409, 'perry': 3237, 'screenplay': 3831, 'norman': 3035, 'heat': 2067, 'dick': 1239, 'quotes': 3497, '17': 9, 'program': 3424, 'episodes': 1508, 'richard': 3686, 'dean': 1139, 'addition': 118, 'ben': 463, 'strength': 4219, 'sadly': 3769, 'sci': 3815, 'fi': 1701, 'channel': 753, 'rid': 3690, '11': 2, 'hopefully': 2157, 'atlantis': 332, 'stargate': 4151, 'franchise': 1817, 'nowhere': 3056, 'genres': 1889, 'range': 3516, 'ages': 152, 'types': 4622, 'twin': 4615, 'bobby': 525, 'says': 3801, 'used': 4693, 'dozens': 1339, 'sheer': 3925, 'steven': 4182, 'oliver': 3102, 'beach': 419, 'nearly': 2981, 'calls': 654, 'price': 3389, 'roots': 3738, 'heaven': 2068, 'weve': 4856, 'seat': 3845, 'joy': 2422, 'inner': 2292, 'fame': 1642, 'youth': 4994, 'empty': 1465, 'ready': 3546, 'adapted': 114, 'stage': 4139, 'daniel': 1111, 'warm': 4794, 'rough': 3741, 'north': 3036, 'wins': 4899, 'youve': 4995, 'contains': 964, 'extras': 1608, 'beats': 426, 'comparison': 914, 'created': 1037, 'existence': 1571, 'humans': 2190, 'earth': 1403, 'ourselves': 3140, 'progress': 3425, 'species': 4098, 'universe': 4666, 'ashamed': 312, 'global': 1920, 'responsibility': 3655, 'creating': 1039, 'prevent': 3385, 'creation': 1040, 'gross': 1978, 'reality': 3551, 'embarrassment': 1455, 'loaded': 2632, 'monsters': 2893, 'shark': 3921, 'awkward': 374, 'sequence': 3887, 'minutes': 2859, 'effect': 1428, 'camera': 659, 'rush': 3759, 'jump': 2429, 'cuts': 1091, 'fancy': 1649, 'boring': 543, 'repetitive': 3633, 'specific': 4099, 'somewhat': 4068, 'grab': 1949, 'shake': 3914, 'needed': 2988, 'prisoners': 3403, 'cell': 730, 'suicide': 4283, 'methods': 2825, 'sitting': 4007, 'please': 3304, 'raising': 3509, 'victor': 4731, 'terribly': 4408, 'unfortunately': 4654, 'artificial': 306, 'fake': 1635, 'directing': 1260, 'lighting': 2596, 'authentic': 358, 'similarly': 3980, 'wrote': 4974, 'goal': 1925, 'hip': 2114, 'buddies': 608, 'filmmaker': 1719, 'classes': 834, 'remotely': 3623, 'larry': 2520, 'excuse': 1564, 'stan': 4142, 'reach': 3536, 'contest': 967, 'spot': 4131, 'surrounding': 4319, 'warming': 4795, 'grinch': 1975, 'stole': 4194, 'christmas': 810, 'plain': 3286, 'paulie': 3215, 'except': 1556, 'ratings': 3530, 'animals': 240, 'fx': 1858, 'big': 478, 'flaws': 1752, 'unwatchable': 4682, 'ugly': 4625, 'massive': 2756, 'mansion': 2731, 'rural': 3758, 'irish': 2353, 'rated': 3527, 'name': 2960, 'donna': 1323, 'clear': 839, 'ms': 2931, 'reed': 3580, 'mgm': 2828, 'lived': 2626, 'talents': 4356, 'ball': 388, 'depth': 1190, 'produced': 3414, 'universal': 4665, 'neither': 2996, 'touch': 4533, 'nor': 3032, 'wit': 4908, 'conventional': 979, 'pre': 3365, 'effort': 1432, 'light': 2595, 'louis': 2665, 'safe': 3771, 'zone': 4999, 'challenging': 744, 'stand': 4143, 'decision': 1151, 'loss': 2659, 'television': 4394, 'ahead': 159, 'jackie': 2374, 'master': 2757, 'super': 4295, 'lord': 2652, 'gang': 1867, 'assigned': 325, 'protect': 3443, 'mans': 2730, 'turn': 4605, 'states': 4163, 'gangster': 1868, 'sends': 3876, 'matters': 2770, 'keeping': 2450, 'maggie': 2702, 'bay': 416, 'stunts': 4246, 'martial': 2747, 'arts': 310, 'choreography': 802, 'famous': 1647, 'battle': 414, 'royal': 3746, 'mall': 2720, 'charismatic': 763, 'slapstick': 4019, 'humor': 2192, 'explore': 1594, 'explain': 1586, 'ok': 3098, 'exploration': 1593, 'round': 3742, 'families': 1644, 'bear': 420, 'subplot': 4255, 'daring': 1114, 'hunt': 2198, 'food': 1778, 'isolated': 2363, 'path': 3209, 'whale': 4857, 'tour': 4538, 'elephant': 1444, 'africa': 141, 'mark': 2739, 'survival': 4320, 'skills': 4014, 'protagonists': 3442, 'brooks': 596, 'burton': 630, 'adam': 111, 'andrew': 228, 'shooting': 3943, 'lesson': 2577, 'doc': 1303, 'poster': 3353, 'fellow': 1691, 'visit': 4755, '12': 3, '2006': 36, '2nd': 41, 'screening': 3830, 'theatre': 4429, 'room': 3735, 'involves': 2348, 'morgan': 2905, 'freeman': 1829, 'career': 686, 'beneath': 464, 'attitude': 346, 'previous': 3387, 'flicks': 1755, 'bargain': 400, 'drops': 1372, 'community': 908, 'market': 2740, 'research': 3644, 'manager': 2724, 'soon': 4074, 'discovers': 1281, 'iron': 2354, 'returns': 3668, 'interview': 2329, 'circumstances': 822, 'join': 2412, 'felt': 1692, 'extended': 1604, 'teaching': 4377, 'childrens': 793, 'literally': 2622, 'simon': 3982, 'delight': 1167, 'exact': 1550, 'opposite': 3123, 'credits': 1050, 'target': 4366, 'books': 536, 'tape': 4365, 'reading': 3544, 'brad': 560, 'brief': 579, 'specifically': 4100, 'changed': 750, 'born': 544, 'inside': 2297, 'wise': 4903, 'context': 968, 'tells': 4397, 'solve': 4060, 'technical': 4381, 'aspects': 322, 'easy': 1410, 'asian': 314, 'lovers': 2673, 'spanish': 4092, 'lol': 2640, '13': 4, 'hated': 2047, 'totally': 4532, 'cover': 1022, 'confusing': 942, 'mix': 2876, 'murder': 2939, 'sloppy': 4030, 'resolution': 3649, 'strangely': 4212, 'stiff': 4187, 'meant': 2784, 'offensive': 3086, 'successful': 4266, 'recorded': 3574, '25': 40, 'promising': 3433, 'downhill': 1336, 'overacting': 3149, '16': 8, 'italian': 2367, 'native': 2972, 'accents': 79, 'male': 2719, 'fat': 1663, 'van': 4713, 'doctor': 1304, 'sign': 3970, 'couples': 1017, 'kids': 2467, 'four': 1812, 'stooges': 4200, 'remote': 3622, 'makeup': 2717, 'clothing': 861, 'twisted': 4618, 'combination': 883, 'cutting': 1092, 'favor': 1670, 'proves': 3447, 'indeed': 2271, 'unnecessary': 4672, 'unknown': 4668, 'mine': 2853, 'vhs': 4725, 'included': 2262, 'translation': 4562, 'began': 441, 'tunes': 4603, 'opening': 3114, 'minimal': 2855, 'absurd': 73, 'thoughtful': 4463, 'paris': 3189, 'texas': 4418, 'et': 1526, 'lifetime': 2594, 'experience': 1580, 'rarely': 3524, 'members': 2801, 'pointed': 3317, 'bought': 552, '1st': 28, 'sat': 3786, 'found': 1811, 'myself': 2953, 'bbc': 417, 'finest': 1730, 'ranks': 3519, 'hooked': 2153, 'superb': 4296, 'unusual': 4681, 'edge': 1418, 'bleak': 503, 'continues': 971, 'originality': 3130, 'upon': 4684, 'uk': 4626, 'smart': 4035, 'tongue': 4518, 'cheek': 779, 'blown': 516, 'adds': 119, 'physical': 3262, 'shame': 3918, 'purpose': 3474, 'intention': 2319, 'design': 1202, 'african': 142, 'casting': 711, 'planned': 3290, 'defined': 1160, 'greater': 1965, 'support': 4303, 'board': 522, 'flash': 1745, 'present': 3374, 'portrayal': 3341, 'blind': 506, 'convincingly': 986, 'toward': 4539, 'ocean': 3079, 'cliff': 847, 'subtitles': 4259, 'closely': 855, 'note': 3044, 'sight': 3969, 'names': 2963, 'york': 4985, 'reviews': 3679, 'faults': 1669, 'tribute': 4581, 'overwhelming': 3157, 'due': 1382, 'properly': 3437, 'titled': 4505, 'reviewers': 3678, 'annoying': 249, 'otherwise': 3137, 'touched': 4534, 'robin': 3715, 'chris': 806, 'cooper': 990, 'acted': 98, 'reveal': 3670, 'crying': 1074, 'segment': 3867, 'precious': 3366, 'initial': 2290, 'seconds': 3848, 'outside': 3146, 'convey': 982, 'involve': 2345, 'meeting': 2794, 'met': 2822, 'recently': 3567, 'stranger': 4213, 'appreciate': 279, 'julie': 2428, 'ethan': 1528, 'theyre': 4445, 'expectations': 1576, 'doubt': 1332, 'landscape': 2513, 'cusack': 1088, 'hotel': 2175, 'engaged': 1480, 'answer': 251, 'mental': 2808, 'everyday': 1541, 'developed': 1223, 'themes': 4434, 'choice': 797, 'psychological': 3458, 'thrill': 4471, 'fully': 1847, 'styles': 4250, 'hammer': 2014, 'bore': 540, 'steal': 4170, 'ideas': 2215, 'shining': 3933, 'stephen': 4176, 'king': 2479, 'reduced': 3579, 'stealing': 4171, 'mixed': 2877, 'waiting': 4772, 'gave': 1875, 'portion': 3338, 'riveting': 3708, 'amazed': 208, 'enjoyed': 1486, 'kurt': 2499, 'thomas': 4458, 'bar': 396, 'stone': 4197, 'horse': 2170, 'village': 4743, 'slow': 4031, 'commit': 904, 'encounters': 1467, 'seed': 3856, 'weeks': 4839, 'moment': 2886, 'ultimate': 4627, 'summer': 4291, 'effective': 1429, 'growing': 1983, 'lengthy': 2570, 'lessons': 2578, 'carry': 700, 'suffering': 4277, 'largely': 2518, 'triumph': 4590, 'guarantee': 1987, 'causes': 725, 'poverty': 3357, 'coherent': 871, 'loose': 2650, 'underlying': 4641, 'newly': 3004, 'disc': 1278, 'faster': 1662, 'background': 380, 'grim': 1974, 'hop': 2154, 'gruesome': 1986, 'dubbing': 1380, 'notable': 3041, 'wanted': 4789, '18': 10, 'comical': 894, 'hidden': 2096, 'matrix': 2767, 'fighting': 1709, 'among': 217, 'blend': 504, 'pop': 3330, 'culture': 1079, 'costumes': 1006, 'superhero': 4299, 'mill': 2845, 'soldiers': 4054, 'punk': 3468, 'heavy': 2070, 'metal': 2823, 'rock': 3720, 'retarded': 3663, 'impressive': 2253, 'gonna': 1936, 'nightmares': 3015, '2002': 32, 'pg': 3250, 'jet': 2400, 'li': 2587, 'core': 995, 'personalities': 3241, 'friendship': 1838, 'villain': 4744, 'smoking': 4040, 'hot': 2174, 'suffered': 4276, 'mediocre': 2791, 'means': 2783, 'deserved': 1200, 'comic': 893, 'wet': 4855, 'inspiring': 2302, 'schools': 3814, 'slasher': 4020, 'seventies': 3904, 'halloween': 2010, 'genre': 1888, 'cult': 1077, 'titles': 4506, 'urban': 4688, 'legend': 2564, 'picked': 3266, 'splatter': 4118, 'university': 4667, 'heavily': 2069, 'eighties': 1437, 'boom': 537, 'approach': 282, 'certain': 737, 'traditional': 4550, 'fashion': 1659, 'insane': 2296, 'decided': 1149, 'unhappy': 4656, 'service': 3896, 'elsewhere': 1449, 'unseen': 4678, 'break': 569, 'unfortunate': 4653, 'sun': 4292, 'shine': 3931, 'murdered': 2940, 'transfer': 4559, 'st': 4137, 'college': 875, 'educational': 1425, 'catholic': 720, 'teacher': 4375, 'busy': 635, 'students': 4239, 'sudden': 4272, 'knock': 2488, 'door': 1328, 'chest': 784, 'kitchen': 2485, 'knife': 2487, 'introduced': 2337, 'arrival': 299, 'resident': 3647, 'motivation': 2914, 'holds': 2131, 'slaughter': 4021, 'alike': 185, 'dropping': 1371, 'flies': 1756, 'shy': 3962, 'menace': 2806, 'armed': 293, 'blade': 495, 'suspicious': 4332, 'professor': 3422, 'murderer': 2941, 'available': 361, 'under': 4639, 'killings': 2473, 'edition': 1422, 'print': 3399, 'floating': 1758, 'id': 2212, 'fairly': 1631, 'likes': 2602, 'rage': 3505, 'instance': 2304, 'critics': 1064, 'failed': 1626, 'direction': 1261, 'accomplished': 87, 'possibility': 3349, 'bottom': 551, 'ring': 3698, 'improved': 2255, 'par': 3184, 'wooden': 4934, 'teenagers': 4391, 'helped': 2080, 'build': 614, 'confidence': 938, 'signs': 3973, 'clumsy': 866, 'brave': 567, 'conclusion': 936, 'expecting': 1578, 'witness': 4914, 'teens': 4392, 'pace': 3161, 'scream': 3826, 'queen': 3485, 'undoubtedly': 4648, 'un': 4630, 'paced': 3162, 'youd': 4987, 'distant': 1295, 'future': 1856, 'parents': 3188, 'curious': 1083, 'truth': 4599, 'asks': 319, 'sons': 4073, 'national': 2970, 'private': 3404, 'connection': 946, 'pages': 3168, 'entertaining': 1499, 'follows': 1775, 'modern': 2882, 'glover': 1923, 'david': 1124, 'jobs': 2406, 'highlight': 2103, 'shakespeare': 3915, 'deal': 1135, 'focused': 1767, 'poor': 3328, 'blows': 517, '80s': 59, 'department': 1182, 'crack': 1027, 'bigger': 479, 'care': 684, 'drunk': 1375, 'degree': 1164, 'snl': 4044, 'costs': 1004, 'waited': 4771, 'sold': 4052, 'house': 2178, 'thinking': 4453, 'ended': 1470, 'lasted': 2522, 'theaters': 4428, 'interviews': 2330, 'closing': 859, 'contact': 961, 'ignored': 2225, 'tortured': 4530, 'starts': 4159, 'affect': 136, 'stands': 4147, 'atrocious': 335, 'streep': 4215, 'trouble': 4592, 'card': 681, 'learning': 2552, 'pacing': 3163, 'intentions': 2321, 'indians': 2275, 'burt': 629, 'reynolds': 3684, 'page': 3167, 'week': 4837, 'starting': 4158, 'influence': 2286, 'particular': 3194, 'experienced': 1581, 'till': 4493, 'incredible': 2269, 'rescue': 3643, 'noise': 3024, 'supposedly': 4307, 'minds': 2852, 'class': 833, 'industry': 2280, 'honestly': 2147, 'becoming': 435, 'freddy': 1826, 'jason': 2383, 'myers': 2952, 'jumps': 2432, 'endings': 1472, 'dr': 1340, 'themselves': 4435, 'trapped': 4564, 'ice': 2210, 'kirk': 2482, 'former': 1800, 'pair': 3176, 'spends': 4107, 'slowly': 4032, 'throwing': 4480, 'leaving': 2557, 'hold': 2129, 'grasp': 1960, 'coming': 896, 'kelly': 2452, 'installment': 2303, 'captain': 674, 'allowed': 190, 'plus': 3311, 'ian': 2209, 'trio': 4587, 'luckily': 2682, 'spoilers': 4124, 'walked': 4776, 'forty': 1807, 'laughed': 2532, 'fell': 1690, 'realized': 3553, 'product': 3418, 'somebody': 4062, 'reminiscent': 3621, 'inspector': 2299, 'gadget': 1860, 'slick': 4026, 'excited': 1561, 'gags': 1862, 'relies': 3606, 'push': 3477, 'pointless': 3318, 'silly': 3976, '45': 50, 'preview': 3386, 'intelligence': 2313, 'giant': 1905, 'sad': 3767, 'travesty': 4570, 'pretentious': 3383, 'popularity': 3334, 'itll': 2369, 'catch': 714, 'beat': 423, 'nomination': 3026, 'immensely': 2243, 'bored': 541, 'outstanding': 3147, 'george': 1893, 'oddly': 3081, 'eric': 1514, 'roberts': 3714, 'talks': 4361, 'reasons': 3560, 'understood': 4647, 'noble': 3021, 'shadow': 3912, 'sandler': 3781, 'wannabe': 4787, 'comedian': 887, 'cruise': 1071, 'wacky': 4769, 'naked': 2959, 'shoes': 3941, 'appearance': 274, 'humour': 2194, 'australia': 356, 'australian': 357, 'definition': 1163, 'comfortable': 892, 'arm': 292, 'bag': 384, 'popcorn': 3331, 'splendid': 4119, 'carol': 693, 'western': 4853, 'attempting': 342, 'drag': 1342, 'swedish': 4335, 'pays': 3218, 'russian': 3762, 'scientist': 3818, 'plans': 3292, 'ruined': 3751, 'military': 2844, 'arrived': 301, 'planet': 3289, 'cia': 814, 'ninja': 3018, 'practice': 3363, 'speaking': 4095, 'cheap': 774, 'creates': 1038, 'machine': 2693, 'guns': 1998, 'huge': 2185, 'smoke': 4039, 'thousands': 4466, 'dying': 1393, 'suits': 4287, 'snow': 4045, 'choreographed': 801, 'combat': 882, 'describe': 1194, 'moved': 2922, 'bang': 393, 'instant': 2305, 'amounts': 220, 'alcohol': 176, 'party': 3200, 'comedies': 889, 'failure': 1629, 'charles': 764, 'miscast': 2862, 'favorites': 1672, 'identify': 2216, 'notch': 3043, 'spoke': 4125, 'stick': 4185, 'mistress': 2874, 'added': 116, 'greatly': 1967, 'value': 4709, 'struggling': 4236, 'respect': 3652, 'ran': 3513, '1973': 19, 'survived': 4322, 'changes': 751, 'productions': 3420, 'exceptional': 1558, 'green': 1971, 'generations': 1884, 'morality': 2903, 'result': 3660, 'prequel': 3372, 'unlikely': 4671, 'list': 2617, 'pilot': 3276, 'erotic': 1515, 'checking': 778, 'porn': 3336, 'celluloid': 731, 'solid': 4057, 'recall': 3562, 'monster': 2892, 'enters': 1496, 'field': 1704, 'ship': 3934, 'tiger': 4491, 'visible': 4753, 'likewise': 2603, 'creature': 1045, 'doors': 1329, 'wonderfully': 4929, 'airplane': 166, 'mature': 2773, '30': 42, 'favourite': 1673, 'timeless': 4496, 'tomatoes': 4515, 'nations': 2971, 'move': 2921, 'website': 4834, 'viewers': 4739, 'forms': 1801, 'grant': 1956, 'fashioned': 1660, 'predictable': 3367, 'cuba': 1075, 'jr': 2423, 'portraying': 3343, 'navy': 2976, 'carl': 692, 'determined': 1221, 'de': 1131, 'niro': 3019, 'billy': 482, 'sunday': 4293, 'higher': 2101, 'key': 2458, 'describes': 1196, 'rich': 3685, 'willing': 4889, 'careful': 688, 'front': 1841, 'revealing': 3672, 'biggest': 480, 'company': 910, 'president': 3378, 'makers': 2715, 'media': 2789, 'wishes': 4906, 'interested': 2324, 'son': 4070, 'sharp': 3922, 'dressed': 1358, 'named': 2961, 'familys': 1646, 'pet': 3246, 'lane': 2515, 'accept': 80, 'boat': 523, 'race': 3499, 'julia': 2427, 'voices': 4763, 'jennifer': 2392, 'directed': 1259, 'terms': 4406, 'whatsoever': 4861, 'logic': 2638, 'illogical': 2231, 'decisions': 1152, 'wow': 4962, 'beast': 422, 'graphic': 1958, 'france': 1816, 'married': 2744, 'horses': 2171, 'area': 286, 'ladies': 2506, 'bride': 577, 'hands': 2020, 'sleep': 4024, 'member': 2800, 'woods': 4935, 'desired': 1206, 'featured': 1679, 'baseball': 403, 'twist': 4617, 'clues': 865, 'fitting': 1742, 'expected': 1577, 'expert': 1585, 'grow': 1982, 'hugh': 2186, 'figured': 1712, 'complex': 923, 'driven': 1365, 'adventure': 130, 'competition': 918, 'begin': 442, 'following': 1774, 'uwe': 4703, 'boll': 529, 'concept': 930, 'dozen': 1338, 'downright': 1337, 'chain': 741, 'generic': 1885, 'forgive': 1795, 'holes': 2133, 'flaw': 1749, 'dramatic': 1348, 'civil': 826, 'ii': 2226, 'yeah': 4977, 'secret': 3849, 'six': 4010, 'suddenly': 4273, 'shut': 3961, 'decides': 1150, 'covered': 1023, 'memory': 2804, 'heroes': 2089, 'suit': 4284, 'cash': 709, 'huh': 2187, 'shirt': 3937, 'kept': 2456, 'errors': 1516, 'columbo': 881, 'bucks': 606, 'public': 3460, 'necessary': 2984, 'plan': 3287, 'orson': 3132, 'welles': 4844, 'manages': 2725, 'ass': 323, 'masterpieces': 2760, 'trial': 4580, 'midnight': 2834, 'tend': 4400, 'edited': 1420, 'manage': 2722, 'sit': 4003, 'speech': 4102, 'proceedings': 3410, 'constantly': 958, 'skill': 4013, 'poetry': 3314, 'soul': 4079, 'glorious': 1921, 'visual': 4757, 'frightening': 1839, 'monk': 2889, 'formula': 1802, 'reminds': 3620, 'marriage': 2743, 'reaction': 3541, 'journey': 2421, 'advantage': 129, 'painful': 3171, 'linda': 2609, 'offering': 3089, 'abysmal': 76, 'capable': 672, 'drop': 1369, 'appalling': 268, 'dropped': 1370, 'hoping': 2160, 'tim': 4494, 'offer': 3087, 'genuinely': 1892, 'captivating': 675, 'wears': 4832, 'obsessed': 3069, 'seemingly': 3863, 'hunter': 2199, 'resulting': 3661, 'emotional': 1459, 'provide': 3448, 'display': 1291, 'tony': 4521, 'focus': 1766, 'obsession': 3070, 'raw': 3532, 'material': 2766, 'alas': 172, 'glimpse': 1919, 'hadnt': 2004, 'wave': 4818, 'pleasant': 3302, 'unpredictable': 4675, 'surprise': 4312, 'lawyer': 2541, 'defend': 1158, 'guilty': 1995, 'serve': 3893, 'quick': 3490, 'sentence': 3881, 'believes': 456, 'investigation': 2343, 'proceeds': 3411, 'learns': 2553, 'spoil': 4121, 'introduces': 2338, 'mysteries': 2954, 'whos': 4876, 'spiritual': 4116, 'guide': 1993, 'relationships': 3598, 'suggests': 4282, 'warn': 4797, 'falling': 1639, 'rain': 3506, 'create': 1036, 'horrors': 2169, 'russell': 3761, '2004': 34, 'depicts': 1186, 'centered': 733, 'sappy': 3784, 'dances': 1106, 'samurai': 3779, 'upset': 4687, 'possibly': 3351, 'crocodile': 1065, 'escapes': 1519, 'dinosaur': 1255, 'legs': 2566, 'unoriginal': 4673, 'cliche': 843, 'hunters': 2200, 'separate': 3883, 'lame': 2510, 'threatening': 4468, 'wasting': 4810, 'minimum': 2856, 'method': 2824, 'matthau': 2771, 'lemmon': 2567, 'witty': 4917, 'attempts': 343, 'rolling': 3730, 'jack': 2373, 'string': 4227, 'fine': 1729, 'patrick': 3213, 'ludicrous': 2685, 'intensity': 2317, 'bland': 500, 'captures': 678, 'welcome': 4842, 'joseph': 2419, 'anthony': 253, 'andrews': 229, 'belongs': 460, 'alan': 171, 'develop': 1222, 'deliver': 1169, 'prince': 3396, 'appearances': 275, 'lawrence': 2539, 'authority': 360, 'dignity': 1252, 'comments': 901, 'sensitive': 3879, 'overcome': 3151, 'eddie': 1416, 'murphy': 2944, 'loves': 2674, 'hundreds': 2196, 'library': 2589, 'hear': 2061, 'rental': 3627, 'sutherland': 4333, '24': 39, 'agents': 151, 'protagonist': 3441, 'douglas': 1334, 'veteran': 4724, 'agent': 150, 'clint': 850, 'aging': 153, 'historical': 2119, 'duty': 1390, 'amongst': 218, 'forgettable': 1794, 'ex': 1549, 'law': 2538, 'hunting': 2201, 'individual': 2277, 'unable': 4631, 'tight': 4492, 'pants': 3182, 'fest': 1696, 'tense': 4403, 'threat': 4467, 'subplots': 4256, 'entertained': 1498, 'recognized': 3570, 'hundred': 2195, 'rats': 3531, 'rights': 3697, 'serial': 3889, 'figure': 1711, 'subjects': 4254, 'scientific': 3817, 'study': 4242, 'creep': 1051, 'remain': 3609, 'guard': 1988, 'claims': 830, '1970s': 17, 'size': 4012, 'basis': 408, 'alternate': 200, 'suggested': 4281, 'group': 1980, 'stress': 4220, 'weapon': 4828, 'james': 2378, 'pretend': 3381, 'concert': 935, 'antics': 255, 'tremendous': 4579, 'moves': 2925, 'loses': 2657, 'madness': 2698, 'thankfully': 4422, 'usual': 4699, 'tricks': 4583, 'european': 1530, 'tension': 4404, 'watchable': 4812, 'entertain': 1497, '90': 60, 'aside': 315, 'existent': 1572, 'struggle': 4234, 'balance': 387, 'buff': 611, 'object': 3066, 'teen': 4388, 'lust': 2689, 'breasts': 572, 'pursuit': 3476, 'brings': 587, 'offended': 3085, 'poem': 3312, 'reminded': 3619, 'showed': 3956, 'exercise': 1568, 'personal': 3240, 'asking': 318, 'understanding': 4645, 'facing': 1620, 'audio': 354, 'track': 4544, 'impossible': 2249, 'purchased': 3470, 'finished': 1733, 'imdb': 2240, 'eight': 1436, 'co': 867, 'exist': 1569, 'bush': 632, 'decade': 1144, 'record': 3573, 'questionable': 3488, 'suspend': 4329, 'disbelief': 1277, 'bill': 481, 'eating': 1413, 'rat': 3525, 'crew': 1053, 'suggest': 4280, 'bus': 631, 'homeless': 2143, 'frankenstein': 1822, 'breath': 573, 'survivor': 4324, 'kicks': 2463, 'sucked': 4270, 'hole': 2132, 'computer': 927, 'werewolf': 4850, 'clips': 852, 'remind': 3618, 'women': 4923, 'wanna': 4786, 'dvds': 1392, 'numbers': 3061, 'road': 3709, 'tad': 4347, 'soccer': 4048, 'remaining': 3610, 'restored': 3659, 'choose': 799, 'memories': 2803, 'woody': 4936, 'allen': 188, 'match': 2762, 'reporter': 3636, 'deceased': 1146, 'london': 2641, 'sid': 3964, 'handsome': 2021, 'talented': 4355, 'bollywood': 530, 'akshay': 169, 'abraham': 69, 'thank': 4421, 'hed': 2072, 'india': 2273, 'overdone': 3152, 'south': 4087, 'amazingly': 210, 'crappy': 1032, 'century': 736, 'inept': 2281, 'equal': 1509, 'expression': 1602, 'generated': 1882, 'adequate': 120, 'finding': 1727, 'trailers': 4555, 'presented': 3376, 'earlier': 1399, 'candy': 668, 'regular': 3592, 'task': 4368, 'crafted': 1029, 'mexican': 2826, 'sounded': 4082, 'social': 4049, 'bright': 581, 'crowd': 1067, 'deals': 1137, 'condition': 937, 'screaming': 3827, 'compelling': 916, 'according': 88, 'shouldnt': 3952, 'stood': 4199, 'drew': 1360, 'dennis': 1179, 'uncomfortable': 4637, 'pleasure': 3306, 'messages': 2820, 'diamond': 1236, 'toilet': 4512, 'kiss': 2483, 'mouth': 2920, 'foul': 1810, 'finger': 1731, 'bo': 521, 'luke': 2687, 'duke': 1383, 'uncle': 4636, 'jesse': 2397, 'jail': 2376, 'disappointing': 1274, 'insult': 2309, 'horribly': 2164, 'worry': 4950, 'seven': 3903, 'racist': 3503, 'rebel': 3561, 'honor': 2150, 'belief': 451, 'prove': 3445, 'fill': 1714, 'destroying': 1215, 'mainly': 2708, 'nights': 3016, 'haunted': 2050, 'childish': 791, 'gory': 1943, 'mtv': 2933, 'unintentionally': 4659, 'distracting': 1297, 'angles': 237, 'blowing': 515, 'deeply': 1156, 'selfish': 3870, 'hbo': 2055, 'behavior': 446, 'breaking': 570, 'houses': 2180, 'pulling': 3463, 'ridden': 3691, 'construction': 960, 'pseudo': 3454, 'intellectual': 2312, 'turkey': 4604, 'movements': 2924, 'building': 615, 'ned': 2986, 'angry': 238, 'husband': 2204, 'box': 555, 'office': 3091, 'plastic': 3294, 'science': 3816, '3000': 43, 'hysterical': 2208, 'nervous': 2999, 'screenwriter': 3833, 'cares': 690, 'fresh': 1833, 'glory': 1922, 'cats': 721, 'cost': 1003, 'similarities': 3979, 'ie': 2221, 'stolen': 4195, 'couldve': 1009, 'fall': 1637, 'dragged': 1343, 'disneys': 1290, 'jungle': 2434, 'resembles': 3646, 'cartoons': 705, 'occasionally': 3075, 'captured': 677, 'sneak': 4043, 'base': 402, 'sinister': 3998, 'voiced': 4762, 'wore': 4939, 'hand': 2015, 'everywhere': 1545, 'likely': 2601, 'below': 462, 'killers': 2471, 'aged': 149, 'chuck': 812, 'intent': 2318, 'hitting': 2126, 'reaches': 3538, 'nonsense': 3030, 'implausible': 2245, 'manner': 2729, 'daughters': 1122, 'missing': 2869, 'panic': 3181, 'sucks': 4271, 'kingdom': 2480, 'search': 3841, 'shop': 3945, 'magic': 2703, 'rex': 3683, 'hilariously': 2107, 'acceptable': 81, 'wolf': 4920, 'cannibal': 669, 'listed': 2618, 'hearted': 2065, 'ed': 1415, 'wood': 4933, 'anna': 245, 'fever': 1699, 'birthday': 487, 'draws': 1352, 'paper': 3183, 'subsequent': 4257, 'stronger': 4230, 'darker': 1116, 'charlotte': 766, 'disappear': 1270, 'broken': 594, 'cd': 728, 'built': 618, 'devil': 1228, 'idiotic': 2219, 'dialogues': 1235, 'unbelievably': 4635, 'pregnant': 3369, 'portrayed': 3342, 'control': 977, '2000': 30, 'narration': 2965, 'million': 2847, 'losers': 2656, 'gene': 1879, 'leading': 2547, 'dancers': 1105, '40s': 49, 'superior': 4300, 'rising': 3703, 'hits': 2125, 'artistic': 308, 'steps': 4177, 'limits': 2607, 'naive': 2958, 'cabin': 643, 'sky': 4017, 'wearing': 4831, 'thief': 4448, 'human': 2188, 'johnson': 2411, 'accidentally': 85, 'shoots': 3944, 'players': 3299, 'sidekick': 3966, 'provides': 3450, 'mario': 2738, 'lisa': 2616, 'explains': 1589, 'ridiculously': 3694, 'gothic': 1945, 'commercial': 902, 'mass': 2753, 'adding': 117, 'technically': 4382, 'jumping': 2431, 'miserably': 2864, 'destroy': 1213, 'artist': 307, 'sake': 3775, 'disaster': 1276, 'plague': 3285, 'slap': 4018, 'hanks': 2025, 'persona': 3239, 'months': 2897, 'feed': 1683, 'piano': 3264, 'teeth': 4393, 'gas': 1874, 'web': 4833, 'advance': 127, 'parent': 3187, 'soap': 4047, 'opera': 3116, 'stereotypical': 4180, 'babe': 375, 'blonde': 511, 'plots': 3310, 'disturbed': 1299, 'handful': 2017, 'thirty': 4456, 'replaced': 3634, 'relief': 3605, 'gotta': 1946, 'admit': 122, 'sheriff': 3929, 'eaten': 1412, 'german': 1896, 'prisoner': 3402, 'camp': 662, 'nazis': 2978, 'information': 2288, 'letter': 2581, 'murderous': 2942, 'quirky': 3493, 'christian': 808, 'delivered': 1170, 'format': 1799, 'attached': 336, 'yesterday': 4983, 'internet': 2327, 'sin': 3988, 'la': 2500, 'latin': 2527, 'gold': 1932, 'silver': 3977, 'content': 966, 'successfully': 4267, 'artists': 309, 'humble': 2191, 'surprises': 4314, 'mainstream': 2709, 'space': 4090, 'simplicity': 3984, 'frank': 1821, 'interaction': 2322, 'alexander': 180, 'blob': 507, 'alien': 183, 'invasion': 2340, '1950s': 14, 'elderly': 1440, 'substance': 4258, 'arrive': 300, 'doctors': 1305, 'nurse': 3063, 'runs': 3757, 'load': 2631, 'roll': 3728, 'atmospheric': 334, '1972': 18, 'fights': 1710, 'surrounded': 4318, 'lights': 2597, 'questions': 3489, 'gentle': 1890, 'farce': 1654, 'bank': 394, 'disjointed': 1287, 'amanda': 205, 'root': 3737, 'sally': 3776, 'anne': 246, 'mrs': 2930, 'sister': 4001, 'mary': 2751, 'established': 1524, 'suited': 4286, 'eyre': 1614, 'required': 3641, 'accident': 84, 'blue': 518, 'visits': 4756, 'smith': 4038, 'demented': 1176, 'jackson': 2375, 'nostalgic': 3039, 'bakshi': 386, 'primarily': 3393, 'technique': 4383, 'rings': 3699, 'notes': 3046, 'warmth': 4796, 'lacking': 2504, 'aired': 165, 'wrap': 4963, 'grandmother': 1955, 'scores': 3823, 'summary': 4290, 'requires': 3642, 'frankly': 1823, 'kinda': 2477, 'regret': 3591, 'relations': 3596, 'usa': 4691, 'stupidity': 4248, 'sympathetic': 4342, 'signed': 3971, 'reached': 3537, 'eccentric': 1414, 'vulnerable': 4768, 'occasional': 3074, 'thoughts': 4464, 'guilt': 1994, 'beings': 449, 'reputation': 3640, 'betty': 474, 'florida': 1761, 'color': 877, 'harris': 2038, 'catching': 716, 'represents': 3639, 'richardson': 3688, 'divorce': 1301, 'allows': 192, 'costume': 1005, 'mere': 2814, 'believing': 457, 'heights': 2073, 'jean': 2389, 'cake': 648, 'hair': 2005, 'thumbs': 4485, 'tear': 4379, 'innovative': 2295, 'brilliantly': 584, 'tune': 4602, 'chooses': 800, 'dollars': 1315, 'prison': 3401, 'executive': 1567, 'storytelling': 4209, 'normal': 3033, 'testament': 4417, 'consists': 955, '1980s': 21, 'sequels': 3886, 'advertising': 132, 'shape': 3919, 'maker': 2714, 'exposure': 1599, 'quickly': 3491, 'fault': 1668, 'happiness': 2031, 'vague': 4705, 'eyed': 1612, 'females': 1694, 'scared': 3805, 'possibilities': 3348, 'viewing': 4740, 'assumed': 329, 'cynical': 1093, 'warned': 4798, 'christians': 809, 'blame': 499, 'complain': 919, 'grown': 1984, 'adults': 126, 'accepted': 82, 'attracted': 349, 'matt': 2768, 'generation': 1883, 'twists': 4619, 'hired': 2117, 'vs': 4767, 'dear': 1140, 'spell': 4104, 'doom': 1326, 'solo': 4058, 'trap': 4563, 'laid': 2508, 'battles': 415, 'williams': 4888, 'farm': 1656, 'calm': 655, 'earl': 1398, 'cinemas': 817, 'brutally': 604, 'sword': 4340, 'levels': 2585, 'stated': 4161, 'provoking': 3452, 'angle': 236, 'staring': 4152, 'polanski': 3321, 'aware': 368, 'pulled': 3462, 'highlights': 2104, 'dub': 1378, 'ears': 1402, 'pitch': 3279, 'realise': 3548, 'regardless': 3590, 'corner': 996, 'insight': 2298, 'paranoia': 3186, 'kings': 2481, 'liners': 2611, 'mood': 2898, 'walken': 4777, 'dracula': 1341, 'vampires': 4712, 'deaths': 1142, 'row': 3744, 'fay': 1674, '30s': 44, 'formulaic': 1803, 'aunt': 355, 'von': 4765, 'daughter': 1121, 'winner': 4897, 'vampire': 4711, 'museum': 2945, 'holmes': 2139, 'steals': 4172, 'kissing': 2484, 'dangerous': 1110, '1933': 12, 'showcase': 3954, 'cross': 1066, 'dressing': 1359, 'inevitably': 2283, 'stylish': 4251, 'tied': 4489, 'tedious': 4387, 'engage': 1479, 'melodramatic': 2798, 'caring': 691, 'bible': 477, 'news': 3005, 'sea': 3838, 'changing': 752, 'structure': 4233, 'killed': 2469, 'vice': 4727, 'code': 869, 'enjoying': 1487, 'torn': 4528, 'reflect': 3584, 'unexpected': 4650, 'revelation': 3674, 'monkey': 2890, 'bond': 532, 'throws': 4482, 'shell': 3927, '1996': 26, 'rochester': 3719, '70': 55, 'noticed': 3049, 'franco': 1820, 'winter': 4900, 'window': 4894, 'bringing': 586, 'merely': 2815, 'surface': 4310, 'complexity': 924, 'humorous': 2193, 'conversations': 981, 'staged': 4140, 'putting': 3482, 'wounded': 4961, 'crucial': 1068, 'remains': 3611, 'joan': 2404, 'likable': 2598, 'grace': 1950, 'heroine': 2091, 'thin': 4449, 'vision': 4754, 'passionate': 3206, 'further': 1853, 'necessarily': 2983, 'pity': 3281, 'soft': 4051, 'spoken': 4126, 'quiet': 3492, 'mild': 2839, 'moody': 2899, 'facial': 1619, 'attacks': 339, 'sleeping': 4025, 'patient': 3211, 'hardly': 2036, 'whereas': 4865, 'buddy': 609, 'walks': 4780, 'prom': 3429, 'props': 3439, 'embarrassed': 1453, 'produce': 3413, 'monkeys': 2891, 'cameras': 660, '50': 51, 'inspiration': 2300, 'dare': 1113, 'countries': 1013, 'returned': 3666, 'brown': 601, 'annie': 247, 'sullivan': 4288, 'helen': 2075, 'achieve': 92, 'learned': 2551, 'paint': 3173, 'lewis': 2586, 'retired': 3664, 'overlook': 3153, 'eg': 1434, 'uneven': 4649, 'picking': 3267, 'presents': 3377, 'painting': 3175, 'bears': 421, 'bell': 458, 'covers': 1024, 'pit': 3278, 'gods': 1928, 'garbo': 1871, 'spain': 4091, 'proper': 3436, 'concern': 931, 'novels': 3053, 'previously': 3388, 'causing': 726, 'discussion': 1284, '20th': 38, 'exaggerated': 1552, 'silent': 3975, 'fabulous': 1615, 'household': 2179, 'spectacular': 4101, 'stunt': 4245, 'imaginative': 2237, 'gary': 1873, 'stretch': 4221, 'bet': 470, 'davies': 1125, 'fu': 1844, 'brilliance': 582, 'silence': 3974, 'fictional': 1703, 'mentions': 2813, 'threw': 4470, 'consistent': 953, 'frame': 1815, 'incredibly': 2270, 'vote': 4766, 'relevant': 3604, 'suffers': 4278, 'explained': 1587, 'nyc': 3065, 'superman': 4301, 'powers': 3361, 'fallen': 1638, 'don': 1320, 'conceived': 929, 'worthless': 4954, 'introduction': 2339, 'train': 4556, 'accompanied': 86, 'dig': 1250, 'strictly': 4223, 'ordinary': 3128, 'june': 2433, 'expensive': 1579, 'bothered': 549, 'dating': 1120, 'continued': 970, 'health': 2060, 'church': 813, 'saved': 3795, 'felix': 1689, 'unintentional': 4658, 'ironic': 2355, 'friendly': 1836, 'east': 1407, 'regarding': 3589, 'mate': 2765, 'grey': 1973, 'hyde': 2206, 'occasion': 3073, 'ego': 1435, 'spy': 4135, 'simmons': 3981, 'raymond': 3534, 'illness': 2230, 'watches': 4814, 'dealing': 1136, 'meaningful': 2781, 'satisfied': 3789, 'block': 508, 'molly': 2884, 'using': 4698, 'troubles': 4594, 'towards': 4540, 'menacing': 2807, 'jessica': 2398, 'pulls': 3464, 'cardboard': 682, 'lousy': 2667, 'aforementioned': 139, 'pie': 3271, 'stale': 4141, 'greek': 1970, 'double': 1331, 'inferior': 2285, 'horrid': 2165, 'dawn': 1127, 'ted': 4386, 'dude': 1381, 'travel': 4567, 'humanity': 2189, 'friday': 1834, 'flashbacks': 1747, 'medium': 2792, 'powell': 3358, 'desperately': 1209, 'player': 3298, 'musicals': 2948, 'broadway': 592, 'hart': 2041, 'differences': 1247, 'bands': 392, 'spike': 4111, 'lou': 2663, 'courage': 1018, 'fond': 1776, 'lucy': 2684, 'haunting': 2051, 'redemption': 3578, 'unfolds': 4651, 'psychotic': 3459, 'gangsters': 1869, 'whilst': 4869, 'twenty': 4612, 'caused': 724, 'sentimental': 3882, 'spielberg': 4110, 'revolves': 3682, 'twelve': 4611, 'flawless': 1751, 'false': 1641, 'craig': 1030, 'alice': 182, 'airport': 167, 'drugs': 1374, 'hank': 2024, 'claire': 831, 'danes': 1108, 'kate': 2446, 'walker': 4778, 'rushed': 3760, 'hudson': 2184, 'mighty': 2836, 'achieved': 93, 'companion': 909, 'wing': 4896, 'racism': 3502, 'bacall': 377, 'secretary': 3850, 'millions': 2848, 'wild': 4885, 'landing': 2512, 'unfunny': 4655, 'timing': 4498, 'psychic': 3456, 'lake': 2509, 'abilities': 64, 'leo': 2571, 'titanic': 4503, 'utter': 4701, 'projects': 3428, 'vast': 4716, 'mildly': 2840, 'teach': 4374, 'zombie': 4997, 'dan': 1102, 'intentionally': 2320, 'witnessed': 4915, 'opens': 3115, 'neck': 2985, 'goofy': 1939, 'larger': 2519, 'deep': 1154, 'toys': 4543, 'coffee': 870, 'planning': 3291, 'returning': 3667, 'digital': 1251, 'diana': 1237, 'surviving': 4323, 'boxing': 556, 'progresses': 3426, 'alongside': 196, 'michelle': 2830, 'dance': 1103, 'drives': 1367, 'experiences': 1582, 'argument': 291, 'luck': 2681, 'philosophical': 3253, 'driving': 1368, 'argue': 290, 'shallow': 3917, 'pride': 3391, 'deeper': 1155, 'landscapes': 2514, 'chicago': 785, 'understandable': 4644, 'lynch': 2691, 'aimed': 162, 'lover': 2672, 'mistakes': 2873, 'fix': 1744, 'associated': 327, 'narrator': 2967, 'ryan': 3765, 'nine': 3017, 'brazil': 568, 'odds': 3082, 'handed': 2016, 'dickens': 1240, 'appeared': 276, 'edward': 1426, 'canada': 666, 'speed': 4103, 'provided': 3449, 'keaton': 2448, 'travels': 4569, 'cried': 1054, 'denzel': 1181, 'mickey': 2831, 'worn': 4948, 'rise': 3702, 'modesty': 2883, 'innocence': 2293, 'marks': 2742, 'earned': 1401, 'passes': 3203, 'centers': 734, 'football': 1783, 'blair': 497, 'massacre': 2754, 'succeeds': 4264, 'incoherent': 2265, 'realism': 3549, 'stops': 4203, 'aid': 160, 'magazine': 2701, 'compelled': 915, 'section': 3853, 'fate': 1665, 'grave': 1962, 'demon': 1177, 'zombies': 4998, 'doll': 1313, 'jesus': 2399, 'text': 4419, 'screens': 3832, 'explaining': 1588, 'ought': 3138, 'increasingly': 2268, 'lose': 2654, 'tends': 4402, 'tons': 4520, 'elm': 1447, 'parody': 3192, 'territory': 4411, 'hospital': 2172, 'promise': 3430, 'host': 2173, 'nail': 2957, 'arnold': 296, 'da': 1094, 'turner': 4607, 'soldier': 4053, 'hamilton': 2012, '1960s': 15, 'hippie': 2115, 'religious': 3608, 'helping': 2081, 'witches': 4910, 'manhattan': 2726, 'puts': 3481, 'neighbor': 2992, 'jimmy': 2403, 'stewart': 4184, 'boredom': 542, 'losing': 2658, 'chief': 788, 'sidney': 3968, 'mistaken': 2872, 'smooth': 4041, 'polished': 3323, 'drink': 1361, 'tie': 4488, 'dust': 1388, 'hang': 2022, 'clean': 838, 'walls': 4783, 'san': 3780, 'lesbian': 2573, 'standards': 4145, 'victim': 4729, '1930s': 11, 'resist': 3648, 'danger': 1109, 'directs': 1267, 'skip': 4016, 'careers': 687, 'band': 391, 'staff': 4138, 'dimensional': 1253, 'worthy': 4956, 'split': 4120, 'dollar': 1314, 'revolution': 3680, 'fighter': 1708, 'heroic': 2090, 'motivations': 2915, 'characterization': 759, 'theory': 4437, 'movement': 2923, 'identity': 2217, 'politics': 3326, 'negative': 2991, '1990s': 25, 'gender': 1878, 'normally': 3034, 'assume': 328, 'carried': 698, 'occurred': 3077, 'represented': 3638, 'presentation': 3375, 'singer': 3993, 'jaws': 2385, 'cousin': 1021, 'cars': 702, 'skin': 4015, 'wishing': 4907, 'lab': 2501, 'locked': 2637, 'virus': 4752, 'figures': 1713, 'headed': 2058, 'dancer': 1104, 'medical': 2790, 'furthermore': 1854, 'throat': 4476, 'paltrow': 3178, 'topic': 4526, 'partner': 3198, 'wake': 4774, 'jim': 2402, 'vivid': 4760, 'image': 2233, 'consequences': 947, 'unit': 4663, 'mission': 2870, 'excessive': 1560, 'stanley': 4148, 'trick': 4582, 'groups': 1981, 'apparent': 269, 'borrowed': 545, 'sixties': 4011, 'gray': 1963, 'marty': 2749, 'evident': 1547, 'pearl': 3220, 'extra': 1606, 'nearby': 2980, 'parties': 3196, 'happily': 2030, 'daddy': 1096, 'younger': 4990, 'diane': 1238, 'stayed': 4167, 'risk': 3704, 'southern': 4088, 'insulting': 2310, 'rape': 3521, 'basketball': 409, 'angela': 232, 'moore': 2901, 'flawed': 1750, 'recognize': 3569, 'prior': 3400, 'symbolism': 4341, 'canadian': 667, 'purposes': 3475, 'wallace': 4782, 'bound': 553, 'grows': 1985, 'collection': 874, 'shall': 3916, 'dancing': 1107, 'poignant': 3315, 'stereotype': 4178, 'justin': 2440, 'corrupt': 1001, 'system': 4345, 'secretly': 3851, 'beating': 425, 'cops': 993, 'greedy': 1969, 'journalist': 2420, 'served': 3894, 'altogether': 202, 'reasonably': 3559, 'convince': 983, 'uninteresting': 4660, 'sitcom': 4004, 'neat': 2982, 'currently': 1085, 'martin': 2748, 'noir': 3023, 'buck': 605, 'widmark': 4882, 'peters': 3248, 'ritter': 3705, 'secrets': 3852, 'womans': 4922, 'involving': 2349, 'chick': 786, 'melodrama': 2797, 'ann': 244, 'mothers': 2912, 'exists': 1573, 'importantly': 2248, 'endure': 1475, 'wrestling': 4966, 'deliberately': 1166, 'appearing': 277, 'tall': 4362, 'climactic': 848, 'singing': 3995, 'spoiled': 4122, 'orders': 3127, 'trade': 4547, 'rambo': 3511, 'amitabh': 216, 'barry': 401, 'philip': 3252, 'photos': 3261, 'rabbit': 3498, 'pack': 3165, 'encounter': 1466, 'con': 928, 'phony': 3256, 'revolutionary': 3681, 'elvis': 1451, 'mafia': 2700, 'ruby': 3749, 'led': 2558, 'randomly': 3515, 'controversial': 978, 'hopper': 2161, 'roy': 3745, 'okay': 3099, 'flight': 1757, 'murders': 2943, 'heck': 2071, 'apes': 267, 'clark': 832, 'drivel': 1364, 'shelf': 3926, 'blockbuster': 509, 'description': 1197, 'advice': 133, 'facts': 1624, 'buried': 624, 'topless': 4527, 'trite': 4589, 'constructed': 959, 'failing': 1627, 'delivers': 1172, 'gripping': 1976, 'pleasantly': 3303, 'slight': 4027, 'contained': 963, 'depiction': 1185, 'recognition': 3568, 'perspective': 3245, 'sadness': 3770, 'rare': 3523, 'extraordinary': 1607, 'extent': 1605, 'icon': 2211, 'tame': 4363, 'instantly': 2306, 'nose': 3037, 'stunning': 4244, 'hoffman': 2128, 'albert': 174, 'rob': 3710, 'suspenseful': 4331, 'duo': 1386, 'lumet': 2688, 'contrary': 974, 'afternoon': 144, 'managed': 2723, 'classics': 837, 'sexuality': 3909, 'multiple': 2936, 'classical': 836, 'amateur': 206, 'ensemble': 1492, 'emotionally': 1460, 'tap': 4364, 'carter': 703, 'turning': 4608, 'cave': 727, 'fits': 1741, 'ease': 1404, 'transition': 4561, 'reveals': 3673, 'center': 732, 'techniques': 4384, 'contemporary': 965, 'assault': 324, '1983': 22, 'matthew': 2772, 'rubbish': 3748, 'variety': 4714, 'wisdom': 4902, 'chess': 783, 'boot': 538, 'juvenile': 2441, 'delivering': 1171, 'albeit': 173, 'slightest': 4028, 'robbins': 3712, 'hearing': 2063, 'burned': 626, 'helps': 2082, 'introduce': 2336, 'harsh': 2040, 'loosely': 2651, 'caine': 647, 'composed': 926, 'heston': 2094, 'dentist': 1180, 'september': 3884, 'discuss': 1283, 'court': 1020, 'irrelevant': 2358, 'holiday': 2134, 'vacation': 4704, 'disappoint': 1272, 'versus': 4722, 'civilization': 827, 'machines': 2694, 'painted': 3174, '3rd': 47, 'deserve': 1199, 'pat': 3208, 'invisible': 2344, 'dealt': 1138, 'wouldve': 4959, 'knowledge': 2491, 'namely': 2962, 'dutch': 1389, 'nazi': 2977, 'masterful': 2758, 'distribution': 1298, 'frequently': 1832, 'pops': 3332, 'burns': 628, 'sinatra': 3989, 'cared': 685, 'hood': 2151, 'hat': 2045, 'goldberg': 1933, 'idiots': 2220, 'belong': 459, 'persons': 3244, 'picks': 3268, 'screams': 3828, 'raise': 3507, 'border': 539, 'mexico': 2827, 'ultra': 4629, 'genius': 1887, 'upper': 4685, 'newspaper': 3006, 'carradine': 695, 'cheated': 775, 'ya': 4976, 'nancy': 2964, 'faces': 1618, 'vicious': 4728, 'pushed': 3478, 'refuses': 3587, 'receive': 3563, 'terry': 4415, 'curtis': 1087, 'considerable': 950, 'junior': 2435, 'burning': 627, 'bodies': 526, 'survivors': 4325, 'breaks': 571, 'fathers': 1667, 'sisters': 4002, 'troops': 4591, 'broadcast': 591, 'primary': 3394, 'smile': 4036, 'broke': 593, 'baker': 385, 'satisfying': 3791, 'ambitious': 211, 'conspiracy': 956, 'hall': 2009, 'cleverly': 842, 'al': 170, 'pacino': 3164, 'mayor': 2777, 'overlooked': 3154, 'helicopter': 2076, 'maintain': 2710, 'hook': 2152, 'bridge': 578, 'tree': 4576, 'lifestyle': 2593, 'providing': 3451, 'celebrity': 729, 'rooms': 3736, 'editor': 1423, 'sleazy': 4023, 'homosexual': 2145, 'desire': 1205, 'explored': 1595, 'ingredients': 2289, 'mirror': 2861, 'possessed': 3347, 'lacks': 2505, 'credibility': 1047, 'chicks': 787, 'cameos': 658, 'cheese': 780, 'dry': 1377, 'sport': 4129, 'shower': 3957, 'dirty': 1268, 'harry': 2039, 'eastwood': 1409, 'vehicle': 4718, 'chosen': 805, 'qualities': 3483, 'ron': 3734, 'punch': 3466, 'carrey': 696, 'transformation': 4560, 'bitter': 491, 'visuals': 4759, 'succeed': 4262, 'quest': 3486, 'jerk': 2395, 'savage': 3793, 'scare': 3803, 'margaret': 2735, 'performer': 3231, 'breathtaking': 574, 'blacks': 494, 'sophisticated': 4075, 'slave': 4022, 'jewish': 2401, 'americas': 215, 'hollywoods': 2138, 'march': 2734, 'loser': 2655, 'petty': 3249, 'appealing': 272, 'portrait': 3339, 'enterprise': 1495, 'partly': 3197, 'pulp': 3465, 'hanging': 2023, 'expressed': 1601, 'individuals': 2278, 'benefit': 465, 'kung': 2498, 'awe': 370, 'enter': 1494, 'dragon': 1344, 'lion': 2614, 'kay': 2447, 'flashback': 1746, 'imitation': 2241, 'reviewer': 3677, 'draw': 1349, 'consistently': 954, 'measure': 2786, 'uninspired': 4657, 'washington': 4806, 'brady': 561, 'priceless': 3390, 'voight': 4764, 'purchase': 3469, 'womens': 4924, 'spider': 4109, 'jonathan': 2417, 'winters': 4901, 'storm': 4206, 'fred': 1825, 'delivery': 1173, 'francis': 1818, 'ford': 1789, 'godfather': 1927, 'taught': 4370, 'creator': 1043, 'tiresome': 4502, 'balls': 390, 'rocky': 3723, 'susan': 4326, 'jamie': 2379, 'abandoned': 62, 'motives': 2916, 'existed': 1570, 'surreal': 4317, 'reasonable': 3558, 'hills': 2109, 'passion': 3205, 'thoroughly': 4459, 'narrative': 2966, 'fortunately': 1805, 'colour': 880, 'blues': 519, 'stretched': 4222, 'wall': 4781, 'attempted': 341, 'psychiatrist': 3455, 'meaningless': 2782, 'briefly': 580, 'judy': 2426, 'displays': 1293, 'wells': 4845, 'claimed': 829, 'matched': 2763, 'prefer': 3368, 'useful': 4694, 'handled': 2019, 'scares': 3806, 'sticks': 4186, 'ignore': 2224, 'husbands': 2205, 'arrested': 298, 'irony': 2357, 'beliefs': 452, 'segments': 3868, 'practically': 3362, 'brian': 576, 'godzilla': 1929, 'buildings': 616, 'plant': 3293, 'adorable': 124, 'idiot': 2218, 'bergman': 466, 'jon': 2416, 'del': 1165, 'che': 773, 'bone': 533, 'square': 4136, 'letting': 2583, 'developing': 1224, 'focuses': 1768, 'link': 2613, 'damon': 1101, 'notably': 3042, 'financial': 1725, 'neighbors': 2994, 'hopes': 2159, 'cary': 706, 'dad': 1095, 'stilted': 4190, 'drawing': 1350, 'giallo': 1904, 'choices': 798, 'contrast': 975, 'greed': 1968, 'lower': 2677, 'handle': 2018, 'scope': 3821, 'weapons': 4829, 'fired': 1735, 'attitudes': 347, 'cowboy': 1025, 'tradition': 4549, 'suspect': 4327, 'arrogant': 303, 'closer': 856, 'credible': 1048, 'hollow': 2135, 'mindless': 2851, 'widow': 4883, 'performing': 3233, 'struggles': 4235, 'equivalent': 1512, 'hopeless': 2158, 'doomed': 1327, 'enemy': 1477, 'graphics': 1959, 'sports': 4130, 'user': 4696, 'catherine': 719, 'macy': 2695, 'grandfather': 1954, 'discovery': 1282, 'broad': 590, 'california': 650, 'spoof': 4127, 'carefully': 689, 'occur': 3076, 'potentially': 3356, 'commercials': 903, 'honesty': 2148, 'concerning': 933, 'meat': 2787, 'saving': 3797, 'seeking': 3859, 'alex': 179, 'rocket': 3721, 'brooklyn': 595, 'games': 1865, 'brains': 563, 'asleep': 320, 'challenge': 743, 'troubled': 4593, 'attorney': 348, 'swim': 4337, 'fairy': 1632, 'obnoxious': 3067, 'funnier': 1850, 'taxi': 4371, 'driver': 1366, 'prime': 3395, 'rule': 3753, 'spots': 4132, 'awfully': 373, 'pitt': 3280, 'touches': 4535, 'limited': 2606, 'multi': 2935, 'patients': 3212, 'receives': 3565, 'loyal': 2679, 'loving': 2675, 'neil': 2995, 'wars': 4804, 'performers': 3232, 'ups': 4686, 'appreciation': 281, 'cultural': 1078, 'jake': 2377, 'pleased': 3305, 'nonsensical': 3031, 'security': 3854, 'los': 2653, 'angeles': 233, 'videos': 4734, 'robbery': 3711, 'illegal': 2229, 'calling': 653, 'photo': 3257, 'aids': 161, 'online': 3109, 'sunshine': 4294, 'understated': 4646, 'advise': 134, 'hal': 2006, 'misses': 2868, 'basement': 405, 'lazy': 2543, 'blah': 496, 'cable': 644, 'angels': 234, 'defense': 1159, 'traveling': 4568, 'connect': 944, 'represent': 3637, 'fears': 1677, '13th': 5, 'angel': 231, 'racial': 3501, 'rival': 3706, 'leg': 2562, 'lovely': 2671, 'prostitute': 3440, 'drags': 1345, 'lyrics': 2692, 'teenager': 4390, 'response': 3654, 'lucky': 2683, 'truck': 4595, 'birth': 486, 'lena': 2568, 'anime': 243, 'fulci': 1845, 'noted': 3045, 'kudos': 2497, 'mummy': 2937, 'dire': 1257, 'priest': 3392, 'property': 3438, 'hire': 2116, 'contract': 973, 'lips': 2615, 'device': 1227, 'bedroom': 437, 'opened': 3113, 'disease': 1285, 'lay': 2542, 'demands': 1175, 'minded': 2850, 'closed': 854, 'wound': 4960, 'astaire': 330, 'ginger': 1908, 'francisco': 1819, 'outfit': 3144, 'wealthy': 4827, 'rogers': 3725, 'marvelous': 2750, 'sir': 4000, 'believed': 455, 'reel': 3581, 'shocked': 3939, 'craft': 1028, 'stones': 4198, 'scientists': 3819, 'devoted': 1231, 'escaped': 1518, 'cards': 683, 'explanation': 1590, 'alcoholic': 177, 'dreary': 1356, 'omen': 3104, 'italy': 2368, 'warrior': 4802, 'souls': 4080, 'nicholson': 3011, 'lesser': 2576, 'neo': 2998, 'efforts': 1433, 'desperation': 1210, 'shadows': 3913, 'portray': 3340, 'stinks': 4192, 'misery': 2865, 'beloved': 461, 'improve': 2254, 'snake': 4042, 'accused': 91, 'chases': 771, 'analysis': 224, 'tag': 4348, 'westerns': 4854, 'destiny': 1212, 'seeks': 3860, 'freedom': 1828, 'tales': 4357, 'latest': 2526, 'frustration': 1843, 'alert': 178, 'ripped': 3701, 'mixture': 2878, 'tracy': 4546, 'everyones': 1543, 'warren': 4801, 'beatty': 427, 'colors': 879, 'smiling': 4037, 'hitchcock': 2123, 'jeremy': 2394, 'recording': 3575, 'safety': 3772, 'builds': 617, 'subtlety': 4261, 'dynamic': 1394, 'convoluted': 987, 'profound': 3423, 'strikes': 4225, 'lucas': 2680, 'judging': 2425, 'secondly': 3847, 'saturday': 3792, 'struck': 4232, 'phantom': 3251, 'jeff': 2390, 'nasty': 2968, 'inventive': 2341, 'merits': 2817, 'carpenter': 694, 'tommy': 4516, 'feminist': 1695, 'mars': 2746, 'equally': 1510, 'matches': 2764, 'championship': 745, 'passed': 3202, 'kane': 2442, 'defeat': 1157, 'guest': 1992, 'ape': 266, 'satire': 3788, 'ireland': 2352, 'streisand': 4218, 'raped': 3522, 'generous': 1886, 'empathy': 1462, 'purely': 3472, 'clichéd': 845, 'rick': 3689, 'proud': 3444, 'lowest': 2678, 'dislike': 1288, '1990': 24, 'serves': 3895, 'chances': 748, 'jay': 2386, 'cox': 1026, 'operation': 3117, 'rules': 3754, 'elaborate': 1439, 'ha': 2002, 'loads': 2633, 'remarks': 3615, 'inane': 2258, 'purple': 3473, 'repeatedly': 3632, 'cook': 988, 'kubrick': 2496, 'closet': 858, 'bumbling': 622, 'raised': 3508, 'solely': 4056, 'domestic': 1318, 'beaten': 424, 'inappropriate': 2259, 'countless': 1012, 'whom': 4875, 'committed': 905, 'carrie': 697, 'fisher': 1739, 'guessing': 1991, 'terrorists': 4414, 'bin': 483, 'sean': 3840, 'copies': 992, 'decades': 1145, 'legendary': 2565, 'wes': 4851, 'craven': 1034, 'unreal': 4676, 'dinner': 1254, 'express': 1600, 'hint': 2112, 'issue': 2364, 'liberal': 2588, 'pot': 3354, 'dinosaurs': 1256, 'domino': 1319, 'edgar': 1417, 'acid': 95, 'hung': 2197, 'tea': 4373, 'easier': 1405, 'derek': 1192, 'unconvincing': 4638, 'funeral': 1849, 'theyd': 4443, 'irritating': 2359, 'sympathy': 4343, 'sarah': 3785, 'tonight': 4519, 'removed': 3624, 'gift': 1906, 'thru': 4483, 'trailer': 4554, 'cole': 873, 'smaller': 4034, 'pretending': 3382, 'enjoyment': 1488, 'blow': 514, 'messed': 2821, 'mitchell': 2875, 'sits': 4006, 'meanwhile': 2785, 'vaguely': 4706, 'louise': 2666, 'montage': 2894, 'kicked': 2461, '35': 45, 'laura': 2536, 'drinking': 1362, 'principal': 3398, 'crisis': 1060, 'dialogs': 1233, 'reunion': 3669, 'views': 4742, 'scottish': 3825, 'imagery': 2234, 'blew': 505, 'mel': 2796, 'warriors': 4803, 'combined': 885, 'sandra': 3782, 'todd': 4510, 'bela': 450, 'lugosi': 2686, 'visually': 4758, 'relatives': 3601, 'estate': 1525, 'ruins': 3752, 'blunt': 520, 'leonard': 2572, 'comparing': 913, 'unaware': 4632, 'definite': 1161, 'donald': 1321, 'abusive': 75, 'plausible': 3295, 'hamlet': 2013, 'ruth': 3763, 'communist': 907, 'relative': 3599, 'sexually': 3910, 'curse': 1086, 'faithful': 1634, 'england': 1482, 'hoped': 2156, 'singers': 3994, 'demons': 1178, 'ad': 110, 'switch': 4339, 'philosophy': 3254, 'moronic': 2907, 'whoever': 4872, 'maria': 2736, 'trail': 4553, 'redeeming': 3577, 'corporate': 998, 'bother': 548, 'adams': 112, 'rangers': 3517, 'justify': 2439, 'lying': 2690, 'comics': 895, 'neighborhood': 2993, 'terrifying': 4410, 'bird': 484, 'realizes': 3554, 'essence': 1521, 'showdown': 3955, 'photographed': 3258, 'suitable': 4285, 'tooth': 4524, 'officers': 3093, '2003': 33, 'roman': 3731, 'opinions': 3119, 'oscars': 3134, 'bull': 619, 'harder': 2035, 'masters': 2761, 'pool': 3327, 'swimming': 4338, 'press': 3379, 'miller': 2846, 'bettie': 473, 'sink': 3999, 'devoid': 1230, 'wrapped': 4964, 'marketing': 2741, 'chorus': 803, 'proof': 3434, 'depressed': 1187, 'clown': 862, 'hill': 2108, 'parallel': 3185, 'commented': 900, 'album': 175, '1999': 27, 'magical': 2704, 'forbidden': 1785, 'spare': 4093, 'sue': 4274, 'forgot': 1796, 'muslim': 2949, 'terrorist': 4413, 'despair': 1207, 'counter': 1011, 'stellar': 4174, 'spooky': 4128, 'ward': 4793, 'absence': 70, 'wives': 4918, 'butt': 638, 'unbearable': 4633, 'spite': 4117, 'unsettling': 4679, 'promised': 3431, 'dawson': 1128, 'tarzan': 4367, 'virginia': 4750, 'ellen': 1446, 'searching': 3842, 'sellers': 3872, 'laurel': 2537, 'weight': 4840, 'fortune': 1806, 'worlds': 4947, 'steel': 4173, 'cinderella': 815, 'businessman': 634, 'resemblance': 3645, 'merit': 2816, 'korean': 2495, 'spring': 4134, 'cities': 823, 'annoyed': 248, 'presumably': 3380, 'fifteen': 1706, 'praise': 3364, 'kidding': 2465, 'wholly': 4874, 'hiding': 2099, 'attacked': 338, 'temple': 4398, 'serving': 3897, 'highest': 2102, 'contain': 962, 'models': 2881, 'robots': 3718, 'morris': 2908, 'fbi': 1675, 'blank': 501, 'explosion': 1596, 'instinct': 2308, 'gain': 1863, 'access': 83, 'stevens': 4183, 'bullets': 621, 'corruption': 1002, 'stinker': 4191, 'critic': 1061, 'catchy': 717, 'surfing': 4311, 'pushing': 3479, 'renting': 3629, 'eat': 1411, 'depression': 1189, 'chased': 770, 'mountain': 2917, 'dolph': 1317, 'wreck': 4965, '2001': 31, 'gag': 1861, 'roger': 3724, 'rolled': 3729, 'experiments': 1584, 'karen': 2444, 'ham': 2011, 'ear': 1397, 'treats': 4575, 'yellow': 4981, 'laws': 2540, 'hence': 2083, 'embarrassing': 1454, 'reflection': 3585, 'worried': 4949, 'campbell': 663, 'iran': 2350, 'horrifying': 2167, 'fascinated': 1657, 'frequent': 1831, 'fooled': 1780, 'butler': 637, 'clip': 851, 'april': 284, 'kidnapped': 2466, 'rendition': 3625, 'emphasis': 1463, 'yelling': 4980, 'writes': 4970, 'vegas': 4717, 'le': 2544, 'spread': 4133, 'avoided': 364, 'hype': 2207, 'staying': 4168, 'overrated': 3156, 'ironically': 2356, 'colonel': 876, 'synopsis': 4344, 'saga': 3773, 'eve': 1532, 'poetic': 3313, 'flynn': 1765, 'typically': 4624, 'trashy': 4566, 'dave': 1123, 'gradually': 1952, 'citizen': 824, 'solution': 4059, 'perfection': 3225, 'em': 1452, 'documentaries': 1306, 'twins': 4616, 'imagined': 2239, 'plight': 3308, 'bonus': 534, 'scarecrow': 3804, 'joined': 2413, 'button': 639, 'arguably': 289, 'dubbed': 1379, 'cagney': 646, 'robinson': 3716, 'dolls': 1316, 'relation': 3595, 'orange': 3125, 'reads': 3545, 'settle': 3902, 'wed': 4835, 'bourne': 554, '2007': 37, 'prize': 3405, 'literature': 2623, 'wealth': 4826, 'legal': 2563, 'conflicts': 940, 'lloyd': 2630, 'trek': 4578, 'cringe': 1059, 'trademark': 4548, 'masses': 2755, 'official': 3094, 'kenneth': 2455, '3d': 46, 'report': 3635, 'ballet': 389, 'opportunities': 3120, 'elegant': 1441, 'mechanical': 2788, 'stanwyck': 4149, 'kennedy': 2454, 'improvement': 2256, 'ashley': 313, 'selling': 3873, 'pro': 3406, 'wandering': 4785, 'endless': 1473, 'enthusiasm': 1501, 'stiller': 4189, 'melting': 2799, 'complaint': 920, 'ignorant': 2223, 'miserable': 2863, 'twilight': 4614, 'cinematographer': 819, 'lit': 2621, 'outer': 3143, 'chapter': 757, 'mundane': 2938, 'notion': 3050, 'channels': 754, 'senseless': 3878, 'incompetent': 2266, 'bottle': 550, 'fields': 1705, 'ah': 158, 'outrageous': 3145, 'designs': 1204, 'depicting': 1184, 'gifted': 1907, 'karloff': 2445, 'demand': 1174, 'firstly': 1737, 'lees': 2560, 'agrees': 157, 'jaw': 2384, 'pink': 3277, 'alfred': 181, 'flop': 1760, 'caliber': 649, 'dedicated': 1153, 'homage': 2141, 'emily': 1456, 'resources': 3651, 'opposed': 3122, 'hates': 2048, 'cg': 739, 'kicking': 2462, 'shelley': 3928, 'detailed': 1218, 'ramones': 3512, 'khan': 2459, 'directorial': 1265, 'allowing': 191, 'victoria': 4732, 'mildred': 2841, 'goodness': 1938, 'wished': 4905, 'gundam': 1997, 'satisfy': 3790, '1940s': 13, 'awake': 365, 'timothy': 4499, 'mann': 2728, 'distance': 1294, 'seagal': 3839, 'fonda': 1777, 'coach': 868, 'rap': 3520, 'intimate': 2331, 'joey': 2408, 'toy': 4542, 'sole': 4055, 'incomprehensible': 2267, 'tracks': 4545, 'strike': 4224, 'cameron': 661, 'holy': 2140, 'ties': 4490, 'cheating': 776, 'advanced': 128, 'equipment': 1511, 'dalton': 1098, 'willis': 4890, 'cup': 1080, 'chaplin': 756, 'fish': 1738, 'union': 4661, 'lily': 2605, 'eager': 1396, 'rubber': 3747, 'catches': 715, '1980': 20, 'montana': 2895, 'simpson': 3987, 'education': 1424, 'shirley': 3936, 'creators': 1044, 'bud': 607, 'santa': 3783, 'waters': 4817, 'fury': 1855, 'laughably': 2531, 'jealous': 2388, 'politically': 3325, 'explosions': 1597, 'britain': 588, 'performs': 3234, 'sincere': 3991, 'gabriel': 1859, 'hardy': 2037, 'branagh': 564, 'jeffrey': 2391, 'miike': 2837, 'holly': 2136, 'georges': 1894, 'pal': 3177, 'gerard': 1895, 'gandhi': 1866, 'satan': 3787, 'brando': 566, 'jenny': 2393, 'scooby': 3820, 'doo': 1325, 'scrooge': 3837, 'kapoor': 2443, 'brosnan': 597, 'homer': 2144, 'pokemon': 3320}
    
    words = review_to_words(event['body'])
    bow = bow_encoding(words, vocab)

    # The SageMaker runtime is what allows us to invoke the endpoint that we've created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # Now we use the SageMaker runtime to invoke our endpoint, sending the review we were given
    response = runtime.invoke_endpoint(EndpointName = 'sagemaker-xgboost-2020-05-08-19-29-55-770',# The name of the endpoint we created
                                       ContentType = 'text/csv',                 # The data format that is expected
                                       Body = ','.join([str(val) for val in bow]).encode('utf-8')) # The actual review

    # The response is an HTTP response whose body contains the result of our inference
    result = response['Body'].read().decode('utf-8')
    
    # Round the result so that our web app only gets '1' or '0' as a response.
    result = round(float(result))

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : str(result)
    }
```

Once you have copy and pasted the code above into the Lambda code editor, replace the `**ENDPOINT NAME HERE**` portion with the name of the endpoint that we deployed earlier. You can determine the name of the endpoint using the code cell below.

In [49]:
xgb_predictor.endpoint

'sagemaker-xgboost-2020-05-08-19-29-55-770'

In addition, you will need to copy the vocabulary dict to the appropriate place in the code at the beginning of the `lambda_handler` method. The cell below prints out the vocabulary dict in a way that is easy to copy and paste.

In [62]:
del(test_words)

Once you have added the endpoint name to the Lambda function, click on **Save**. Your Lambda function is now up and running. Next we need to create a way for our web app to execute the Lambda function.

### Setting up API Gateway

Now that our Lambda function is set up, it is time to create a new API using API Gateway that will trigger the Lambda function we have just created.

Using AWS Console, navigate to **Amazon API Gateway** and then click on **Get started**.

On the next page, make sure that **New API** is selected and give the new api a name, for example, `sentiment_analysis_web_app`. Then, click on **Create API**.

Now we have created an API, however it doesn't currently do anything. What we want it to do is to trigger the Lambda function that we created earlier.

Select the **Actions** dropdown menu and click **Create Method**. A new blank method will be created, select its dropdown menu and select **POST**, then click on the check mark beside it.

For the integration point, make sure that **Lambda Function** is selected and click on the **Use Lambda Proxy integration**. This option makes sure that the data that is sent to the API is then sent directly to the Lambda function with no processing. It also means that the return value must be a proper response object as it will also not be processed by API Gateway.

Type the name of the Lambda function you created earlier into the **Lambda Function** text entry box and then click on **Save**. Click on **OK** in the pop-up box that then appears, giving permission to API Gateway to invoke the Lambda function you created.

The last step in creating the API Gateway is to select the **Actions** dropdown and click on **Deploy API**. You will need to create a new Deployment stage and name it anything you like, for example `prod`.

You have now successfully set up a public API to access your SageMaker model. Make sure to copy or write down the URL provided to invoke your newly created public API as this will be needed in the next step. This URL can be found at the top of the page, highlighted in blue next to the text **Invoke URL**.

## Step 7: Deploying our web app

Now that we have a publicly available API, we can start using it in a web app. For our purposes, we have provided a simple static html file which can make use of the public api you created earlier.

In the `website` folder there should be a file called `index.html`. Download the file to your computer and open that file up in a text editor of your choice. There should be a line which contains **\*\*REPLACE WITH PUBLIC API URL\*\***. Replace this string with the url that you wrote down in the last step and then save the file.

Now, if you open `index.html` on your local computer, your browser will behave as a local web server and you can use the provided site to interact with your SageMaker model.

If you'd like to go further, you can host this html file anywhere you'd like, for example using github or hosting a static site on Amazon's S3. Once you have done this you can share the link with anyone you'd like and have them play with it too!

> **Important Note** In order for the web app to communicate with the SageMaker endpoint, the endpoint has to actually be deployed and running. This means that you are paying for it. Make sure that the endpoint is running when you want to use the web app but that you shut it down when you don't need it, otherwise you will end up with a surprisingly large AWS bill.

### Delete the endpoint

Remember to always shut down your endpoint if you are no longer using it. You are charged for the length of time that the endpoint is running so if you forget and leave it on you could end up with an unexpectedly large bill.

In [58]:
del(response)

## Optional: Clean up

The default notebook instance on SageMaker doesn't have a lot of excess disk space available. As you continue to complete and execute notebooks you will eventually fill up this disk space, leading to errors which can be difficult to diagnose. Once you are completely finished using a notebook it is a good idea to remove the files that you created along the way. Of course, you can do this from the terminal or from the notebook hub if you would like. The cell below contains some commands to clean up the created files from within the notebook.

In [63]:
# First we will remove all of the files contained in the data_dir directory
!rm $data_dir/*

# And then we delete the directory itself
!rmdir $data_dir

# Similarly we remove the files in the cache_dir directory and the directory itself
!rm $cache_dir/*
!rmdir $cache_dir

OSError: [Errno 12] Cannot allocate memory