# Creating a Sentiment Analysis Web App
## Using PyTorch and SageMaker

_Deep Learning Nanodegree Program | Deployment_

---

The goal will be to have a simple web page which a user can use to enter a movie review. The web page will then send the review off to our deployed model which will predict the sentiment of the entered review.


## General Outline

The general outline for SageMaker projects using a notebook instance are as follows:

1. Download or otherwise retrieve the data.
2. Process / Prepare the data.
3. Upload the processed data to S3.
4. Train a chosen model.
5. Test the trained model (typically using a batch transform job).
6. Deploy the trained model.
7. Use the deployed model.

For this project, the steps above will be conducted. 

## Step 1: Downloading the data

As in the XGBoost in SageMaker notebook, the [IMDb dataset](http://ai.stanford.edu/~amaas/data/sentiment/) will be used.

> Maas, Andrew L., et al. [Learning Word Vectors for Sentiment Analysis](http://ai.stanford.edu/~amaas/data/sentiment/). In _Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies_. Association for Computational Linguistics, 2011.

In [1]:
%mkdir ../data
!wget -O ../data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf ../data/aclImdb_v1.tar.gz -C ../data

mkdir: cannot create directory ‘../data’: File exists
--2019-05-16 11:37:56--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘../data/aclImdb_v1.tar.gz’


2019-05-16 11:38:05 (8.46 MB/s) - ‘../data/aclImdb_v1.tar.gz’ saved [84125825/84125825]



## Step 2: Preparing and Processing the data

Also, as in the XGBoost notebook, some initial data processing will be done. To begin with, each of the reviews will be read and combined into a single input structure. Then, the dataset will be split into a training set and a testing set.

In [2]:
import os
import glob

def read_imdb_data(data_dir='../data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Positive review represented by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [3]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


Now that the raw training and testing data is read from the downloaded dataset, the positive and negative reviews will be combined and the resulting records will be shuffled.

In [4]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    # Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    # Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [5]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {}, test = {}".format(len(train_X), len(test_X)))

IMDb reviews (combined): train = 25000, test = 25000


Now that the training and testing sets is unified and prepared, a quick check is required to see an example of the data the model will be trained on. This is generally a good idea as it allows to see how each of the further processing steps affects the reviews and it also ensures that the data has been loaded correctly.

In [6]:
print(train_X[100])
print(train_y[100])

I've just returned from a showing of "My Left Foot" at our public library. What an emotional experience -- I feel drained and uplifted.<br /><br />It's the story of Christy Brown, Irish writer and painter, and based on the author's autobiographical "My Left Foot." Christy was born with a form of cerebral palsy such that the only limb he had good control of was his left foot. Doctors advised his parents he was hopelessly mentally retarded but his mother didn't give up on him and, somewhat as Annie Sullivan had done with Helen Keller, helped him achieve a breakthrough in which he learned the alphabet and then to read, write, and paint.<br /><br />This film won Academy Awards for Daniel Day-Lewis (best actor) as well as best supporting actress for the actress playing his mother; it also received Oscar nominations for best picture, best director, and best adapted screenplay.<br /><br />As a retired clinical psychologist and family therapist, while many films may entertain me, many also oft

The first step in processing the reviews is to make sure that any html tags that appear should be removed. In addition  tokenize the input, that way words such as *entertained* and *entertaining* are considered the same with regard to sentiment analysis.

In [7]:
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import *

import re
from bs4 import BeautifulSoup

def review_to_words(review):
    nltk.download("stopwords", quiet=True)
    stemmer = PorterStemmer()
    
    text = BeautifulSoup(review, "html.parser").get_text() # Remove HTML tags
    text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower()) # Convert to lower case
    words = text.split() # Split string into words
    words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
    words = [PorterStemmer().stem(w) for w in words] # Stem
    
    return words

The `review_to_words` method defined above uses `BeautifulSoup` to remove any html tags that appear and uses the `nltk` package to tokenize the reviews. As a check to ensure everything is working, `review_to_words` is applied to one of the reviews in the training set.

In [8]:
# Apply review_to_words to a review (train_X[100] or any other review)
review_to_words(train_X[0])

['know',
 'purist',
 'poo',
 'poo',
 'anyth',
 'exactli',
 'like',
 'origin',
 'howev',
 'sometim',
 'spin',
 'off',
 'stand',
 'merit',
 'like',
 'new',
 'iron',
 'chef',
 'similar',
 'enough',
 'japanes',
 'version',
 'time',
 'cater',
 'american',
 'spirit',
 'love',
 'alton',
 'brown',
 'comment',
 'explain',
 'thing',
 'flair',
 'iron',
 'chef',
 'interest',
 'know',
 'origin',
 'probabl',
 'best',
 'chef',
 'planet',
 'time',
 'bobbi',
 'flay',
 'american',
 'iron',
 'chef',
 'beat',
 'mario',
 'batali',
 'seem',
 'fun',
 'cook',
 'make',
 'comment',
 'flashi',
 'creat',
 'watch',
 'seri',
 'find',
 'player',
 'work',
 'togeth',
 'well',
 'judg',
 'alway',
 'best',
 'choic',
 'howev',
 'except',
 'like',
 'lawyer',
 'turn',
 'foodi',
 'judg',
 'question',
 'abl',
 'handl',
 'serv',
 'enjoy',
 'watch',
 'chef',
 'hustl',
 'challeng',
 'surpris',
 'food',
 'end',
 'alway',
 'look',
 'amaz',
 'sometim',
 'inspir',
 'kitchen',
 'perhap',
 'anyon',
 'ask',
 'want',
 'realli',
 'eat',


**Question/Analysis:** Above it's mentioned that `review_to_words` method removes html formatting and allows to tokenize the words found in a review, for example, converting *entertained* and *entertaining* into *entertain* so that they are treated as though they are the same word. What else, if anything, does this method do to the input?

**Answer:** In addition, the method also remove stopwords, punctuations marks, and makes all words lowercase.

The method below applies the `review_to_words` method to each of the reviews in the training and testing datasets. In addition it caches the results. This is because performing this processing step can take a long time. This way if unable to complete the notebook in the current session, it can be resumed from here without needing to process the data a second time.

In [9]:
import pickle

cache_dir = os.path.join("../cache", "sentiment_analysis")  # Where to store cache files
os.makedirs(cache_dir, exist_ok=True)  # Ensure cache directory exists

def preprocess_data(data_train, data_test, labels_train, labels_test,
                    cache_dir=cache_dir, cache_file="preprocessed_data.pkl"):
    """Convert each review to words; read from cache if available."""

    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), "rb") as f:
                cache_data = pickle.load(f)
            print("Read preprocessed data from cache file:", cache_file)
        except:
            pass  # Unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Preprocess training and test data to obtain words for each review
        #words_train = list(map(review_to_words, data_train))
        #words_test = list(map(review_to_words, data_test))
        words_train = [review_to_words(review) for review in data_train]
        words_test = [review_to_words(review) for review in data_test]
        
        # Write to cache file for future runs
        if cache_file is not None:
            cache_data = dict(words_train=words_train, words_test=words_test,
                              labels_train=labels_train, labels_test=labels_test)
            with open(os.path.join(cache_dir, cache_file), "wb") as f:
                pickle.dump(cache_data, f)
            print("Wrote preprocessed data to cache file:", cache_file)
    else:
        # Unpack data loaded from cache file
        words_train, words_test, labels_train, labels_test = (cache_data['words_train'],
                cache_data['words_test'], cache_data['labels_train'], cache_data['labels_test'])
    
    return words_train, words_test, labels_train, labels_test

In [10]:
# Preprocess data
train_X, test_X, train_y, test_y = preprocess_data(train_X, test_X, train_y, test_y)

Read preprocessed data from cache file: preprocessed_data.pkl


## Transform the data

To start, each word will be represented as an integer. Of course, some of the words that appear in the reviews occur very infrequently and so likely don't contain much information for the purposes of sentiment analysis. To address this  problem the size of the working vocabulary is fixed to only include the words that appear most frequently. Then, combine all of the infrequent words into a single category and, in this case, label it as `1`.

Since a recurrent neural network will be used, it will be convenient if the length of each review is the same. To do this, a size for the reviews will be fixed and then pad short reviews with the category 'no word' (which will be labelled `0`) and truncate long reviews.

### A word dictionary

To begin with, a way to map words that appear in the reviews to integers need to be constructed. Here the size of the vocabulary is fixed (including the 'no word' and 'infrequent' categories) to be `5000` but this can be changed to see how it affects the model.

> Implementation for the `build_dict()` method below. Note that even though the vocab_size is set to `5000`, a mapping is only constructed for the most frequently appearing `4998` words. This is because the special labels `0` for 'no word' and `1` are reserved for 'infrequent word'.

In [11]:
import numpy as np

def build_dict(data, vocab_size = 5000):
    """Construct and return a dictionary mapping each of the most frequently appearing words to a unique integer."""
    
    # Determine how often each word appears in `data`. Note that 
    # `data` is a list of sentences and that a sentence is a list of words.
    
    word_count = {} # A dict storing the words that appear in the reviews along with how often they occur
    
    for review in data:
        for word in review:
            if word in word_count:
                word_count[word] += 1
            else:
                word_count[word] = 1
                
    # Sort the words found in `data` 
    # sorted_words[0] is the most frequently appearing word
    # sorted_words[-1] is the least frequently appearing word.
    
    sorted_word_list = sorted(word_count.items(), key=(lambda word: word[1]), reverse=True)
    sorted_words = [word[0] for word in sorted_word_list]
    
    word_dict = {} # A dictionary that translates words into integers
    for idx, word in enumerate(sorted_words[:vocab_size - 2]): # The -2 is for the 'no word'
        word_dict[word] = idx + 2                              # 'infrequent' labels
        
    return word_dict

In [12]:
word_dict = build_dict(train_X)

**Question/Analysis:** What are the five most frequently appearing (tokenized) words in the training set? Does it makes sense that these words appear frequently in the training set?

**Answer:** The following words in the code cell below.

In [13]:
list(word_dict.keys())[:5]

['movi', 'film', 'one', 'like', 'time']

### Save `word_dict`

Later on when an endpoint is constrcuted which processes a submitted review, `word_dict` will be used. As such, it will be saved to a file now for future use.

In [14]:
data_dir = '../data/pytorch' # The folder for storing data
if not os.path.exists(data_dir): # Make sure that the folder exists
    os.makedirs(data_dir)

In [15]:
with open(os.path.join(data_dir, 'word_dict.pkl'), "wb") as f:
    pickle.dump(word_dict, f)

### Transform the reviews

Now that the word dictionary allows to transform the words appearing in the reviews into integers, it is time to make use of it and convert the reviews to their integer sequence representation, making sure to pad or truncate to a fixed length, which in our case is `500`.

In [16]:
def convert_and_pad(word_dict, sentence, pad=500):
    NOWORD = 0 # 0 represents the 'no word' category
    INFREQ = 1 # 1 represents the infrequent words, i.e., words not appearing in word_dict
    
    working_sentence = [NOWORD] * pad
    
    for word_index, word in enumerate(sentence[:pad]):
        if word in word_dict:
            working_sentence[word_index] = word_dict[word]
        else:
            working_sentence[word_index] = INFREQ
            
    return working_sentence, min(len(sentence), pad)

def convert_and_pad_data(word_dict, data, pad=500):
    result = []
    lengths = []
    
    for sentence in data:
        converted, leng = convert_and_pad(word_dict, sentence, pad)
        result.append(converted)
        lengths.append(leng)
        
    return np.array(result), np.array(lengths)

In [17]:
train_X, train_X_len = convert_and_pad_data(word_dict, train_X)
test_X, test_X_len = convert_and_pad_data(word_dict, test_X)

As a quick check to make sure that things are working as intended, check to see what one of the reviews in the training set looks like after having been processeed.

In [18]:
# Examine one of the processed reviews to make sure everything is working as intended.
print("Length of a random review: ", len(train_X[100]))
print(train_X[100])

Length of a random review:  500
[1102  137  549    2   76   74  134 1562 1082  503  211  177   14 1082
   13    1    1  753 1689   66  909  197  994   20  172   56  109 1234
  205  956  111   98  808 1278  538  971    1 1738   17  796 1703 2229
    4    1 3224   49  578 1867  590  168   67   38    7  783 1562  106
  770  411  284 1867 4409  753 1162 4216    1  408    1   61 3915  103
  132 1524 3564  669  370    4    1  649    1  541    1  181  254    1
 1244   28  480 2128 4325  630    4    1   28  480   56  809   71  391
  163  389  111    1   71    1 1689  564    1  632    5 1236 1047   53
  132 1280 1162 1950    1 2593  139   59  146 2801 4184  757    1 1689
 1162 1350    1    1  581  178  552    1 2871    1    1 1745    9  928
  695  572  529 1521  238    1  974 1066    1  523 2349  117   53   31
  773   20   33  129   57  573  197  994   39 1531   69    2   74    3
  164  119  190  261   69    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0

**Question/Analysis:** In the cells above the `preprocess_data` and `convert_and_pad_data` methods is used to process both the training and testing set. Why or why not might this be a problem?

**Answer:** The preprocessing steps are kept the same for both the training and testing dataset, to help model make predictions  on the basis of the actual input only. Otherwise, the model might do quite well during the training, but fail on the raw testing data.

## Step 3: Upload the data to S3

The training dataset will be uploaded to S3 in order for the training code to access it. For now it will be saved  locally and uploaded to S3 later on.

### Save the processed training dataset locally

It is important to note the format of the data that are saved as it will need to be known when writing the training code. In this case, each row of the dataset has the form `label`, `length`, `review[500]` where `review[500]` is a sequence of `500` integers representing the words in the review.

In [19]:
import pandas as pd
    
pd.concat([pd.DataFrame(train_y), pd.DataFrame(train_X_len), pd.DataFrame(train_X)], axis=1) \
        .to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)

### Uploading the training data


Next, the training data will be uploaded to the SageMaker default S3 bucket so that access can be provided to it while training the model.

In [20]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/sentiment_rnn'

role = sagemaker.get_execution_role()

In [21]:
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

**NOTE:** The cell above uploads the entire contents of the data directory. This includes the `word_dict.pkl` file. This is fortunate as it will be needed later on when an endpoint is created that accepts an arbitrary review. For now, just take note of the fact that it resides in the data directory (and so also in the S3 training bucket) and that it needs to be made sure it gets saved in the model directory.

## Step 4: Build and Train the PyTorch Model

A model comprises three objects:

 - Model Artifacts
 - Training Code
 - Inference Code
 
Each of which interact with one another. Here containers provided by Amazon is used with the added benefit of being able to include own custom code.

Initially, the own neural network in PyTorch along with a training script will be implemented. For the purposes of this project, the necessary model object is in the `model.py` file, inside of the `train` folder. The implementation by is visible in the run cell below.

In [22]:
!pygmentize train/model.py

[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m

[34mclass[39;49;00m [04m[32mLSTMClassifier[39;49;00m(nn.Module):
    [33m"""[39;49;00m
[33m    This is the simple RNN model used to perform Sentiment Analysis.[39;49;00m
[33m    """[39;49;00m

    [34mdef[39;49;00m [32m__init__[39;49;00m([36mself[39;49;00m, embedding_dim, hidden_dim, vocab_size):
        [33m"""[39;49;00m
[33m        Initialize the model by setting up the various layers.[39;49;00m
[33m        """[39;49;00m
        [36msuper[39;49;00m(LSTMClassifier, [36mself[39;49;00m).[32m__init__[39;49;00m()

        [36mself[39;49;00m.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=[34m0[39;49;00m)
        [36mself[39;49;00m.lstm = nn.LSTM(embedding_dim, hidden_dim)
        [36mself[39;49;00m.dense = nn.Linear(in_features=hidden_dim, out_features=[34m1[39;49;00m)
        [36mself[39;49;00m.sig = nn.Sigmoid()
      

The important takeaway from the implementation is that there are three parameters that can be tweaked to improve the performance of the model. These are the embedding dimension, the hidden dimension and the size of the vocabulary. These parameters will be made configurable in the training script later on. 

First a small portion of the training data set is loaded to use as a sample. It would be very time consuming to try and train the model completely in the notebook as gpu and the compute instance available iis not particularly powerful. However, a small bit of the data can be worked on to get a feel for how the training script is behaving.

In [23]:
import torch
import torch.utils.data

# Read in only the first 250 rows
train_sample = pd.read_csv(os.path.join(data_dir, 'train.csv'), header=None, names=None, nrows=250)

# Turn the input pandas dataframe into tensors
train_sample_y = torch.from_numpy(train_sample[[0]].values).float().squeeze()
train_sample_X = torch.from_numpy(train_sample.drop([0], axis=1).values).long()

# Build the dataset
train_sample_ds = torch.utils.data.TensorDataset(train_sample_X, train_sample_y)
# Build the dataloader
train_sample_dl = torch.utils.data.DataLoader(train_sample_ds, batch_size=50)

### The training method

The training code itself will now be written. 

In [24]:
def train(model, train_loader, epochs, optimizer, loss_fn, device):
    for epoch in range(1, epochs + 1):
        model.train()
        total_loss = 0
        for batch in train_loader:         
            batch_X, batch_y = batch
            
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            optimizer.zero_grad()
            output = model.forward(batch_X)
            loss = loss_fn(output, batch_y)
            loss.backward()
            optimizer.step()
            
            total_loss += loss.data.item()
        print("Epoch: {}, BCELoss: {}".format(epoch, total_loss / len(train_loader)))

Using the small sample training set that was loaded earlier, the training method will be tested. The reason for doing this in the notebook is to fix any errors that arise early when they are easier to diagnose.

In [25]:
import torch.optim as optim
from train.model import LSTMClassifier

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMClassifier(32, 100, 5000).to(device)
optimizer = optim.Adam(model.parameters())
loss_fn = torch.nn.BCELoss()

train(model, train_sample_dl, 5, optimizer, loss_fn, device)

Epoch: 1, BCELoss: 0.6933461785316467
Epoch: 2, BCELoss: 0.683461606502533
Epoch: 3, BCELoss: 0.6755330562591553
Epoch: 4, BCELoss: 0.6672791242599487
Epoch: 5, BCELoss: 0.6579331278800964


In order to construct a PyTorch model using SageMaker, a trainign script needs to be provided to SageMaker. Optionally, a directory can be included which will be copied to the container and from which the training code will run. When the training container is executed it will check the uploaded directory (if there is one) for a `requirements.txt` file and install any required Python libraries, after which the training script will be run.

### Training the model

When a PyTorch model is constructed in SageMaker, an entry point must be specified. This is the Python file which will be executed when the model is trained. Inside of the `train` directory is a file called `train.py` which contains most of the necessary code to train the model. 

The way that SageMaker passes hyperparameters to the training script is by way of arguments. These arguments can then be parsed and used in the training script. To see how this is done take a look at the provided `train/train.py` file.

In [26]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point="train.py",
                    source_dir="train",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    hyperparameters={
                        'epochs': 10,
                        'hidden_dim': 200,
                    })

In [27]:
estimator.fit({'training': input_data})

2019-05-16 11:38:55 Starting - Starting the training job...
2019-05-16 11:39:00 Starting - Launching requested ML instances......
2019-05-16 11:40:05 Starting - Preparing the instances for training......
2019-05-16 11:41:25 Downloading - Downloading input data...
2019-05-16 11:41:36 Training - Downloading the training image.....
[31mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[31mbash: no job control in this shell[0m
[31m2019-05-16 11:42:33,949 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[31m2019-05-16 11:42:33,987 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[31m2019-05-16 11:42:35,395 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[31m2019-05-16 11:42:35,683 sagemaker-containers INFO     Module train does not provide a setup.py. [0m
[31mGenerating setup.py[0m
[31m2019-05-16 11:42:35,683 sagemaker-containers INFO

[31mModel loaded with embedding_dim 32, hidden_dim 200, vocab_size 5000.[0m
[31mEpoch: 1, BCELoss: 0.6773987461109551[0m
[31mEpoch: 2, BCELoss: 0.6161920124170731[0m
[31mEpoch: 3, BCELoss: 0.5196619258851422[0m
[31mEpoch: 4, BCELoss: 0.4491598429728527[0m
[31mEpoch: 5, BCELoss: 0.3895218329770224[0m
[31mEpoch: 6, BCELoss: 0.35611109587610984[0m
[31mEpoch: 7, BCELoss: 0.3337481824719176[0m
[31mEpoch: 8, BCELoss: 0.31364739914329687[0m
[31mEpoch: 9, BCELoss: 0.28942192239420755[0m

2019-05-16 11:45:56 Uploading - Uploading generated training model
2019-05-16 11:45:56 Completed - Training job completed
[31mEpoch: 10, BCELoss: 0.27181245386600494[0m
[31m2019-05-16 11:45:48,506 sagemaker-containers INFO     Reporting training SUCCESS[0m
Billable seconds: 272


## Step 5: Testing the model

This model will be tested by first deploying it and then sending the testing data to the deployed endpoint. This is to ensure that the deployed model is working correctly.

## Step 6: Deploy the model for testing

As the model is trained, it will be tested to see how it performs. Currently the model takes input of the form `review_length, review[500]` where `review[500]` is a sequence of `500` integers which describe the words present in the review, encoded using `word_dict`. Fortunately, SageMaker provides built-in inference code for models with simple inputs such as this.

There is one thing that needs to be provided, however, and that is a function which loads the saved model. This function must be called `model_fn()` and takes as its only parameter a path to the directory where the model artifacts are stored. This function must also be present in the python file which will be specified as the entry point. 

**NOTE**: When the built-in inference code is run it must import the `model_fn()` method from the `train.py` file. This is why the training code is wrapped in a main guard ( ie, `if __name__ == '__main__':` )

Since no code changea are made that was uploaded during training, the current model can be deployed as-is.

**NOTE:** When deploying a model, SageMaker is asked to launch an compute instance that will wait for data to be sent to it. As a result, this compute instance will continue to run until is it shut down. This is important to know since the cost of a deployed endpoint depends on how long it has been running for.

In other words **If no longer using a deployed endpoint, shut it down!**

In [28]:
# Deploy the trained model
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

---------------------------------------------------------------------------------------------------!

## Step 7 - Use the model for testing

Once deployed, the test data can be read in and sent off to the deployed model to get some results. Once all of the results are collected, the accuracy of the the model can be determined.

In [29]:
test_X = pd.concat([pd.DataFrame(test_X_len), pd.DataFrame(test_X)], axis=1)

In [30]:
# Split the data into chunks and send each chunk seperately, accumulating the results.

def predict(data, rows=512):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = np.array([])
    for array in split_array:
        predictions = np.append(predictions, predictor.predict(array))
    
    return predictions

In [31]:
predictions = predict(test_X.values)
predictions = [round(num) for num in predictions]

In [32]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.8568

**Question/Analysis:** How does this model compare to the XGBoost model created earlier? Why might these two models perform differently on this dataset?

**Answer:** XGBoost model performs better than this model. This is as XGBoost model is already optimized and uses early stopping and validation dataset. Tuning this model will gain the same or even higher performance. RNN work well with sentiment analysis, as it can capture the meaning of the text and other subtle context/differences which can not be done using standard ML/DL approaches.

### More testing

Now, there is a trained model which has been deployed and which can be sent processed reviews to and which returns the predicted sentiment. However, ultimately it should be able to send unprocessed review to the model. That is, send the review itself as a string. For example, sending something like the string in the cell below.

In [33]:
test_review = 'The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.'

The question to answer is, how can this review be sent to the model?

A bunch of data processing to the IMDb dataset was done in the first section of this notebook. In particular, these two specific things to the provided reviews.
 - Removed any html tags and stemmed the input
 - Encoded the review as a sequence of integers using `word_dict`
 
In order to process the review, these same two steps need to be repeated.

In [34]:
# Convert test_review into a form usable by the model and save the results in test_data
test_data_words = review_to_words(test_review)
test_data = [np.array(convert_and_pad(word_dict, test_data_words)[0])]

Now that the review is processed, the resulting array can be sent to the model to predict the sentiment of the review.

In [35]:
predictor.predict(test_data)

array(0.48943165, dtype=float32)

Since the return value of our model is closer to `1`, it is certain that the review we submitted is positive.

### Delete the endpoint

Once an endpoint is deployed, it continues to run until it is shut down. Since the use of the endpoint for now is done, it can be deleted.

In [36]:
estimator.delete_endpoint()

## Step 6 (again) - Deploy the model for the web app

Now that it's validated that the model is working, it's time to create some custom inference code so that the model a review is sent to which has not been processed and have it determine the sentiment of the review.

By default the estimator which is created, when deployed, will use the entry script and directory which is provided when creating the model. However, to now accept a string as input and the model expects a processed review, a custom inference code needs to be written.

The code writen will be stored in the `serve` directory. Provided in this directory is the `model.py` file that is used to construct the model, a `utils.py` file which contains the `review_to_words` and `convert_and_pad` pre-processing functions which is used during the initial data processing, and `predict.py`, the file which will contain the custom inference code. Note also that `requirements.txt` is present which will tell SageMaker what Python libraries are required by the custom inference code.

When deploying a PyTorch model in SageMaker, four functions are expected to be provided which the SageMaker inference container will use.
 - `model_fn`: This function is the same function that is used in the training script and it tells SageMaker how to load the model.
 - `input_fn`: This function receives the raw serialized input that has been sent to the model's endpoint and its job is to de-serialize and make the input available for the inference code.
 - `output_fn`: This function takes the output of the inference code and its job is to serialize this output and return it to the caller of the model's endpoint.
 - `predict_fn`: The heart of the inference script, this is where the actual prediction is done.

For the simple website that is constructed during this project, the `input_fn` and `output_fn` methods are relatively straightforward. The only requirement is being able to accept a string as input and expect to return a single value as output. 

In [49]:
!pygmentize serve/predict.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msagemaker_containers[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.optim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.utils.data[39;49;00m

[34mfrom[39;49;00m [04m[36mmodel[39;49;00m [34mimport[39;49;00m LSTMClassifier

[34mfrom[39;49;00m [04m[36mutils[39;49;00m [34mimport[39;49;00m review_to_words, 

The `model_fn` method is the same as the one in the training code and the `input_fn` and `output_fn` methods are very simple.

### Deploying the model

Now that the custom inference code has been written, the model will be created and deployed. To begin with, a new PyTorchModel object is constructed which points to the model artifacts created during training and also points to the inference code. Then the deploy method is called to launch the deployment container.

**NOTE**: The default behaviour for a deployed PyTorch model is to assume that any input passed to the predictor is a `numpy` array. In this case, the input is string so a simple wrapper needs to be constructed around the `RealTimePredictor` class to accomodate simple strings.

In [50]:
from sagemaker.predictor import RealTimePredictor
from sagemaker.pytorch import PyTorchModel

class StringPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')

model = PyTorchModel(model_data=estimator.model_data,
                     role = role,
                     framework_version='0.4.0',
                     entry_point='predict.py',
                     source_dir='serve',
                     predictor_cls=StringPredictor)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

-------------------------------------------------------------------------------!

### Testing the model

Now the model is deployed with the custom inference code, test it to see if everything is working. Here, it's tested by loading the first `250` positive and negative reviews and send them to the endpoint, then collect the results. The reason for only sending some of the data is that the amount of time it takes for the model to process the input and then perform inference is quite long and so testing the entire data set would be prohibitive.

In [51]:
import glob

def test_reviews(data_dir='../data/aclImdb', stop=250):
    
    results = []
    ground = []
    
    # Test both positive and negative reviews    
    for sentiment in ['pos', 'neg']:
        
        path = os.path.join(data_dir, 'test', sentiment, '*.txt')
        files = glob.glob(path)
        
        files_read = 0
        
        print('Starting ', sentiment, ' files')
        
        # Iterate through the files and send them to the predictor
        for f in files:
            with open(f) as review:
                # First, store the ground truth (was the review positive or negative)
                if sentiment == 'pos':
                    ground.append(1)
                else:
                    ground.append(0)
                # Read in the review and convert to 'utf-8' for transmission via HTTP
                review_input = review.read().encode('utf-8')
                # Send the review to the predictor and store the results
                results.append(int(predictor.predict(review_input)))
                
            # Sending reviews to the endpoint one at a time takes a while
            # So, only send a small number of reviews
            files_read += 1
            if files_read == stop:
                break
            
    return ground, results

In [52]:
ground, results = test_reviews()

Starting  pos  files
Starting  neg  files


In [53]:
from sklearn.metrics import accuracy_score
accuracy_score(ground, results)

0.858

As an additional test, the `test_review` can be passed in.

In [54]:
predictor.predict(test_review)

b'1'

Now that it's known that the endpoint is working as expected, the web page can be set up that it will interact with. 

## Step 7 (again): Use the model for the web app

> These steps are to be done mostly in the AWS console.

So far, the model endpoint is accessed by constructing a predictor object which uses the endpoint and then just using the predictor object to perform inference. What if a web app was to be created which accessed our model? The way things are set up currently makes that not possible since in order to access a SageMaker endpoint the app would first have to authenticate with AWS using an IAM role which included access to SageMaker endpoints. However, there is an easier way! Some additional AWS services are required.

<img src="Web App Diagram.svg">

The diagram above gives an overview of how the various services will work together. On the far right is the model which is trained above and which is deployed using SageMaker. On the far left is the web app that collects a user's movie review, sends it off and expects a positive or negative sentiment in return.

In the middle is where some of the magic happens. A Lambda function will be constructed, which can be thought of as a straightforward Python function that can be executed whenever a specified event occurs. This function will be given permission to send and recieve data from a SageMaker endpoint.

Lastly, the method used to execute the Lambda function is a new endpoint that is created using API Gateway. This endpoint will be a url that listens for data to be sent to it. Once it gets some data it will pass that data on to the Lambda function and then return whatever the Lambda function returns. Essentially it will act as an interface that lets the web app communicate with the Lambda function.

### Setting up a Lambda function

The first thing to do is set up a Lambda function. This Lambda function will be executed whenever the public API has data sent to it. When it is executed it will receive the data, perform any sort of processing that is required, send the data (the review) to the SageMaker endpoint that is created and then return the result.

#### Part A: Create an IAM Role for the Lambda function

Since the Lambda function will call a SageMaker endpoint, the required permission have to be set. To do this, a role is constructed that can later give the Lambda function.

Using the AWS Console, navigate to the **IAM** page and click on **Roles**. Then, click on **Create role**. Make sure that the **AWS service** is the type of trusted entity selected and choose **Lambda** as the service that will use this role, then click **Next: Permissions**.

In the search box type `sagemaker` and select the check box next to the **AmazonSageMakerFullAccess** policy. Then, click on **Next: Review**.

Lastly, give this role a name. Make sure to use a name that will be remembered later on, for example `LambdaSageMakerRole`. Then, click on **Create role**.

#### Part B: Create a Lambda function

Now it is time to actually create the Lambda function.

Using the AWS Console, navigate to the AWS Lambda page and click on **Create a function**. On the next page, make sure that **Author from scratch** is selected. Now, name the Lambda function, using a name that will be remembered later on, for example `sentiment_analysis_func`. Make sure that the **Python 3.6** runtime is selected and then choose the role that was created in the previous part. Then, click on **Create Function**.

On the next page, some information about the Lambda function that is just created is visible. Scroll down, an editor can be seen in which the code can be written that will be executed when the Lambda function is triggered. For this case, the following code below is used. 

```python
# Use the low-level library to interact with SageMaker 
# Since the SageMaker API is not available natively through Lambda.
import boto3

def lambda_handler(event, context):

    # The SageMaker runtime is what allows to invoke the endpoint that is created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # The SageMaker runtime is used to invoke the endpoint, sending the review 
    response = runtime.invoke_endpoint(EndpointName = '**ENDPOINT NAME HERE**',    # The name of the endpoint created
                                       ContentType = 'text/plain',                 # The data format that is expected
                                       Body = event['body'])                       # The actual review

    # The response is an HTTP response whose body contains the result of the inference
    result = response['Body'].read().decode('utf-8')

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : result
    }
```

Once copy and pasted the code above into the Lambda code editor, replace the `**ENDPOINT NAME HERE**` portion with the name of the endpoint that is deployed earlier. The name of the endpoint is determined using the code cell below.

In [55]:
predictor.endpoint

'sagemaker-pytorch-2019-05-16-12-37-10-049'

Once the endpoint name is to the Lambda function, click on **Save**. The Lambda function is now up and running. Next a way for our web app to execute the Lambda function is created.

### Setting up API Gateway

Now that the Lambda function is set up, it is time to create a new API using API Gateway that will trigger the Lambda function that has been created.

Using AWS Console, navigate to **Amazon API Gateway** and then click on **Get started**.

On the next page, make sure that **New API** is selected and give the new api a name, for example, `sentiment_analysis_api`. Then, click on **Create API**.

Now an API is created, however it doesn't currently do anything. Next, trigger the Lambda function that is created earlier.

Select the **Actions** dropdown menu and click **Create Method**. A new blank method will be created, select its dropdown menu and select **POST**, then click on the check mark beside it.

For the integration point, make sure that **Lambda Function** is selected and click on the **Use Lambda Proxy integration**. This option makes sure that the data that is sent to the API is then sent directly to the Lambda function with no processing. It also means that the return value must be a proper response object as it will also not be processed by API Gateway.

Type the name of the Lambda function created earlier into the **Lambda Function** text entry box and then click on **Save**. Click on **OK** in the pop-up box that then appears, giving permission to API Gateway to invoke the Lambda function you created.

The last step in creating the API Gateway is to select the **Actions** dropdown and click on **Deploy API**. A new Deployment stage will need to be created and name it anything, for example `prod`.

A public API to access the SageMaker model is successfully set up. Make sure to copy or write down the URL provided to invoke the newly created public API as this will be needed in the next step. This URL can be found at the top of the page, highlighted in blue next to the text **Invoke URL**.

## Step 4: Deploying the web app

Now that a publicly available API is present, it can be used in the web app.

In the `website` folder there should be a file called `index.html`. Download the file to the computer and open that file up in a text editor. In the form action, the url needs to be replaced. Replace this string with the url that was noted in the last step and then save the file.

Now, if you open `index.html` on the local computer, the browser will behave as a local web server and the site can be used to interact with the SageMaker model.

This html file can be hosted anywhere, for example using github or hosting a static site on Amazon's S3. 

> **Important Note** In order for the web app to communicate with the SageMaker endpoint, the endpoint has to actually be deployed and running. This means that it is being charged. Make sure that the endpoint is running to use the web app but then shut down when it's not used, otherwise a large AWS bill could occur.

Now that the web app is working, it's time to see how it works.

**Question/Analysis**: Give an example of a review that is entered into the web app. What was the predicted sentiment of example review?

**Answer:** 
* "The movie was amazing!" Prediction: Your review was POSITIVE!
* "Absolutely awful." Prediction: Your review was NEGATIVE!

### Delete the endpoint

Remember to always shut down the endpoint if no longer using it. Charged for the length of time that the endpoint is running so if left on, it could end up with an unexpectedly large bill.

In [56]:
predictor.delete_endpoint()