# Creating a Sentiment Analysis Web App
## Using PyTorch and SageMaker


## General Outline

1. Download or otherwise retrieve the data.
2. Process / Prepare the data.
3. Upload the processed data to S3.
4. Train the chosen model.
5. Test the trained model (using a batch transform job).
6. Deploying the trained model.
7. Using the deployed model.

## Step 1: Downloading the data

I will be using the [IMDb dataset](http://ai.stanford.edu/~amaas/data/sentiment/)

In [2]:
%mkdir ../data
!wget -O ../data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf ../data/aclImdb_v1.tar.gz -C ../data

mkdir: cannot create directory ‘../data’: File exists
--2020-04-20 19:01:42--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘../data/aclImdb_v1.tar.gz’


2020-04-20 19:01:44 (54.6 MB/s) - ‘../data/aclImdb_v1.tar.gz’ saved [84125825/84125825]



## Step 2: Preparing and Processing the data

In [3]:
import os
import glob

def read_imdb_data(data_dir='../data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Here I represent a positive review by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [4]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


Now that I've read the raw training and testing data from the downloaded dataset, I will combine the positive and negative reviews and shuffle the resulting records.

In [5]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    #Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    #Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [6]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {}, test = {}".format(len(train_X), len(test_X)))

IMDb reviews (combined): train = 25000, test = 25000


In [7]:
print(train_X[100])
print(train_y[100])

(SPOILERS IN THIS)<br /><br />"Rosenstraße" is a movie about heroic women in German Nazi time. But it is way too long, it is not touching and sometimes even boring! There are too many clichés and not enough good acting.<br /><br />The storytelling (storyline) is bad. Like in James Cameron´s Titanic an old woman remembers events of her live. Good, now we´ve got a point of view. Than there is another woman introduced who does the same. Confusing is that they both are recalling events of lifes of other people! Come on! This is a lack of knowledge of basic story telling...How can Riemann know about the fate of the little girl´s mother and her interrogation for example?<br /><br />The scenes are shown in the wrong order and you rarely know when it took place. For example the scene when Riemann is proposing to Fabian. When did that happen? The scene looks like it is set in the Twenties...<br /><br />Riemann´s character is of course a talented pianist, well, she is even a Baroness! Wow. Her b

The first step in processing the reviews is to make sure that any html tags that appear should be removed. In addition I wish to tokenize the input, that way words such as *entertained* and *entertaining* are considered the same with regard to sentiment analysis.

In [8]:
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import *

import re
from bs4 import BeautifulSoup

def review_to_words(review):
    nltk.download("stopwords", quiet=True)
    stemmer = PorterStemmer()
    
    text = BeautifulSoup(review, "html.parser").get_text() # Remove HTML tags
    text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower()) # Convert to lower case
    words = text.split() # Split string into words
    words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
    words = [PorterStemmer().stem(w) for w in words] # stem
    
    return words

The `review_to_words` method defined above uses `BeautifulSoup` to remove any html tags that appear and uses the `nltk` package to tokenize the reviews.

In [9]:
review_to_words(train_X[100])

['spoiler',
 'rosenstra',
 'e',
 'movi',
 'heroic',
 'women',
 'german',
 'nazi',
 'time',
 'way',
 'long',
 'touch',
 'sometim',
 'even',
 'bore',
 'mani',
 'clich',
 'enough',
 'good',
 'act',
 'storytel',
 'storylin',
 'bad',
 'like',
 'jame',
 'cameron',
 'titan',
 'old',
 'woman',
 'rememb',
 'event',
 'live',
 'good',
 'got',
 'point',
 'view',
 'anoth',
 'woman',
 'introduc',
 'confus',
 'recal',
 'event',
 'life',
 'peopl',
 'come',
 'lack',
 'knowledg',
 'basic',
 'stori',
 'tell',
 'riemann',
 'know',
 'fate',
 'littl',
 'girl',
 'mother',
 'interrog',
 'exampl',
 'scene',
 'shown',
 'wrong',
 'order',
 'rare',
 'know',
 'took',
 'place',
 'exampl',
 'scene',
 'riemann',
 'propos',
 'fabian',
 'happen',
 'scene',
 'look',
 'like',
 'set',
 'twenti',
 'riemann',
 'charact',
 'cours',
 'talent',
 'pianist',
 'well',
 'even',
 'baro',
 'wow',
 'brother',
 'come',
 'back',
 'eastern',
 'front',
 'receiv',
 'ritterkreuz',
 'show',
 'scene',
 'war',
 'hero',
 'still',
 'fine',
 'ma

review_to_words,
- converts all characters to lower case
- removes any stopwords using the list of stopwords given in the nltk library
- removes HTML tags

The method below applies the `review_to_words` method to each of the reviews in the training and testing datasets. In addition it caches the results. This is because performing this processing step can take a long time.

In [10]:
import pickle

cache_dir = os.path.join("../cache", "sentiment_analysis")  # where to store cache files
os.makedirs(cache_dir, exist_ok=True)  # ensure cache directory exists

def preprocess_data(data_train, data_test, labels_train, labels_test,
                    cache_dir=cache_dir, cache_file="preprocessed_data.pkl"):
    """Convert each review to words; read from cache if available."""

    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), "rb") as f:
                cache_data = pickle.load(f)
            print("Read preprocessed data from cache file:", cache_file)
        except:
            pass  # unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Preprocess training and test data to obtain words for each review
        #words_train = list(map(review_to_words, data_train))
        #words_test = list(map(review_to_words, data_test))
        words_train = [review_to_words(review) for review in data_train]
        words_test = [review_to_words(review) for review in data_test]
        
        # Write to cache file for future runs
        if cache_file is not None:
            cache_data = dict(words_train=words_train, words_test=words_test,
                              labels_train=labels_train, labels_test=labels_test)
            with open(os.path.join(cache_dir, cache_file), "wb") as f:
                pickle.dump(cache_data, f)
            print("Wrote preprocessed data to cache file:", cache_file)
    else:
        # Unpack data loaded from cache file
        words_train, words_test, labels_train, labels_test = (cache_data['words_train'],
                cache_data['words_test'], cache_data['labels_train'], cache_data['labels_test'])
    
    return words_train, words_test, labels_train, labels_test

In [11]:
# Preprocess data
train_X, test_X, train_y, test_y = preprocess_data(train_X, test_X, train_y, test_y)

Read preprocessed data from cache file: preprocessed_data.pkl


## Transforming the data

Since I will be using a recurrent neural network, it will be convenient if the length of each review is the same. To do this, I will fix a size for the reviews and then pad short reviews with the category 'no word' (which I will label `0`) and truncate long reviews.

### Creating a word dictionary

In [12]:
import numpy as np

def build_dict(data, vocab_size = 5000):
    """Construct and return a dictionary mapping each of the most frequently appearing words to a unique integer."""
    
    # Determining how often each word appears in `data`. Note that `data` is a list of sentences and that a
    # sentence is a list of words.
    
    word_count = {} # A dict storing the words that appear in the reviews along with how often they occur
    
    for a_review in data:
        for a_word in a_review:
            if a_word in word_count:
                word_count[a_word] += 1
            else: 
                word_count[a_word] = 1
                
    # Sorting the words found in `data` so that sorted_words[0] is the most frequently appearing word and
    # sorted_words[-1] is the least frequently appearing word.
    
    sorted_words = sorted(word_count, key = word_count.get, reverse=True)
    
    word_dict = {} # This is what I am building, a dictionary that translates words into integers
    for idx, word in enumerate(sorted_words[:vocab_size - 2]): # The -2 is so that I save room for the 'no word'
        word_dict[word] = idx + 2                              # 'infrequent' labels
        
    return word_dict

In [13]:
word_dict = build_dict(train_X)

In [14]:
# five most frequently appearing words in the training set.
list(word_dict.keys())[0:5]

['movi', 'film', 'one', 'like', 'time']

The words appearing all make sense as they would be commonly used in reviews.

### Saving `word_dict`

In [15]:
data_dir = '../data/pytorch' # The folder I will use for storing data
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

In [16]:
with open(os.path.join(data_dir, 'word_dict.pkl'), "wb") as f:
    pickle.dump(word_dict, f)

### Transforming the reviews

In [17]:
def convert_and_pad(word_dict, sentence, pad=500):
    NOWORD = 0 # I will use 0 to represent the 'no word' category
    INFREQ = 1 # and I use 1 to represent the infrequent words, i.e., words not appearing in word_dict
    
    working_sentence = [NOWORD] * pad
    
    for word_index, word in enumerate(sentence[:pad]):
        if word in word_dict:
            working_sentence[word_index] = word_dict[word]
        else:
            working_sentence[word_index] = INFREQ
            
    return working_sentence, min(len(sentence), pad)

def convert_and_pad_data(word_dict, data, pad=500):
    result = []
    lengths = []
    
    for sentence in data:
        converted, leng = convert_and_pad(word_dict, sentence, pad)
        result.append(converted)
        lengths.append(leng)
        
    return np.array(result), np.array(lengths)

In [18]:
train_X, train_X_len = convert_and_pad_data(word_dict, train_X)
test_X, test_X_len = convert_and_pad_data(word_dict, test_X)

## Step 3: Uploading the data to S3

### Saving the processed training dataset locally

In [20]:
import pandas as pd
    
pd.concat([pd.DataFrame(train_y), pd.DataFrame(train_X_len), pd.DataFrame(train_X)], axis=1) \
        .to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)

### Uploading the training data


Next, I need to upload the training data to the SageMaker default S3 bucket so that I can provide access to it while training the model.

In [21]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/sentiment_rnn'

role = sagemaker.get_execution_role()

In [22]:
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

## Step 4: Build and Train the PyTorch Model

I will start by implementing my own neural network in PyTorch along with a training script.

In [23]:
!pygmentize train/model.py

[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m

[34mclass[39;49;00m [04m[32mLSTMClassifier[39;49;00m(nn.Module):
    [33m"""[39;49;00m
[33m    This is the simple RNN model we will be using to perform Sentiment Analysis.[39;49;00m
[33m    """[39;49;00m

    [34mdef[39;49;00m [32m__init__[39;49;00m([36mself[39;49;00m, embedding_dim, hidden_dim, vocab_size):
        [33m"""[39;49;00m
[33m        Initialize the model by settingg up the various layers.[39;49;00m
[33m        """[39;49;00m
        [36msuper[39;49;00m(LSTMClassifier, [36mself[39;49;00m).[32m__init__[39;49;00m()

        [36mself[39;49;00m.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=[34m0[39;49;00m)
        [36mself[39;49;00m.lstm = nn.LSTM(embedding_dim, hidden_dim)
        [36mself[39;49;00m.dense = nn.Linear(in_features=hidden_dim, out_features=[34m1[39;49;00m)
        [36mself[39;49;00m.sig = nn.Sigm

In [24]:
import torch
import torch.utils.data

# Read in only the first 250 rows
train_sample = pd.read_csv(os.path.join(data_dir, 'train.csv'), header=None, names=None, nrows=250)

# Turn the input pandas dataframe into tensors
train_sample_y = torch.from_numpy(train_sample[[0]].values).float().squeeze()
train_sample_X = torch.from_numpy(train_sample.drop([0], axis=1).values).long()

# Build the dataset
train_sample_ds = torch.utils.data.TensorDataset(train_sample_X, train_sample_y)
# Build the dataloader
train_sample_dl = torch.utils.data.DataLoader(train_sample_ds, batch_size=50)

### Writing the training method

In [25]:
def train(model, train_loader, epochs, optimizer, loss_fn, device):
    for epoch in range(1, epochs + 1):
        model.train()
        total_loss = 0
        for batch in train_loader:         
            batch_X, batch_y = batch
            
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            
            optimizer.zero_grad()
            out = model.forward(batch_X)
            loss = loss_fn(out, batch_y)
            loss.backward()
            optimizer.step()
            
            total_loss += loss.data.item()
        print("Epoch: {}, BCELoss: {}".format(epoch, total_loss / len(train_loader)))

In [26]:
import torch.optim as optim
from train.model import LSTMClassifier

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMClassifier(32, 100, 5000).to(device)
optimizer = optim.Adam(model.parameters())
loss_fn = torch.nn.BCELoss()

train(model, train_sample_dl, 5, optimizer, loss_fn, device)

Epoch: 1, BCELoss: 0.6947798490524292
Epoch: 2, BCELoss: 0.6867001295089722
Epoch: 3, BCELoss: 0.6798485159873963
Epoch: 4, BCELoss: 0.6720484375953675
Epoch: 5, BCELoss: 0.6621503233909607


In order to construct a PyTorch model using SageMaker I must provide SageMaker with a training script. I may optionally include a directory which will be copied to the container and from which our training code will be run. When the training container is executed it will check the uploaded directory (if there is one) for a `requirements.txt` file and install any required Python libraries, after which the training script will be run.

### Training the model

In [27]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point="train.py",
                    source_dir="train",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    train_instance_type='ml.m4.xlarge',
                    hyperparameters={
                        'epochs': 10,
                        'hidden_dim': 200,
                    })

In [29]:
estimator.fit({'training': input_data})

2020-04-20 19:10:01 Starting - Starting the training job...
2020-04-20 19:10:04 Starting - Launching requested ML instances......
2020-04-20 19:11:19 Starting - Preparing the instances for training......
2020-04-20 19:12:20 Downloading - Downloading input data...
2020-04-20 19:12:44 Training - Downloading the training image.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-04-20 19:13:05,283 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-04-20 19:13:05,285 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-04-20 19:13:05,298 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-04-20 19:13:06,716 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-04-20 19:13:07,019 sagemaker-containers INFO     Module train does not pro

[34mEpoch: 1, BCELoss: 0.6757696648033298[0m
[34mEpoch: 2, BCELoss: 0.5979432086555325[0m
[34mEpoch: 3, BCELoss: 0.49232446235053395[0m
[34mEpoch: 4, BCELoss: 0.43451686720458826[0m
[34mEpoch: 5, BCELoss: 0.3734972927035118[0m
[34mEpoch: 6, BCELoss: 0.35580140656354475[0m
[34mEpoch: 7, BCELoss: 0.3233942736168297[0m
[34mEpoch: 8, BCELoss: 0.3150736677403353[0m
[34mEpoch: 9, BCELoss: 0.29499997776381826[0m

2020-04-20 20:54:46 Uploading - Uploading generated training model
2020-04-20 20:54:46 Completed - Training job completed
[34mEpoch: 10, BCELoss: 0.2783846113146568[0m
[34m2020-04-20 20:54:38,701 sagemaker-containers INFO     Reporting training SUCCESS[0m
Training seconds: 6146
Billable seconds: 6146


## Step 5: Testing the model

As mentioned at the top of this notebook, I will be testing this model by first deploying it and then sending the testing data to the deployed endpoint. I will do this so that I can make sure that the deployed model is working correctly.

## Step 6: Deploying the model for testing

In [30]:
# Deploying the trained model
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

-------------!

## Step 7 - Using the model for testing

In [31]:
test_X = pd.concat([pd.DataFrame(test_X_len), pd.DataFrame(test_X)], axis=1)

In [32]:
# I split the data into chunks and send each chunk seperately, accumulating the results.

def predict(data, rows=512):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = np.array([])
    for array in split_array:
        predictions = np.append(predictions, predictor.predict(array))
    
    return predictions

In [33]:
predictions = predict(test_X.values)
predictions = [round(num) for num in predictions]

In [34]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.85056

### More testing

In [35]:
test_review = 'The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.'

In [36]:
# Convert test_review into a form usable by the model and save the results in test_data
test_data = review_to_words(test_review)
test_data = [np.array(convert_and_pad(word_dict, test_data)[0])]

Now that I have processed the review, I can send the resulting array to the model to predict the sentiment of the review.

In [37]:
predictor.predict(test_data)

array(0.72589403, dtype=float32)

Since the return value of the model is close to `1`, I can be certain that the review I submitted is positive.

### Delete the endpoint

In [38]:
estimator.delete_endpoint()

## Step 6 - Deploying the model for the web app

For the simple website that I am constructing during this project, the `input_fn` and `output_fn` methods are relatively straightforward. I only require being able to accept a string as input and I expect to return a single value as output.

### Writing inference code

In [39]:
!pygmentize serve/predict.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msagemaker_containers[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.nn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.optim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mtorch.utils.data[39;49;00m

[34mfrom[39;49;00m [04m[36mmodel[39;49;00m [34mimport[39;49;00m LSTMClassifier

[34mfrom[39;49;00m [04m[36mutils[39;49;00m [34mimport[39;49;00m review_to_words, 

### Deploying the model

Now that the custom inference code has been written, I will create and deploy the model. To begin with, I need to construct a new PyTorchModel object which points to the model artifacts created during training and also points to the inference code that I wish to use. Then I can call the deploy method to launch the deployment container.

In [40]:
from sagemaker.predictor import RealTimePredictor
from sagemaker.pytorch import PyTorchModel

class StringPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')

model = PyTorchModel(model_data=estimator.model_data,
                     role = role,
                     framework_version='0.4.0',
                     entry_point='predict.py',
                     source_dir='serve',
                     predictor_cls=StringPredictor)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

-------------!

### Testing the model

In [43]:
import glob

def test_reviews(data_dir='../data/aclImdb', stop=250):
    
    results = []
    ground = []
    
    # I make sure to test both positive and negative reviews    
    for sentiment in ['pos', 'neg']:
        
        path = os.path.join(data_dir, 'test', sentiment, '*.txt')
        files = glob.glob(path)
        
        files_read = 0
        
        print('Starting ', sentiment, ' files')
        
        # Iterate through the files and send them to the predictor
        for f in files:
            with open(f) as review:
                # First, I store the ground truth (was the review positive or negative)
                if sentiment == 'pos':
                    ground.append(1)
                else:
                    ground.append(0)
                # Read in the review and convert to 'utf-8' for transmission via HTTP
                review_input = review.read().encode('utf-8')
                # Send the review to the predictor and store the results
                results.append(float(predictor.predict(review_input)))
                
            # Sending reviews to our endpoint one at a time takes a while so we
            # only send a small number of reviews
            files_read += 1
            if files_read == stop:
                break
            
    return ground, results

In [44]:
ground, results = test_reviews()

Starting  pos  files
Starting  neg  files


In [45]:
from sklearn.metrics import accuracy_score
accuracy_score(ground, results)

0.85

In [46]:
predictor.predict(test_review)

b'1.0'

## Step 7: Using the model for the web app

### Setting up a Lambda function

The first thing I am going to do is set up a Lambda function. This Lambda function will be executed whenever the public API has data sent to it. When it is executed it will receive the data, perform any sort of processing that is required, send the data (the review) to the SageMaker endpoint I've created and then return the result.

#### Part A: Create an IAM Role for the Lambda function

#### Part B: Create a Lambda function

```python
# I need to use the low-level library to interact with SageMaker since the SageMaker API
# is not available natively through Lambda.
import boto3

def lambda_handler(event, context):

    # The SageMaker runtime is what allows me to invoke the endpoint that I've created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # Now I use the SageMaker runtime to invoke the endpoint, sending the review I was given
    response = runtime.invoke_endpoint(EndpointName = '**ENDPOINT NAME HERE**',    # The name of the endpoint I created
                                       ContentType = 'text/plain',                 # The data format that is expected
                                       Body = event['body'])                       # The actual review

    # The response is an HTTP response whose body contains the result of the inference
    result = response['Body'].read().decode('utf-8')

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : result
    }
```

In [47]:
predictor.endpoint

'sagemaker-pytorch-2020-04-20-21-32-04-442'

### Setting up API Gateway

Now that the Lambda function is set up, I create a new API using API Gateway that will trigger the Lambda function I have just created.

## Step 4: Deploying the web app

#### These are results taken from the web app -

Lord Of the Rings: The Fellowship of the Ring

"Breathtaking. Unique. Captivating. Enchanting.

Within minutes of the start of this first chapter of an undeniably epic trilogy, the audience was left gasping at the intensity of the images on the screen. And we had nearly three hours to go."

Predicted Sentiment: Positive

### Delete the endpoint

In [48]:
predictor.delete_endpoint()