# Word-level language modeling using PyTorch

[Reference Source: PyTorch Example from SageMaker](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/pytorch_lstm_word_language_model)

## Contents

1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Train)
1. [Host](#Host)

---

## Setup

_This notebook was created and tested on an ml.p2.xlarge notebook instance._

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See [the documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the sagemaker.get_execution_role() with appropriate full IAM role arn string(s).


In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

'''
A session stores configuration state and allows you to create service clients and resources.
sagemaker.session.Session - AWS service calls are delegated to an underlying Boto3 session, 
which by default is initialized using the AWS configuration chain. 
When you make an Amazon SageMaker API call that accesses an S3 bucket location and one is not specified, 
the Session creates a default bucket based on a naming convention which includes the current AWS account ID.
'''

bucket = sagemaker_session.default_bucket()

'''
Form of the name of the bucket - sagemaker-{region}-{AWS account ID} Return the name of the default bucket to use in relevant Amazon SageMaker interactions.

'''

prefix = 'sagemaker/DEMO-pytorch-rnn-lstm'

'''
Used later
'''

role = sagemaker.get_execution_role()
'''
Get the execution role for the notebook instance. This is the IAM role that you created when you created your notebook instance. You pass the role to the tuning job.
'''

## Data
### Getting the data
As mentioned above we are going to use [the wikitext-2 raw data](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/). This data is from Wikipedia and is licensed CC-BY-SA-3.0. Before you use this data for any other purpose than this example, you should understand the data license, described at https://creativecommons.org/licenses/by-sa/3.0/

This dataset is provided by SalesForce, The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. What we have here is a good example of how English language flows.

### Examples
= Gold dollar =

 The gold dollar or gold one @-@ dollar piece was a coin struck as a regular issue by the United States Bureau of the Mint from 1849 to 1889 . The coin had three types over its lifetime , all designed by Mint Chief Engraver James B. Longacre . The Type 1 issue had the smallest diameter of any United States coin ever minted .
 A gold dollar had been proposed several times in the 1830s and 1840s , but was not initially adopted . Congress was finally galvanized into action by the increased supply of bullion caused by the California gold rush , and in 1849 authorized a gold dollar . In its early years , silver coins were being hoarded or exported , and the gold dollar found a ready place in commerce . Silver again circulated after Congress in 1853 required that new coins of that metal be made lighter , and the gold dollar became a rarity in commerce even before federal coins vanished from circulation because of the economic disruption caused by the American Civil War .
 
 = Super Mario Land =

 Super Mario Land is a 1989 side @-@ scrolling platform video game , the first in the Super Mario Land series , developed and published by Nintendo as a launch title for their Game Boy handheld game console . In gameplay similar to that of the 1985 Super Mario Bros. , but resized for the smaller device 's screen , the player advances Mario to the end of 12 levels by moving to the right and jumping across platforms to avoid enemies and pitfalls . Unlike other Mario games , Super Mario Land is set in Sarasaland , a new environment depicted in line art , and Mario pursues Princess Daisy . The game introduces two Gradius @-@ style shooter levels .
 At Nintendo CEO Hiroshi Yamauchi 's request , Game Boy creator Gunpei Yokoi 's Nintendo R & D1 developed a Mario game to sell the new console . It was the first portable version of Mario and the first to be made without Mario creator and Yokoi protégé Shigeru Miyamoto . Accordingly , the development team shrunk Mario gameplay elements for the device and used some elements inconsistently from the series . Super Mario Land was expected to showcase the console until Nintendo of America bundled Tetris with new Game Boys . The game launched alongside the Game Boy first in Japan ( April 1989 ) and later worldwide . Super Mario Land was later rereleased for the Nintendo 3DS via Virtual Console in 2011 again as a launch title , which featured some tweaks to the game 's presentation .
 Initial reviews were laudatory . Reviewers were satisfied with the smaller Super Mario Bros. , but noted its short length . They considered it among the best of the Game Boy launch titles . The handheld console became an immediate success and Super Mario Land ultimately sold over 18 million copies , more than that of Super Mario Bros. 3 . Both contemporaneous and retrospective reviewers praised the game 's soundtrack . Later reviews were critical of the compromises made in development and noted Super Mario Land 's deviance from series norms . The game begot a series of sequels , including the 1992 Super Mario Land 2 : 6 Golden Coins , 1994 Wario Land : Super Mario Land 3 , and 2011 Super Mario 3D Land , though many of the original 's mechanics were not revisited . The game was included in several top Game Boy game lists and debuted Princess Daisy as a recurring Mario series character .
 
= = = Sinclair Scientific Programmable = = =

 The Sinclair Scientific Programmable was introduced in 1975 , with the same case as the Sinclair Oxford . It was larger than the Scientific , at 73 by 155 by 34 millimetres ( 2 @.@ 9 in × 6 @.@ 1 in × 1 @.@ 3 in ) , and used a larger  battery , but could also be powered by mains electricity .
 It had 24 @-@ step programming abilities , which meant it was highly limited for many purposes . It also lacked functions for the natural logarithm and exponential function . Constants used in programs were required to be integers , and the programming was wasteful , with start and end quotes needed to use a constant in a program .
 However , included with the calculator was a library of over 120 programs that that performed common operations in mathematics , geometry , statistics , finance , physics , electronics , engineering , as well as fluid mechanics and materials science . The full library of standard programs contained over 400 programs in the Sinclair Program Library .

### Dataset statistics
In comparison to the Mikolov processed version of the Penn Treebank (PTB), the WikiText datasets are larger. WikiText-2 aims to be of a similar size to the PTB while WikiText-103 contains all articles extracted from Wikipedia. The WikiText datasets also retain numbers (as opposed to replacing them with N), case (as opposed to all text being lowercased), and punctuation (as opposed to stripping them out).

![Dataset statistics](../img/dataset-statistics.png)

In [None]:
%%bash
wget http://research.metamind.io.s3.amazonaws.com/wikitext/wikitext-2-raw-v1.zip
unzip -n wikitext-2-raw-v1.zip
cd wikitext-2-raw
mv wiki.test.raw test && mv wiki.train.raw train && mv wiki.valid.raw valid
# Moving the pre-divided datasets into Test, Train and Validation directories.

Let's preview what data looks like.

In [None]:
!head -5 wikitext-2-raw/train
#Lets see how the train dataset looks like

### Uploading the data to S3
We are going to use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use later when we start the training job.



In [None]:
inputs = sagemaker_session.upload_data(path='wikitext-2-raw', bucket=bucket, key_prefix=prefix)

'''
S3 object key name prefix (default: ‘data’). S3 uses the prefix to create a directory structure for the bucket content that it display in the S3 console.

Tree of the datasets - 

├── wikitext-2-raw
│   ├── test
│   ├── train
│   └── valid
'''

print('input spec (in this case, just an S3 path): {}'.format(inputs))

## Train
### Training script
We need to provide a training script that can run on the SageMaker platform. The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:

* `SM_MODEL_DIR`: A string representing the path to the directory to write model artifacts to.
  These artifacts are uploaded to S3 for model hosting.
* `SM_OUTPUT_DATA_DIR`: A string representing the filesystem path to write output artifacts to. Output artifacts may
  include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed
  and uploaded to S3 to the same S3 prefix as the model artifacts.

Supposing one input channel, 'training', was used in the call to the PyTorch estimator's `fit()` method,
the following will be set, following the format `SM_CHANNEL_[channel_name]`:

* `SM_CHANNEL_TRAINING`: A string representing the path to the directory containing data in the 'training' channel.

The script that we will use in this example is stored in GitHub repo 
[https://github.com/awslabs/amazon-sagemaker-examples/tree/training-scripts](https://github.com/awslabs/amazon-sagemaker-examples/tree/training-scripts), 
under the branch `training-scripts`. It is a public repo so we don't need authentication to access it. Let's specify the `git_config` argument here: 


A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to `model_dir` so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an `argparse.ArgumentParser` instance. 

For example, the script run by this notebook: 
[https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/train.py](https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/train.py). 

For more information about training environment variables, please visit [SageMaker Containers](https://github.com/aws/sagemaker-containers).

In the current example we also need to provide source directory, because training script imports data and model classes from other modules. The source directory is 
[https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/](https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/). We should provide 'pytorch-rnn-scripts' for `source_dir` when creating the Estimator object, which is a relative path inside the Git repository. 


Lets see the training script in details - this training script is located here - 

```bash
├── pytorch-rnn-scripts
│   ├── data.py
│   ├── generate.py
│   ├── __init__.py
│   ├── rnn.py
│   └── train.py
```

```python
import data
```

Here we import data.py, data.py has functions for tokenizing and creating a corpus for consumptions. A few relevant details here - [tokens](https://github.com/nicolas-ivanov/tf_seq2seq_chatbot/issues/15#issuecomment-246106807)

Then we have hyperparamters being passed to this script, you can see this in the training blob


```python
# Hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--emsize', type=int, default=200,
                    help='size of word embeddings')
parser.add_argument('--nhid', type=int, default=200,
                    help='number of hidden units per layer')
parser.add_argument('--nlayers', type=int, default=2,
                    help='number of layers')
parser.add_argument('--lr', type=float, default=20,
                    help='initial learning rate')
parser.add_argument('--clip', type=float, default=0.25,
                    help='gradient clipping')
parser.add_argument('--epochs', type=int, default=40,
                    help='upper epoch limit')
parser.add_argument('--batch_size', type=int, default=20, metavar='N',
                    help='batch size')
parser.add_argument('--bptt', type=int, default=35,
                    help='sequence length')
parser.add_argument('--dropout', type=float, default=0.2,
                    help='dropout applied to layers (0 = no dropout)')
parser.add_argument('--tied', type=bool, default=False,
                    help='tie the word embedding and softmax weights')
parser.add_argument('--seed', type=int, default=1111,
                    help='random seed')
parser.add_argument('--log-interval', type=int, default=200, metavar='N',
                    help='report interval')
```
Then we have details of file paths - 

```python
# Data and model checkpoints/otput directories from the container environment
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
parser.add_argument('--data-dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
```

Here are some logs from when a job like this was run - 

```json
SM_TRAINING_ENV=
{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "training": "/opt/ml/input/data/training"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_pytorch_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "epochs": 6,
        "tied": true
    },
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {
        "training": {
            "RecordWrapperType": "None",
            "S3DistributionType": "FullyReplicated",
            "TrainingInputMode": "File"
        }
    },
    "input_dir": "/opt/ml/input",
    "is_master": true,
    "job_name": "sagemaker-pytorch-2019-11-26-19-32-08-962",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-us-west-2-111652037296/sagemaker-pytorch-2019-11-26-19-32-08-962/source/sourcedir.tar.gz",
    "module_name": "train",
    "network_interface_name": "eth0",
    "num_cpus": 4,
    "num_gpus": 1,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-1",
        "hosts": [
            "algo-1"
        ],
        "network_interface_name": "eth0"
    },
    "user_entry_point": "train.py"
}
```

Here are some environment variables - 

```json
SM_USER_ARGS=["--epochs","6","--tied","True"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TRAINING=/opt/ml/input/data/training
SM_HP_TIED=true
SM_HP_EPOCHS=6
PYTHONPATH=/usr/local/bin:/usr/lib/python36.zip:/usr/lib/python3.6:/usr/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/dist-packages:/usr/lib/python3/dist-packages
Invoking script with the following command:
/usr/bin/python -m train --epochs 6 --tied True

Namespace(batch_size=20, bptt=35, clip=0.25, data_dir='/opt/ml/input/data/training', dropout=0.2, emsize=200, epochs=6, log_interval=200, lr=20, model_dir='/opt/ml/model', nhid=200, nlayers=2, output_data_dir='/opt/ml/output/data', seed=1111, tied=True)
```

You can find the logs by going to the training jobs in the Amazon SageMaker Dashboard

```python
# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
```

This seed you can seed was 1111 in the above example. You can use torch.manual_seed() to seed the RNG for all devices (both CPU and CUDA). Completely reproducible results are not guaranteed across PyTorch releases, individual commits or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds. However, in order to make computations deterministic on your specific problem on one specific platform and PyTorch release, there are a couple of steps to take. This is one of these steps. More details [here(https://pytorch.org/docs/stable/notes/randomness.html)


Now we load the corpus created by data.py

```python
print('Load data')
corpus = data.Corpus(args.data_dir)
```


### Model in Pytorch

Now at this stage after a bit more setup - we load the model 

```python
ntokens = len(corpus.dictionary)
rnn_type = 'LSTM'
model = RNNModel(rnn_type, ntokens, args.emsize, args.nhid, args.nlayers, args.dropout, args.tied).to(device)
```

We have defined the model in - rnn.py

We are using LSTM which is a variant of RNN

![The LSTM Cell](../img/The_LSTM_cell-600.400.png)

Soure of image - [Guillaume Chevalier from Wikipedia](https://en.wikipedia.org/wiki/Long_short-term_memory#/media/File:The_LSTM_cell.png)

Now you can see how the above architecture is setup, we can understand this better using the following image as a resource. 

![The LSTM Cell - in series](../img/nct-seq2seq.png)

Soure of image - [Deep Learning for Chatbots, Part 1 – Introduction](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/)

```python
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(ntoken, ninp)
self.rnn = getattr(nn, 'LSTM')(ninp, nhid, nlayers, dropout=dropout)

nn - Base class for all neural network modules.
ninp - size of word embeddings - emsize=200
nhid - number of hidden units per layer - nhid=200
nlayers - number of layers - nlayers=2
dropout - dropout=0.2
tied - tie the word embedding and softmax weights - tied=True
```

Then you can see the necessary functions of RNNs - such as - forward pass and initialize hidden weights. 

```python
def init_weights(self):
def forward(self, input, hidden):
def init_hidden(self, bsz):
```    

### Back to the training script 

We have function to batch up the corpus -

```python
def get_batch(source, i):
get_batch subdivides the source data into chunks of length args.bptt.
```

We have a function for Training, Validation

```python
def train():
def evaluate(data_source):
```

We have the training loop

```python
for epoch in range(1, args.epochs+1):
```

We checkpoint the model - 

```python
print('Saving the best model: {}'.format(best_state))
with open(checkpoint_path, 'wb') as f:
    torch.save(model.state_dict(), f)
with open(checkpoint_state_path, 'w') as f:
```
    
&

```python
# Load the best saved model.
with open(checkpoint_path, 'rb') as f:
    model.load_state_dict(torch.load(f))
    # after load the rnn params are not a continuous chunk of memory
    # this makes them a continuous chunk, and will speed up forward pass
    model.rnn.flatten_parameters()
```

&

Save the model - 

```python
# Move the best model to cpu and resave it
with open(model_path, 'wb') as f:
    torch.save(model.cpu().state_dict(), f)
```


### Run training in SageMaker
The PyTorch class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script and source directory, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ```ml.p2.xlarge``` instance. As you can see in this example you can also specify hyperparameters. The following training takes about 11 minutes. 

Here we are using a prebuilt container for training our script, if you want to create your own please navigate to - https://github.com/aws/sagemaker-pytorch-container


In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point='train.py',
                    role=role,
                    framework_version='1.2.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    source_dir='pytorch-rnn-scripts',
                    # available hyperparameters: emsize, nhid, nlayers, lr, clip, epochs, batch_size,
                    #                            bptt, dropout, tied, seed, log_interval
                    hyperparameters={
                        'epochs': 1,
                        'tied': True
                    })

After we've constructed our PyTorch object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.

In [None]:
estimator.fit({'training': inputs})

### How we monitor training 

```python
| epoch   3 |  2200/ 2983 batches | lr 20.00 | ms/batch 89.25 | loss  5.16 | ppl   173.74
```

epoch - is the current turn of the loop for training - for each epoch we go through the entire training dataset
batches - as we saw before we have split up training in batches also check [here](https://datascience.stackexchange.com/questions/16807/why-mini-batch-size-is-better-than-one-single-batch-with-all-training-data)

& from Yann LeCun's facebook - 

>Training with large minibatches is bad for your health. More importantly, it's bad for your test error. Friends dont let friends use minibatches larger than 32. Let's face it: the only people have switched to minibatch sizes larger than one since 2012 is because GPUs are inefficient for batch sizes smaller than 32. That's a terrible reason. It just means our hardware sucks.

lr - Learning Rate - more (here)[https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/] and (here)[https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/]
Learning rate is at the rate at which our model updates the weights in the cells each time back-propogation is done. 

ms/batch - time taken per batch

loss - current training loss, and the end of epoch we calculate validation loss

ppl - this is math.exp(loss) - per-word perplexity(PPL) of the model and it varies based on the word models - https://arxiv.org/pdf/1703.08864.pdf, this paper explains how it is getting calculated as well. Perplexity per word is explained in https://en.wikipedia.org/wiki/Perplexity.

In [None]:
%%html
<iframe src="https://instacalc.com/53287/embed" width="450" height="350" frameborder="0"></iframe>

## Host
### Hosting script
We are going to provide custom implementation of `model_fn`, `input_fn`, `output_fn` and `predict_fn` hosting functions in a separate file, which is in the same Git repo as the training script: 
[https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/generate.py](https://github.com/awslabs/amazon-sagemaker-examples/blob/training-scripts/pytorch-rnn-scripts/generate.py). 
We will use Git integration for hosting too since the hosting code is also in the Git repo. 


You can also put your training and hosting code in the same file but you would need to add a main guard (`if __name__=='__main__':`) for the training code, so that the container does not inadvertently run it at the wrong point in execution during hosting.

### Lets dissect generate.py's hosting methods

Please note: Following explanations have been adapted from Chainer continer examples, for a more detailed and authoratative description please go [here](https://github.com/aws/sagemaker-python-sdk/blob/fa14b32e63087d9f9a0bdbf63e9e39d151975dec/doc/using_pytorch.rst)

```python
from rnn import RNNModel
```

We start with importing the basic RNN from rnn.py

```python
def model_fn(model_dir):
    model_info = torch.load(f)
    model.load_state_dict(torch.load(f))
    model.to(device).eval()
    corpus = data.Corpus(model_dir)
    
    """
    Before a model can be served, it must be loaded. The SageMaker PyTorch model server loads your model by invoking a model_fn
    function that you must provide in your script. The model_fn should have the following signature:
    
    def model_fn(model_dir)
    
    SageMaker will inject the directory where your model files and sub-directories, saved by save, have been mounted. 
    Your model function should return a model object that can be used for model serving.
    
    This function is called by the Pytroch container during hosting when running on SageMaker with
    values populated by the hosting environment.
    
    This function loads models written during training into `model_dir`.
    Args:
        model_dir (str): path to the directory containing the saved model artifacts
    Returns:
        a loaded Pytorch model
    For more on `model_fn`, please visit the sagemaker-python-sdk repository:
    https://github.com/aws/sagemaker-python-sdk/blob/fa14b32e63087d9f9a0bdbf63e9e39d151975dec/doc/using_pytorch.rst
    For more on the Pytorch container, please visit the sagemaker-pytorch-containers repository:
    https://github.com/aws/sagemaker-pytorch-container
    """
    
    
```

It deals with loading the model and the corups. [A common PyTorch convention](https://pytorch.org/tutorials/beginner/saving_loading_models.html) is to save models using either a .pt or .pth file extension. 

```python
    def input_fn(serialized_input_data, content_type=JSON_CONTENT_TYPE):
    
    """This function is called on the byte stream sent by the client, and is used to deserialize the
    bytes into a Python object suitable for inference by predict_fn -- in this case, a NumPy array.
    
    This implementation is effectively identical to the default implementation used in the Pytorch
    container, for NPY formatted data. This function is included in this script to demonstrate
    how one might implement `input_fn`.
    Args:
        input_bytes (numpy array): a numpy array containing the data serialized by the PyTorch predictor
        content_type: the MIME type of the data in input_bytes
    Returns:
        a NumPy array represented by input_bytes.
    """
```

Deals with deserializing the input data. 


```python
    def predict_fn(input_data, model):
    
    """
    This function receives a NumPy array and makes a prediction on it using the model returned
    by `model_fn`.
    
    The default predictor used by `PyTorch` serializes input data to the 'npy' format:
    https://docs.scipy.org/doc/numpy-1.14.0/neps/npy-format.html
    The PyTorch container provides an overridable pre-processing function `input_fn`
    that accepts the serialized input data and deserializes it into a NumPy array.
    `input_fn` is invoked before `predict_fn` and passes its return value to this function
    (as `input_data`)
    
    The PyTorch container provides an overridable post-processing function `output_fn`
    that accepts this function's return value and serializes it back into `npy` format, which
    the PyTorch predictor can deserialize back into a NumPy array on the client.
    Args:
        input_data: a numpy array containing the data serialized by the PyTorch predictor
        model: the return value of `model_fn`
    Returns:
        a NumPy array containing predictions which will be returned to the client
```

& Finally

```python
    def output_fn(prediction_output, accept=JSON_CONTENT_TYPE):

    """This function is called on the return value of predict_fn, and is used to serialize the
    predictions back to the client.
    
    This implementation is effectively identical to the default implementation used in the PyTorch
    container, for NPY formatted data. This function is included in this script to demonstrate
    how one might implement `output_fn`.
    Args:
        prediction_output (numpy array): a numpy array containing the data serialized by the PyTorch predictor
        accept: the MIME type of the data expected by the client.
    Returns:
        a tuple containing a serialized NumPy array and the MIME type of the serialized data.
    """
```

### Import model into SageMaker
The PyTorch model uses a npy serializer and deserializer by default. For this example, since we have a custom implementation of all the hosting functions and plan on using JSON instead, we need a predictor that can serialize and deserialize JSON.

In [None]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)

Since hosting functions implemented outside of train script we can't just use estimator object to deploy the model. Instead we need to create a PyTorchModel object using the latest training job to get the S3 location of the trained model data. Besides model data location in S3, we also need to configure PyTorchModel with the script and source directory (because our `generate` script requires model and data classes from source directory), an IAM role.

In [None]:
from sagemaker.pytorch import PyTorchModel

training_job_name = estimator.latest_training_job.name
desc = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=training_job_name)
trained_model_location = desc['ModelArtifacts']['S3ModelArtifacts']
model = PyTorchModel(model_data=trained_model_location,
                     role=role,
                     framework_version='1.0.0',
                     entry_point='generate.py',
                     source_dir='pytorch-rnn-scripts',
                     predictor_cls=JSONPredictor)

The PyTorchModel constructor takes the following arguments:

* model_dat: An S3 location of a SageMaker model data .tar.gz file
* role: An IAM role name or Arn for SageMaker to access AWS resources on your behalf.
* predictor_cls: A function to call to create a predictor. If not None, deploy will return the result of invoking this function on the created endpoint name
* entry_point: Path (absolute or relative) to the Python file which should be executed as the entry point to model hosting.
* source_dir: Optional. Path (absolute or relative) to a directory with any other training source code dependencies including tne entry point file. Structure within this directory will be preserved when training on SageMaker.
* sagemaker_session: The SageMaker Session object, used for SageMaker interaction

Your model data must be a .tar.gz file in S3. SageMaker Training Job model data is saved to .tar.gz files in S3, however if you have local data you want to deploy, you can prepare the data yourself.

### Create endpoint

Now the model is ready to be deployed at a SageMaker endpoint and we are going to use the `sagemaker.pytorch.model.PyTorchModel.deploy` method to do this. We can use a CPU-based instance for inference (in this case an ml.m4.xlarge), even though we trained on GPU instances, because at the end of training we moved model to cpu before returning it. This way we can load trained model on any device and then move to GPU if CUDA is available. 


In [None]:
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

### Evaluate
We are going to use our deployed model to generate text by providing random seed, temperature (higher will increase diversity) and number of words we would like to get.

[Temperature](https://cs.stackexchange.com/questions/79241/what-is-temperature-in-lstm-and-neural-networks-generally) -

Temperature is a hyperparameter of LSTMs (and neural networks generally) used to control the randomness of predictions by scaling the logits before applying softmax.
It is used in generate.py here - 

```python
word_weights = output.squeeze().div(input_data['temperature']).exp().cpu()
```

Seed - remember this from before, it was 111

```python
torch.manual_seed(input_data['seed'])
```
Words - used to loop and generate words

```python
for i in range(input_data['words']):
```
```python
```
```python
```
```python
```




In [None]:
#Debug this - 

input = {
    'seed': 111,
    'temperature': 2.0,
    'words': 100
}
response = predictor.predict(input)
print(response)

### Lookup logs 

You can check what your container is up to using the logs - from cloudwatch they will look something like this - 

```bash
2019-11-15 18:50:48,321 generate     INFO     Deserializing the input data.
2019-11-15 18:50:48,321 generate     INFO     Generating text based on input parameters.
2019-11-15 18:50:48,321 generate     INFO     Current device: cpu
2019-11-15 18:50:48,343 generate     INFO     Generating 100 words.
10.32.0.2 - - [15/Nov/2019:18:50:49 +0000] "GET /ping HTTP/1.1" 200 0 "-" "AHC/2.0"
2019-11-15 18:50:49,664 generate     INFO     Serializing the generated output.
```


### Cleanup

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.


In [None]:
sagemaker_session.delete_endpoint(predictor.endpoint)

### Suggested - Excercise

Train GPT-2 in a similar setting and generate text - from here - https://github.com/graykode/gpt-2-Pytorch