# Creating a Richard Feynman bot with GPT-2
In this notebook, we're going to retrain [OpenAI's GPT-2 text generation model](https://github.com/openai/gpt-2), and then export that trained model. You can use any text you'd like—we chose Richard Feynman's lectures because we're fans, and the lectures are [freely available.](http://www.feynmanlectures.caltech.edu/I_toc.html)

At the end of this tutorial, we're going to deploy our trained model as a web API on AWS using Cortex.

Cortex is 100% free and open source. All we ask is that if you like this tutorial, you click here to leave a star on the [Cortex repository](https://github.com/cortexlabs/cortex/).

Also, as a quick thank you note, this tutorial uses Max Woolf's incredible [gpt-2-simple](https://github.com/minimaxir/gpt-2-simple) to train GPT-2.

# Initial Setup

To begin, we'll install the required packages:

In [0]:
!pip install tensorflow==1.14.* numpy==1.* boto3==1.* gpt-2-simple

In [0]:
import sys
import os
import time
import json
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model.signature_def_utils_impl import predict_signature_def
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

Thanks to gpt-2-simple, we can easily download the GPT-2 model with one command. While there are several sizes of GPT-2, we're going to be using the smallest here for simplicity and speed's sake.

In [0]:
gpt2.download_gpt2(model_name="124M")

Next, we are going to mount our Google Drive to Colab, so that we can easily upload our text file, and export our model.

In [0]:
gpt2.mount_gdrive()

Finally, we will upload our text file. You can upload a file by clicking into the expanded pane on the left side of your notebook in Colab, selecting the *Files* tab, and uploading. Once your file is uploaded, input its name into the following cell and click run:

In [0]:
file_name = "feynmann.txt"
gpt2.copy_file_from_gdrive(file_name)

# Training GPT-2

Training GPT-2 is incredibly easy thanks to gpt-2-simple. We're not going to worry about the parameters here, but if you want to dig deeper, check out the documentation. Simply run the following code to finetune your model:

In [0]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              dataset=file_name,
              model_name='124M',
              steps=200,
              restore_from='fresh',
              run_name='run1',
              print_every=10,
              sample_every=200,
              save_every=100,
              )

Once your model is trained, you can save it to your Google Drive with the following cell:

In [0]:
gpt2.copy_checkpoint_to_gdrive(run_name='run1')

Done! Now you can export your model.


# Exporting your GPT-2 model

The following code will export your model for TensorFlow Serving, which will allow Cortex to deploy it as a web API on AWS.

In [0]:
from gpt_2_simple.src import model, sample, encoder, memory_saving_gradients

def export_for_serving(
    model_name='run1',
    seed=None,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
    models_dir='/content/drive/My Drive/checkpoint'
):
    """
    Export the model for TF Serving
    :model_name=124M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
     :models_dir : path to parent folder containing model subfolders
     (i.e. contains the <model_name> folder)
    """
    hparams = model.default_hparams()
    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)

        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        export_dir=os.path.join(models_dir, model_name, "export", str(time.time()).split('.')[0])
        if not os.path.isdir(export_dir):
            os.makedirs(export_dir)

        builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
        signature = predict_signature_def(inputs={'context': context},
        outputs={'sample': output})

        builder.add_meta_graph_and_variables(sess,
                                     [tf.saved_model.SERVING],
                                     signature_def_map={"predict": signature},
                                     strip_default_attrs=True)
        builder.save()


export_for_serving(top_k=40, length=256, model_name='run1')

## Upload the model to AWS

Cortex loads models from AWS, so we need to upload the exported model.

Set these variables to configure your AWS credentials and model upload path:

In [0]:
AWS_ACCESS_KEY_ID = "" #@param {type:"string"}
AWS_SECRET_ACCESS_KEY = "" #@param {type:"string"}
S3_UPLOAD_PATH = "" #@param {type:"string"}

import sys
import re

if AWS_ACCESS_KEY_ID == "":
    print("\033[91m {}\033[00m".format("ERROR: Please set AWS_ACCESS_KEY_ID"), file=sys.stderr)

elif AWS_SECRET_ACCESS_KEY == "":
    print("\033[91m {}\033[00m".format("ERROR: Please set AWS_SECRET_ACCESS_KEY"), file=sys.stderr)

else:
    try:
        bucket = re.search("s3://(.+?)/", S3_UPLOAD_PATH).group(1)
        key = re.search("s3://.+?/(.+)", S3_UPLOAD_PATH).group(1)
    except:
        print("\033[91m {}\033[00m".format("ERROR: Invalid s3 path (should be of the form s3://my-bucket/path/to/file)"), file=sys.stderr)

Upload the model to S3:

In [0]:
import os
import boto3

s3 = boto3.client("s3", aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)

MODEL_NAME = 'gpt-2'

for dirpath, _, filenames in os.walk("content/drive/My Drive/checkpoint/run1/export"):
    for filename in filenames:
        filepath = os.path.join(dirpath, filename)
        filekey = os.path.join(key, MODEL_NAME, filename)
        print("Uploading s3://{}/{} ...".format(bucket, filekey), end = '')
        s3.upload_file(filepath, bucket, filekey)
        print(" ✓")

print("\nUploaded model export directory to {}/{}".format(S3_UPLOAD_PATH, MODEL_NAME))

<!-- CORTEX_VERSION_MINOR x2 -->
We also need to upload `vocab.bpe` and `encoder.json`, so that the [encoder](https://github.com/cortexlabs/cortex/blob/master/examples/text-generator/encoder.py) in the [pre-inference request handler](https://github.com/cortexlabs/cortex/blob/master/examples/text-generator/handler.py) can encode the input text before making a request to the model.

In [0]:
print("Uploading s3://{}/{}/vocab.bpe ...".format(bucket, key), end = '')
s3.upload_file("/content/drive/My Drive/checkpoint/run1/vocab.bpe", bucket, os.path.join(key, "vocab.bpe"))
print(" ✓")

print("Uploading s3://{}/{}/encoder.json ...".format(bucket, key), end = '')
s3.upload_file("/content/drive/My Drive/checkpoint/run1/encoder.json", bucket, os.path.join(key, "encoder.json"))
print(" ✓")

And that's it! Your model is uploaded to S3, and your ready to spin up a Cortex deployment.

# Deploying with Cortex

Deploying a model with Cortex is fairly straightforward. To deploy GPT-2, we need to do the following:

1.  Install Cortex, if you haven't already.
2.  Create a handler script. This will take data sent to our API, wrangle it to be processable by our model, and then format the model's output before returning it to the user.
3. Configure our Cortex deployment.

Installing Cortex is a quick process, which we have a [guide to here.](https://www.cortex.dev/install)

Once Cortex is installed, we need to create our handler script. For our purposes, this handler script will be very simple. The code below is not executable, but you can copy + paste it into your own `handler.py` file, or download it from [this gist.](https://gist.github.com/caleb-kaiser/e8d954987abdd96ca6db163c38f302f2)

```
from encoder import get_encoder
encoder = get_encoder()


def pre_inference(sample, metadata):
    context = encoder.encode(sample["text"])
    return {"context": [context]}


def post_inference(prediction, metadata):
    response = prediction["sample"]
    return encoder.decode(response)
```



With our handler script set up, we can now configure our Cortex deployment. A Cortex deployment uses a configuration file titled `cortex.yaml` and uses it as a blueprint for spinning up your API on AWS. For this API, our `cortex.yaml` will look like this (as before, you can download this file directly from [this gist](https://gist.github.com/caleb-kaiser/39ba06b8b9ff642c37e29d302ecb34cf).):

```
- kind: deployment
  name: text

- kind: api
  name: generator
  model: s3://your/bucket/model
  request_handler: handler.py
```

Once your `cortex.yaml` file is created, you launch your deployment. To do so, simply run `$ cortex deploy` from your command line.

Once this command is run, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.

You can track the status of a deployment using cortex get:

```$ cortex get generator --watch

status   up-to-date   available   requested   last update   avg latency
live     1            1           1           8s            —
```

The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.

Congraulations! Your GPT-2 model is trained and deployed as an API.

# Testing your GPT-2 API

Now that your API is live, you can test it by running these commands:
```
$ cortex get generator

url: http://***.amazonaws.com/text/generator

$ curl http://***.amazonaws.com/text/generator \
    -X POST -H "Content-Type: application/json" \
    -d '{"text": "machine learning"}'

Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment...
```

That's it. You now have a functional API that can receive text, and run it through GPT-2, and return a response.

If you have any questions, feel free to [reach out to us directly](https://gitter.im/cortexlabs/cortex). If you enjoyed this tutorial at all, don't forget to leave a star on the [Cortex repository](https://github.com/cortexlabs/cortex).



