# Exporting GPT-2
In this notebook, we'll show how to export [OpenAI's GPT-2 text generation model](https://github.com/openai/gpt-2) for serving.

First, we'll download the GPT-2 code repository:

In [None]:
!apt update
!apt install -y git
!git clone --no-checkout https://github.com/openai/gpt-2.git
!cd gpt-2 && git reset --hard ac5d52295f8a1c3856ea24fb239087cc1a3d1131

Then we install the Python dependencies 

In [None]:
!pip3 install requests tqdm tensorflow==1.14.* numpy==1.* boto3==1.*

Next we'll specify the model size (choose one of 124M, 355M, or 774M). 124M fits well into a 8GB GPU such as a GeForce 1070 Ti for serving, so we pick that here:

In [None]:
import sys

MODEL_SIZE = "124M" 

if MODEL_SIZE not in {"124M", "355M", "774M"}:
    print("\033[91m{}\033[00m".format('ERROR: MODEL_SIZE must be "124M", "355M", or "774M"'), file=sys.stderr)

We can use `download_model.py` to download the model:

In [None]:
!python3 ./gpt-2/download_model.py $MODEL_SIZE

In [None]:
import sys
import os
import time
import json
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model.signature_def_utils_impl import predict_signature_def

Now we can export the model for serving:

In [None]:
sys.path.append(os.path.join(os.getcwd(), 'gpt-2/src'))
import model, sample

def export_for_serving(
    model_name='124M',
    seed=None,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
    models_dir='models'
):
    """
    Export the model for TF Serving
    :model_name=124M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
     :models_dir : path to parent folder containing model subfolders
     (i.e. contains the <model_name> folder)
    """
    models_dir = os.path.expanduser(os.path.expandvars(models_dir))
    export_dir = '/models'

    hparams = model.default_hparams()
    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)

        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        export_dir=os.path.join(export_dir, model_name, str(time.time()).split('.')[0])
        if not os.path.isdir(export_dir):
            os.makedirs(export_dir)

        builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
        signature = predict_signature_def(inputs={'context': context},
        outputs={'sample': output})

        builder.add_meta_graph_and_variables(sess,
                                     [tf.saved_model.SERVING],
                                     signature_def_map={"predict": signature},
                                     strip_default_attrs=True)
        builder.save()


export_for_serving(top_k=40, length=20, model_name=MODEL_SIZE)

Verify that the model is written to the shared filesystem on `/models`

In [15]:
! ls -lR /models/


/models/:
total 1
drwxr-xr-x 4 root root 499340732 Feb 10 04:23 124M

/models/124M:
total 1
drwxr-xr-x 3 root root 499340732 Feb 10 04:20 1581308418
drwxr-xr-x 3 root root 499340732 Feb 10 04:23 1581308605

/models/124M/1581308418:
total 1540
-rw-r--r-- 1 root root   1576285 Feb 10 04:20 saved_model.pb
drwxr-xr-x 2 root root 497764447 Feb 10 04:20 variables

/models/124M/1581308418/variables:
total 486099
-rw-r--r-- 1 root root 497759232 Feb 10 04:20 variables.data-00000-of-00001
-rw-r--r-- 1 root root      5215 Feb 10 04:20 variables.index

/models/124M/1581308605:
total 1540
-rw-r--r-- 1 root root   1576285 Feb 10 04:23 saved_model.pb
drwxr-xr-x 2 root root 497764447 Feb 10 04:23 variables

/models/124M/1581308605/variables:
total 486099
-rw-r--r-- 1 root root 497759232 Feb 10 04:23 variables.data-00000-of-00001
-rw-r--r-- 1 root root      5215 Feb 10 04:23 variables.index


That's it! We can now continue to deploy the Inference Service.

*This notebook is derived from original work by [Cortex](https://www.cortex.dev)*