Let’s write some python code to use all four of the released models for generating text. That will let us see how the changes in capacity related to the quality of the text produced.

We download the GPT-2 library from OpenAI.

The OpenAI codebase has a list of other libraries that it requires, which is handled by installing requirements.txt. We go to the appropriate file, requirements.txt, and install those libraries.

Then, we download four different pre-trained models OpenAI made available, each roughly double in size from the previous.

In [2]:
!git clone https://github.com/openai/gpt-2.git
import os
os.chdir("gpt-2")
import warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
warnings.filterwarnings('ignore')
%tensorflow_version 1.x
!pip3 install -r requirements.txt
!python3 download_model.py 124M
!python3 download_model.py 345M
!python3 download_model.py 774M
!python3 download_model.py 1558M

fatal: destination path 'gpt-2' already exists and is not an empty directory.
TensorFlow 1.x selected.
Fetching checkpoint: 1.00kit [00:00, 824kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 2.90Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 838kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:17, 27.7Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 4.54Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 1.65Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 1.76Mit/s]                                                       
Fetching checkpoint: 1.00kit [00:00, 797kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 3.18Mit/s]                                        

Next we import some addtional libraries we'll be using in this notebook.

In [13]:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()


We define an `autocomplete` function that returns the next `length` number of words given the `model_name` and the `raw_text` input.

We set up a session for talking to the tensorflow backend. We also create a place for the output of the model to go. We checkpoint the tensorflow backend so we can establish the link to our code.Once all of that is set up, we can send our text prompt to the model for processing. We pull out the output of the model and return the string.

In [14]:
# Return-a-string version

def autocomplete(model_name, raw_text, length):
    batch_size = 1
    temperature = 1
    top_k = 0
    models_dir = '../models'
    seed = None
    models_dir = os.path.expanduser(os.path.expandvars(models_dir))

    enc = encoder.get_encoder(model_name, models_dir)
    hparams = model.default_hparams()
    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        context_tokens = enc.encode(raw_text)
        out = sess.run(output, feed_dict={
                context: [context_tokens]
        })[:, len(context_tokens):]
        text = enc.decode(out[0])
    return(text)

Below is an example of our `autocomplete` function, printing out the next 10 predicted words.

In [15]:
print(autocomplete('124M', "Learning about machine learning is kind of like", 10))





Instructions for updating:
Use `tf.cast` instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use `tf.random.categorical` instead.
INFO:tensorflow:Restoring parameters from ../models/124M/model.ckpt
 getting out of bed and coding professionally. You just


Here show how the predictions for a given phrase changes with the number of parameters in the model.

In [16]:
for gpt2model in ['124M', '345M', '774M', '1558M']:
  print(gpt2model, autocomplete(gpt2model, "My first time visiting the ocean, I marveled at", 20))

INFO:tensorflow:Restoring parameters from ../models/124M/model.ckpt
124M  lounging in the land of Lake Chad (a portage that left little steam), which was
INFO:tensorflow:Restoring parameters from ../models/345M/model.ckpt
345M  what felt like all the riches I had missed out on in outdoor recreation. The Gulf of Aden seemed
INFO:tensorflow:Restoring parameters from ../models/774M/model.ckpt
774M  the hot view and expected that the tsunami, which was fledged from the sea about several days earlier
INFO:tensorflow:Restoring parameters from ../models/1558M/model.ckpt
1558M  the beauty and scope of the world, tried to conjure it with words from the books of my


In [17]:
for gpt2model in ['124M', '345M', '774M', '1558M']:
  print(gpt2model, autocomplete(gpt2model, "When I first arrived at college, I could not believe", 20))

INFO:tensorflow:Restoring parameters from ../models/124M/model.ckpt
124M  that the word "pedophile" would come around these days, but, at the time, I
INFO:tensorflow:Restoring parameters from ../models/345M/model.ckpt
345M  all this awesome students went to college, but looked upon their degree as not relevant anymore. This realization
INFO:tensorflow:Restoring parameters from ../models/774M/model.ckpt
774M  no one read Sorcery! It was so strange. It actually was quite top secret.


INFO:tensorflow:Restoring parameters from ../models/1558M/model.ckpt
1558M  how many boys wanted to kiss me. When I saw it on TV or in books, I laughed


In [19]:
for gpt2model in ['124M', '345M', '774M', '1558M']:
  print(gpt2model, autocomplete(gpt2model, "The color of broccoli is a deep", 1))

INFO:tensorflow:Restoring parameters from ../models/124M/model.ckpt
124M  red
INFO:tensorflow:Restoring parameters from ../models/345M/model.ckpt
345M ,
INFO:tensorflow:Restoring parameters from ../models/774M/model.ckpt
774M  shade
INFO:tensorflow:Restoring parameters from ../models/1558M/model.ckpt
1558M ,
