# This is the Jupyter Notebook ran in Google Colab during the GPT-2 Workshop at the Data Science Minneapolis Event on January 25th, 2020.

We need to pip install the gpt-2-simple Python library. This is a wrapper around the source code that OpenAI published via GitHub.

In [0]:
!pip install -q gpt-2-simple

Now that we've pip installed the library, we can import it into the Python Notebook.

In [0]:
import gpt_2_simple as gpt2

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Here we can decide which model size we want to download and use, 1558M is the largest model, 774M is the second largest. The larger the model, the longer it takes to run the inference and download/load. 

In [0]:
model_size = "1558M"

We need to download the model to the file storage system.

In [0]:
gpt2.download_gpt2(model_name=model_size)

Fetching checkpoint: 1.05Mit [00:00, 533Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 130Mit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 768Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 6.23Git [02:05, 49.8Mit/s]                                 
Fetching model.ckpt.index: 1.05Mit [00:00, 399Mit/s]                                                
Fetching model.ckpt.meta: 2.10Mit [00:00, 194Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 241Mit/s]                                                       


Now we need to start the Tensorflow session to allow us to run the Tensorflow model.

In [0]:
sess = gpt2.start_tf_sess()

With the model downloaded and stored in the local file storage, we can load the model into memory to use.

In [0]:
gpt2.load_gpt2(sess, model_name = model_size)

Loading pretrained model models/1558M/model.ckpt
INFO:tensorflow:Restoring parameters from models/1558M/model.ckpt


Here we can send the text we want the model to continue off of. This will be sent into the prefix parameter of the `generate()` function.

In [0]:
text = "Data Science Minneapolis is ecstatic from the generous contributions from Google Colab's department"

`generate()` is the function we use to generate the text. We need to pass the Tensorflow session, model we selected, and the text we want to continue off of to the `prefix` parameter.

In [0]:
gpt2.generate(
    sess,
    model_name = model_size,
    prefix = text,
    length = 100,
    temperature = 0.7,
    top_p = 0.9,
    nsamples = 1,
    batch_size = 1
)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Data Science Minneapolis is ecstatic from the generous contributions from Google Colab's department of Data Science and Analytics.

This is just the latest in a series of initiatives for the city, including the Google Fiber, the first city in the United States to have gigabit internet speeds.

The Google Fiber project has resulted in a number of innovative uses for the fiber, including a data center in the heart of downtown Minneapolis, which is also home to the University of Minnesota.

The latest announcement is a new partnership with the Minneapolis Institute of Arts and the University of Minnesota
