## Introduction

For the text generation of each celebrity with the GPT-2 model, the procedure is the same. 

Steps:

1. Select all the texts belonging to the celebrity in question and turn it into one corpus.

2. Use that corpus to train the GPT-2 model.

3. After training the model, create some sample text for the celebrity.

4. Export the generated text using pickle

In [None]:
try:
  from google.colab import drive
  drive.mount('/content/drive')
except:
  print('File not in drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# import libraries
import pickle
import time
import os

In [None]:
# import preprocessed dataset
pickle_in = open('df.pickle', 'rb')
df = pickle.load(pickle_in)
df.head()


FileNotFoundError: ignored

In [None]:
 df[df['Username']=='daniel tosh']

## Installing GPT-2

In [None]:
!pip install -q gpt_2_simple

  Building wheel for gpt-2-simple (setup.py) ... [?25l[?25hdone


In [None]:
!pip install tensorflow==1.14


Collecting tensorflow==1.14
[?25l  Downloading https://files.pythonhosted.org/packages/f4/28/96efba1a516cdacc2e2d6d081f699c001d414cc8ca3250e6d59ae657eb2b/tensorflow-1.14.0-cp37-cp37m-manylinux1_x86_64.whl (109.3MB)
[K     |████████████████████████████████| 109.3MB 50kB/s 
Collecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed0510d/tensorflow_estimator-1.14.0-py2.py3-none-any.whl (488kB)
[K     |████████████████████████████████| 491kB 42.1MB/s 
Collecting keras-applications>=1.0.6
[?25l  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
[K     |████████████████████████████████| 51kB 6.3MB/s 
Collecting tensorboard<1.15.0,>=1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f90841

In [None]:
# import tensorflow and gpt-2 libraries
import tensorflow as tf
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files


In [None]:
# show names of celebrities
names = sorted(list(df['Username'].value_counts().index))
print(names)

In [None]:
# return text corpus for a given data frame
def corpus(df):
  corpus = ""
  for text in df['Text (Model)']:
    corpus = corpus + str(text)
  return corpus


In [None]:
# select the small version of the GPT-2 model and download it
model_name = '124M'
gpt2.download_gpt2(model_name=model_name) 

## Alicia Keys Text Generation

### Training the Model

In [None]:
# select the needed corpus and train the model with it
keys = df[df['Username']== "Alicia Keys"]
keys_corpus = corpus(keys)

text_file = open("Alicia Keys.txt", "w")
n = text_file.write(keys_corpus)
text_file.close()

sess = gpt2.start_tf_sess()
gpt2.finetune(sess, 
              'Alicia Keys.txt',
              restore_from='fresh',
              print_every=10,
              steps=100)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.78it/s]


dataset has 20436 tokens
Training...
[1 | 118.60] loss=4.41 avg=4.41
[2 | 234.91] loss=4.22 avg=4.32
[3 | 349.94] loss=4.30 avg=4.31
[4 | 460.78] loss=3.90 avg=4.21
[5 | 571.35] loss=4.02 avg=4.17
[6 | 681.51] loss=3.96 avg=4.13
[7 | 792.64] loss=3.67 avg=4.06
[8 | 912.72] loss=3.81 avg=4.03
[9 | 1024.17] loss=3.88 avg=4.01
[10 | 1135.84] loss=3.87 avg=4.00
[11 | 1248.15] loss=3.82 avg=3.98
[12 | 1361.11] loss=3.70 avg=3.96
[13 | 1480.05] loss=3.73 avg=3.94
[14 | 1592.57] loss=3.58 avg=3.91
[15 | 1704.56] loss=3.39 avg=3.87
[16 | 1818.22] loss=3.38 avg=3.84
[17 | 1930.64] loss=3.03 avg=3.79
[18 | 2043.35] loss=3.37 avg=3.76
[19 | 2159.25] loss=3.37 avg=3.74
[20 | 2271.51] loss=3.04 avg=3.70
[21 | 2383.04] loss=2.99 avg=3.66
[22 | 2495.09] loss=3.21 avg=3.64
[23 | 2607.00] loss=2.96 avg=3.61
[24 | 2723.77] loss=3.15 avg=3.59
[25 | 2836.27] loss=2.79 avg=3.55
[26 | 2948.26] loss=2.73 avg=3.52
[27 | 3060.76] loss=2.81 avg=3.49
[28 | 3173.32] loss=2.72 avg=3.46
[29 | 3292.07] loss=2.68 avg

### Generating Text


In [None]:
# generate text with the model
keys_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
keys_pickle_out = open('keys_text_generation.pickle', 'wb')
pickle.dump(keys_text_generation, keys_pickle_out)
keys_pickle_out.close()

## Anthony Joshua Text Generation

In [None]:
# select text corpus
joshua = df[df['Username']== 'Anthony Joshua']
joshua_corpus = corpus(joshua)

text_file = open("Anthony Joshua.txt", "w")
n = text_file.write(joshua_corpus)
text_file.close()


### Training the Model

In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              'Anthony Joshua.txt',
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00, 207.81it/s]

Loading dataset...
dataset has 11552 tokens
Training...





[10 | 1161.76] loss=3.49 avg=3.49
[20 | 2318.51] loss=3.04 avg=3.26
[30 | 3481.07] loss=1.86 avg=2.79
[40 | 4626.67] loss=1.22 avg=2.39
[50 | 5782.56] loss=0.72 avg=2.05
[60 | 6947.20] loss=0.25 avg=1.74
[70 | 8104.79] loss=0.10 avg=1.50
[80 | 9263.24] loss=0.23 avg=1.34
[90 | 10432.25] loss=0.10 avg=1.19
[100 | 11607.40] loss=0.04 avg=1.07
Saving checkpoint/run1/model-100


### Generating Text

In [None]:
# generate text
joshua_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
joshua_pickle_out = open('joshua_text_generation.pickle', 'wb')
pickle.dump(joshua_text_generation, joshua_pickle_out)
joshua_pickle_out.close()

## Barack Obama

In [None]:
# select text corpus
obama = df[df['Username']== 'Barack Obama']
obama_corpus = corpus(obama)

text_file = open("Barack Obama.txt", "w")
n = text_file.write(obama_corpus)
text_file.close()



In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              'Barack Obama.txt',
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.92it/s]


dataset has 33389 tokens
Training...
[10 | 1234.54] loss=3.06 avg=3.06
[20 | 2481.22] loss=2.12 avg=2.59
[30 | 3746.23] loss=2.45 avg=2.54
[40 | 5002.97] loss=2.03 avg=2.41
[50 | 6245.99] loss=1.49 avg=2.23
[60 | 7493.48] loss=1.07 avg=2.03
[70 | 8736.15] loss=1.17 avg=1.90
[80 | 9977.23] loss=0.63 avg=1.74
[90 | 11241.61] loss=0.69 avg=1.62
[100 | 12509.88] loss=0.25 avg=1.47
Saving checkpoint/run1/model-100


In [None]:
 # generate text
 obama_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
obama_pickle_out = open('obama_text_generation.pickle', 'wb')
pickle.dump(obama_text_generation, obama_pickle_out)
obama_pickle_out.close()

## Bill Gates Text Generation

In [None]:
# select text corpus
gates = df[df['Username']== 'Bill Gates']
gates_corpus = corpus(gates)

text_file = open("Bill Gates.txt", "w")
n = text_file.write(gates_corpus)
text_file.close()



In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              'Bill Gates.txt',
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.69it/s]


dataset has 33143 tokens
Training...
[10 | 1263.74] loss=3.08 avg=3.08
[20 | 2525.53] loss=2.76 avg=2.92
[30 | 3796.02] loss=2.26 avg=2.69
[40 | 5056.51] loss=2.04 avg=2.53
[50 | 6319.67] loss=1.68 avg=2.36
[60 | 7578.39] loss=1.01 avg=2.13
[70 | 8831.70] loss=0.82 avg=1.93
[80 | 10079.23] loss=0.85 avg=1.79
[90 | 11321.60] loss=0.27 avg=1.62
[100 | 12568.83] loss=0.38 avg=1.49
Saving checkpoint/run1/model-100


In [None]:
 # generate text
 gates_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
gates_pickle_out = open('gates_text_generation.pickle', 'wb')
pickle.dump(gates_text_generation, gates_pickle_out)
gates_pickle_out.close()

## Conan 'O Brien Text Generation

In [None]:
brien = df[df['Username']== "Conan O'Brien"]
brien_corpus = corpus(brien)

text_file = open("Conan O' Brien.txt", "w")
n = text_file.write(brien_corpus)
text_file.close()


In [None]:
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Conan O' Brien.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.15it/s]


dataset has 23444 tokens
Training...
[10 | 1295.37] loss=3.91 avg=3.91
[20 | 2577.66] loss=3.20 avg=3.55
[30 | 3856.86] loss=2.45 avg=3.18
[40 | 5126.71] loss=2.08 avg=2.90
[50 | 6385.44] loss=1.41 avg=2.60
[60 | 7638.18] loss=0.69 avg=2.27
[70 | 8891.76] loss=0.47 avg=2.01
[80 | 10148.73] loss=0.31 avg=1.79
[90 | 11396.56] loss=0.12 avg=1.59
[100 | 12644.63] loss=0.13 avg=1.44
Saving checkpoint/run1/model-100


In [None]:
brien_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, return_as_list=True, temperature=0.7)

In [None]:
brien_text_generation[0]

'"We\'re all in this together," rapper Future taunted in a series of nasally crooned rants. Now, that\'s just one person can change the world.take our poll - simple: the person wearing the red and white is giving everyone else a bit of a blow-out.\n\nollie cristo is the reason i have a Facebook page. every Saturday at 10pm, i send out a tweet which is instantly thought to be from space. anyway, now someone has to pay for my flight to hell.is anyone else still alive who can take selfies with the apocalypse on video?i\'m going to be checking out @theggregary this week and i promise to be funny, polite, and just generally nice.  if only there was a better name for the phrase "death to dinosaurs"--moonshine.the last door into hell is the last door into an old age temple.yogurt is the healthiest food ever packaged in a container that will last 100 years.yogurt is an animal protein isolate, which means it\'s 100% natural. my friend @kumailn stopped by to talk about a new book and i-have-no-c

In [None]:
brien_pickle_out = open('brien_text_generation.pickle', 'wb')
pickle.dump(brien_text_generation, brien_pickle_out)
brien_pickle_out.close()

## Donald Trump Text Generation

In [None]:
# select text corpus
trump = df[df['Username']== "Donald Trump"]
trump_corpus = corpus(trump)

text_file = open("Donald Trump.txt", "w")
n = text_file.write(trump_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Donald Trump.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.40s/it]


dataset has 187558 tokens
Training...
[10 | 1310.95] loss=3.62 avg=3.62
[20 | 2604.33] loss=3.53 avg=3.57
[30 | 3917.52] loss=3.47 avg=3.54
[40 | 5202.83] loss=3.28 avg=3.47
[50 | 6483.74] loss=3.10 avg=3.40
[60 | 7749.14] loss=3.08 avg=3.34
[70 | 9055.48] loss=3.16 avg=3.32
[80 | 10359.27] loss=2.81 avg=3.25
[90 | 11665.62] loss=2.72 avg=3.19
[100 | 12971.22] loss=3.00 avg=3.17
Saving checkpoint/run1/model-100


In [None]:
# generate text
trump_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
trump_pickle_out = open('trump_text_generation.pickle', 'wb')
pickle.dump(trump_text_generation, trump_pickle_out)
trump_pickle_out.close()

## Dwayne Johnson Text Generation

In [None]:
# select text corpus
johnson = df[df['Username']== "Dwayne Johnson"]
johnson_corpus = corpus(johnson)

text_file = open("Dwayne Johnson.txt", "w")
n = text_file.write(johnson_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Dwayne Johnson.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.58it/s]


dataset has 39828 tokens
Training...
[10 | 1265.50] loss=3.76 avg=3.76
[20 | 2536.73] loss=3.73 avg=3.74
[30 | 3817.89] loss=3.64 avg=3.71
[40 | 5098.19] loss=2.95 avg=3.52
[50 | 6386.95] loss=2.37 avg=3.28
[60 | 7667.96] loss=1.96 avg=3.06
[70 | 8944.58] loss=1.59 avg=2.84
[80 | 10230.40] loss=1.40 avg=2.66
[90 | 11516.97] loss=1.11 avg=2.48
[100 | 12827.13] loss=1.15 avg=2.34
Saving checkpoint/run1/model-100


In [None]:
# genreate text
dwayne_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
johnson_pickle_out = open('dwayne_text_generation.pickle', 'wb')
pickle.dump(dwayne_text_generation, johnson_pickle_out)
johnson_pickle_out.close()

## Elizabeth Warren Text Generation

In [None]:
# select text corpus
warren = df[df['Username']== "Elizabeth Warren"]
warren_corpus = corpus(warren)

text_file = open("Elizabeth Warren.txt", "w")
n = text_file.write(warren_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Elizabeth Warren.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.08it/s]


dataset has 50443 tokens
Training...
[10 | 1407.96] loss=3.31 avg=3.31
[20 | 2697.27] loss=2.96 avg=3.14
[30 | 3934.31] loss=2.60 avg=2.96
[40 | 5162.04] loss=2.21 avg=2.77
[50 | 6400.23] loss=2.28 avg=2.67
[60 | 7638.20] loss=1.78 avg=2.51
[70 | 8873.93] loss=1.23 avg=2.33
[80 | 10113.63] loss=1.71 avg=2.25
[90 | 11359.39] loss=1.13 avg=2.12
[100 | 12597.61] loss=0.98 avg=2.00
Saving checkpoint/run1/model-100


In [None]:
# generate text
warren_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
warren_pickle_out = open('warren_text_generation.pickle', 'wb')
pickle.dump(warren_text_generation, warren_pickle_out)
warren_pickle_out.close()

## Ellen DeGeneres Text Generation

In [None]:
# select text corpus
DeGeneres = df[df['Username']== "Ellen DeGeneres"]
DeGeneres_corpus = corpus(DeGeneres)

text_file = open("Ellen DeGeneres.txt", "w")
n = text_file.write(DeGeneres_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Ellen DeGeneres.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.63it/s]


dataset has 23490 tokens
Training...
[10 | 1270.43] loss=3.72 avg=3.72
[20 | 2541.75] loss=2.94 avg=3.33
[30 | 3818.17] loss=2.38 avg=3.01
[40 | 5094.53] loss=2.00 avg=2.75
[50 | 6366.42] loss=1.51 avg=2.50
[60 | 7633.23] loss=1.17 avg=2.27
[70 | 8898.70] loss=0.77 avg=2.05
[80 | 10177.28] loss=0.48 avg=1.85
[90 | 11462.99] loss=0.25 avg=1.66
[100 | 12752.55] loss=0.14 avg=1.50
Saving checkpoint/run1/model-100


In [None]:
# generate text
DeGeneres_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
DeGeneres_pickle_out = open('Degeneres_text_generation.pickle', 'wb')
pickle.dump(DeGeneres_text_generation, DeGeneres_pickle_out)
DeGeneres_pickle_out.close()

NameError: ignored

## Elon Musk Text Generation

In [None]:
# select text corpus
musk = df[df['Username']== "Elon Musk"]
musk_corpus = corpus(musk)

text_file = open("Elon Musk.txt", "w")
n = text_file.write(musk_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Elon Musk.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00, 1672.37it/s]

Loading dataset...
dataset has 4104 tokens
Training...





[10 | 1323.48] loss=2.95 avg=2.95
[20 | 2696.15] loss=1.57 avg=2.26
[30 | 4066.95] loss=0.16 avg=1.55
[40 | 5426.61] loss=0.04 avg=1.17
[50 | 6774.23] loss=0.03 avg=0.94
[60 | 8127.05] loss=0.04 avg=0.78
[70 | 9516.00] loss=0.02 avg=0.67
[80 | 10918.60] loss=0.03 avg=0.59
[90 | 12347.46] loss=0.02 avg=0.52
[100 | 13760.76] loss=0.02 avg=0.47
Saving checkpoint/run1/model-100


In [None]:
# generate text
musk_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export genreated text
musk_pickle_out = open('musk_text_generation.pickle', 'wb')
pickle.dump(musk_text_generation, musk_pickle_out)
musk_pickle_out.close()

## Emma Watson Text Generation


In [None]:
# select text corpus
watson = df[df['Username']== "Emma Watson"]
watson_corpus = corpus(watson)

text_file = open("Emma Watson.txt", "w")
n = text_file.write(watson_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Emma Watson.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.32it/s]


dataset has 18237 tokens
Training...
[10 | 960.24] loss=4.04 avg=4.04
[20 | 1942.12] loss=2.99 avg=3.51
[30 | 2940.59] loss=2.54 avg=3.19
[40 | 3966.86] loss=2.00 avg=2.89
[50 | 4986.35] loss=1.02 avg=2.50
[60 | 6006.91] loss=0.67 avg=2.19
[70 | 7015.56] loss=0.22 avg=1.90
[80 | 8024.18] loss=0.11 avg=1.67
[90 | 9028.24] loss=0.12 avg=1.49
[100 | 10039.10] loss=0.07 avg=1.34
Saving checkpoint/run1/model-100


In [None]:
# generate text
watson_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
watson_pickle_out = open('watson_text_generation.pickle', 'wb')
pickle.dump(watson_text_generation, watson_pickle_out)
watson_pickle_out.close()

## Gordon Ramsay Text Generation

In [None]:
# select text corpus
ramsay = df[df['Username']== "Gordon Ramsay"]
ramsay_corpus = corpus(ramsay)

text_file = open("Gordon Ramsay.txt", "w")
n = text_file.write(ramsay_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Gordon Ramsay.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.74it/s]


dataset has 23556 tokens
Training...
[10 | 1207.78] loss=3.22 avg=3.22
[20 | 2434.26] loss=2.93 avg=3.08
[30 | 3678.17] loss=2.51 avg=2.89
[40 | 4928.24] loss=2.03 avg=2.67
[50 | 6172.69] loss=1.38 avg=2.41
[60 | 7404.34] loss=1.00 avg=2.17
[70 | 8626.17] loss=0.54 avg=1.93
[80 | 9853.56] loss=0.28 avg=1.71
[90 | 11080.59] loss=0.17 avg=1.54
[100 | 12310.50] loss=0.11 avg=1.39
Saving checkpoint/run1/model-100


In [None]:
# generate text
ramsay_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
ramsay_pickle_out = open('ramsay_text_generation.pickle', 'wb')
pickle.dump(ramsay_text_generation, ramsay_pickle_out)
ramsay_pickle_out.close()

## Jeff Weiner Text Generation

In [None]:
# select text corpus
weiner = df[df['Username']== "Jeff Weiner"]
weiner_corpus = corpus(weiner)

text_file = open("Jeff Weiner.txt", "w")
n = text_file.write(weiner_corpus)
text_file.close()


NameError: ignored

In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Jeff Weiner.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


In [None]:
# generate text
weiner_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
weiner_pickle_out = open('weiner_text_generation.pickle', 'wb')
pickle.dump(weiner_text_generation, weiner_pickle_out)
weiner_pickle_out.close()

## Joe Biden Text Generation

In [None]:
# select text corpus
biden = df[df['Username']== "Joe Biden"]
biden_corpus = corpus(biden)

text_file = open("Joe Biden.txt", "w")
n = text_file.write(biden_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Joe Biden.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.55s/it]


dataset has 223773 tokens
Training...
[10 | 1215.61] loss=3.09 avg=3.09
[20 | 2438.95] loss=3.00 avg=3.04
[30 | 3670.56] loss=2.83 avg=2.97
[40 | 4902.98] loss=3.01 avg=2.98
[50 | 6134.48] loss=2.71 avg=2.93
[60 | 7363.84] loss=2.58 avg=2.87
[70 | 8590.78] loss=2.65 avg=2.83
[80 | 9815.64] loss=2.67 avg=2.81
[90 | 11041.41] loss=2.50 avg=2.78
[100 | 12272.50] loss=2.41 avg=2.74
Saving checkpoint/run1/model-100


In [None]:
# generate text
biden_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
biden_pickle_out = open('biden_text_generation.pickle', 'wb')
pickle.dump(biden_text_generation, biden_pickle_out)
biden_pickle_out.close()

## John Cena Text Generation

In [None]:
# select text corpus
cena = df[df['Username']== "John Cena"]
cena_corpus = corpus(cena)

text_file = open("John Cena.txt", "w")
n = text_file.write(cena_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "John Cena.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.94it/s]


dataset has 34840 tokens
Training...
[10 | 1260.03] loss=3.31 avg=3.31
[20 | 2504.50] loss=2.98 avg=3.15
[30 | 3747.92] loss=2.80 avg=3.03
[40 | 4988.71] loss=2.34 avg=2.85
[50 | 6234.30] loss=2.07 avg=2.69
[60 | 7474.64] loss=1.48 avg=2.49
[70 | 8711.62] loss=1.26 avg=2.31
[80 | 9950.28] loss=0.66 avg=2.09
[90 | 11188.35] loss=0.78 avg=1.94
[100 | 12427.21] loss=0.38 avg=1.78
Saving checkpoint/run1/model-100


In [None]:
# generate text
cena_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
cena_pickle_out = open('cena_text_generation.pickle', 'wb')
pickle.dump(cena_text_generation, cena_pickle_out)
cena_pickle_out.close()

## Kevin Durant Text Generation

In [None]:
# select text corpus
durant = df[df['Username']== "Kevin Durant"]
durant_corpus = corpus(durant)

text_file = open("Kevin Durant.txt", "w")
n = text_file.write(durant_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Kevin Durant.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.14it/s]


dataset has 16311 tokens
Training...
[10 | 1232.48] loss=3.95 avg=3.95
[20 | 2499.29] loss=3.26 avg=3.60
[30 | 3762.00] loss=2.62 avg=3.27
[40 | 5022.03] loss=1.68 avg=2.87
[50 | 6283.38] loss=1.09 avg=2.51
[60 | 7542.83] loss=0.53 avg=2.17
[70 | 8804.59] loss=0.21 avg=1.88
[80 | 10069.12] loss=0.34 avg=1.68
[90 | 11332.57] loss=0.08 avg=1.49
[100 | 12591.74] loss=0.06 avg=1.34
Saving checkpoint/run1/model-100


In [None]:
# generate text
durant_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
durant_pickle_out = open('durant_text_generation.pickle', 'wb')
pickle.dump(durant_text_generation, durant_pickle_out)
durant_pickle_out.close()

## Kevin Hart Text Generation

In [None]:
# select text corpus
hart = df[df['Username']== "Kevin Hart"]
hart_corpus = corpus(hart)

text_file = open("Kevin Hart.txt", "w")
n = text_file.write(hart_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Kevin Hart.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00,  7.36it/s]

Loading dataset...
dataset has 12853 tokens
Training...





[10 | 1261.27] loss=3.44 avg=3.44
[20 | 2520.90] loss=2.41 avg=2.92
[30 | 3785.53] loss=1.53 avg=2.45
[40 | 5052.17] loss=0.91 avg=2.06
[50 | 6312.74] loss=0.68 avg=1.78
[60 | 7562.33] loss=0.18 avg=1.51
[70 | 8798.52] loss=0.10 avg=1.30
[80 | 10040.63] loss=0.06 avg=1.14
[90 | 11285.56] loss=0.11 avg=1.02
[100 | 12532.56] loss=0.05 avg=0.92
Saving checkpoint/run1/model-100


In [None]:
# generate text
hart_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
hart_pickle_out = open('hart_text_generation.pickle', 'wb')
pickle.dump(hart_text_generation, hart_pickle_out)
hart_pickle_out.close()

## Kylie Jenner Text Generation

In [None]:
# select text corpus
jenner = df[df['Username']== "Kylie Jenner"]
jenner_corpus = corpus(jenner)

text_file = open("Kylie Jenner.txt", "w")
n = text_file.write(jenner_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Kylie Jenner.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00,  6.39it/s]

Loading dataset...
dataset has 19060 tokens
Training...





[10 | 1187.44] loss=2.96 avg=2.96
[20 | 2402.22] loss=2.51 avg=2.73
[30 | 3626.58] loss=1.97 avg=2.48
[40 | 4852.02] loss=1.71 avg=2.28
[50 | 6083.39] loss=1.44 avg=2.11
[60 | 7294.64] loss=0.78 avg=1.88
[70 | 8501.90] loss=0.60 avg=1.69
[80 | 9710.36] loss=0.33 avg=1.52
[90 | 10922.15] loss=0.14 avg=1.36
[100 | 12131.60] loss=0.11 avg=1.23
Saving checkpoint/run1/model-100


In [None]:
# generate text
jenner_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
jenner_pickle_out = open('jenner_text_generation.pickle', 'wb')
pickle.dump(jenner_text_generation, jenner_pickle_out)
jenner_pickle_out.close()

## Lady Gaga Text Generation

In [None]:
# select text corpus
gaga = df[df['Username']== "Lady Gaga"]
gaga_corpus = corpus(gaga)

text_file = open("Lady Gaga.txt", "w")
n = text_file.write(gaga_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Lady Gaga.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.12it/s]


dataset has 24459 tokens
Training...
[10 | 1197.54] loss=3.56 avg=3.56
[20 | 2410.74] loss=3.00 avg=3.28
[30 | 3635.18] loss=2.65 avg=3.07
[40 | 4865.82] loss=2.33 avg=2.88
[50 | 6084.61] loss=1.57 avg=2.61
[60 | 7315.81] loss=1.10 avg=2.35
[70 | 8529.42] loss=0.86 avg=2.13
[80 | 9744.70] loss=0.43 avg=1.91
[90 | 10956.46] loss=0.27 avg=1.72
[100 | 12168.79] loss=0.16 avg=1.56
Saving checkpoint/run1/model-100


In [None]:
# generate text
gaga_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
gaga_pickle_out = open('gaga_text_generation.pickle', 'wb')
pickle.dump(gaga_text_generation, gaga_pickle_out)
gaga_pickle_out.close()

## LeBron James Text Generation

In [None]:
# select text corpus
james = df[df['Username']== "LeBron James"]
james_corpus = corpus(james)

text_file = open("Lebron James.txt", "w")
n = text_file.write(james_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Lebron James.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.61it/s]


dataset has 18536 tokens
Training...
[10 | 1323.39] loss=4.13 avg=4.13
[20 | 2644.73] loss=3.33 avg=3.73
[30 | 3972.56] loss=2.96 avg=3.47
[40 | 5284.33] loss=2.49 avg=3.22
[50 | 6582.50] loss=1.45 avg=2.86
[60 | 7879.47] loss=0.91 avg=2.53
[70 | 9178.72] loss=0.52 avg=2.23
[80 | 10478.45] loss=0.36 avg=1.99
[90 | 11779.20] loss=0.15 avg=1.78
[100 | 13074.58] loss=0.07 avg=1.60
Saving checkpoint/run1/model-100


In [None]:
# generate text
james_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
james_pickle_out = open('james_text_generation.pickle', 'wb')
pickle.dump(james_text_generation, james_pickle_out)
james_pickle_out.close()

## Louis Tomlinson Text Generation

In [None]:
# select text corpus
tomlinson = df[df['Username']== "Louis Tomlinson"]
tomlinson_corpus = corpus(tomlinson)

text_file = open("Louis Tomlinson.txt", "w")
n = text_file.write(tomlinson_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Louis Tomlinson.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00,  5.46it/s]

Loading dataset...
dataset has 16523 tokens
Training...





[10 | 1214.89] loss=3.16 avg=3.16
[20 | 2417.43] loss=2.67 avg=2.91
[30 | 3630.82] loss=2.31 avg=2.71
[40 | 4834.11] loss=1.61 avg=2.43
[50 | 6030.75] loss=1.01 avg=2.14
[60 | 7231.46] loss=0.46 avg=1.85
[70 | 8436.67] loss=0.34 avg=1.63
[80 | 9632.26] loss=0.22 avg=1.45
[100 | 12019.49] loss=0.06 avg=1.16
Saving checkpoint/run1/model-100


In [None]:
# generate text
tomlinson_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
tomlinson_pickle_out = open('tomlinson_text_generation.pickle', 'wb')
pickle.dump(tomlinson_text_generation, tomlinson_pickle_out)
tomlinson_pickle_out.close()

## Mariah Carey Text Generation

In [None]:
# select text corpus
carey = df[df['Username']== "Mariah Carey"]
carey_corpus = corpus(carey)

text_file = open("Mariah Carey.txt", "w")
n = text_file.write(carey_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Mariah Carey.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.48it/s]


dataset has 20601 tokens
Training...
[10 | 1217.20] loss=3.48 avg=3.48
[20 | 2439.39] loss=2.87 avg=3.17
[30 | 3672.39] loss=2.53 avg=2.96
[40 | 4917.34] loss=1.97 avg=2.71
[50 | 6175.03] loss=1.03 avg=2.36
[60 | 7451.86] loss=0.63 avg=2.07
[70 | 8695.96] loss=0.35 avg=1.82
[80 | 9938.77] loss=0.29 avg=1.62
[90 | 11180.71] loss=0.17 avg=1.45
[100 | 12417.32] loss=0.17 avg=1.32
Saving checkpoint/run1/model-100


In [None]:
# generate text
carey_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
carey_pickle_out = open('carey_text_generation.pickle', 'wb')
pickle.dump(carey_text_generation, carey_pickle_out)
carey_pickle_out.close()

## Neil Patrick Harris Text Generation

In [None]:
# select text corpus
harris = df[df['Username']== "Neil Patrick Harris"]
harris_corpus = corpus(harris)

text_file = open("Neil Patrick Harris.txt", "w")
n = text_file.write(harris_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Neil Patrick Harris.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.70it/s]


dataset has 31639 tokens
Training...
[10 | 1246.38] loss=4.21 avg=4.21
[20 | 2483.92] loss=3.62 avg=3.91
[30 | 3749.34] loss=3.25 avg=3.69
[40 | 5019.89] loss=2.66 avg=3.43
[50 | 6336.95] loss=2.20 avg=3.18
[60 | 7677.10] loss=1.54 avg=2.90
[70 | 8967.19] loss=1.08 avg=2.63
[80 | 10225.98] loss=0.81 avg=2.39
[90 | 11580.15] loss=0.33 avg=2.16
[100 | 12868.92] loss=0.40 avg=1.97
Saving checkpoint/run1/model-100


In [None]:
# generate text
harris_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
harris_pickle_out = open('harris_text_generation.pickle', 'wb')
pickle.dump(harris_text_generation, harris_pickle_out)
harris_pickle_out.close()

## Oprah Winfrey Text Generation 

In [None]:
# select text corpus
winfrey = df[df['Username']== "Oprah Winfrey"]
winfrey_corpus = corpus(winfrey)

text_file = open("Oprah Winfrey.txt", "w")
n = text_file.write(winfrey_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Oprah Winfrey.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.98it/s]


dataset has 31433 tokens
Training...
[10 | 1236.68] loss=3.36 avg=3.36
[20 | 2520.23] loss=3.12 avg=3.24
[30 | 3815.31] loss=2.57 avg=3.01
[40 | 5090.11] loss=2.42 avg=2.86
[50 | 6343.26] loss=1.84 avg=2.65
[60 | 7560.92] loss=1.70 avg=2.49
[70 | 8789.54] loss=1.09 avg=2.28
[80 | 10018.36] loss=0.91 avg=2.11
[90 | 11245.67] loss=0.49 avg=1.92
[100 | 12471.75] loss=0.42 avg=1.76
Saving checkpoint/run1/model-100


In [None]:
# generate text
winfrey_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
winfrey_pickle_out = open('winfrey_text_generation.pickle', 'wb')
pickle.dump(winfrey_text_generation, winfrey_pickle_out)
winfrey_pickle_out.close()

## Pope Francis Text Generation

In [None]:
# select text corpus
francis = df[df['Username']== "Pope Francis"]
francis_corpus = corpus(francis)

text_file = open("Pope Francis.txt", "w")
n = text_file.write(francis_corpus)
text_file.close()


In [None]:
# train model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Pope Francis.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.09it/s]


dataset has 45034 tokens
Training...
[10 | 1274.50] loss=3.17 avg=3.17
[20 | 2538.53] loss=2.75 avg=2.96
[30 | 3798.52] loss=2.47 avg=2.79
[40 | 5052.36] loss=2.51 avg=2.72
[50 | 6307.87] loss=2.25 avg=2.63
[60 | 7563.52] loss=1.71 avg=2.47
[70 | 8829.09] loss=1.80 avg=2.37
[80 | 10089.90] loss=1.42 avg=2.25
[90 | 11349.44] loss=1.23 avg=2.13
[100 | 12610.64] loss=0.68 avg=1.98
Saving checkpoint/run1/model-100


In [None]:
# generate text
francis_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
francis_pickle_out = open('francis_text_generation.pickle', 'wb')
pickle.dump(francis_text_generation, francis_pickle_out)
francis_pickle_out.close()

## Ronda Rousey Text Generation

In [None]:
# select text corpus
rousey = df[df['Username']== "Ronda Rousey"]
rousey_corpus = corpus(rousey)

text_file = open("Ronda Rousey.txt", "w")
n = text_file.write(rousey_corpus)
text_file.close()


In [None]:
# train the model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Ronda Rousey.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.23it/s]


dataset has 28572 tokens
Training...
[10 | 1229.50] loss=3.45 avg=3.45
[20 | 2452.71] loss=3.04 avg=3.24
[30 | 3682.89] loss=2.81 avg=3.10
[40 | 4899.26] loss=2.12 avg=2.85
[50 | 6113.34] loss=1.71 avg=2.62
[60 | 7331.07] loss=1.13 avg=2.36
[70 | 8551.20] loss=1.06 avg=2.17
[80 | 9765.89] loss=0.48 avg=1.95
[90 | 10986.82] loss=0.44 avg=1.78
[100 | 12202.29] loss=0.29 avg=1.62
Saving checkpoint/run1/model-100


In [None]:
# generate text
rousey_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export generated text
rousey_pickle_out = open('rousey_text_generation.pickle', 'wb')
pickle.dump(rousey_text_generation, rousey_pickle_out)
rousey_pickle_out.close()

## Tim Cook Text Generation

In [None]:
# select the text corpus
cook = df[df['Username']== "Tim Cook"]
cook_corpus = corpus(cook)

text_file = open("Tim Cook.txt", "w")
n = text_file.write(cook_corpus)
text_file.close()


In [None]:
# train the model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Tim Cook.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.67it/s]


dataset has 34709 tokens
Training...
[10 | 930.18] loss=3.43 avg=3.43
[20 | 1848.70] loss=3.14 avg=3.29
[30 | 2766.73] loss=2.96 avg=3.18
[40 | 3686.54] loss=2.25 avg=2.94
[50 | 4621.43] loss=1.96 avg=2.74
[60 | 5567.46] loss=1.66 avg=2.56
[70 | 6513.79] loss=1.15 avg=2.35
[80 | 7462.26] loss=0.79 avg=2.15
[90 | 8409.89] loss=0.72 avg=1.98
[100 | 9353.87] loss=0.47 avg=1.82
Saving checkpoint/run1/model-100


In [None]:
# generate text
cook_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export the generated text
cook_pickle_out = open('cook_text_generation.pickle', 'wb')
pickle.dump(cook_text_generation, cook_pickle_out)
cook_pickle_out.close()

## Wiz Khalifa Text Generation

In [None]:
# select the text corpus
khalifa = df[df['Username']== "Wiz Khalifa"]
khalifa_corpus = corpus(khalifa)

text_file = open("Wiz Khalifa.txt", "w")
n = text_file.write(khalifa_corpus)
text_file.close()


In [None]:
# train the model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Wiz Khalifa.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00, 722.28it/s]

Loading dataset...
dataset has 11544 tokens
Training...





[10 | 878.06] loss=4.03 avg=4.03
[20 | 1741.04] loss=2.97 avg=3.50
[30 | 2604.42] loss=2.06 avg=3.02
[40 | 3462.18] loss=1.16 avg=2.55
[50 | 4333.96] loss=0.34 avg=2.10
[60 | 5216.81] loss=0.10 avg=1.76
[70 | 6100.20] loss=0.17 avg=1.52
[80 | 6983.11] loss=0.04 avg=1.33
[90 | 7856.18] loss=0.03 avg=1.18
[100 | 8711.44] loss=0.04 avg=1.06
Saving checkpoint/run1/model-100


In [None]:
# generate text
khalifa_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export the generated text
khalifa_pickle_out = open('khalifa_text_generation.pickle', 'wb')
pickle.dump(khalifa_text_generation, khalifa_pickle_out)
khalifa_pickle_out.close()

## jimmy fallon Text Generation

In [None]:
# select the text corpus
fallon = df[df['Username']== "jimmy fallon"]
fallon_corpus = corpus(fallon)

text_file = open("jimmy fallon.txt", "w")
n = text_file.write(fallon_corpus)
text_file.close()


In [None]:
# train the model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "jimmy fallon.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


dataset has 25204 tokens
Training...
[10 | 1266.20] loss=3.60 avg=3.60
[20 | 2529.38] loss=2.70 avg=3.15
[30 | 3803.74] loss=2.38 avg=2.89
[40 | 5076.73] loss=1.87 avg=2.63
[50 | 6338.29] loss=1.12 avg=2.33
[60 | 7614.74] loss=0.85 avg=2.07
[70 | 8864.07] loss=0.68 avg=1.87
[80 | 10120.50] loss=0.53 avg=1.70
[90 | 11378.56] loss=0.36 avg=1.54
[100 | 12630.32] loss=0.20 avg=1.40
Saving checkpoint/run1/model-100


In [None]:
# generate text with the model
fallon_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export the data
fallon_pickle_out = open('fallon_text_generation.pickle', 'wb')
pickle.dump(fallon_text_generation, fallon_pickle_out)
fallon_pickle_out.close()

## Harry Styles Text Generation

In [None]:
# select the text corpus
styles = df[df['Username']== "Harry Styles."]
styles_corpus = corpus(styles)

text_file = open("Harry Styles.txt", "w")
n = text_file.write(styles_corpus)
text_file.close()


In [None]:
# train the model
tf.reset_default_graph()
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              "Harry Styles.txt",
              model_name='124M',
              restore_from='fresh',
              print_every=10,
              steps=100)


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


100%|██████████| 1/1 [00:00<00:00,  6.90it/s]

Loading dataset...
dataset has 14557 tokens
Training...





[10 | 1357.40] loss=2.96 avg=2.96
[20 | 2722.76] loss=2.21 avg=2.59
[30 | 4060.14] loss=1.48 avg=2.21
[40 | 5405.45] loss=1.12 avg=1.94
[50 | 6749.38] loss=0.69 avg=1.68
[60 | 8091.33] loss=0.41 avg=1.47
[70 | 9433.55] loss=0.16 avg=1.27
[80 | 10787.51] loss=0.07 avg=1.12
[90 | 12138.07] loss=0.07 avg=1.00
[100 | 13486.10] loss=0.05 avg=0.90
Saving checkpoint/run1/model-100


In [None]:
# generate text with model
styles_text_generation = gpt2.generate(sess, run_name='run1', nsamples=5, batch_size=5, return_as_list=True, temperature=0.7)

In [None]:
# export data
styles_pickle_out = open('styles_text_generation.pickle', 'wb')
pickle.dump(styles_text_generation, styles_pickle_out)
styles_pickle_out.close()