Ash Rai <br>
CSC 675, Spring 2022

# Project 4: Text Generation with LSTM and Transformers Networks

## Initial Experimentation (Section 8.1 Implementation)

Reweighting a probability distribution to a different temperature

In [107]:
import numpy as np

def reweight_distribution(original_distribution, temperature=0.5):
    distribution = np.log(original_distribution) / temperature
    distribution = np.exp(distribution)
    return distribution / np.sum(distribution)

### Implementing character-level LSTM text generation

#### Preparing the Data

Downloading and parsing the initial text file

In [109]:
import tensorflow.keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893


Vectorizing sequences of characters

In [110]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


#### Building the Network

Single-layer LSTM model for next-character prediction

In [112]:
from tensorflow.keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Model compilation configuration

In [113]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

#### Training and Sampling

Function to sample the next character given the model’s predictions

In [114]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Text-generation loop

In [115]:
import random
import sys

def run_text_generation(epoch_size, generation_size):
    accuracy_values = []
    loss_values = []
    for epoch in range(1, epoch_size):
        print()
        print('------------- epoch', epoch, '-------------')
        # Fit the model for 1 epoch on the available training data
        history = model.fit(x, y,
                  batch_size=128,
                  epochs=1)

        # Select a text seed at random
        start_index = random.randint(0, len(text) - maxlen - 1)
        generated_text = text[start_index: start_index + maxlen]
        print('--- Generating with seed: "' + generated_text + '"')

        for temperature in [0.2, 0.5, 1.0, 1.2]:
            print('------ temperature:', temperature)
            sys.stdout.write(generated_text)

            # We generate 400 characters
            for i in range(generation_size):
                sampled = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1.

                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = chars[next_index]

                generated_text += next_char
                generated_text = generated_text[1:]

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()

In [116]:
run_text_generation(60, 400)


------------- epoch 1 -------------
--- Generating with seed: "indly patronage and defense of whatever
is misunderstood and"
------ temperature: 0.2
indly patronage and defense of whatever
is misunderstood and the present and the men of the one indection of the men in the desire and who have have and the super and which he sensed the must attering the problem and such and the one as the and with the sensibe the supers and which is all the super the sense the precience of the mand the super the one as the all the perhaps and the restrestion of the and such and the were of the reselves the and such and t
------ temperature: 0.5
the and such and the were of the reselves the and such and the man eselfing the subject of the liutlity, and the restiny it is the indeces has decited and chat all the a who have in the langer of the god the decention of a thing as the mands of the present and have at last extrement that the great of the all the propersion of the something is all more the world h

  This is separate from the ipykernel package so we can avoid doing imports until


ls in the s

------------- epoch 15 -------------
--- Generating with seed: "out
among the instincts, and that the foundation of the emot"
------ temperature: 0.2
out
among the instincts, and that the foundation of the emotion of the peris and surplies of the same the most philosophy is a surply and self-destence, and the surplies of the surplined by the self-power of the subject of the spiritual of the sense of the same and the self-point of the sense of the surplies of the subject of the same and the self-desired of the soul and the self-point and consideration of the soul and standard of the sense of the spiritua
------ temperature: 0.5
ration of the soul and standard of the sense of the spiritual of comparison" and the soul higher and accompanimion of the desire to acts of the pobiseable of a healthing in the philosophy of the doubtht
the friends and discourse, and consideration to a far to be mankind of promises when he was self-destruction of a belief in all the strong to power, 

## Testing a Different Text Dataset

Since Nietzsche was a philospher/critic from the 1800s, I would expect his text to have a particular stern and serious tone. Since this is the training dataset we trained out initial model on, we can see that our text generation also has a serious tone.

For this reason, I chose a second dataset that is quite different. I will be using a jokes dataset with a contemporary tone.

Reading dataset:

In [129]:
import csv

file = open('shortjokes.csv')
csvreader = csv.reader(file)
text = ''
jokes_sample_size = 7000
jokes_count = 0

for row in csvreader:
    jokes_count += 1
    if jokes_count > jokes_sample_size:
        break
    text += row[1] + ' '

print('Corpus length:', len(text))

Corpus length: 656510


Vectorizing sequences of characters for jokes text

In [130]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 218817
Unique characters: 93
Vectorization...


### Using the same initial model architecture

Using the same architecture, and the same hyper paramaters, I run the model again. I clear the session to reset the learned weights

In [13]:
from tensorflow.keras import layers

keras.backend.clear_session()

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [14]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               113664    
_________________________________________________________________
dense (Dense)                (None, 93)                11997     
Total params: 125,661
Trainable params: 125,661
Non-trainable params: 0
_________________________________________________________________


In [29]:
run_text_generation(60, 75)


------------- epoch 1 -------------
--- Generating with seed: "y when we visited the Alpaca Farm, next time Alpaca lunch. T"
------ temperature: 0.2
y when we visited the Alpaca Farm, next time Alpaca lunch. The rack so say to the bar baby and a beath and a bill of the girlfriend and
------ temperature: 0.5
to the bar baby and a beath and a bill of the girlfriend and the is at the bar baby to it the shame both to the right and a dead but th
------ temperature: 1.0
bar baby to it the shame both to the right and a dead but theiljower when you're my indice realiymeoon't I desent mean, it fuscting som
------ temperature: 1.2
ou're my indice realiymeoon't I desent mean, it fuscting something: Mp? Humoptence hang" ain" Jeofes over. Cray is juicouforeal* Whol, 

------------- epoch 2 -------------
--- Generating with seed: "s so fat her butt is the butt of every joke. What do you cal"
------ temperature: 0.2
s so fat her butt is the butt of every joke. What do you call a baby. What do you ca

  This is separate from the ipykernel package so we can avoid doing imports until


 news! What do
------ temperature: 1.0
o get to she wonder the difference between the news! What do you do this great in the profisharnedary. He periows like. though to the I
------ temperature: 1.2
eat in the profisharnedary. He periows like. though to the If a leithely. Conkeaname. I play  SHATI ARvars Bed. watching If he doah Lei

------------- epoch 52 -------------
--- Generating with seed: "gg adopted a child... they could call it Slush Puppy :) I've"
------ temperature: 0.2
gg adopted a child... they could call it Slush Puppy :) I've like to see the same and a sex in the street and a street in the friend to
------ temperature: 0.5
e same and a sex in the street and a street in the friend to get the movie girl off a lot because they are to realized I are a progreja
------ temperature: 1.0
girl off a lot because they are to realized I are a progrejabch ups sttrain for "Me: You movie and Hosed a lot over kitch badther's sou
------ temperature: 1.2
 for "Me: You movie and Hosed a lo

Saving this initial model

In [30]:
model.save('initial_jokes_model.h5')

Writing a function to complete a joke that is passed into the model

In [31]:
def complete_joke(seed_joke, generation_size):
    original_seed_joke = seed_joke
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        seed_joke = original_seed_joke
        sys.stdout.write(seed_joke)

        for i in range(generation_size):
            sampled = np.zeros((1, len(seed_joke), len(chars)))
            for t, char in enumerate(seed_joke):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            seed_joke += next_char
            seed_joke = seed_joke[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

In [35]:
complete_joke('There are three types of people you always meet in the world', 100)

------ temperature: 0.2
There are three types of people you always meet in the world and a bad for a street and a bar than the difference between a dog beer the short to my face of the
------ temperature: 0.5
There are three types of people you always meet in the world.. It was free says to the second beat to make a fisher and says, "Where do you know what do you cal
------ temperature: 1.0
There are three types of people you always meet in the world racist That's next to Stew BIBAARY Lal Sowang joke! Toast the haleriour, the cheer guess 'ex who ne
------ temperature: 1.2
There are three types of people you always meet in the world? Burs joke's favorite paints or asks my one Two nerdanc!? bittlechate.lig.och says houating? You aw


In [36]:
complete_joke('A dog, a cat, and a giraffe walk into an empty bar and order', 100)

------ temperature: 0.2
A dog, a cat, and a giraffe walk into an empty bar and order and a bar before I was all the best been a bathroom and a bar than a telley and a bad drink, and my
------ temperature: 0.5
A dog, a cat, and a giraffe walk into an empty bar and order backwards up

  This is separate from the ipykernel package so we can avoid doing imports until


 to a said of other be going to be a bears of a sell are been an aircer. How many and a
------ temperature: 1.0
A dog, a cat, and a giraffe walk into an empty bar and order... Oy road oed it's mase beliens like tminhs ate the sucked for a right? Coket!!!." Two moutle... f
------ temperature: 1.2
A dog, a cat, and a giraffe walk into an empty bar and order? San excited* What please can say *ivagitists? Dete. "Hhis car necusinee I got too so-under. 2proci


Comparing with the text from the last epoch of the nietzsche dataset model

In [37]:
complete_joke('stray even in its own path and is sitting intoxicated in som', 100)

------ temperature: 0.2
stray even in its own path and is sitting intoxicated in someone is a bird and a street and a stand to see his dog of the boobs and a baby. I have a band the ba
------ temperature: 0.5
stray even in its own path and is sitting intoxicated in someone was the other day of the stop say to the chick. The footboon things and leave the extre the pig
------ temperature: 1.0
stray even in its own path and is sitting intoxicated in something. As santainds twit rape to find a band knowers-warm wearing? I'm in my che*..... I don't don'
------ temperature: 1.2
stray even in its own path and is sitting intoxicated in some and boy chaTish nexscale recens have in out "recally with a like these rux on stuokan they made ou


### Adding more epochs

I decided to run it for a further 40 epochs on the initial model trained for 60 epochs. This pushes the total number of epochs to 100. I want to evaluate if that improves the text generation.

In [38]:
run_text_generation(40, 75)


------------- epoch 1 -------------
--- Generating with seed: " there? * It's the police. We have received complaints about"
------ temperature: 0.2
 there? * It's the police. We have received complaints about the first because they gotten to get a comedy when I was a bad man with a 
------ temperature: 0.5
use they gotten to get a comedy when I was a bad man with a house will say for a having trees The friend are for your bad baby of it to
------ temperature: 1.0
for a having trees The friend are for your bad baby of it to there and right out rullow friend What's nodisto why mom Jesush. You look 
------ temperature: 1.2
t out rullow friend What's nodisto why mom Jesush. You look so a whore long says, "Many in an Acaxion 12 tir things plan's 2 op blondes

------------- epoch 2 -------------
--- Generating with seed: "p the room now. This bank pen tastes like it's been in a lot"
------ temperature: 0.2
p the room now. This bank pen tastes like it's been in a lot of people and a polar f

  This is separate from the ipykernel package so we can avoid doing imports until


 when by magratosy 

------------- epoch 30 -------------
--- Generating with seed: "The Cat in the Hat" is a lesson to your kids on how to throw"
------ temperature: 0.2
The Cat in the Hat" is a lesson to your kids on how to throw a sex in the other day in the party with a pise of the bartender to see if
------ temperature: 0.5
ther day in the party with a pise of the bartender to see if you could had a pencic? and play they could have been back and traveler of
------ temperature: 1.0
a pencic? and play they could have been back and traveler of by turns to me to super. I. Sch. k!!! What does the piggy up and play and 
------ temperature: 1.2
 to super. I. Sch. k!!! What does the piggy up and play and your coila's everyone's merrist? damn goes ate the robits and wasn't let de

------------- epoch 31 -------------
--- Generating with seed: " Burton have split up! It's a bit of a Nightmare before Chri"
------ temperature: 0.2
 Burton have split up! It's a bit of a Nightmare before Christ

Saving model

In [40]:
model.save('initial_model_jokes_100_epochs.h5')

In [41]:
complete_joke('There are three types of people you always meet in the world', 100)

------ temperature: 0.2
There are three types of people you always meet in the world of the police because they can't call the student the police because it is the difference between a
------ temperature: 0.5
There are three types of people you always meet in the world in the seth child to be the most with my corn the cold and her *driving on a clowners and to a dead
------ temperature: 1.0
There are three types of people you always meet in the world lookine I was gur to drink his argually? They were femining. What do you call in what doesn't kitti
------ temperature: 1.2
There are three types of people you always meet in the world that she work kn? dralgfre." !quecky just because they were uviaun' for back....... Day Hallowy let


In [42]:
complete_joke('A dog, a cat, and a giraffe walk into an empty bar and order', 100)

------ temperature: 0.2
A dog, a cat, and a giraffe walk into an empty bar and orders? What do you call a street who want to the first was a scread and says "I have a bar the student t
------ temperature: 0.5
A dog, a cat, and a giraffe walk into an empty bar and order. If you guys for a blind door and five in the police since the police country in the life when they
------ temperature: 1.0
A dog, a cat, and a giraffe walk into an empty bar and orders, but didn't only means whet milk. What's Bra misting. Why did the reducely help'olbox. Why is it. 
------ temperature: 1.2
A dog, a cat, and a giraffe walk into an empty bar and order... bust It was seid. What do I Druna eecneves, "Bouthes Bak

  This is separate from the ipykernel package so we can avoid doing imports until


inh call? A Halloween I  Hij:m"" help! A


In [43]:
complete_joke('stray even in its own path and is sitting intoxicated in som', 100)

------ temperature: 0.2
stray even in its own path and is sitting intoxicated in someone is a shower and a baby is a banang to the chicken says "that the police for a strangers and a b
------ temperature: 0.5
stray even in its own path and is sitting intoxicated in some marrian a short does it take to screw in a light bulb? On the first there is a girl when they can 
------ temperature: 1.0
stray even in its own path and is sitting intoxicated in someone just so gay cra-trum. Hatesiden procre crout 

  This is separate from the ipykernel package so we can avoid doing imports until


store this is 20 the then When everyone fat about 
------ temperature: 1.2
stray even in its own path and is sitting intoxicated in someone say of the busisides, Y"! ""That for cat aser's yestroons... Man's guy! They fine men it's son 


### Building a more complex LSTM model 
We further add representational power to our model by adding another LSTM layer. Dropout layers with the droupout rate of 0.1 were also added.

The architecture was inspired from: https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/. However, I got terrible results with the suggested initial hyper parameters of 256 nodes LSTM layers and 2 Dropout layers with 0.2. Hence, I scaled down my model to 128 LSTM layers and 0.1 drouput rate.

In [135]:
from tensorflow.keras import layers

keras.backend.clear_session()

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars)), return_sequences=True))
model.add(layers.Dropout(0.1))
model.add(layers.LSTM(128))
model.add(layers.Dropout(0.1))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [136]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 60, 128)           113664    
_________________________________________________________________
dropout (Dropout)            (None, 60, 128)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 93)                11997     
Total params: 257,245
Trainable params: 257,245
Non-trainable params: 0
_________________________________________________________________


In [49]:
run_text_generation(30, 75)


------------- epoch 1 -------------
--- Generating with seed: "y to Trump? show me your schlong form birth certificate I th"
------ temperature: 0.2
y to Trump? show me your schlong form birth certificate I the for the boor and the boor of the good man and a bar and to be to start to
------ temperature: 0.5
and the boor of the good man and a bar and to be to start to know to be a but to hind a bart and the man a has to like a say and a bars
------ temperature: 1.0
ut to hind a bart and the man a has to like a say and a barstal and is now. Poto good about why baby like now wonder worght to my froit
------ temperature: 1.2
 Poto good about why baby like now wonder worght to my froit arn taby with onite dechmpanthargeding. Yo mame and maclies, and aspirked 

------------- epoch 2 -------------
--- Generating with seed: "ving butt sex with a guy? It feels really good until you loo"
------ temperature: 0.2
ving butt sex with a guy? It feels really good until you looks to the probably sex a

  This is separate from the ipykernel package so we can avoid doing imports until


nd not good When it have "the go There
------ temperature: 1.2
at is the differents and not good When it have "the go There blediding flese worda zience What if lint womon pboctury. What out. [bo"in

------------- epoch 3 -------------
--- Generating with seed: "uy: 34C. If you don't like oral sex You should keep your mou"
------ temperature: 0.2
uy: 34C. If you don't like oral sex You should keep your mouth to the support an a bar... I was a bar that when you the part to the par
------ temperature: 0.5
rt an a bar... I was a bar that when you the part to the part in the other an any potter I don't don't leannact of the bottle dods in p
------ temperature: 1.0
an any potter I don't don't leannact of the bottle dods in plean: donbably fieure 3thrictbreset. Sure anyther bid the sheep? Who the li
------ temperature: 1.2
fieure 3thrictbreset. Sure anyther bid the sheep? Who the lie *Wait look CorracocI natgerar dean! On block ano cBut Fitter puame, smept

------------- epoch 4 ------------

KeyboardInterrupt: 

In [47]:
model.save('double_lstm_double_dropout_30_epochs.h5')

Initially, I ran this model at 50 epochs. However, I noticed that after about 20 epochs, the model's loss started diverging quite drastically after about 18 epochs. Until about 20 epochs, the quality of the text was getting progressively better as well. However, after that, the text started getting worse and by the end of the 60th epoch, it was mostly gibberish.

In [54]:
run_text_generation(18, 75)


------------- epoch 1 -------------
--- Generating with seed: "! Which mafia boss came with all the dlc? John Goty Yo mama "
------ temperature: 0.2
! Which mafia boss came with all the dlc? John Goty Yo mama of the come the said a light the man a light to some to did the said and th
------ temperature: 0.5
 said a light the man a light to some to did the said and the lead. This mear? The light with the bord his the way priend e: What do yo
------ temperature: 1.0
ar? The light with the bord his the way priend e: What do you was dack you are mandite care, we does it some to storches mapontar. Mzmo
------ temperature: 1.2
are mandite care, we does it some to storches mapontar. Mzmonictiwion, week and and crewm, seved Rayn-als-MmmEnan all it has a gumm-dad

------------- epoch 2 -------------
--- Generating with seed: "itude". You know you are applying to be a corrections office"
------ temperature: 0.2
itude". You know you are applying to be a corrections office and a proble the work t

  This is separate from the ipykernel package so we can avoid doing imports until


o you bould be an at face the b
------ temperature: 1.0
ing smars with my people to do you bould be an at face the bean about polsors. Lake. Cike:." *I took reale. Any nargey the professas te
------ temperature: 1.2
rs. Lake. Cike:." *I took reale. Any nargey the professas techopar for it women? A: "6lmw Dodynednaught oria? Asscrop joke." Mop: Howed

------------- epoch 5 -------------
--- Generating with seed: "hen I let u crash at my place and u said u owed me one G: ye"
------ temperature: 0.2
hen I let u crash at my place and u said u owed me one G: year a light the bard and a the start the computer and a bather the bear the 
------ temperature: 0.5
bard and a the start the computer and a bather the bear the retire and a flied for going to too the men the bear and not to seen me to 
------ temperature: 1.0
ied for going to too the men the bear and not to seen me to burlonate other whodelbulb you sue. Who's a food laft you tree. It would le
------ temperature: 1.2
 whodelbulb you su

In [55]:
model.save('double_lstm_double_dropout_20_epochs.h5')

In [56]:
complete_joke('There are three types of people you always meet in the world', 100)

------ temperature: 0.2
There are three types of people you always meet in the world was the tars and the work to the word to the work of the stand to the best the bears with a bartend
------ temperature: 0.5
There are three types of people you always meet in the world and and a dog baby. What do you call a b

  This is separate from the ipykernel package so we can avoid doing imports until


eaurd weird to not a night of the time suppore actually was
------ temperature: 1.0
There are three types of people you always meet in the world the ravered can't end it treatchoued. Me: I like the Sanssertacuse Ehliwill ysarut meaps Hightenred
------ temperature: 1.2
There are three types of people you always meet in the world. me: yous a tulled Nowing that than a lage .Bepiie.? frinaurders the moine, but no doctors move?" I


In [57]:
complete_joke('A dog, a cat, and a giraffe walk into an empty bar and order', 100)

------ temperature: 0.2
A dog, a cat, and a giraffe walk into an empty bar and order what do you have a chicken wa

  This is separate from the ipykernel package so we can avoid doing imports until


s the work of the bartender say a watch a bar the work in the bartende
------ temperature: 0.5
A dog, a cat, and a giraffe walk into an empty bar and ordert to the man is the pilate he want to subles in the chicken worm for a pat of a bad can a with the r
------ temperature: 1.0
A dog, a cat, and a giraffe walk into an empty bar and orders emgmeaning. What did the gain here? if you screpted up...   gually? They didn't make my of donil? 
------ temperature: 1.2
A dog, a cat, and a giraffe walk into an empty bar and order: 2 coolr sBocky Tet candyd. u look untellame, tracking ea.....," tyl I chase every? Because I Heopl


In [58]:
complete_joke('stray even in its own path and is sitting intoxicated in som', 100)

------ temperature: 0.2
stray even in its own path and is sitting intoxicated in someone of the stand to the world and the girlfrien

  This is separate from the ipykernel package so we can avoid doing imports until


d the part with a bears in the antister the was the 
------ temperature: 0.5
stray even in its own path and is sitting intoxicated in some of he was a drange telling so good to be your ass did you have a bar belong and a busine should sh
------ temperature: 1.0
stray even in its own path and is sitting intoxicated in some Oben feel and was 2nlaeeingrating the wife looking in a come with irenting for better jokes apable
------ temperature: 1.2
stray even in its own path and is sitting intoxicated in somfunking like being dom MeFsy" pow hall a Grarged I've have only a has tarsing on her... hide fread! 


### Text Generation through Bidirectional LSTM model

Since bidirectional LSTMs can be used to train from two sides, it could provide a different context to the letters. A model of similar complexity as the previous one, but with the first LSTM layer being bidirectional is built.

Reference: https://towardsdatascience.com/nlp-text-generation-through-bidirectional-lstm-model-9af29da4e520

In [131]:
from tensorflow.keras import layers

keras.backend.clear_session()

model = keras.models.Sequential()
model.add(layers.Bidirectional(layers.LSTM(128, input_shape=(maxlen, len(chars)), return_sequences=True)))
model.add(layers.Dropout(0.1))
model.add(layers.LSTM(128))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [134]:
model.build(input_shape=(None, maxlen, len(chars)))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional (Bidirectional multiple                  227328    
_________________________________________________________________
dropout (Dropout)            multiple                  0         
_________________________________________________________________
lstm_1 (LSTM)                multiple                  197120    
_________________________________________________________________
dense (Dense)                multiple                  11997     
Total params: 436,445
Trainable params: 436,445
Non-trainable params: 0
_________________________________________________________________


In [61]:
import tensorflow

x = tensorflow.cast(x, tensorflow.float32)
y = tensorflow.cast(y, tensorflow.float32)
run_text_generation(50, 75)


------------- epoch 1 -------------
--- Generating with seed: " mourning wood. What do gnomes fear most about Christmas? Th"
------ temperature: 0.2
 mourning wood. What do gnomes fear most about Christmas? The bark I have a pirst the bart the bear a poople the bark the so the common
------ temperature: 0.5
 pirst the bart the bear a poople the bark the so the commone who like the intime the birst who betting to got the time a pant the so a
------ temperature: 1.0
intime the birst who betting to got the time a pant the so after is the hice shamosies date I netting deas lastung you need Ddnastalis 
------ temperature: 1.2
e shamosies date I netting deas lastung you need Ddnastalis is frew chaiouse flonie TE1Otie: SORUHATME I 1: now i down beader" "Jundly 

------------- epoch 2 -------------
--- Generating with seed: "e and E.T? E.T learned English and wanted to go home. What d"
------ temperature: 0.2
e and E.T? E.T learned English and wanted to go home. What do you call a but the sto

  This is separate from the ipykernel package so we can avoid doing imports until


can then words the paent. Lessung: ["N
------ temperature: 1.2
all dukling I'll sa bacan then words the paent. Lessung: ["Naad a" *will mo? Markd? Hid guye the yeal idiiniby, unth GOTSeDTCS UtININB:

------------- epoch 6 -------------
--- Generating with seed: "a rooster with erectile dysfunction? Boneless chicken. Seize"
------ temperature: 0.2
a rooster with erectile dysfunction? Boneless chicken. Seize when you can think the son a perican and say to the common to the best con
------ temperature: 0.5
hink the son a perican and say to the common to the best con to the bardams to a coach on the start in a room today to get to have a ro
------ temperature: 1.0
 to a coach on the start in a room today to get to have a roas by docogy. What's eleving jes day it losters out exoof neal or star up f
------ temperature: 1.2
hat's eleving jes day it losters out exoof neal or star up for recuring to love. Fandfret. Nell" Khave onets eegiony go-? He met keirib

------------- epoch 7 ------------

KeyboardInterrupt: 

Interrupted run at 50 epochs because the loss started converging a lot, and generated quality rapidly dropped. Reran the model with only 10 epochs.

In [63]:
run_text_generation(10, 75)


------------- epoch 1 -------------
--- Generating with seed: "ng stupid Many women say a guy who makes them laugh is all t"
------ temperature: 0.2
ng stupid Many women say a guy who makes them laugh is all the bar and a bar to the ban and a pering and the man and a boople to a bar 
------ temperature: 0.5
r to the ban and a pering and the man and a boople to a bar so was a caller to was into the was to up the off to makic a spile off a mo
------ temperature: 1.0
 to was into the was to up the off to makic a spile off a moytlican are adds? Oh Bruemman licalicon. I off thas's the ? My Pild Fiodds 
------ temperature: 1.2
s? Oh Bruemman licalicon. I off thas's the ? My Pild Fiodds me Oh you to your intingite Ttad Jamer whick his forridd reals inpart What 

------------- epoch 2 -------------
--- Generating with seed: "ne causes a lot of pain and makes a constant high pitched wh"
------ temperature: 0.2
ne causes a lot of pain and makes a constant high pitched when you the most seen to 

  This is separate from the ipykernel package so we can avoid doing imports until


he sone thing the money? One this is the lot of and such there is the 
------ temperature: 1.0
ing the money? One this is the lot of and such there is the tally through. If my blo thin the molding goat peoole there apped. Talkle h
------ temperature: 1.2
If my blo thin the molding goat peoole there apped. Talkle hik, stitara guys excupter blisonh latel Loly. No. then staght. You've lot i

------------- epoch 5 -------------
--- Generating with seed: " to hell, there will be a 6-year-old pushing a shopping cart"
------ temperature: 0.2
 to hell, there will be a 6-year-old pushing a shopping cart the burger and a black the said the park to be the people on the short to 
------ temperature: 0.5
 a black the said the park to be the people on the short to be been and a cub today the start of a sangerary most who was for a polerac
------ temperature: 1.0
ub today the start of a sangerary most who was for a poleracer "cent what are in the ig year. West I his guy export. Walt not with a si
---

In [64]:
model.save('bidirectional_lstm.h5')

In [68]:
complete_joke('There are three types of people you always meet in the world', 100)

------ temperature: 0.2
There are three types of people you always meet in the world to start the store to the best the pickles of the bar and a bar the stroke and a store the back of 
------ temperature: 0.5
There are three types of people you always meet in the world to store to make for a most of a long sintic the burce of the only the always was i

  This is separate from the ipykernel package so we can avoid doing imports until


n a dick and a d
------ temperature: 1.0
There are three types of people you always meet in the world of the incopa gard, surpers papers hear I got man any her ups. Retouling to pet droming as not work
------ temperature: 1.2
There are three types of people you always meet in the world's Skebuls-fam long into Yo Idooms that a new has mow soo. What do you need from homes. lucker pakes


In [69]:
complete_joke('A dog, a cat, and a giraffe walk into an empty bar and order', 100)

------ temperature: 0.2
A dog, a cat, and a giraffe walk into an empty bar and order the stand of the store to start the not to the bartender of the life was a strand of a store that t
------ temperature: 0.5
A dog, a cat, and a giraffe walk into an empty bar and orders and his racist is complay in the time of it have says to the looge to the stroke? A not joke of th
------ temperature: 1.0
A dog, a cat, and a giraffe walk into an empty bar and ordering a mass? A: N stroke to do I cell, as the srightbulb.

  This is separate from the ipykernel package so we can avoid doing imports until


 Americans shakes about without her plays of
------ temperature: 1.2
A dog, a cat, and a giraffe walk into an empty bar and order: knew bulb? Wank. A Snanmoto. Make. ? MY Md, but line Bwenal go about Phobed keep hole cur man Mar 


In [70]:
complete_joke('stray even in its own path and is sitting intoxicated in som', 100)

------ temperature: 0.2
stray even in its own path and is sitting intoxicated in some convent of the store the time of a computer of the police of the bartender of the bartender of the
------ temperature: 0.5
stray even in its own path and is sitting intoxicated in some because if they stop a stroke to stop stand the poother that should be the first to t

  This is separate from the ipykernel package so we can avoid doing imports until


he computer a
------ temperature: 1.0
stray even in its own path and is sitting intoxicated in some naite." "You have to so serviewe hat you can't will an urd.... So if you u tuads home would so wet
------ temperature: 1.2
stray even in its own path and is sitting intoxicated in someooo greep. Just and a why would seel siaver... Because happy to start asulamesly, I'm jose. Why did


### Implemeting Transformer based miniature GPT model for text generation

For comparison, I also implemented a transformer based model. The model I chose was a miniature GPT model based.

Reference: https://keras.io/examples/generative/text_generation_with_miniature_gpt/


Setup

In [117]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
import numpy as np
import os
import re
import string
import random

Implement a Transformer block as a layer


In [118]:
def causal_attention_mask(batch_size, n_dest, n_src, dtype):
    """
    Mask the upper half of the dot product matrix in self attention.
    This prevents flow of information from future tokens to current token.
    1's in the lower triangle, counting from the lower right corner.
    """
    i = tf.range(n_dest)[:, None]
    j = tf.range(n_src)
    m = i >= j - n_src + n_dest
    mask = tf.cast(m, dtype)
    mask = tf.reshape(mask, [1, n_dest, n_src])
    mult = tf.concat(
        [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)], 0
    )
    return tf.tile(mask, mult)


class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = layers.MultiHeadAttention(num_heads, embed_dim)
        self.ffn = keras.Sequential(
            [layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs):
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]
        causal_mask = causal_attention_mask(batch_size, seq_len, seq_len, tf.bool)
        attention_output = self.att(inputs, inputs, attention_mask=causal_mask)
        attention_output = self.dropout1(attention_output)
        out1 = self.layernorm1(inputs + attention_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output)
        return self.layernorm2(out1 + ffn_output)

Implement an embedding layer


In [119]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

Implement the miniature GPT model


In [123]:
vocab_size = 30000  # Only consider the top 30k words
maxlen = 80  # Max sequence size
embed_dim = 256  # Embedding size for each token
num_heads = 2  # Number of attention heads
feed_forward_dim = 256  # Hidden layer size in feed forward network inside transformer


def create_model():
    inputs = layers.Input(shape=(maxlen,), dtype=tf.int32)
    embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
    x = embedding_layer(inputs)
    transformer_block = TransformerBlock(embed_dim, num_heads, feed_forward_dim)
    x = transformer_block(x)
    outputs = layers.Dense(vocab_size)(x)
    model = keras.Model(inputs=inputs, outputs=[outputs, x])
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    model.compile(
        "adam", loss=[loss_fn, None],
    )  # No loss and optimization based on word embeddings from transformer block
    return model


In [124]:
model = create_model()
model.summary()

Model: "model_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_10 (InputLayer)        [(None, 80)]              0         
_________________________________________________________________
token_and_position_embedding (None, 80, 256)           7700480   
_________________________________________________________________
transformer_block_9 (Transfo (None, 80, 256)           658688    
_________________________________________________________________
dense_31 (Dense)             (None, 80, 30000)         7710000   
Total params: 16,069,168
Trainable params: 16,069,168
Non-trainable params: 0
_________________________________________________________________


Prepare the dataset for word-level language modeling

In [82]:
import csv

file = open('shortjokes.csv')
csvreader = csv.reader(file)
jokes_text = []
jokes_sample_size = 10001
jokes_count = 0

for row in csvreader:
    jokes_count += 1
    if jokes_count > jokes_sample_size:
        break
    jokes_text.append(row[1])
    
jokes_text = jokes_text[1:]

In [84]:
batch_size = 128

# Create a dataset
text_ds = tf.data.Dataset.from_tensor_slices(jokes_text)
text_ds = text_ds.shuffle(buffer_size=256)
text_ds = text_ds.batch(batch_size)


def custom_standardization(input_string):
    """ Remove html line-break tags and handle punctuation """
    lowercased = tf.strings.lower(input_string)
    stripped_html = tf.strings.regex_replace(lowercased, "<br />", " ")
    return tf.strings.regex_replace(stripped_html, f"([{string.punctuation}])", r" \1")


# Create a vectorization layer and adapt it to the text
vectorize_layer = TextVectorization(
    standardize=custom_standardization,
    max_tokens=vocab_size - 1,
    output_mode="int",
    output_sequence_length=maxlen + 1,
)
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()  # To get words back from token indices


def prepare_lm_inputs_labels(text):
    """
    Shift word sequences by 1 position so that the target for position (i) is
    word at position (i+1). The model will use all words up till position (i)
    to predict the next word.
    """
    text = tf.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y


text_ds = text_ds.map(prepare_lm_inputs_labels)
text_ds = text_ds.prefetch(tf.data.AUTOTUNE)

Implement a Keras callback for generating text


In [104]:
class TextGenerator(keras.callbacks.Callback):
    """A callback to generate text from a trained model.
    1. Feed some starting prompt to the model
    2. Predict probabilities for the next token
    3. Sample the next token and add it to the next input

    Arguments:
        max_tokens: Integer, the number of tokens to be generated after prompt.
        start_tokens: List of integers, the token indices for the starting prompt.
        index_to_word: List of strings, obtained from the TextVectorization layer.
        top_k: Integer, sample from the `top_k` token predictions.
        print_every: Integer, print after this many epochs.
    """

    def __init__(
        self, max_tokens, start_tokens, index_to_word, top_k=10, print_every=1
    ):
        self.max_tokens = max_tokens
        self.start_tokens = start_tokens
        self.index_to_word = index_to_word
        self.print_every = print_every
        self.k = top_k

    def sample_from(self, logits):
        logits, indices = tf.math.top_k(logits, k=self.k, sorted=True)
        indices = np.asarray(indices).astype("int32")
        preds = keras.activations.softmax(tf.expand_dims(logits, 0))[0]
        preds = np.asarray(preds).astype("float32")
        return np.random.choice(indices, p=preds)

    def detokenize(self, number):
        return self.index_to_word[number]

    def on_epoch_end(self, epoch, logs=None):
        start_tokens = [_ for _ in self.start_tokens]
        if (epoch + 1) % self.print_every != 0:
            return
        num_tokens_generated = 0
        tokens_generated = []
        while num_tokens_generated <= self.max_tokens:
            pad_len = maxlen - len(start_tokens)
            sample_index = len(start_tokens) - 1
            if pad_len < 0:
                x = start_tokens[:maxlen]
                sample_index = maxlen - 1
            elif pad_len > 0:
                x = start_tokens + [0] * pad_len
            else:
                x = start_tokens
            x = np.array([x])
            y, _ = self.model.predict(x)
            sample_token = self.sample_from(y[0][sample_index])
            tokens_generated.append(sample_token)
            start_tokens.append(sample_token)
            num_tokens_generated = len(tokens_generated)
        txt = " ".join(
            [self.detokenize(_) for _ in self.start_tokens + tokens_generated]
        )
        print(f"generated text:\n{txt}\n")


# Tokenize starting prompt
word_to_index = {}
for index, word in enumerate(vocab):
    word_to_index[word] = index

start_prompt = "So there are three types of people you always meet in the world"
start_tokens = [word_to_index.get(_, 1) for _ in start_prompt.split()]
num_tokens_generated = 100
text_gen_callback = TextGenerator(num_tokens_generated, start_tokens, vocab)


Train the model

In [105]:
model = create_model()

model.fit(text_ds, verbose=2, epochs=25, callbacks=[text_gen_callback])

Epoch 1/25
79/79 - 20s - loss: 3.0822 - dense_27_loss: 3.0822
generated text:
[UNK] there are three types of people you always meet in the world a a a i the , and . .                                                                                            

Epoch 2/25
79/79 - 19s - loss: 1.5742 - dense_27_loss: 1.5742
generated text:
[UNK] there are three types of people you always meet in the world , but i 'm the other . .                                                                                             

Epoch 3/25
79/79 - 19s - loss: 1.3578 - dense_27_loss: 1.3578
generated text:
[UNK] there are three types of people you always meet in the world 's .                                                                                                   

Epoch 4/25
79/79 - 19s - loss: 1.2276 - dense_27_loss: 1.2276
generated text:
[UNK] there are three types of people you always meet in the world . .                                                                              

<tensorflow.python.keras.callbacks.History at 0x7fbddd691d30>

Running for further 25 epochs

In [106]:
model.fit(text_ds, verbose=2, epochs=25, callbacks=[text_gen_callback])

Epoch 1/25
79/79 - 19s - loss: 0.2251 - dense_27_loss: 0.2251
generated text:
[UNK] there are three types of people you always meet in the world revolves around him .                                                                                                 

Epoch 2/25
79/79 - 19s - loss: 0.2136 - dense_27_loss: 0.2136
generated text:
[UNK] there are three types of people you always meet in the world cup of people and now .                                                                                               

Epoch 3/25
79/79 - 19s - loss: 0.2048 - dense_27_loss: 0.2048
generated text:
[UNK] there are three types of people you always meet in the world revolves around him him .                                                                                                

Epoch 4/25
79/79 - 19s - loss: 0.1972 - dense_27_loss: 0.1972
generated text:
[UNK] there are three types of people you always meet in the world revolves around him him him .                            

<tensorflow.python.keras.callbacks.History at 0x7fbdd9888e80>