# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Seinfeld Script Generator

Notebook 4: Word Level RNN/LSTM Model

As discussed in notebook 3, this notebook will dedicate to build a RNN-LSTM model based on words, rather than characters. In addition, this model would aim to build a generator function that will generate the data on the fly, instead of processing the giant data altogether. With this function, I hope to solve the issue caused by hardware limitation. Furthermore, this model would also add a diversity function that enables to specify the level of creativity that the text will be generated upon.

In [1]:
import pandas as pd
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import keras.utils as ku 
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM, Dropout
from keras.optimizers import RMSprop
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, LambdaCallback

import json
import pickle
import sys

For the character-based model, there were only 59 tokens in total. However for the word-based model I'm building, there would be a lot more -- since every unique word would be a token. I therefore would like to use keras' tokenizer package streamline this process. To use the keras Tokenizer, I have to separate the texts in list first otherwise they would be tokenized on the character level. To make things easier, I'm using the data in```.csv```format. 

In [2]:
pd.set_option('display.max_colwidth', 0)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [4]:
df = pd.read_csv('../data/for_train.csv')
df.head()

Unnamed: 0,char_line
0,"jerry: you know, why we're here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about ""we should go out""? this is what they're talking about...this whole thing, we're all out now, no one is home. not one person here is home, we're all out! there are people tryin' to find us, they don't know where we are. (imitates one of these people ""tryin' to find us""; pretends his hand is a phone) ""did you ring?, i can't find him."" (imitates other person on phone) ""where did he go?"" (the first person again) ""he didn't tell me where he was going"". he must have gone out. you wanna go out: you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...there you're staring around, whatta you do? you go: ""we gotta be getting back"". once you're out, you wanna get back! you wanna go to sleep, you wanna get up, you wanna go out again tomorrow, right? where ever you are in life, it's my feeling, you've gotta go. (pete's luncheonette. jerry and george are sitting at a table.)"
1,"jerry: seems to me, that button is in the worst possible spot. (talking about george's shirt) the second button literally makes or breaks the shirt, look at it: it's too high! it's in no-man's-land, you look like you live with your mother."
2,george: are you through? (kind of irritated)
3,"jerry: you do of course try on, when you buy?"
4,"george: yes, it was purple, i liked it, i don't actually recall considering the buttons."


### Preprocessing

In [6]:
# adding new line mark at the end of each line so that the new line mark would also be tokenized
df = df['char_line'].map(lambda x: x+'\n')
df.head()

0    jerry: you know, why we're here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about "we should go out"? this is what they're talking about...this whole thing, we're all out now, no one is home. not one person here is home, we're all out! there are people tryin' to find us, they don't know where we are. (imitates one of these people "tryin' to find us"; pretends his hand is a phone) "did you ring?, i can't find him." (imitates other person on phone) "where did he go?" (the first person again) "he didn't tell me where he was going". he must have gone out. you wanna go out: you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...there you're staring around, whatta you do? you go: "we gotta be getting back". once you're out, you wanna get back! you wanna go to sleep, you wanna get up, you wanna go ou

In [7]:
# create a function to separate all the special characters, each will be treated as a unique token
# inspired by Shiva Verma's blog post
def seperate_punc(text):
    
    punc = ['...', '.', '[', ']', '(', ')', ';', ':', "'", '/', '"', ',', '?', '*', '!', '-', '$', '%', '&', '\n']
    for i in punc:
        text = text.replace(i, ' ' + i + ' ')
    text = text.replace('\n', '<NEWLINE>')
    return text

df = df.map(seperate_punc)

In [8]:
# check the status
df[:5]

0    jerry :  you know ,  why we ' re here ?  to be out ,  this is out  .  .  .  and out is one of the single most enjoyable experiences of life .  people  .  .  .  did you ever hear people talking about  " we should go out "  ?  this is what they ' re talking about  .  .  .  this whole thing ,  we ' re all out now ,  no one is home .  not one person here is home ,  we ' re all out !  there are people tryin '  to find us ,  they don ' t know where we are .   ( imitates one of these people  " tryin '  to find us "  ;  pretends his hand is a phone )   " did you ring ?  ,  i can ' t find him .  "   ( imitates other person on phone )   " where did he go ?  "   ( the first person again )   " he didn ' t tell me where he was going "  .  he must have gone out .  you wanna go out :  you get ready ,  you pick out the clothes ,  right ?  you take the shower ,  you get all ready ,  get the cash ,  get your friends ,  the car ,  the spot ,  the reservation  .  .  .  there you ' re staring around ,

In [9]:
# create a list of each preprocessed line to get tokenized and avoid being treated as character
corpus = []
for text in df:
    corpus.append(text)
corpus

['jerry :  you know ,  why we \' re here ?  to be out ,  this is out  .  .  .  and out is one of the single most enjoyable experiences of life .  people  .  .  .  did you ever hear people talking about  " we should go out "  ?  this is what they \' re talking about  .  .  .  this whole thing ,  we \' re all out now ,  no one is home .  not one person here is home ,  we \' re all out !  there are people tryin \'  to find us ,  they don \' t know where we are .   ( imitates one of these people  " tryin \'  to find us "  ;  pretends his hand is a phone )   " did you ring ?  ,  i can \' t find him .  "   ( imitates other person on phone )   " where did he go ?  "   ( the first person again )   " he didn \' t tell me where he was going "  .  he must have gone out .  you wanna go out :  you get ready ,  you pick out the clothes ,  right ?  you take the shower ,  you get all ready ,  get the cash ,  get your friends ,  the car ,  the spot ,  the reservation  .  .  .  there you \' re staring a

In [10]:
# instantiate the Tokenizer
# only look at the top 10,000 most frequent tokens, change the filters to include punctuations as well
tokenizer = Tokenizer(filters='', num_words=10_000, char_level=False)

In [11]:
def preprocessing(text):
    # fit the tokenizer on the corpus
    tokenizer.fit_on_texts(text)
    # turn each line into sequence with all tokens mapped to index
    token_list = tokenizer.texts_to_sequences(text)
    # find the number of total tokens, note that 0 is not included in tokenizer therefore the total_words should +1
    total_words = len(tokenizer.word_index) + 1
    
    # putting all the sequences back to one big chunk of list
    input_sequences = []
    for i in range(len(token_list)):
        input_sequences += token_list[i]
    return input_sequences, total_words

inp_sequences, total_words = preprocessing(corpus)

### Building Generator

In [12]:
# build the generator function to yield the results and save RAM space

def generator(input_sequences, max_len, total_words, batch_size=1024):

    while True:
        # randomly select a token from the input sequence
        index = np.random.randint(0,len(input_sequences) - max_len - batch_size - 1)
        
        # establish feature and label
        X = np.zeros((batch_size, max_len), dtype=int)
        y = []
        for i, num in enumerate(range(index, index + batch_size)):
            X[i] = input_sequences[num: num + max_len]
            y.append(input_sequences[num + max_len])
        # one-hot encode the label
        y = ku.to_categorical(y, num_classes=total_words) 
        
        yield X, y

Apart from the preprocessing, the logic behind word-based RNN-LSTM is the same as character-based. I would set a number of words that I would like the model to learn from and then train it to predict the very next word. This time I chose 40 as the sequence length in a hope to generate results that are more contextual.

In [13]:
max_len = 40
train = generator(inp_sequences, max_len, total_words)

### Modeling

In [14]:
def create_model(max_len, total_words):
    input_len = max_len
    model = Sequential()
    
    # Add Input Embedding Layer
    model.add(Embedding(10000, output_dim=40, input_length=input_len))
    
    # Add Hidden Layer - 2 LSTM Layers
    model.add(LSTM(512, dropout=0.1, recurrent_dropout=0.1, return_sequences=True))
    model.add(LSTM(512, dropout=0.1, recurrent_dropout=0.1))

    # Add Output Layer
    model.add(Dense(total_words, activation='softmax'))
    
    # using RMSprop per Francois Challot's, creator of Keras, suggestion in his textbook Deep Learning with Python
    model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.01))
    
    return model

model = create_model(max_len, total_words)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 40, 40)            400000    
_________________________________________________________________
lstm (LSTM)                  (None, 40, 512)           1132544   
_________________________________________________________________
lstm_1 (LSTM)                (None, 512)               2099200   
_________________________________________________________________
dense (Dense)                (None, 18725)             9605925   
Total params: 13,237,669
Trainable params: 13,237,669
Non-trainable params: 0
_________________________________________________________________


In [15]:
checkpoint = ModelCheckpoint('weights.hdf5', monitor='loss',
                             verbose=1, save_best_only=True,
                             mode='min')

# reduce learning rate when loss doesn't drop anymore.
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2,
                              patience=1, min_lr=0.001)

callbacks = [checkpoint, reduce_lr]

In [16]:
model.fit(train, 
          steps_per_epoch=2000, 
          epochs=20, 
          verbose=1, 
          callbacks=callbacks)

Epoch 1/20
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Epoch 00001: loss improved from inf to 4.62882, saving model to weights.hdf5
Epoch 2/20
Epoch 00002: loss improved from 4.62882 to 3.77096, saving model to weights.hdf5
Epoch 3/20
Epoch 00003: loss improved from 3.77096 to 3.72055, saving model to weights.hdf5
Epoch 4/20
Epoch 00004: loss improved from 3.72055 to 3.62527, saving model to weights.hdf5
Epoch 5/20
Epoch 00005: loss improved from 3.62527 to 3.56212, saving model to weights.hdf5
Epoch 6/20
Epoch 00006: loss improved from 3.56212 to 3.50924, saving model to weights.hdf5
Epoch 7/20
Epoch 0

<tensorflow.python.keras.callbacks.History at 0x7fa626d4bd10>

With google Cloud GPU, it took around 1015s to train an epoch. A total of 5.6h was used to train this model. My lowest loss was 3.13. After the 13th epoch, the loss is having trouble dropping. I tested with other architectures with more complex layer, however, 3.13 is the lowest I can get. I am still trying to figure out why as this is the case, as according to multiple researchers, the model should work well if the loss drops under 1.

### Scripts Generation

In [17]:
# build the function to enable prediction diversity
# credit to keras helper function https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    # adding temperature before the preds get exponentiated; with higher temperature, the variance will be bigger and vice versa.
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)   

In [18]:
# formatting function for generating scripts

def uppercase_char(generated):
    formatted = ''
    lines = generated.split('\n')
    # uppercase the name of the speaking character to be consistant with the original script
    for line in lines:
        char_line = line.split(':')
        # considering when the model doesn't predict anything in a line or only predict the line with no character
        if len(char_line) == 2:
            formatted_line = char_line[0].upper()+': '+char_line[1].strip().capitalize()+'\n'
            formatted_line = char_line[0].upper()+': '+char_line[1].strip().capitalize()+'\n'
        else:
            formatted_line = char_line[0].capitalize()+'\n'
        formatted += formatted_line
    return formatted

In [19]:
# build the generation function that takes in seed text(which character to start talking, how long to generate the script and how creative should the script be)

def generate_text(seed_text, next_words, model, max_len, temperature):
    
    generated = ''
    if seed_text:
        generated += seed_text.lower() + ' :'
    
    # if no seed text is provided, randomly select the top 20 characters with most lines
    else:
        characters = ['JERRY', 'GEORGE', 'ELAINE', 'KRAMER', 'NEWMAN', 'MORTY', 'HELEN',
       'FRANK', 'SUSAN', 'ESTELLE', 'PETERMAN', 'WOMAN', 'PUDDY', 'MAN',
       'JACK', 'UNCLE LEO', 'MICKEY', 'STEINBRENNER', 'DOCTOR', 'CLERK']
        seed_text = np.random.choice(characters)
        generated += seed_text.lower() + ' :'
    
    for i in range(next_words):
        token_list = tokenizer.texts_to_sequences([generated])[0]
        token_list = pad_sequences([token_list], maxlen=max_len)
        predicted = model.predict(token_list, verbose=0)[0]

        next_index = sample(predicted, temperature)
        # map the index back to the token using the tokenizer function
        next_word = tokenizer.index_word[next_index]

        generated += " " + next_word
        
        
    # format the generated texts
    # replace <newline> back to new line sign to achieve line breaking between generated scripts 
    generated = generated.replace('<newline>', '\n')
    
    # putting the rest of the punctuation back per grammar rules
    punc1 = ['.', ':', '!', ';', ')', ']', '?', ',', '%']
    for i in punc1:
        generated = generated.replace(' '+i, i)
    punc2 = ['[', '(', '$']    
    for i in punc2:
        generated = generated.replace(i+' ', i)
    punc3 = ["'", '-']    
    for i in punc3:
        generated = generated.replace(' '+i+' ', i)
    
    generated = uppercase_char(generated)
    
    return generated

### Evaluation

To test out the diversity function, I used temperature as low as 0.33 and as high as 1.2 to compare the difference. I tried out five characters from the show as the starting line, and generated 10 of 400-word length scripts.

From a format perspective, the generated scripts overall follows ```speaking character``` followed by```: ``` and the corresponding line. As punctuation was separated from the texts and added back later on, the model didn't learn how the punctuation works. The punctuation formatting was coded after the model was trained. With this regard, character-level has the advantage as special characters are trained along with regular ones.

Grammar-wise, no typo was found which was expected as the generated text were based on real word therefore the model didn't need to learn the spelling. Although still sometimes minor errors such as having special character in front of words. i.e ```*mr*!``` The scripts make grammatic sense and in a lot of the time make semantic sense as well, which is a big improvement from the character level model. However still in many scripts the characters did not finish a complete sentence before moving on to the next line. i.e.: ```i don't even know where the hell is--``` And there was also cases although rare, where below script was generated ```S-e-r-t-n-la-la-la-la-la-la-hoo-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t```

Content-wise, thanks to the diversity function I was able to specify how creative I would like the script to be. Based on the scripts generated, the lower the temperature is, the more likely to have singular prediction, as seen in the character level model. For those scripts fewer grammar errors were made and each line was complete with a meaning, although not much plot going on. With diveristy value over 1.2, many interesting scripts are generated, while at the cost of grammar accuracy. The model learned to generate most scene description inside the parenthese yet sometimes you can still find words that should definitely be inside the parentheses but were not, which is a result of data not being completely cleaned (as mentioned in the EDA section). However what I found very inspiring is that from some scripts, links between lines extended beyond not one, not two, but several lines.
> JERRY: The guy's giving your keys!<br>UNCLE LEO: I want it.. you have your hand with him?<br>JERRY: Yeah! come on. <br>ELAINE: Where? <br>JERRY: Pull? up down three cups.<br>HELEN: So how come you have your own keys?

Helen was able to use the word key again 5 lines after Jerry mentioned it. That's kind of amazing.


Below is an excerpt of some other conversations that I found interesting.
> GEORGE: You have a present? <br>ESTELLE: It's a hundred dollars.<br>ESTELLE: The Cadillac with the Seinfelds.<br>GEORGE: A little!<br>ESTELLE: (in front of the the-the white) Oh, the that's the!<br>ESTELLE: What is that?<br>FRANK: That's with the it! The key comes from the living room. It just happens to it! It's just for me!<br>ESTELLE: The Seinfelds couldn't hide it! This was the elevator!<br>ESTELLE: All right, I'll call you.<br>FRANK: (shouting) Elaine, where the hell is your son?

It doesn't make total sense however I just felt this conversation so Costanza! What's also amazing is that the model was able to learn that Frank, Estelle and George usually show up together!

> GEORGE: Man they ordered the Mets! who's your offence?<br>KRAMER: Well he was it. You know he has the yankees!<br>JERRY: Really? Most bucks?<br>KRAMER: (impressed) Oh, sure! They put some nuts. I told you. They're out..

They also (tried to) talk about sports...

> KRAMER: Or one day Jerry has dropping. They're you ladies, as they are still eighty. With one of your friends at Jerry's couch, Look at him. Kramer is down the road with the racquet. He gives out the lot of choice.

And this one sounds totally Kramer.

In conclusion, the model might need some more tuning to get a even better result but as far as what I have seen, I think it created a great source for inspiration and the most fascinating part is, the generated texts sound like those characters, in other words, consistant with the original. 

In [20]:
for i in range(10):
    print(generate_text('jerry', 400, model, 40, 0.33))
    print('======================')

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
JERRY: S-- motion.
JERRY: I'm not in the middle of the car.
GEORGE: I'm a.
JERRY: You know, i'm not a man.
GEORGE: I'm a.
JERRY: I'm not.
GEORGE: I'm not.
JERRY: I'm not.
GEORGE: I'm not.
JERRY: I'm not.
GEORGE: I'm a.
JERRY: I'm not the guy.
GEORGE: I'm not going to believe this, i'm going to the party.
JERRY: I'm not a.
GEORGE: You're not a.
JERRY: I'm not.
GEORGE: I'm sorry.
JERRY: I'm not.
GEORGE: You're not gonna believe.
JERRY: I'm not.
GEORGE: I'm not.
JERRY: You're not going to believe what's going on.
GEORGE: I'm not going to do anything.
JERRY: Well,

In [21]:
for i in range(10):
    print(generate_text('george', 400, model, 40, 1.2))
    print('======================')

GEORGE: In his clinic, i'm not giving him anything made away if a)
KRAMER: She has something more than that in lady-me in sleeves.
JERRY: I wish you're quite.
KRAMER: There's no guy. and you see? that's a shame in your apartment. okay, i'm sorry.
JERRY: I went down to the now's there,
NEWMAN: Look at that, ten minutes.
JERRY: Gimme this * my * cab this strange! i could give it my life for you. i will see to go out the...
GEORGE: Heh.. wait. i can't wait! who's your head for a vacuum cleaner? we like you! the store's huge!
KRAMER: (awful laughter) that is her.
ELAINE: All right, go ahead.
KRAMER: Well, come on. look, you want a very strange sweet. kramer gets me anything over here. (leaving) george and the driver in there is an attractive woman in conference he sees a car.
KRAMER: I don't know what you said. but everybody.
ELAINE: Well, you took yourself out tonight?
KRAMER: What did you say?
ELAINE: Just down with her again.
KRAMER: Well anybody know me. why wouldn't they talk us out f

In [22]:
for i in range(10):
    print(generate_text('elaine', 400, model, 40, 0.77))
    print('======================')

ELAINE: Come on out of the way.
KRAMER: What?
ELAINE: What are you talking about?
KRAMER: Oh, i felt so.
ELAINE: What are you talking about?
KRAMER: Well nothing.
ELAINE: I talked to a woman, but i feel like you.
ELAINE: How's the world?
KRAMER: Yeah.
JERRY: You was a--
ELAINE: So, uh, what was she left?
ELAINE: I was thinking.
JERRY: I thought you were saying.
ELAINE: I thought she was trying to act the woman.
JERRY: Maybe you could?
ELAINE: I mean, i was just trying to do it...
JERRY: Hey, elaine.
ELAINE: Hey.
JERRY: Hey.
ELAINE: Hey.
JERRY: Hey.
ELAINE: Hey
JERRY: What happened?
ELAINE: What?
JERRY: You don't see you?
ELAINE: What?
JERRY: I'm a.
ELAINE: You're a doctor?
JERRY: I have to go to the.
ELAINE: No.
JERRY: Just!
ELAINE: Really?
JERRY: Yeah.
ELAINE: You're saying so?!
JERRY: Oh, see.
ELAINE: ... but i'm just being ridiculous.
JERRY: What?
ELAINE: I said something.
JERRY: What does he mean?
ELAINE: He's a woman.
JERRY: Oh, no.
ELAINE: So, i guess you're being getting
JERRY: 

In [23]:
for i in range(10):
    print(generate_text('kramer', 400, model, 40, 1.1))
    print('======================')

KRAMER: Or one day jerry has dropping. they're you ladies, as they are still eighty. with one of your friends at jerry's couch, look at him. kramer is down the road with the racquet. he gives out the lot of choice.
KRAMER: I'll pick up your your back. no one-in a president.
JERRY: I've taken a cold? i get a pair of those situations.
KRAMER: Oh, my... yeah. you see two guys. anyway, that's too bad, really? (everybody course) the club nods it up into your head, when i mean anything any more your it, right with the?
GEORGE: Dropped it. so what, this is me and everything? this is very nice
GEORGE: Huh and all right. there's a new plates.
GEORGE: George. arnie's off! they've got!! they had all these paramount for the cars.
GEORGE: What? why does this woman have to use the table?
JERRY: I talked to her in that area.
GEORGE: Where's no party in your head?
JERRY: I'm down. george comes in
KRAMER: Alright. c'mon. but george it is is good.
JERRY: Yeah. george puts the paper on.
KRAMER: Now, you 

In [24]:
for i in range(10):
    print(generate_text('frank', 400, model, 40, 0.8))
    print('======================')

FRANK: S-they-- he's not running anything.
KRAMER: All right!!
JERRY: You're not gonna have anything i think about i am.
ELAINE: What about it?
JERRY: I don't know, elaine.
ELAINE: You can't believe it.
JERRY: You're not a stupid fan. you could close down the day from the coffee shop.
ELAINE: I never'm the same!
JERRY: Well, what have you got?
ELAINE: You know?
JERRY: Look at this.
ELAINE: Is it possible?
JERRY: Yeah, it's a great.
ELAINE: Well, you know... you know, i'm gonna ask you something.
JERRY: So, i feel you and it done.
ELAINE: It was me!
JERRY: What is it?
ELAINE: My friend, jerry.
JERRY: Kramer, i'm sorry about it.
ELAINE: Mine's already in the mood.
JERRY: Elaine. you know what? i've never seen it.
ELAINE: No, but i've always got the to this woman for this.
JERRY: Could you be able to do it?
ELAINE: Yeah, i'm gonna keep it. i'm letting him get outta there with him.
JERRY: Look, you're not gonna do it.
ELAINE: I'm sure he's a.
JERRY: So, you're a good town?
ELAINE: No. not 

In [25]:
for i in range(10):
    print(generate_text('', 400, model, 40, 0.6))
    print('======================')

SUSAN: S-e-boom-t-hoo.
JERRY: Well, this is, i'm not a--
GEORGE: That's a.
JERRY: I think we're going to the party.
GEORGE: Yeah, yeah. but they're going to be good, and all the way back.
JERRY: So, you're going to wait to get out of this?
GEORGE: I got a problem with me.
JERRY: You ever notice.
GEORGE: I'm sorry.
JERRY: I'm not.
GEORGE: Well, you know, jerry, i think i'm going to town.
JERRY: No, i'm not.
GEORGE: I'm not.
JERRY: Well, actually, i'm not gonna have to do anything. i don't even know where the hell is--
GEORGE: You mean a lot?
JERRY: A couple of days ago, kramer.
KRAMER: Well, i'm going to go to the bathroom.
JERRY: What the hell is on?
KRAMER: I'm putting the big heads.
JERRY: What is this?
KRAMER: It's been on the street.
JERRY: What?
KRAMER: It's a.
JERRY: You're going to be in the car?
KRAMER: Yeah, yeah.
JERRY: I'm sorry. i'm sorry.
KRAMER: It's all over.
JERRY: Yeah, we're going to be here.
KRAMER: Oh, you know what i'm doing here?
JERRY: No, no.
KRAMER: What?
JERRY

### Save Model

In [None]:
# Save the model full and lightweight version
model.save('../assets/lstm2_best/inputdim')
model.save('../assets/lstm2_best/inputdim.h5')

# Save the model architecture
model_json = model.to_json()
with open("../assets/lstm2_best/inputdim_config.json", "w") as config:
    config.write(model_json)

# Save the tokenizer to json
import io

tokenizer_json = tokenizer.to_json()
with io.open("../assets/lstm2_best/inputdim_tokenizer.json", "w") as token:
    token.write(json.dumps(tokenizer_json, ensure_ascii=False))

print("Saved model to disk")

Note that 3.7+ python version will cause load error for ```.h5``` files.

In [None]:
# from tensorflow.keras.models import load_model, model_from_json
# from tensorflow.keras.preprocessing import text
# import json

# model = load_model('../assets/lstm2_best/inputdim')

# with open('../assets/lstm2_best/inputdim_tokenizer.json') as f:
#     data = json.load(f)
#     tokenizer = text.tokenizer_from_json(data)