# **Uploading Files onto Google Colab**

Word Embeddings have been uploaded on Google Drive in a folder called **Word2Vec**.

They were trained by the Gensim module with dimensions of both 200 and 300, window size of 5, and 500 iterations.

Here, we will access those files. 

Note: The URL for the Word2Vec folder in my Google Drive is : https://drive.google.com/drive/folders/1sdDeXX3XTdJg5tEnQNhmjNmol7DPzOZH

That is why I set the the q paramter is set to: **1sdDeXX3XTdJg5tEnQNhmjNmol7DPzOZH**

You will want to make a copy of the Word2Vec folder and put it in your Google Drive's 'Colab Notebooks' folder. Then you will want to change the q parameter to the end of the URL of the Word2Vec folder.

In [1]:
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

local_download_path = os.path.expanduser('~/Word2Vec')
try:
  os.makedirs(local_download_path)
except: pass

file_list = drive.ListFile(
    {'q': "'1sdDeXX3XTdJg5tEnQNhmjNmol7DPzOZH' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)

title: Bad Sentences.csv, id: 1zcZm3griSFElsa2te1Hhd5lpAl7tl5vI
downloading to /root/Word2Vec/Bad Sentences.csv
title: Good Sentences.csv, id: 1dSk6Zq20YnP51fQqqjR65eERDaFRSrdb
downloading to /root/Word2Vec/Good Sentences.csv
title: Tokenized Bad Sentences.csv, id: 1n5XEE1aHdoAfBxrqXWn-yEeEjp2WUj1B
downloading to /root/Word2Vec/Tokenized Bad Sentences.csv
title: word2vec_Good_300, id: 1zWEoKRHFRToBzqkLLvA00N2HBcDvG_4F
downloading to /root/Word2Vec/word2vec_Good_300
title: word2vec_Good_200, id: 1NT7o_yI1IAHfGFcw9Wrq3CKNeC9md4T2
downloading to /root/Word2Vec/word2vec_Good_200
title: word2vec_Bad_300, id: 1zv6AxhipBjhD1rwID1HTE5zncVJPVGms
downloading to /root/Word2Vec/word2vec_Bad_300
title: word2vec_Bad_200, id: 1Giw1gcBYoncfYMGi6LKtEJIeYZ_a2PcC
downloading to /root/Word2Vec/word2vec_Bad_200
title: Tokenized Good Sentences.csv, id: 1YPAgpNciFdTAKQUsrcRs0RLaivevlNoT
downloading to /root/Word2Vec/Tokenized Good Sentences.csv
title: Tokenized Bad Sentences.csv, id: 1S8P8EC47oT46b7ql-xht2QT

# Importing Tokenized Sentences

I have also uploaded two csv files called **Tokenized Bad Sentences.csv** and **Tokenized Good Sentences.csv**  found in the Word2Vec folder. 

Below is the code to read in those files.

In [2]:
import csv
with open('/root/Word2Vec/Bad Sentences.csv', 'r', encoding="utf-8") as f:
    reader = csv.reader(f)
    badSentences = list(reader)
    
with open('/root/Word2Vec/Good Sentences.csv', 'r', encoding="utf-8") as f:
    reader = csv.reader(f)
    goodSentences = list(reader)
    
print(badSentences[0:10])   
print(goodSentences[0:10])  

[['oh', ',', 'shit', '.'], ['you', 'just', 'got', 'wolfed', '.'], ['what', '?'], ['that', 'is', 'an', 'official', 'trademark', 'that', 'i', 'am', 'getting', 'registered', '.'], ['it', "'s", 'a', 'lot', 'of', 'stuff', 'you', 'got', 'ta', 'do', ',', 'hoops', 'you', 'got', 'ta', 'jump', 'through', '.'], ['got', 'ta', 'get', 'on', 'the', 'internet', '.'], ['got', 'ta', 'go', 'to', 'some', 'stupidass', 'website', 'where', 'you', 'register', 'a', 'catch', 'phrase', '.'], ['i', 'wanted', '``', 'bam', ',', "''", 'but', 'emeril', 'had', 'taken', 'it', '.'], ['i', "'m", 'rambling', ',', 'man', '.'], ['get', 'up', ',', 'man', '.']]
[['mr.', 'dufresne', ',', 'describe', 'the', 'confrontation', 'you', 'had', 'with', 'your', 'wife', 'the', 'night', 'she', 'was', 'murdered', '.'], ['it', 'was', 'very', 'bitter', '.'], ['she', 'said', 'she', 'was', 'glad', 'i', 'knew', ',', 'that', 'she', 'hated', 'all', 'the', 'sneaking', 'around', '.'], ['and', 'she', 'said', 'that', 'she', 'wanted', 'a', 'divorce',

# **Installing gensim**

Gensim is the Python module used to train the word2vec embeddings. Here is how to upload the files.

In [3]:
!pip install -q gensim
from gensim.models import Word2Vec
model_Bad = Word2Vec.load("/root/Word2Vec/word2vec_Bad_300")
model_Good = Word2Vec.load("/root/Word2Vec/word2vec_Good_300")
print(model_Bad)
print(model_Good)

Word2Vec(vocab=29504, size=300, alpha=0.025)
Word2Vec(vocab=30426, size=300, alpha=0.025)


# **Similar Vectors**

Once the word2vec embeddings are uploaded, you can view the vectors most similar to a given word. 

In [4]:
for i in model_Bad.wv.most_similar (positive = 'good'):
  print(i)
  
print()

for i in model_Good.wv.most_similar (positive = 'good'):
  print(i)

('bad', 0.42860960960388184)
('nice', 0.40062108635902405)
('like', 0.3491782248020172)
('tough', 0.3429213762283325)
('great', 0.34254634380340576)
('better', 0.28865498304367065)
('big', 0.28429123759269714)
('weird', 0.2701437175273895)
('happy', 0.26822713017463684)
('hard', 0.26718002557754517)

('bad', 0.5157124996185303)
('nice', 0.42234158515930176)
('great', 0.3885391056537628)
('smart', 0.3636835813522339)
('fine', 0.3554360568523407)
('tough', 0.34922707080841064)
('big', 0.3384079933166504)
('funny', 0.33056941628456116)
('hard', 0.3305012583732605)
('tempting', 0.31914812326431274)


  if np.issubdtype(vec.dtype, np.int):


# **Word2Vec Weights onto Keras**

Because we are going to use Keras to train an RNN, here is how to extract the actual pretrained weights of the word embedding which can be used for the neural network.

In [5]:
from keras.layers import Embedding

pretrained_weights_Bad = model_Bad.wv.vectors 
pretrained_weights_Good = model_Good.wv.vectors

embeddingBad = model_Bad.wv.get_keras_embedding()

embeddingGood = model_Good.wv.get_keras_embedding()

print(pretrained_weights_Bad)

Using TensorFlow backend.


[[ 1.25372    -1.2052641  -0.25990063 ... -0.976092    1.8551047
   1.3307298 ]
 [ 1.4823085  -1.2304381  -0.9231364  ... -1.6590116   1.3164423
   0.8863562 ]
 [ 2.0804706  -0.7694774   0.17020172 ... -0.5701194   1.6237695
   1.2792978 ]
 ...
 [ 0.6441616   0.12467387  0.17375445 ...  0.27845207  0.394679
  -0.18595086]
 [ 0.70862037  0.14693536  0.24079292 ...  0.2610408   0.41543
  -0.17618927]
 [-0.08563908  0.16787037 -0.30501494 ...  0.32231143  0.14400984
   0.1407564 ]]


# LSTM Neural Network

Here, we design the architecture for the neural network. You will want to tinker with this to get something that trains in a reasonable number of time, but has good performance.

In [7]:
from keras.models import Sequential
from keras.layers import Activation, Dense, Bidirectional, Dropout
from keras.layers import LSTM

vocab_size, emdedding_size = pretrained_weights_Bad.shape
vocab_size_g, embedding_size_g = pretrained_weights_Good.shape

# This is where you define the models for the bad movie neural net, and the good movie neural net.
# It is important that the models are seperate so you don't fit the model to both datasets.
# For consistency's sake, make sure both models have same parameters
def get_bad_movie_model():
  model = Sequential()
  model.add(embeddingBad)
  model.add(Bidirectional(LSTM(units=128))) # If you want a non-bidirectional LSTM, just remove the Bidirectional()
                                            # Bidirectional significantly increases the training time, especially if 
                                            # you increase layers/units
  model.add(Dropout(rate=0.5)) # Kind of high, but important to avoid overfitting.
  model.add(Dense(units=vocab_size))
  model.add(Activation('softmax'))
  print(model.summary())
  return model


def get_good_movie_model():
  model = Sequential()
  model.add(embeddingGood)
  model.add(Bidirectional(LSTM(units=128)))
  model.add(Dropout(rate=0.5))
  model.add(Dense(units=vocab_size_g))
  model.add(Activation('softmax'))
  print(model.summary())
  return model

bad_model = get_bad_movie_model()
good_model = get_good_movie_model()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 300)         8851200   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 256)               439296    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 29504)             7582528   
_________________________________________________________________
activation_1 (Activation)    (None, 29504)             0         
Total params: 16,873,024
Trainable params: 8,021,824
Non-trainable params: 8,851,200
_________________________________________________________________
None
_________________________________________________________________
Layer (type)                 Output Shape           

## Splitting the Training and Test Sets

In [0]:
import numpy as np
import keras

MAX_SEQ_LEN = 200

np.random.seed(42) # Seed needs to be set for consistent results, as otherwise the training/test set change every run.
rand_b = np.random.permutation(len(badSentences))
rand_g = np.random.permutation(len(goodSentences))

bad_sentence_array = np.zeros((len(badSentences), MAX_SEQ_LEN), dtype=np.int32)
good_sentence_array = np.zeros((len(goodSentences), MAX_SEQ_LEN), dtype=np.int32)

next_word_bad = np.zeros((len(bad_sentence_array)), dtype=np.int32)
next_word_good = np.zeros((len(good_sentence_array)), dtype=np.int32)
# 0 index in wv.index2word is '.', indicating end of sentence.

for s, sentence in enumerate(badSentences):
    for w, word in enumerate(sentence):
        if w >= MAX_SEQ_LEN:
            break
        if w < len(sentence)-2:
            bad_sentence_array[s][w] = model_Bad.wv.vocab[word].index
        else:
            next_word_bad[s] = model_Bad.wv.vocab[word].index  
            break

for s, sentence in enumerate(goodSentences):
    for w, word in enumerate(sentence):
        if w >= MAX_SEQ_LEN:
            break
        if w < len(sentence)-2:
            good_sentence_array[s][w] = model_Good.wv.vocab[word].index
        else:
            next_word_good[s] = model_Good.wv.vocab[word].index
            break

bad_sentence_array = bad_sentence_array[rand_b]
next_word_bad = next_word_bad[rand_b]
good_sentence_array = good_sentence_array[rand_g]
next_word_good = next_word_good[rand_g]

train_x = bad_sentence_array[:int(0.8 * len(bad_sentence_array))]
train_y = next_word_bad[:int(0.8 * len(next_word_bad))]
train_x_good = good_sentence_array[:int(0.8 * len(good_sentence_array))]
train_y_good = next_word_good[:int(0.8 * len(next_word_good))]

test_x = bad_sentence_array[int(0.8 * len(bad_sentence_array)):]
test_y = next_word_bad[int(0.8 * len(next_word_bad)):]
test_x_good = good_sentence_array[int(0.8 * len(good_sentence_array)):]
test_y_good = next_word_good[int(0.8 * len(next_word_good)):]

bad_sentence_array = None
good_sentence_array = None
next_word_bad = None
next_word_good = None
# Memory is limited, and all of these are quite large.
# Need them to be GCed.


## Callbacks

This cell sets up the callbacks for early stopping and saving checkpoints to Google Drive. Makes use of the callback implemented in [this](https://github.com/Zahlii/colab-tf-utils) repository, as due to the way Google Colab works, there is not a way to natively save model checkpoints during training and be able to retrieve them later.

In [7]:
!wget https://raw.githubusercontent.com/Zahlii/colab-tf-utils/master/utils.py
import utils
import os
from keras.callbacks import EarlyStopping, ModelCheckpoint

def compare(best, new):
  if not best.losses['val_acc']:
    print("Not best")
  if not new.losses['val_acc']:
    print("Not new")
  return best.losses['val_acc'] < new.losses['val_acc']

def path_b(new):
  if new.losses['val_acc'] > 0.20:
    return 'bad_movie_model_%s.h5' % new.losses['val_acc']

def path_g(new):
    if new.losses['val_acc'] > 0.20:
        return 'good_movie_model_%s.h5' % new.losses['val_acc']

early_stop_b = EarlyStopping(monitor='val_acc', patience=5, verbose=1)
early_stop_g = EarlyStopping(monitor='val_acc', patience=5, verbose=1)

cb_b = [
    utils.GDriveCheckpointer(compare,path_b),
    keras.callbacks.TensorBoard(log_dir=os.path.join(utils.LOG_DIR,'bad_movie_model')),
    early_stop_b
]

cb_g = [
    utils.GDriveCheckpointer(compare,path_g),
    keras.callbacks.TensorBoard(log_dir=os.path.join(utils.LOG_DIR,'good_movie_model')),
    early_stop_g
]

--2018-12-19 00:28:58--  https://raw.githubusercontent.com/Zahlii/colab-tf-utils/master/utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6935 (6.8K) [text/plain]
Saving to: ‘utils.py.5’


2018-12-19 00:28:58 (71.4 MB/s) - ‘utils.py.5’ saved [6935/6935]

--2018-12-19 00:29:04--  https://raw.githubusercontent.com/mixuala/colab_utils/master/tboard.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5214 (5.1K) [text/plain]
Saving to: ‘tboard.py’


2018-12-19 00:29:04 (43.8 MB/s) - ‘tboard.py’ saved [5214/5214]

ngrok installed
status: t

## Training the Neural Network


###Note
For loading checkpoints, you will want to get the file ID of the file you want to download. For example, if my checkpoint's url is https://drive.google.com/file/d/abcd1234

The id is **abcd1234**

To get the URL for the specific file, you just need to right click it and select 'Get Shareable Link'

In [0]:
# This is needed to make sure we are still authenticated and don't throw an Exception when we try to upload/download to Drive
from google.colab import files
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# CHANGE THESE IF YOU ARE RESUMING TRAINING FROM A CERTAIN EPOCH.
BAD_MOVIE_INIT_EPOCH = 0
GOOD_MOVIE_INIT_EPOCH = 0

# CODE BLOCK FOR LOADING THE MODEL FROM AN EXISTING SAVED MODEL
# UNCOMMENT THIS IF YOU ARE LOADING A MODEL FROM A CHECKPOINT TO CONTINUE TRAINING
###################################################################
# from keras.models import load_model
# drive_chk_bad = drive.CreateFile({'id': 'ID_GOES_HERE'})
# drive_chk_bad.GetContentFile('chkpt_bad.h5')
# drive_chk_good = drive.CreateFile({'id': 'ID_GOES_HERE'})
# drive_chk_good.GetContentFile('chkpt_good.h5')

# bad_model = load_model('chkpt_bad.h5')
# good_model = load_model('chkpt_good.h5')
###################################################################
# Since these models are checkpoints, they still need to be compiled. So keep the lines directly below.

# compile model
good_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
bad_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


# fit model
bad_model.fit(train_x, train_y, validation_split = 0.1, batch_size=128, epochs=50, callbacks=cb_b,
             initial_epoch = BAD_MOVIE_INIT_EPOCH) 
# save the model to file
bad_model.save('bad_movie_model.h5')

# Re-authenticate, because there is a good chance you will no longer be authenticated after training the model
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# files.download('bad_movie_model.h5')
# Uncomment if you want it to download the model to your local machine after training

# Save file to Drive
b_file = drive.CreateFile()
b_file.SetContentFile('bad_movie_model.h5')
b_file.Upload()


#repeat but for good movies
good_model.fit(train_x_good, train_y_good, validation_split = 0.1, batch_size=128, epochs=50, callbacks=cb_g,
              initial_epoch = GOOD_MOVIE_INIT_EPOCH)
good_model.save('good_movie_model.h5')

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# files.download('good_movie_model.h5')
# Uncomment if you want to download the model to your local machine after training

g_file = drive.CreateFile()
g_file.SetContentFile('good_movie_model.h5')
g_file.Upload()


# Error Analysis Section

Includes calculation for perplexity, predicted words for a subset of the test set, and sentence generation.

## Loading Model
In case we don't want to run the cell above, and have a trained model we just want to load.

In [0]:
# The IDs left there are for the copy of the final model saved to Leo's Google Drive.
from keras.models import load_model
drive_file_bad = drive.CreateFile({'id': '1eZat2inYSL2nSOOYW1dkU3tSK0S1Ex5o'})
drive_file_bad.GetContentFile('mdl_bad.h5')
drive_file_good = drive.CreateFile({'id': '1yyCarcA3OFSy9vNB2ofon5T2jqmA7LvE'})
drive_file_good.GetContentFile('mdl_good.h5')

bad_model = load_model('mdl_bad.h5')
good_model = load_model('mdl_good.h5')

## Cross-Entropy and Perplexity

In [9]:
from keras.losses import sparse_categorical_crossentropy
from keras.backend import pow, constant, eval, mean

bad_movie_samples = test_x.shape[0]
good_movie_samples = test_x_good.shape[0]
SCALE_FACTOR = 5 # Proportion of test set used for perplexity calculation = 1 / SCALE_FACTOR
print("Amount of bad movie lines tested: ", bad_movie_samples // SCALE_FACTOR)
print("Amount of good movie lines tested: ", good_movie_samples // SCALE_FACTOR)


# This block can be a bit slow and very memory-hungry.
#############################################
# Predict for approximately 20% of the test set. Using too much tends to result in the Colab VM running out of memory and crashing.
# Although it's a bit inconsistent, since Colab VMs do not always have the same amount of memory available.
# Anything smaller than 4 for the scale factor usually crashes, 4 will sometimes crash as well. 5 is generally safe.
preds_bad = constant(bad_model.predict(test_x[:bad_movie_samples // SCALE_FACTOR]))
preds_good = constant(good_model.predict(test_x_good[:good_movie_samples // SCALE_FACTOR]))

#Cross-entropy on movie lines
cx_b = mean(sparse_categorical_crossentropy(test_y[:bad_movie_samples // SCALE_FACTOR], preds_bad))
cx_g = mean(sparse_categorical_crossentropy(test_y_good[:good_movie_samples // SCALE_FACTOR], preds_good))
##############################################


perplexity_b = pow(2.0, cx_b)
perplexity_g = pow(2.0, cx_g)


print("Perplexity of bad movie model on bad movie lines: ", eval(perplexity_b))
print("Perplexity of good movie model on good movie lines: ", eval(perplexity_g))


Amount of bad movie lines tested:  3684
Amount of good movie lines tested:  5009
Perplexity of bad movie model on bad movie lines:  111.54793
Perplexity of good movie model on good movie lines:  106.30434


## Predicting Next Word of Sentence
In this section, for the first 100 sentences in both the good movie and bad movie test sets, we have the respective model predict what the next word should be, and compare it to the actual word that occurs.

After doing so, we then show what sentence was input to the model, and what word it produced as well as what the ground truth was.

In [64]:
from keras.backend import argmax
sentence_arr = test_x[0]
# print(model_Bad.wv.index2word[eval(argmax(preds_bad[0]))])

print("------Bad Movie Model------")
for i, sentence in enumerate(test_x[0:100]):
    str_build = ""
    for word in sentence:
        if word == 0:
            break
        str_build += model_Bad.wv.index2word[word]
        str_build += " "
    print("Starting string: '" + str_build + "'")
    prediction = model_Bad.wv.index2word[eval(argmax(preds_bad[i]))]
    print("Predicted end word: '" + prediction + "'")
    print("Actual end word: '" + model_Bad.wv.index2word[test_y[i]] + "'\n")

print("\n\n------Good Movie Model------")
for i, sentence in enumerate(test_x_good[0:100]):
    str_build = ""
    for word in sentence:
        if word == 0:
            break
        str_build += model_Good.wv.index2word[word]
        str_build += " "
    print("Starting string: '" + str_build + "'")
    prediction = model_Good.wv.index2word[eval(argmax(preds_good[i]))]
    print("Predicted end word: '" + prediction + "'")
    print("Actual end word: '" + model_Good.wv.index2word[test_y[i]] + "'\n")

------Bad Movie Model------
Starting string: 'maintain '
Predicted end word: 'it'
Actual end word: 'perimeter'

Starting string: ''
Predicted end word: '.'
Actual end word: '.'

Starting string: 'i got '
Predicted end word: 'it'
Actual end word: 'it'

Starting string: 'yeah , '
Predicted end word: 'yeah'
Actual end word: 'mike'

Starting string: 'why did n't you tell '
Predicted end word: 'me'
Actual end word: 'me'

Starting string: 'you 'll have to climb the ice pillar and get '
Predicted end word: 'it'
Actual end word: 'it'

Starting string: 'jane , there is no '
Predicted end word: 'choice'
Actual end word: 'curse'

Starting string: 'but for me it was '
Predicted end word: 'it'
Actual end word: 'tuesday'

Starting string: 'why are you '
Predicted end word: 'stopa'
Actual end word: 'stalling'

Starting string: 'you are on the verge of destroying the entire '
Predicted end word: 'world'
Actual end word: 'universe'

Starting string: 'whichever crew wins today , you 're gon na wan na re

## Sentence Generation Using Argmax()
This section generates sentences given a starting sentence and a sentence length (i.e. maximum number of additional words to generate).

It does this by taking the argmax of the prediction given the sentence at each step, and then adding that word to the sentence. Note that a higher max sentence length usually means a longer sentence, as the models don't tend to terminate the sentence early all that often.

In [122]:
from keras.backend import sum

def word_list_to_array(words, model):
    arr = np.zeros((1, MAX_SEQ_LEN), dtype=np.int32)
    for i, word in enumerate(words):
        arr[0][i] = model.wv.vocab[word].index
    return arr

def array_to_sentence(arr, model):
    sentence = ""
    for i, idx in enumerate(arr[0]):
        if i >= MAX_SEQ_LEN - 1 or idx_is_eos(idx, model):
            sentence += model.wv.index2word[idx]
            break
        else:
            sentence += model.wv.index2word[idx] + " "
    return sentence
            
def idx_is_eos(idx, model):
    return idx == model.wv.vocab['.'].index or idx == model.wv.vocab['!'].index or idx == model.wv.vocab['?'].index

seed_sentence = ["words", "go", "here"]
sentence_len = 8
sent = word_list_to_array(seed_sentence, model_Bad)
for i in range(sentence_len):
    next_word = bad_model.predict(sent)[0]
    new_word = eval(argmax(next_word))
#     print(new_word)
    sent[0][i + len(seed_sentence)] = new_word
    if idx_is_eos(new_word, model_Bad):
        break
        
print("Bad Movie Model: [maxlen {}] [seed={}] ".format(sentence_len, seed_sentence) 
      + array_to_sentence(sent, model_Bad))


seed_sentence = ["words", "go", "here"]
sentence_len = 8
sent = word_list_to_array(seed_sentence, model_Good)
for i in range(sentence_len):
    next_word = good_model.predict(sent)[0]
    new_word = eval(argmax(next_word))
    #     print(new_word)
    sent[0][i + len(seed_sentence)] = new_word
    if idx_is_eos(new_word, model_Good):
        break
        
print("Good Movie Model: [maxlen {}] [seed={}] ".format(sentence_len, seed_sentence)
      + array_to_sentence(sent, model_Good))

Bad Movie Model: [maxlen 16] [seed=['so', 'if', 'something']] so if something else here goes there now right now there now going back there now now going back .
Good Movie Model: [maxlen 16] [seed=['so', 'if', 'something']] so if something wrong is it here now now here now now here today now now now now now .
