In [13]:
# Recurrent Neural Networks (RNNs)

# Recurrent neural networks are specifically designed to work with sequence data.
# Examples of sequences:
# 1-) Time series data (E.g. Sales)
# 2-) Audio
# 3-) Sentences
# 4-) Car Trajectories (Sequence of instructions: left, right, forward, and back)
# 5-) Music

# Normal Neuron in Feed Forward Network:

# It takes some input, aggregates them if there are multiple inputs, passes it/them
# through the activation function, and obtain an output.
# Here, the input can consist of a single input or multiple inputs.

# Recurrent Neuron:

# * It sends the output back to itself!
# * Cells that are a function of inputs from previous time stamps are also known as
# memory cells.
# * RNNs are also flexible in their inputs and outputs for both sequences and single
# vector values.
# * It is very easy to create an entire layer of recurrent neurons.
# * Part of the recurrent neural network which preserves some sort of state across
# the time stamps is called a 'memory cell'.
# * In the recurrent neurons, outputs are sent back into the inputs.
# * Recurrent Neural Networks (RNNs) are very flexible in their inputs and outputs.

# * Sequence to Sequence (Example: Passing in a set of time series information, such as a year's worth of montly
# sales data, and then wanting back a sequence of that same sales data shifted over certain timeperiod into the feature.)
# * Sequence to Vector (Example: Sentiment scores)
# - We can feed in a sequence of words and request back a vector which indicates whether it was a positive or
# negative sentiment.
# * Vector to Sequence (Example: Providing a single seed word and then getting out an entire sequence of high-probability
# sequence phrases.)


In [None]:
# Long Short Term Memory (LSTM) and GRU
# An issue RNN faces is that after awhile the network will begin to 'forget' the first inputs, as information is lost
# at each step going through the RNN (especially we train the network on a really large sequence). Therefore, we need
# some sort of 'long-term memory' for our networks. We need to balance both the short term memory of the networks, the
# data that it was recently trained on, versus the long term memory of the networks, train data starting from the very
# first and ending at the most recent.

# The LSTM (Long Short Term Memory) cell was created to help addressing these RNN issues.

# In a typical RNN cell, the output at time t-1 is fed along with the input at time t.

# In an LSTM, the very first step is called 'forget gate layer'. In this step, we decide what information are we going
# to forget from the cell state.

# An LSTM Cell

# f(t) = sigma * (wf * [h(t-1), x(t)] + b(f))

# We pass h(t-1) and x(t) to the f(t), after performing a linear transformation with sigmoid.


# 1 represents 'keeping the information'.
# 0 represents 'forgetting about / getting rid of the information'.

# If we think of a language model where we try to predict predict the next word based on previous ones, a cell state
# might contain the gender of the present subject; therefore, we end up picking the correct pronoun.

#------------------------------------------------------------------------------------------------------------------#

# In the second step, we decide what to keep in the cell state.

In [None]:
# Another variation of the LSTM cell is called 'the gated recurrent unit' or 'GRU'. This was introduced quite recently,
# around 2014. It combines the forget and input gates into a single gate called 'update gate'. It also merges the cell
# state and hidden state.
# GRU is simpler than the LSTM model.

# Depth-gated recurrent neural network was released in 2015.


In [14]:
from google.colab import drive
drive.mount('/content/drive/')


Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [15]:
cd 'drive/MyDrive'

[Errno 2] No such file or directory: 'drive/MyDrive'
/content/drive/MyDrive


In [16]:
# Text generations with LSTMs by using Keras and Python

def read_file(filepath):
  if (filepath is None) or (len(filepath) == 0):
    raise Exception('Invalid filepath is found !')
  with open(filepath) as file:
    text = file.read()

  return text



read_file('moby_dick_four_chapters.txt')




'Call me Ishmael.  Some years ago--never mind how long\nprecisely--having little or no money in my purse, and nothing\nparticular to interest me on shore, I thought I would sail about a\nlittle and see the watery part of the world.  It is a way I have of\ndriving off the spleen and regulating the circulation.  Whenever I\nfind myself growing grim about the mouth; whenever it is a damp,\ndrizzly November in my soul; whenever I find myself involuntarily\npausing before coffin warehouses, and bringing up the rear of every\nfuneral I meet; and especially whenever my hypos get such an upper\nhand of me, that it requires a strong moral principle to prevent me\nfrom deliberately stepping into the street, and methodically knocking\npeople\'s hats off--then, I account it high time to get to sea as soon\nas I can.  This is my substitute for pistol and ball.  With a\nphilosophical flourish Cato throws himself upon his sword; I quietly\ntake to the ship.  There is nothing surprising in this.  If t

In [17]:
moby_dick_text = read_file('moby_dick_four_chapters.txt')
print(moby_dick_text)

Call me Ishmael.  Some years ago--never mind how long
precisely--having little or no money in my purse, and nothing
particular to interest me on shore, I thought I would sail about a
little and see the watery part of the world.  It is a way I have of
driving off the spleen and regulating the circulation.  Whenever I
find myself growing grim about the mouth; whenever it is a damp,
drizzly November in my soul; whenever I find myself involuntarily
pausing before coffin warehouses, and bringing up the rear of every
funeral I meet; and especially whenever my hypos get such an upper
hand of me, that it requires a strong moral principle to prevent me
from deliberately stepping into the street, and methodically knocking
people's hats off--then, I account it high time to get to sea as soon
as I can.  This is my substitute for pistol and ball.  With a
philosophical flourish Cato throws himself upon his sword; I quietly
take to the ship.  There is nothing surprising in this.  If they but
knew it,

In [18]:
# clean and tokenize the text
import spacy

# loading the small english core language library
nlp = spacy.load('en_core_web_sm', disable = ['parser', 'ner', 'tagger'])
print(nlp)
print(type(nlp))

<spacy.lang.en.English object at 0x79c699a46290>
<class 'spacy.lang.en.English'>


In [19]:
nlp.max_length = 1200000

In [20]:
def seperate_punctuation(doc_text):
  punctuation = '-/&()%+^!\n?\t.-,--'
  without_punc = []
  for token in nlp(doc_text):
    if token.text not in punctuation:
        text = token.text
        without_punc.append(text.lower())

  return without_punc


file = read_file('moby_dick_four_chapters.txt')
tokens = seperate_punctuation(file)
print(tokens)
print(len(tokens))
print('There are '+str(len(tokens))+' tokens in the moby_dick_four_chapters.txt file.')



['call', 'me', 'ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought', 'i', 'would', 'sail', 'about', 'a', 'little', 'and', 'see', 'the', 'watery', 'part', 'of', 'the', 'world', ' ', 'it', 'is', 'a', 'way', 'i', 'have', 'of', 'driving', 'off', 'the', 'spleen', 'and', 'regulating', 'the', 'circulation', ' ', 'whenever', 'i', 'find', 'myself', 'growing', 'grim', 'about', 'the', 'mouth', ';', 'whenever', 'it', 'is', 'a', 'damp', 'drizzly', 'november', 'in', 'my', 'soul', ';', 'whenever', 'i', 'find', 'myself', 'involuntarily', 'pausing', 'before', 'coffin', 'warehouses', 'and', 'bringing', 'up', 'the', 'rear', 'of', 'every', 'funeral', 'i', 'meet', ';', 'and', 'especially', 'whenever', 'my', 'hypos', 'get', 'such', 'an', 'upper', 'hand', 'of', 'me', 'that', 'it', 'requires', 'a', 'strong', 'moral', 'principle', 't



In [21]:
from IPython.core.formatters import JavascriptFormatter
# 29 words ----> neural network predicts the word #30

train_len = 30
text_sequences = []
for j in range(train_len, len(tokens)):
  sequence = tokens[j-train_len:j]
  text_sequences.append(sequence)

print(text_sequences)

[['call', 'me', 'ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought'], ['me', 'ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought', 'i'], ['ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought', 'i', 'would'], [' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought', 'i', 'would', 'sail'], ['som

In [22]:
print(text_sequences[0])

['call', 'me', 'ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought']


In [23]:
print(text_sequences[1])

['me', 'ishmael', ' ', 'some', 'years', 'ago', 'never', 'mind', 'how', 'long', 'precisely', 'having', 'little', 'or', 'no', 'money', 'in', 'my', 'purse', 'and', 'nothing', 'particular', 'to', 'interest', 'me', 'on', 'shore', 'i', 'thought', 'i']


In [24]:
print(len(text_sequences))

11993


In [25]:
' '.join(text_sequences[0])

'call me ishmael   some years ago never mind how long precisely having little or no money in my purse and nothing particular to interest me on shore i thought'

In [26]:
' '.join(text_sequences[1])

'me ishmael   some years ago never mind how long precisely having little or no money in my purse and nothing particular to interest me on shore i thought i'

In [27]:
from keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer()
tokenizer.fit_on_texts(text_sequences)

In [28]:
sequences = tokenizer.texts_to_sequences(text_sequences)
print(sequences)

[[962, 18, 267, 4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64], [18, 267, 4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6], [267, 4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6, 60], [4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6, 60, 319], [55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6, 60, 319, 41], [265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6, 60, 319, 41, 2], [412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6, 60, 319, 41, 2, 54], [91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7

In [29]:
# Each of the numbers is an ID particular to the word.

print(sequences[0])
print()
print()
print(sequences[1])



print()
print(len(sequences))
print(len(sequences[0]))

[962, 18, 267, 4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64]


[18, 267, 4, 55, 265, 412, 91, 223, 133, 115, 960, 264, 54, 47, 42, 318, 8, 27, 551, 3, 154, 263, 7, 2718, 18, 29, 2717, 6, 64, 6]

11993
30


In [None]:
# This returns a dictionary where unique id values are mapped to different words. In this dictionary, the keys are
# the unique id values. Moreover, the values are different words.
tokenizer.index_word

{1: 'the',
 2: 'a',
 3: 'and',
 4: ' ',
 5: 'of',
 6: 'i',
 7: 'to',
 8: 'in',
 9: 'it',
 10: 'that',
 11: '"',
 12: 'he',
 13: 'his',
 14: 'was',
 15: ';',
 16: '\n\n',
 17: 'but',
 18: 'me',
 19: 'with',
 20: 'as',
 21: 'at',
 22: 'this',
 23: 'you',
 24: 'is',
 25: 'all',
 26: 'for',
 27: 'my',
 28: 'be',
 29: 'on',
 30: "'s",
 31: 'not',
 32: 'from',
 33: 'there',
 34: 'one',
 35: 'up',
 36: 'what',
 37: 'him',
 38: 'so',
 39: 'bed',
 40: 'now',
 41: 'about',
 42: 'no',
 43: 'into',
 44: 'by',
 45: 'were',
 46: 'out',
 47: 'or',
 48: 'harpooneer',
 49: 'had',
 50: 'then',
 51: 'have',
 52: 'an',
 53: 'upon',
 54: 'little',
 55: 'some',
 56: 'old',
 57: 'like',
 58: 'if',
 59: 'they',
 60: 'would',
 61: 'do',
 62: 'over',
 63: 'landlord',
 64: 'thought',
 65: 'room',
 66: 'when',
 67: 'could',
 68: "n't",
 69: 'night',
 70: 'here',
 71: 'head',
 72: 'such',
 73: 'which',
 74: 'man',
 75: 'did',
 76: 'sea',
 77: 'time',
 78: 'other',
 79: 'very',
 80: 'go',
 81: 'these',
 82: 'more',

In [30]:
print(tokenizer.index_word)
print()
print()
for i in sequences[0]:
  print(f'{i}: {tokenizer.index_word[i]}')

{1: 'the', 2: 'a', 3: 'and', 4: ' ', 5: 'of', 6: 'i', 7: 'to', 8: 'in', 9: 'it', 10: 'that', 11: '"', 12: 'he', 13: 'his', 14: 'was', 15: ';', 16: '\n\n', 17: 'but', 18: 'me', 19: 'with', 20: 'as', 21: 'at', 22: 'this', 23: 'you', 24: 'is', 25: 'all', 26: 'for', 27: 'my', 28: 'be', 29: 'on', 30: "'s", 31: 'not', 32: 'from', 33: 'there', 34: 'one', 35: 'up', 36: 'what', 37: 'him', 38: 'so', 39: 'bed', 40: 'now', 41: 'about', 42: 'no', 43: 'into', 44: 'by', 45: 'were', 46: 'out', 47: 'or', 48: 'harpooneer', 49: 'had', 50: 'then', 51: 'have', 52: 'an', 53: 'upon', 54: 'little', 55: 'some', 56: 'old', 57: 'like', 58: 'if', 59: 'they', 60: 'would', 61: 'do', 62: 'over', 63: 'landlord', 64: 'thought', 65: 'room', 66: 'when', 67: 'could', 68: "n't", 69: 'night', 70: 'here', 71: 'head', 72: 'such', 73: 'which', 74: 'man', 75: 'did', 76: 'sea', 77: 'time', 78: 'other', 79: 'very', 80: 'go', 81: 'these', 82: 'more', 83: 'though', 84: 'first', 85: 'sort', 86: 'said', 87: 'last', 88: 'down', 89: '

In [31]:
tokenizer.word_counts

OrderedDict([('call', 31),
             ('me', 2848),
             ('ishmael', 153),
             (' ', 9784),
             ('some', 875),
             ('years', 156),
             ('ago', 97),
             ('never', 518),
             ('mind', 189),
             ('how', 370),
             ('long', 431),
             ('precisely', 42),
             ('having', 163),
             ('little', 884),
             ('or', 1095),
             ('no', 1156),
             ('money', 137),
             ('in', 6512),
             ('my', 2059),
             ('purse', 80),
             ('and', 11123),
             ('nothing', 322),
             ('particular', 173),
             ('to', 7494),
             ('interest', 25),
             ('on', 1977),
             ('shore', 28),
             ('i', 8249),
             ('thought', 780),
             ('would', 810),
             ('sail', 120),
             ('about', 1170),
             ('a', 11973),
             ('see', 480),
             ('the', 17928),
   

In [32]:
vocabulary_size = len(tokenizer.word_counts)
print('The vocabulary size: '+str(vocabulary_size)+'')
print(type(vocabulary_size))

The vocabulary size: 2724
<class 'int'>


In [33]:
import numpy as np

sequences = np.array(sequences)

In [34]:
print(sequences)

[[ 962   18  267 ... 2717    6   64]
 [  18  267    4 ...    6   64    6]
 [ 267    4   55 ...   64    6   60]
 ...
 [   1  377    5 ...  266   57    2]
 [ 377    5   13 ...   57    2 2724]
 [   5   13  958 ...    2 2724   30]]


In [35]:
sequences

array([[ 962,   18,  267, ..., 2717,    6,   64],
       [  18,  267,    4, ...,    6,   64,    6],
       [ 267,    4,   55, ...,   64,    6,   60],
       ...,
       [   1,  377,    5, ...,  266,   57,    2],
       [ 377,    5,   13, ...,   57,    2, 2724],
       [   5,   13,  958, ...,    2, 2724,   30]])

In [36]:
from keras.utils import to_categorical

In [37]:
print(sequences[:,:-1])

[[ 962   18  267 ...   29 2717    6]
 [  18  267    4 ... 2717    6   64]
 [ 267    4   55 ...    6   64    6]
 ...
 [   1  377    5 ...   13  266   57]
 [ 377    5   13 ...  266   57    2]
 [   5   13  958 ...   57    2 2724]]


In [69]:
# loading the data
X = sequences[:,:-1]
print(len(X))
y = sequences[:, -1]
print(len(y))
y = to_categorical(y, num_classes = vocabulary_size + 1)

print(X.shape, y.shape)

seq_len = X.shape[1]
print(seq_len)


11993
11993
(11993, 29) (11993, 2725)
29


In [65]:
print(X.shape)

(11993, 29)


In [74]:
from keras.layers import LSTM, Dense, Embedding
from keras.models import Sequential

# The batch size is the number of sequences you
# want to pass.
def create_model(vocabulary_size, seq_len):
    model = Sequential()
    print('The vocabulary size: '+str(vocabulary_size)+'')
    print('The sequence length: '+str(seq_len)+'')
    model.add(Embedding(vocabulary_size, 29, input_length=seq_len))
    model.add(LSTM(150, return_sequences=True))
    model.add(LSTM(150))
    model.add(Dense(150, activation='relu'))

    model.add(Dense(vocabulary_size, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.summary()

    return model


In [75]:
model = create_model(vocabulary_size + 1, seq_len)
print(seq_len)


The vocabulary size: 2725
The sequence length: 29
Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_9 (Embedding)     (None, 29, 29)            79025     
                                                                 
 lstm_18 (LSTM)              (None, 29, 150)           108000    
                                                                 
 lstm_19 (LSTM)              (None, 150)               180600    
                                                                 
 dense_18 (Dense)            (None, 150)               22650     
                                                                 
 dense_19 (Dense)            (None, 2725)              411475    
                                                                 
Total params: 801750 (3.06 MB)
Trainable params: 801750 (3.06 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________

In [76]:
from pickle import load, dump
# Batch size is how many sequences to pass at a time.
# training the neural network model
model.fit(X, y, batch_size = 128, epochs = 20, verbose = 1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x79c694579de0>

In [79]:
model.save('my_moby_dick_model.h5') # saving the model with the name 'my_moby_dick_model'
dump(tokenizer, open('my_simple_tokenizer', 'wb')) # save the tokenizer file called 'my_simple_tokenizer' in a 'write binary (wb)' mode

In [71]:
# As you can see in the output, the loss and accuracy are inversely proportional to each other.
# Given that the number of epochs increases, while the loss is decreasing, accuracy is increasing.

# Here, the number of epochs is 200 instead of 20.
model.fit(X, y, batch_size = 128, epochs = 200, verbose = 1)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.src.callbacks.History at 0x79c6895d2d40>

In [72]:
# save the model where the epoch is big to the file
model.save('epoch_big.h5')

# save the tokenizer
with open('epoch_big', 'wb') as epoch_big:
  dump(tokenizer, epoch_big)

  saving_api.save_model(


In [73]:
import os

# Get the current working directory
current_directory = os.getcwd()

# Print the current directory
print("Current working directory:", current_directory)


Current working directory: /content/drive/MyDrive


In [90]:
from pickle import load
from random import randint
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model


# Generating new text
def generate_text(model, tokenizer, seq_len, seed_text, num_gen_words):
    '''
    INPUTS:
    model : model that was trained on text data
    tokenizer : tokenizer that was fit on text data
    seq_len : length of training sequence
    seed_text : raw string text to serve as the seed
    num_gen_words : number of words to be generated by model
    '''

    # Final Output
    output_text = []

    # Intial Seed Sequence
    input_text = seed_text

    # Create num_gen_words
    for i in range(num_gen_words):

        # Take the input text string and encode it to a sequence
        encoded_text = tokenizer.texts_to_sequences([input_text])[0]

        # Pad sequences to our trained rate (50 words in the video)
        pad_encoded = pad_sequences([encoded_text], maxlen=seq_len, truncating='pre')

        # Predict class probabilities for each word
        pred_probabilities = model.predict(pad_encoded, verbose=0)[0]

        print('-------------------------------------')
        print(model.predict(pad_encoded, verbose = 0))
        print(pred_probabilities)

        # Find the index of the word with the highest probability
        pred_word_ind = np.argmax(pred_probabilities)
        print(pred_word_ind)
        print('-------------------------------------')

        #predict_x=model.predict(X_test)
        #classes_x=np.argmax(predict_x,axis=1)

        # Extract word
        pred_word = tokenizer.index_word[pred_word_ind]

        # Update the sequence of input text (shifting one over with the new word)
        input_text += ' ' + pred_word

        output_text.append(pred_word)

    # Make it look like a sentence
    return ' '.join(output_text)




In [81]:
text_sequences[0]

['call',
 'me',
 'ishmael',
 ' ',
 'some',
 'years',
 'ago',
 'never',
 'mind',
 'how',
 'long',
 'precisely',
 'having',
 'little',
 'or',
 'no',
 'money',
 'in',
 'my',
 'purse',
 'and',
 'nothing',
 'particular',
 'to',
 'interest',
 'me',
 'on',
 'shore',
 'i',
 'thought']

In [96]:
import random
random.seed(123) # to get the same randomization in different runs while picking a random seed text
random_pick = random.randint(0,len(text_sequences))

print('The list of text sequences is: ')
print()
print()
print(text_sequences)
print(type(text_sequences))
print('The length of the text_sequences list is: '+str(len(text_sequences))+'')
print('The random pick is: '+str(random_pick)+'')

<class 'list'>
The length of the text_sequences list is: 11993
The random pick is: 857


In [83]:
random_seed_text = text_sequences[random_pick]

In [84]:
random_seed_text

['suddenly',
 'receiving',
 'two',
 'handfuls',
 'of',
 'silver',
 'deliberate',
 'whether',
 'to',
 'buy',
 'him',
 'a',
 'coat',
 'which',
 'he',
 'sadly',
 'needed',
 'or',
 'invest',
 'his',
 'money',
 'in',
 'a',
 'pedestrian',
 'trip',
 'to',
 'rockaway',
 'beach',
 ' ',
 'why']

In [95]:
print('The random seed text is as below: ')
print()
print(random_seed_text)

The random seed text is as below: 

['suddenly', 'receiving', 'two', 'handfuls', 'of', 'silver', 'deliberate', 'whether', 'to', 'buy', 'him', 'a', 'coat', 'which', 'he', 'sadly', 'needed', 'or', 'invest', 'his', 'money', 'in', 'a', 'pedestrian', 'trip', 'to', 'rockaway', 'beach', ' ', 'why']


In [94]:
seed_text = ' '.join(random_seed_text)

In [86]:
seed_text

'suddenly receiving two handfuls of silver deliberate whether to buy him a coat which he sadly needed or invest his money in a pedestrian trip to rockaway beach   why'

In [93]:
print('The joined seed text is as below: ')
print()
print(seed_text)

The joined seed text is as below: 

suddenly receiving two handfuls of silver deliberate whether to buy him a coat which he sadly needed or invest his money in a pedestrian trip to rockaway beach   why


In [91]:
generate_text(model,tokenizer,seq_len,seed_text=seed_text,num_gen_words=100)

-------------------------------------
[[1.9293600e-08 1.1982133e-01 7.3622189e-02 ... 1.2039391e-07
  2.0520312e-03 1.0161609e-07]]
[1.9293600e-08 1.1982133e-01 7.3622189e-02 ... 1.2039391e-07 2.0520312e-03
 1.0161609e-07]
1
-------------------------------------
-------------------------------------
[[1.1494975e-07 2.0506693e-04 1.5489482e-04 ... 7.5815457e-05
  3.1517101e-07 1.2286140e-03]]
[1.1494975e-07 2.0506693e-04 1.5489482e-04 ... 7.5815457e-05 3.1517101e-07
 1.2286140e-03]
65
-------------------------------------
-------------------------------------
[[4.1787794e-07 1.0566573e-02 9.4378032e-03 ... 2.0339433e-08
  2.2244343e-08 3.0111596e-07]]
[4.1787794e-07 1.0566573e-02 9.4378032e-03 ... 2.0339433e-08 2.2244343e-08
 3.0111596e-07]
3
-------------------------------------
-------------------------------------
[[1.2799741e-06 6.0632184e-02 4.3852676e-02 ... 1.3795069e-06
  2.7664846e-05 3.5151434e-06]]
[1.2799741e-06 6.0632184e-02 4.3852676e-02 ... 1.3795069e-06 2.7664846e-05
 3.

'the room and the door                                                                                                                                                                                              '

In [None]:
full_text = read_file('moby_dick_four_chapters.txt')
for i,word in enumerate(full_text.split()):
    if word == 'inkling':
        print(' '.join(full_text.split()[i-20:i+20]))
        print('\n')

In [97]:
from keras.models import load_model

model = load_model('epoch_big.h5')
tokenizer = load(open('epoch_big', 'rb'))
generate_text(model, tokenizer, seq_len, seed_text=seed_text, num_gen_words = 29)

-------------------------------------
[[1.7530108e-18 6.7600602e-05 3.2201656e-03 ... 0.0000000e+00
  1.1716737e-27 2.1632451e-31]]
[1.7530108e-18 6.7600602e-05 3.2201656e-03 ... 0.0000000e+00 1.1716737e-27
 2.1632451e-31]
570
-------------------------------------
-------------------------------------
[[1.3790001e-29 3.0298934e-06 3.9694994e-05 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]]
[1.3790001e-29 3.0298934e-06 3.9694994e-05 ... 0.0000000e+00 0.0000000e+00
 0.0000000e+00]
3
-------------------------------------
-------------------------------------
[[5.4978741e-18 2.4944036e-05 2.9346280e-04 ... 3.6193259e-37
  7.2255480e-34 1.4758923e-35]]
[5.4978741e-18 2.4944036e-05 2.9346280e-04 ... 3.6193259e-37 7.2255480e-34
 1.4758923e-35]
464
-------------------------------------
-------------------------------------
[[1.6363887e-26 2.2738404e-04 3.7228316e-04 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]]
[1.6363887e-26 2.2738404e-04 3.7228316e-04 ... 0.0000000e+00 0.0000000e+00


'thousands and easy stuck and being completely standing and potatoes the trap gabriel \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n'