# **Importing Libraries**

In [27]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.utils import np_utils

# **Mount Drive**

In [28]:
from google.colab import drive
drive.mount('/content/drive')
%cd drive/My Drive

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[Errno 2] No such file or directory: 'drive/My Drive'
/content/drive/My Drive


# **Loading data**
### Here we are using python to read our dataset which is in .txt format. I just give the path of my data which is placed in folder named automated_poet,inside another folder named kaggle in my google drive and the dataset name is sonnet.txt

In [34]:
text=open("kaggle/automated_poet/sonnet.txt").read()
text=text.lower()

print('Dataset\n',text[0:1000])


Dataset
 i

 from fairest creatures we desire increase,
 that thereby beauty's rose might never die,
 but as the riper should by time decease,
 his tender heir might bear his memory:
 but thou, contracted to thine own bright eyes,
 feed'st thy light's flame with self-substantial fuel,
 making a famine where abundance lies,
 thy self thy foe, to thy sweet self too cruel:
 thou that art now the world's fresh ornament,
 and only herald to the gaudy spring,
 within thine own bud buriest thy content,
 and tender churl mak'st waste in niggarding:
   pity the world, or else this glutton be,
   to eat the world's due, by the grave and thee.

 ii

 when forty winters shall besiege thy brow,
 and dig deep trenches in thy beauty's field,
 thy youth's proud livery so gazed on now,
 will be a tatter'd weed of small worth held:
 then being asked, where all thy beauty lies,
 where all the treasure of thy lusty days;
 to say, within thine own deep sunken eyes,
 were an all-eating shame, and thriftless

# **Creating character/ word mapping**

### Here we are making the dataset a set and then making it a list and sorting it alphabetically.
First you need to know the difference between set and list.
Set is a unordered and unindexed collection and it contains no duplicates.
List is ordered , indexed collection and allows duplicate members.

So here, first we are removing all the duplicate member by pushing them in a set then we are giving them index by pushing them in a list. and finally we are sorting them alphabetically.

**character variable contains a sorted list of letters. Like: [ '  ', ' a ', ' b ', ' c ', ' d ' ].** 

As we give them index so **n variable contains the index numbers**.  
**n_to_char** is dictionary where key in the index numbers and value is the single. characters.
**char_to_n** is vice versa.

In [35]:
characters = sorted(list(set(text)))
n_to_char = {n:char for n, char in enumerate(characters)}
char_to_n = {char:n for n, char in enumerate(characters)}

In [36]:
print(characters)

['\n', ' ', '!', "'", '(', ')', ',', '-', '.', ':', ';', '?', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


# **Data Preprocessing**

### As our model will be generating words. so first we have to train the model which letter can be followed by which one to make a meaningful word. So in this case we are declaring a sequence of letters of length 100. and the next word will be predicted upon this sequence.
Let me clear it!



1.First 2 empty list X & Y are declared. 

2.find the length of the dataset.

3.There is a for loop which is started from 0 and the condition to iterate through the loop is (length- seq-length) .
Say, the length is  1000, seq-length is 100 and as initially i=0.

so, **i < (length-sequence)** in 1st iteration and the value of the sequence is getting updated under the loop so next time the value of the sequence will be 200 . and i increases by 1 value.

**UNDER THE HOOD ( for loop)**

Sequence is a list of alphabets. We are setting the limit by $[i:i+seqlength]$ means [0:100] in first iteration. First 100 letteres will be in the sequence variable. And the 101th one will be in the label .

X is containing the features (first 100 words) and Y is containing the label.

**EXAMPLE OF X, Y**

                                       
X=[h, e, l, l]    Y=[o]

X=[e, l, l, o]	  Y=[ ]

X=[l, l, o,  ]	  Y=[i]

X=[l, o,  , i]	  Y=[n] 


In [37]:
X = []
Y = []
length = len(text)
seq_length = 100
for i in range(0, length-seq_length, 1):
     sequence = text[i:i + seq_length]
     label =text[i + seq_length]
     X.append([char_to_n[char] for char in sequence])
     Y.append(char_to_n[label])
     

In [38]:
len(X)

97819

# **Reshaping data**
X is reshaped in expected dimension. and in the second line we are scaling the X. so that it can be easy for neural network to be trained. And then we are using one hot encoding to encode Y.

In [39]:
X_modified = np.reshape(X, (len(X), seq_length, 1))
print('X is modified_1',X_modified)
X_modified = X_modified / float(len(characters))
print('X is modified_2',X_modified)
Y_modified = np_utils.to_categorical(Y)
print('Y is modified',Y_modified)

X is modified_1 [[[20]
  [ 0]
  [ 0]
  ...
  [12]
  [30]
  [ 1]]

 [[ 0]
  [ 0]
  [ 1]
  ...
  [30]
  [ 1]
  [31]]

 [[ 0]
  [ 1]
  [17]
  ...
  [ 1]
  [31]
  [19]]

 ...

 [[12]
  [23]
  [23]
  ...
  [ 1]
  [23]
  [26]]

 [[23]
  [23]
  [ 6]
  ...
  [23]
  [26]
  [33]]

 [[23]
  [ 6]
  [ 0]
  ...
  [26]
  [33]
  [16]]]
X is modified_2 [[[0.52631579]
  [0.        ]
  [0.        ]
  ...
  [0.31578947]
  [0.78947368]
  [0.02631579]]

 [[0.        ]
  [0.        ]
  [0.02631579]
  ...
  [0.78947368]
  [0.02631579]
  [0.81578947]]

 [[0.        ]
  [0.02631579]
  [0.44736842]
  ...
  [0.02631579]
  [0.81578947]
  [0.5       ]]

 ...

 [[0.31578947]
  [0.60526316]
  [0.60526316]
  ...
  [0.02631579]
  [0.60526316]
  [0.68421053]]

 [[0.60526316]
  [0.60526316]
  [0.15789474]
  ...
  [0.60526316]
  [0.68421053]
  [0.86842105]]

 [[0.60526316]
  [0.15789474]
  [0.        ]
  ...
  [0.68421053]
  [0.86842105]
  [0.42105263]]]
Y is modified [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0.

In [21]:
X_modified.shape

(97819, 100, 1)

# **Model**
Our Lstm model has 700 unit in input layer and input shape (100, 1).

In [22]:
model = Sequential()
model.add(LSTM(700, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(700))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [23]:
model.fit(X_modified, Y_modified, epochs=1, batch_size=100)

Epoch 1/1


<keras.callbacks.callbacks.History at 0x7fe54a6d46d8>

# **generating text**

Here we are Predicting next 100 words. First 10 words are given. This process is similar like the data preprocessing part and reshaping part of the training phase.

In [24]:

string_mapped = X[10]
full_string = [n_to_char[value] for value in string_mapped]
print(full_string)
# generating characters
for i in range(seq_length):
    x = np.reshape(string_mapped,(1,len(string_mapped), 1))
    x = x / float(len(characters))
    #print(x.shape)
    pred_index = np.argmax(model.predict(x, verbose=0))
    seq = [n_to_char[value] for value in string_mapped]
    full_string.append(n_to_char[pred_index])
    string_mapped.append(pred_index)
    string_mapped = string_mapped[1:len(string_mapped)]

['a', 'i', 'r', 'e', 's', 't', ' ', 'c', 'r', 'e', 'a', 't', 'u', 'r', 'e', 's', ' ', 'w', 'e', ' ', 'd', 'e', 's', 'i', 'r', 'e', ' ', 'i', 'n', 'c', 'r', 'e', 'a', 's', 'e', ',', '\n', ' ', 't', 'h', 'a', 't', ' ', 't', 'h', 'e', 'r', 'e', 'b', 'y', ' ', 'b', 'e', 'a', 'u', 't', 'y', "'", 's', ' ', 'r', 'o', 's', 'e', ' ', 'm', 'i', 'g', 'h', 't', ' ', 'n', 'e', 'v', 'e', 'r', ' ', 'd', 'i', 'e', ',', '\n', ' ', 'b', 'u', 't', ' ', 'a', 's', ' ', 't', 'h', 'e', ' ', 'r', 'i', 'p', 'e', 'r', ' ']


In [25]:
txt=''
for char in full_string:
   txt = txt+char
txt

"airest creatures we desire increase,\n that thereby beauty's rose might never die,\n but as the riper the the the thee                   th the the the the the the the the the the the the the the the th"