<a href="https://colab.research.google.com/github/Aashish1106/Text-Generation/blob/main/Text_Generation_using_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Text Generation** using a **Recurrent Neural Networks**, specifically a **Long Short-Term Memory Network**, implementing this network in Python, and use it to generate some text.

In [None]:
#importing dependencies 
import numpy
import sys
import nltk 
nltk.download('stopwords')
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
#loading the dataset
file = open("New.txt").read()

In [None]:
#now tokenizing words 
def tokenize_words(input):
    # lowercase everything to standardize it
    input = input.lower()

    # instantiate the tokenizer
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(input)

    # if the created token isn't in the stop words, make it part of "filtered"
    filtered = filter(lambda token: token not in stopwords.words('english'), tokens)
    return " ".join(filtered)

In [None]:
# preprocess the input data, makes tokens
processed_inputs = tokenize_words(file)

In [None]:
#converting characters to numbers as neural networks works on numbers 
chars = sorted(list(set(processed_inputs)))
char_to_num = dict((c, i) for i, c in enumerate(chars))

In [None]:
#checking number of characters and vocabularies
input_len = len(processed_inputs)
vocab_len = len(chars)
print ("Total number of characters:", input_len)
print ("Total vocab:", vocab_len)

Total number of characters: 2376
Total vocab: 53


In [None]:
seq_length = 100
x_data = []
y_data = []

In [None]:
# loop through inputs, start at the beginning and go until we hit
# the final character we can create a sequence out of
for i in range(0, input_len - seq_length, 1):
    # Define input and output sequences
    # Input is the current character plus desired sequence length
    in_seq = processed_inputs[i:i + seq_length]

    # Out sequence is the initial character plus total sequence length
    out_seq = processed_inputs[i + seq_length]

    # We now convert list of characters to integers based on
    # previously and add the values to our lists
    x_data.append([char_to_num[char] for char in in_seq])
    y_data.append(char_to_num[out_seq])

In [None]:
n_patterns = len(x_data)
print ("Total Patterns:", n_patterns)

Total Patterns: 2276


In [None]:
X = numpy.reshape(x_data, (n_patterns, seq_length, 1))
X = X/float(vocab_len)

In [None]:
y = np_utils.to_categorical(y_data)

In [None]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [None]:
filepath = "model_weights_saved.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
desired_callbacks = [checkpoint]

In [None]:
model.fit(X, y, epochs=160, batch_size=256, callbacks=desired_callbacks)

Epoch 1/160

Epoch 00001: loss did not improve from 2.69724
Epoch 2/160

Epoch 00002: loss did not improve from 2.69724
Epoch 3/160

Epoch 00003: loss improved from 2.69724 to 2.67551, saving model to model_weights_saved.hdf5
Epoch 4/160

Epoch 00004: loss improved from 2.67551 to 2.66131, saving model to model_weights_saved.hdf5
Epoch 5/160

Epoch 00005: loss improved from 2.66131 to 2.64632, saving model to model_weights_saved.hdf5
Epoch 6/160

Epoch 00006: loss improved from 2.64632 to 2.63626, saving model to model_weights_saved.hdf5
Epoch 7/160

Epoch 00007: loss improved from 2.63626 to 2.62916, saving model to model_weights_saved.hdf5
Epoch 8/160

Epoch 00008: loss improved from 2.62916 to 2.62669, saving model to model_weights_saved.hdf5
Epoch 9/160

Epoch 00009: loss improved from 2.62669 to 2.59826, saving model to model_weights_saved.hdf5
Epoch 10/160

Epoch 00010: loss did not improve from 2.59826
Epoch 11/160

Epoch 00011: loss improved from 2.59826 to 2.59357, saving mode

<keras.callbacks.History at 0x7f3a676f30d0>

In [None]:
filename = "model_weights_saved.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [None]:
num_to_char = dict((i, c) for i, c in enumerate(chars))

In [None]:
start = numpy.random.randint(0, len(x_data) - 1)
pattern = x_data[start]
print("Random Seed:")
print("\"", ''.join([num_to_char[value] for value in pattern]), "\"")

Random Seed:
" on led many governments classify weapon limit even prohibit use export 6 jurisdictions use cryptogra "


In [None]:
# generate the text
for i in range(1000):
  x = numpy.reshape(pattern, (1,len(pattern), 1))
  x = x/float(vocab_len)
  prediction = model.predict(x, verbose=0)
  index = numpy.argmax(prediction)
  result = num_to_char[index]
  seq_in = [num_to_char[value] for value in pattern]
  sys.stdout.write(result)
  pattern.append(index)
  pattern = pattern[ 1:len(pattern)]

phy letal palm designed therefore termed computationally secure schemes provably cannot broken even unlimited computing power one time pad schemes much difficult use practice best theoretically breakable computationally secure schemes provably cannot broken even unlimited computing power one time pad schemes much difficult use practice best theoretically breakable computationally secure schemes provably cannot broken even unlimited computing power one time pad schemes much difficult use practice best theoretically breakable computationally secure schemes provably cannot broken even unlimited computing power one time pad schemes much difficult use practice best theoretically breakable computationally secure s

KeyboardInterrupt: ignored