# Natural Language Processing 
Natural Language Processing (or NLP for short) is a discipline in computing that deals with the communication between natural (human) languages and computer languages. A common example of NLP is something like spellcheck or autocomplete. Essentially NLP is the field that focuses on how computers can understand and/or process natural/human languages. 

### Recurrent Neural Networks

In this tutorial we will introduce a new kind of neural network that is much more capable of processing sequential data such as text or characters called a **recurrent neural network** (RNN for short). 

We will learn how to use a reccurent neural network to do the following:
- Sentiment Analysis
- Character Generation 

RNN's are complex and come in many different forms so in this tutorial we wil focus on how they work and the kind of problems they are best suited for.



## Sequence Data
In the previous tutorials we focused on data that we could represent as one static data point where the notion of time or step was irrelevant. Take for example our image data, it was simply a tensor of shape (width, height, channels). That data doesn't change or care about the notion of time. 

In this tutorial we will look at sequences of text and learn how we can encode them in a meaningful way. Unlike images, sequence data such as long chains of text, weather patterns, videos and really anything where the notion of a step or time is relevant needs to be processed and handled in a special way. 

But what do I mean by sequences and why is text data a sequence? Well that's a good question. Since textual data contains many words that follow in a very specific and meaningful order, we need to be able to keep track of each word and when it occurs in the data. Simply encoding say an entire paragraph of text into one data point wouldn't give us a very meaningful picture of the data and would be very difficult to do anything with. This is why we treat text as a sequence and process one word at a time. We will keep track of where each of these words appear and use that information to try to understand the meaning of peices of text.



## Encoding Text
As we know machine learning models and neural networks don't take raw text data as an input. This means we must somehow encode our textual data to numeric values that our models can understand. There are many different ways of doing this and we will look at a few examples below. 

Before we get into the different encoding/preprocessing methods let's understand the information we can get from textual data by looking at the following two movie reviews.

```I thought the movie was going to be bad, but it was actually amazing!```

```I thought the movie was going to be amazing, but it was actually bad!```

Although these two setences are very similar we know that they have very different meanings. This is because of the **ordering** of words, a very important property of textual data.

Now keep that in mind while we consider some different ways of encoding our textual data.

### Bag of Words
The first and simplest way to encode our data is to use something called **bag of words**. This is a pretty easy technique where each word in a sentence is encoded with an integer and thrown into a collection that does not maintain the order of the words but does keep track of the frequency. Have a look at the python function below that encodes a string of text into bag of words. 

In [1]:
vocab = {}  # maps word to integer representing it
word_encoding = 1
# def bag_of_words(text):
#   global word_encoding

#   words = text.lower().split(" ")  # create a list of all of the words in the text, well assume there is no grammar in our text for this example
#   bag = {}  # stores all of the encodings and their frequency

#   for word in words:
#     if word in vocab:
#       encoding = vocab[word]  # get encoding from vocab
#     else:
#       vocab[word] = word_encoding
#       encoding = word_encoding
#       word_encoding += 1
    
#     if encoding in bag:
#       bag[encoding] += 1
#     else:
#       bag[encoding] = 1
  
#   return bag

def bag_of_words(text):
    global word_encoding

    words = text.lower().split(" ")  # create a list of all of the words in the text, well assume there is no grammar in our text for this example
    bag = {}  # stores all of the encodings and their frequency

    for word in words:
        if word in vocab:
            encoding = vocab[word]  # get encoding from vocab
        else:
            vocab[word] = word_encoding
            encoding = word_encoding
            word_encoding += 1
    
        if encoding in bag:
            bag[encoding] += 1
        else:
            bag[encoding] = 1
  
    return bag

text = "this is a test to see if this test will work is is test a a"
bag = bag_of_words(text)
print(bag)
print(vocab)

{1: 2, 2: 3, 3: 3, 4: 3, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


This isn't really the way we would do this in practice, but I hope it gives you an idea of how bag of words works. Notice that we've lost the order in which words appear. In fact, let's look at how this encoding works for the two sentences we showed above.



In [2]:
positive_review = "I thought the movie was going to be bad but it was actually amazing"
negative_review = "I thought the movie was going to be amazing but it was actually bad"

pos_bag = bag_of_words(positive_review)
neg_bag = bag_of_words(negative_review)

print("Positive:", pos_bag)
print("Negative:", neg_bag)

Positive: {10: 1, 11: 1, 12: 1, 13: 1, 14: 2, 15: 1, 5: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1}
Negative: {10: 1, 11: 1, 12: 1, 13: 1, 14: 2, 15: 1, 5: 1, 16: 1, 21: 1, 18: 1, 19: 1, 20: 1, 17: 1}


We can see that even though these sentences have a very different meaning they are encoded exaclty the same way. Obviously, this isn't going to fly. Let's look at some other methods.



### Integer Encoding
The next technique we will look at is called **integer encoding**. This involves representing each word or character in a sentence as a unique integer and maintaining the order of these words. This should hopefully fix the problem we saw before were we lost the order of words.


In [3]:
vocab = {}  
word_encoding = 1
def one_hot_encoding(text):
    global word_encoding

    words = text.lower().split(" ") 
    encoding = []  

    for word in words:
        if word in vocab:
            code = vocab[word]  
            encoding.append(code) 
        else:
            vocab[word] = word_encoding
            encoding.append(word_encoding)
            word_encoding += 1
  
    return encoding

text = "this is a test to see if this test will work is is test a a"
encoding = one_hot_encoding(text)
print(encoding)
print(vocab)

[1, 2, 3, 4, 5, 6, 7, 1, 4, 8, 9, 2, 2, 4, 3, 3]
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


And now let's have a look at one hot encoding on our movie reviews.

In [4]:
positive_review = "I thought the movie was going to be bad but it was actually amazing"
negative_review = "I thought the movie was going to be amazing but it was actually bad"

pos_encode = one_hot_encoding(positive_review)
neg_encode = one_hot_encoding(negative_review)

print("Positive:", pos_encode)
print("Negative:", neg_encode)

Positive: [10, 11, 12, 13, 14, 15, 5, 16, 17, 18, 19, 14, 20, 21]
Negative: [10, 11, 12, 13, 14, 15, 5, 16, 21, 18, 19, 14, 20, 17]


Much better, now we are keeping track of the order of words and we can tell where each occurs. But this still has a few issues with it. Ideally when we encode words, we would like similar words to have similar labels and different words to have very different labels. For example, the words happy and joyful should probably have very similar labels so we can determine that they are similar. While words like horrible and amazing should probably have very different labels. The method we looked at above won't be able to do something like this for us. This could mean that the model will have a very difficult time determing if two words are similar or not which could result in some pretty drastic performace impacts.



### Word Embeddings
Luckily there is a third method that is far superior, **word embeddings**. This method keeps the order of words intact as well as encodes similar words with very similar labels. It attempts to not only encode the frequency and order of words but the meaning of those words in the sentence. It encodes each word as a dense vector that represents its context in the sentence.

Unlike the previous techniques word embeddings are learned by looking at many different training examples. You can add what's called an *embedding layer* to the beggining of your model and while your model trains your embedding layer will learn the correct embeddings for words. You can also use pretrained embedding layers.

This is the technique we will use for our examples and its implementation will be showed later on.



## Recurrent Neural Networks (RNN's)
Now that we've learned a little bit about how we can encode text it's time to dive into recurrent neural networks. Up until this point we have been using something called **feed-forward** neural networks. This simply means that all our data is fed forwards (all at once) from left to right through the network. This was fine for the problems we considered before but won't work very well for processing text. After all, even we (humans) don't process text all at once. We read word by word from left to right and keep track of the current meaning of the sentence so we can understand the meaning of the next word. Well this is exaclty what a recurrent neural network is designed to do. When we say recurrent neural network all we really mean is a network that contains a loop. A RNN will process one word at a time while maintaining an internal memory of what it's already seen. This will allow it to treat words differently based on their order in a sentence and to slowly build an understanding of the entire input, one word at a time.

This is why we are treating our text data as a sequence! So that we can pass one word at a time to the RNN.

Let's have a look at what a recurrent layer might look like.

![alt text](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)
*Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/*

Let's define what all these variables stand for before we get into the explination.

**h<sub>t</sub>** output at time t

**x<sub>t</sub>** input at time t

**A** Recurrent Layer (loop)

What this diagram is trying to illustrate is that a recurrent layer processes words or input one at a time in a combination with the output from the previous iteration. So, as we progress further in the input sequence, we build a more complex understanding of the text as a whole.

What we've just looked at is called a **simple RNN layer**. It can be effective at processing shorter sequences of text for simple problems but has many downfalls associated with it. One of them being the fact that as text sequences get longer it gets increasingly difficult for the network to understand the text properly.



## LSTM
The layer we dicussed in depth above was called a *simpleRNN*. However, there does exist some other recurrent layers (layers that contain a loop) that work much better than a simple RNN layer. The one we will talk about here is called LSTM (Long Short-Term Memory). This layer works very similarily to the simpleRNN layer but adds a way to access inputs from any timestep in the past. Whereas in our simple RNN layer input from previous timestamps gradually disappeared as we got further through the input. With a LSTM we have a long-term memory data structure storing all the previously seen inputs as well as when we saw them. This allows for us to access any previous value we want at any point in time. This adds to the complexity of our network and allows it to discover more useful relationships between inputs and when they appear. 

For the purpose of this course we will refrain from going any further into the math or details behind how these layers work.



## Sentiment Analysis
And now time to see a recurrent neural network in action. For this example, we are going to do something called sentiment analysis.

The formal definition of this term from Wikipedia is as follows:

*the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.*

The example we’ll use here is classifying movie reviews as either postive, negative or neutral.

*This guide is based on the following tensorflow tutorial: https://www.tensorflow.org/tutorials/text/text_classification_rnn*



### Movie Review Dataset
Well start by loading in the IMDB movie review dataset from keras. This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.

In [5]:
# %tensorflow_version 2.x  # this line is not required unless you are in a notebook
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

VOCAB_SIZE = 88584

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

In [6]:
# Lets look at one review
train_data[1]

[1,
 194,
 1153,
 194,
 8255,
 78,
 228,
 5,
 6,
 1463,
 4369,
 5012,
 134,
 26,
 4,
 715,
 8,
 118,
 1634,
 14,
 394,
 20,
 13,
 119,
 954,
 189,
 102,
 5,
 207,
 110,
 3103,
 21,
 14,
 69,
 188,
 8,
 30,
 23,
 7,
 4,
 249,
 126,
 93,
 4,
 114,
 9,
 2300,
 1523,
 5,
 647,
 4,
 116,
 9,
 35,
 8163,
 4,
 229,
 9,
 340,
 1322,
 4,
 118,
 9,
 4,
 130,
 4901,
 19,
 4,
 1002,
 5,
 89,
 29,
 952,
 46,
 37,
 4,
 455,
 9,
 45,
 43,
 38,
 1543,
 1905,
 398,
 4,
 1649,
 26,
 6853,
 5,
 163,
 11,
 3215,
 10156,
 4,
 1153,
 9,
 194,
 775,
 7,
 8255,
 11596,
 349,
 2637,
 148,
 605,
 15358,
 8003,
 15,
 123,
 125,
 68,
 23141,
 6853,
 15,
 349,
 165,
 4362,
 98,
 5,
 4,
 228,
 9,
 43,
 36893,
 1157,
 15,
 299,
 120,
 5,
 120,
 174,
 11,
 220,
 175,
 136,
 50,
 9,
 4373,
 228,
 8255,
 5,
 25249,
 656,
 245,
 2350,
 5,
 4,
 9837,
 131,
 152,
 491,
 18,
 46151,
 32,
 7464,
 1212,
 14,
 9,
 6,
 371,
 78,
 22,
 625,
 64,
 1382,
 9,
 8,
 168,
 145,
 23,
 4,
 1690,
 15,
 16,
 4,
 1355,
 5,
 28,
 6,
 52,
 

### More Preprocessing
If we have a look at some of our loaded in reviews, we'll notice that they are different lengths. This is an issue. We cannot pass different length data into our neural network. Therefore, we must make each review the same length. To do this we will follow the procedure below:
- if the review is greater than 250 words then trim off the extra words
- if the review is less than 250 words add the necessary amount of 0's to make it equal to 250.

Luckily for us keras has a function that can do this for us:




In [7]:
train_data = sequence.pad_sequences(train_data, MAXLEN)
test_data = sequence.pad_sequences(test_data, MAXLEN)

### Creating the Model
Now it's time to create the model. We'll use a word embedding layer as the first layer in our model and add a LSTM layer afterwards that feeds into a dense node to get our predicted sentiment. 

32 stands for the output dimension of the vectors generated by the embedding layer. We can change this value if we'd like!

In [8]:
model = tf.keras.Sequential([
    # 32 stands for 32 dimensions vector
    tf.keras.layers.Embedding(VOCAB_SIZE, 32),
    # now passing to lstm it will have 32 dimensions for every single word
    tf.keras.layers.LSTM(32),
    # this is to predict sentiment analysis
    # like more than 0.5 can be takes as positive
    tf.keras.layers.Dense(1, activation="sigmoid")
])

In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          2834688   
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 2,843,041
Trainable params: 2,843,041
Non-trainable params: 0
_________________________________________________________________


### Training
Now it's time to compile and train the model. 

In [10]:
# binary_crossentropy 
# rmsprop
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=['acc'])

# validation_split the training data into train and test data in ratio of 80:20
history = model.fit(train_data, train_labels, epochs=10, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


And we'll evaluate the model on our training data to see how well it performs.

In [11]:
results = model.evaluate(test_data, test_labels)
print(results)

[0.4499751627445221, 0.8532400131225586]


So we're scoring somewhere in the mid-high 80's. Not bad for a simple recurrent network.

### Making Predictions
Now let’s use our network to make predictions on our own reviews. 

Since our reviews are encoded well need to convert any review that we write into that form so the network can understand it. To do that well load the encodings from the dataset and use them to encode our own data.

In [12]:
word_index = imdb.get_word_index()
word_index

{'fawn': 34701,
 'tsukino': 52006,
 'nunnery': 52007,
 'sonja': 16816,
 'vani': 63951,
 'woods': 1408,
 'spiders': 16115,
 'hanging': 2345,
 'woody': 2289,
 'trawling': 52008,
 "hold's": 52009,
 'comically': 11307,
 'localized': 40830,
 'disobeying': 30568,
 "'royale": 52010,
 "harpo's": 40831,
 'canet': 52011,
 'aileen': 19313,
 'acurately': 52012,
 "diplomat's": 52013,
 'rickman': 25242,
 'arranged': 6746,
 'rumbustious': 52014,
 'familiarness': 52015,
 "spider'": 52016,
 'hahahah': 68804,
 "wood'": 52017,
 'transvestism': 40833,
 "hangin'": 34702,
 'bringing': 2338,
 'seamier': 40834,
 'wooded': 34703,
 'bravora': 52018,
 'grueling': 16817,
 'wooden': 1636,
 'wednesday': 16818,
 "'prix": 52019,
 'altagracia': 34704,
 'circuitry': 52020,
 'crotch': 11585,
 'busybody': 57766,
 "tart'n'tangy": 52021,
 'burgade': 14129,
 'thrace': 52023,
 "tom's": 11038,
 'snuggles': 52025,
 'francesco': 29114,
 'complainers': 52027,
 'templarios': 52125,
 '272': 40835,
 '273': 52028,
 'zaniacs': 52130,

In [13]:
def encode_text(text):
    # convert all text into tokens
    tokens = keras.preprocessing.text.text_to_word_sequence(text)
    
    # if the words that it is in these tokens now in our mapping, 
    # so in the vocabulary of 88000 words, then we'll do is replace 
    # its location in ths list with that specific word  or with that 
    # specific integer thst represents it otherwise we will put zero 
    # just to stand, unknown character   
    tokens = [word_index[word] if word in word_index else 0 for word in tokens]
    
    # we will pad this token sequence and just return actually the 
    # first index here . The reason we are doing that is because this 
    # pad sequence works on a list of sequences, so multiple sequence 
    # so we need tout inside a list, which means its going to return a 
    # list of lists, we only want the first sequence of the list, 
    # because we want the first sequence that we pad it 
    return sequence.pad_sequences([tokens], MAXLEN)[0]

text = "that movie was just amazing, so amazing"
encoded = encode_text(text)
encoded

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   

In [14]:
# while were at it lets make a decode function

reverse_word_index = {value: key for (key, value) in word_index.items()}
reverse_word_index

{34701: 'fawn',
 52006: 'tsukino',
 52007: 'nunnery',
 16816: 'sonja',
 63951: 'vani',
 1408: 'woods',
 16115: 'spiders',
 2345: 'hanging',
 2289: 'woody',
 52008: 'trawling',
 52009: "hold's",
 11307: 'comically',
 40830: 'localized',
 30568: 'disobeying',
 52010: "'royale",
 40831: "harpo's",
 52011: 'canet',
 19313: 'aileen',
 52012: 'acurately',
 52013: "diplomat's",
 25242: 'rickman',
 6746: 'arranged',
 52014: 'rumbustious',
 52015: 'familiarness',
 52016: "spider'",
 68804: 'hahahah',
 52017: "wood'",
 40833: 'transvestism',
 34702: "hangin'",
 2338: 'bringing',
 40834: 'seamier',
 34703: 'wooded',
 52018: 'bravora',
 16817: 'grueling',
 1636: 'wooden',
 16818: 'wednesday',
 52019: "'prix",
 34704: 'altagracia',
 52020: 'circuitry',
 11585: 'crotch',
 57766: 'busybody',
 52021: "tart'n'tangy",
 14129: 'burgade',
 52023: 'thrace',
 11038: "tom's",
 52025: 'snuggles',
 29114: 'francesco',
 52027: 'complainers',
 52125: 'templarios',
 40835: '272',
 52028: '273',
 52130: 'zaniacs',

In [15]:
def decode_integers(integers):
    PAD = 0
    text = ""
    for num in integers:
        if num != PAD:
            text += reverse_word_index[num] + " "

    return text[:-1]
  
decode_integers(encoded)

'that movie was just amazing so amazing'

In [16]:
# now time to make a prediction

def predict(text):
#     this is list within list
    encoded_text = encode_text(text)
#     print("encoded_text")
#     print(encoded_text)
    # blank np array
    pred = np.zeros((1,250))
#     print("pred")
#     print(pred)
# this will extract nested list from encoded_text
    pred[0] = encoded_text
#     print("pred[0]")
#     print(pred[0])
    result = model.predict(pred) 
#     print("result")
#     print(result)
#     print("result[0]")
    print(result[0])

positive_review = "That movie was! really loved it and would great watch it again because it was amazingly great"
predict(positive_review)

negative_review = "that movie really sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(negative_review)


[0.94644177]
[0.49331102]


## RNN Play Generator

Now time for one of the coolest examples we've seen so far. We are going to use a RNN to generate a play. We will simply show the RNN an example of something we want it to recreate and it will learn how to write a version of it on its own. We'll do this using a character predictive model that will take as input a variable length sequence and predict the next character. We can use the model many times in a row with the output from the last predicition as the input for the next call to generate a sequence.


*This guide is based on the following: https://www.tensorflow.org/tutorials/text/text_generation*

In [17]:
# %tensorflow_version 2.x  # this line is not required unless you are in a notebook
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

### Dataset
For this example, we only need one peice of training data. In fact, we can write our own poem or play and pass that to the network for training if we'd like. However, to make things easy we'll use an extract from a shakesphere play.




In [18]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
path_to_file

'C:\\Users\\abhay\\.keras\\datasets\\shakespeare.txt'

### Loading Your Own Data
To load your own data, you'll need to upload a file from the dialog below. Then you'll need to follow the steps from above but load in this new file instead.



In [19]:
# from google.colab import files
# path_to_file = list(files.upload().keys())[0]


# ModuleNotFoundError: No module named 'google.colab'

### Read Contents of File
Let's look at the contents of the file.

In [20]:
path_to_file = 'C:\\Users\\abhay\\.keras\\datasets\\shakespeare.txt'
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


In [21]:
# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



### Encoding
Since this text isn't encoded yet well need to do that ourselves. We are going to encode each unique character as a different integer.

In [22]:
vocab = sorted(set(text))
print("vocab")
print(vocab)

# Creating a mapping from unique characters to indices
char2idx = {u:i for i,u in enumerate(vocab)}
print("char2idx")
print(char2idx)

idx2char = np.array(vocab)
print("idx2char")
print(idx2char)

def text_to_int(text):
    return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)
print(text_as_int)

vocab
['\n', ' ', '!', '$', '&', "'", ',', '-', '.', '3', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
char2idx
{'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}
idx2char
['\n' ' ' '!' '$' '&' "'" ',' '-' '.' '3' ':' ';' '?' 'A' 'B' 'C' 'D' 'E

In [23]:
# lets look at how part of our text is encoded
print("Text:", text[:13])
print("Encoded:", text_to_int(text[:13]))

Text: First Citizen
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52]


And here we will make a function that can convert our numeric values to text.


In [24]:
def int_to_text(ints):
    try:
        ints = ints.numpy()
    except:
        pass
    return ''.join(idx2char[ints])

int_to_text(text_as_int[:13])

'First Citizen'

### Creating Training Examples
Remember our task is to feed the model a sequence and have it return to us the next character. This means we need to split our text data from above into many shorter sequences that we can pass to the model as training examples. 

The training examples we will prepapre will use a *seq_length* sequence as input and a *seq_length* sequence as the output where that sequence is the original sequence shifted one letter to the right. For example:

```input: Hell | output: ello```

Our first step will be to create a stream of characters from our text data.

In [25]:
seq_length = 100  # length of sequence for a training example

# for every training example, we need to create a sequence input 
# that's 100 character long and we need to create a sequence of 
# output that's 100 characters long which means that we need to 
# have 101 characters that we use for every training example
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
# converting entire string datasets to characters
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

Next we can use the batch method to turn this stream of characters into batches of desired length.

In [26]:
# drop_remainder - it will discard excess character which are more that seq_length + 1
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

Now we need to use these sequences of length 101 and split them into input and output.

In [27]:
# def for creating tranining example we needed
def split_input_target(chunk):  # for the example: hello
    input_text = chunk[:-1]  # hell
    target_text = chunk[1:]  # ello
    return input_text, target_text  # hell, ello

dataset = sequences.map(split_input_target)  # we use map to apply the above function to every entry

In [28]:
for x, y in dataset.take(2):
    print("\n\nEXAMPLE\n")
    print("INPUT")
    print(int_to_text(x))
    print("\nOUTPUT")
    print(int_to_text(y))



EXAMPLE

INPUT
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You

OUTPUT
irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


EXAMPLE

INPUT
are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you 

OUTPUT
re all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you k


Finally we need to make training batches.

In [29]:
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)  # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### Building the Model
Now it is time to build the model. We will use an embedding layer a LSTM and one dense layer that contains a node for each unique character in our training data. The dense layer will give us a probability distribution over all nodes.

In [30]:
# we're going to pass the model batches of size 64, for training, right. 
# But what we're going to do later is save this model. And then we're going to patch pass it batches of one pieces of you know, training whatever data so that you can actually make a prediction on just one piece of data.

# Because for right now, what it's going to do is takes a batch size of 64, it's
# going to take 64 training examples, and returned to a 64 outputs. 

# That's what this
# model is going to be built to do the way we build it now to start. But later on, we're
# going to rebuild the model using the same parameters that we've saved and trained for
# the model. But change it just be a batch size of one. So that that way, we can get one
# prediction for one input sequence, right. 

# the first parameter vocab_size
# the embedding dimension, which remember was 256 as a second argument

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        
        # this none means is we don't know how long the sequences are going to be in each batch
        # But when we actually use the model to make predictions, we don't know how long the 
        # sequence is going to be that we input, so we leave this none.       
        tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                                  batch_input_shape=[batch_size, None]),
        
        # Next, we'll make an LSTM layer, which is long short term memory RNN units, which is 1024, which again, I don't
        # really want to explain, but you can look up if you want return sequences means return the
        # intermediate stage at every step. The reason we're doing this, is because we want to look
        # at what the model seeing at the intermediate steps and not just the final stage. 

        # So if you leave this as false, and you don't set this to true, what happens is this lsdm just
        # returns one output, that tells us what the model kind of found at the very last time
        # step. 
        tf.keras.layers.LSTM(rnn_units, 
                             return_sequences=True, 
                             stateful=True, 
                             recurrent_initializer='glorot_uniform'),
        
        # Finally, we have a dense layer, which is going to contain the amount of vocabulary
        # size notes. The reason we're doing this is because we want the final layer to have the
        # amount of nodes in it equal to the amount of characters in the vocabulary. This way, every
        # single one of those nodes can represent a probability distribution, the dot character
        # comes next. So all of those nodes value some sum together should give us the value of one.
        # And that's going to allow us to look at that last layer as a predictive layer where it's
        # telling us the probability that these characters come next
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(VOCAB_SIZE,EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (64, None, 256)           16640     
                                                                 
 lstm_1 (LSTM)               (64, None, 1024)          5246976   
                                                                 
 dense_1 (Dense)             (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


### Creating a Loss Function
Now we are going to create our own loss function for this problem. This is because our model will output a (64, sequence_length, 65) shaped tensor that represents the probability distribution of each character at each timestep for every sequence in the batch. 

However, before we do that let's have a look at a sample input and the output from our untrained model. This is so we can understand what the model is giving us.

In [31]:
# reason for this is we have to remember that when we create a dense layer as our last layer that has 65 nodes, every
# prediction is going to contain 65 numbers. And that's going to be the probability of
# every one of those characters occurring, right. That's what that does that the last
# one for us. So obviously, our last dimension is going to be 65. 

for input_example_batch, target_example_batch in data.take(1):
    example_batch_predictions = model(input_example_batch)  # ask our model for a prediction on our first batch of training data (64 entries)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")  # print out the output shape
    
    
    
# actually will the first one from my data set with when it's not trained. So I can actually use my model
# before it's trained with random weights and random biases and parameters by simply using model. 

# I'm going to give it the first batch, and it can even it shows me the shape of this batch 64 100, I'm going to
# pass that to the model. And it's gonna give us a prediction for that. 

# And in fact, it's
# actually going to give us a prediction, for every single element in the batch, right,
# every single training example in the batch is going to give us a prediction for

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [32]:
# So this is what we get, we get a length 64 tensor, right. And then inside of here, we get 
# a list inside of a list or an array inside of an array with all these different predictions. 

# But you can see we're getting 64 different predictions, because there's 64 elements in the batch.

# we can see that the predicition is an array of 64 arrays, one for each entry in the batch
print(len(example_batch_predictions))
# print(example_batch_predictions)
print(example_batch_predictions[0][0])
# example_batch_predictions.shape

64
tf.Tensor(
[ 4.0927017e-04 -2.4461775e-04  1.6009584e-03  8.0196955e-04
  2.0547262e-03  1.2459840e-03  2.0946953e-03  4.4540651e-03
  3.0043223e-03 -4.2792456e-04  1.7639406e-03  5.4706994e-04
  2.0709401e-03 -3.4895905e-03  6.0010869e-03  4.0110066e-03
  6.4030761e-04  2.8252690e-03 -4.2313468e-03 -3.6730856e-04
 -3.7246570e-03  3.1223055e-03 -1.8434611e-03  7.9889186e-03
 -3.5206166e-03 -2.9026887e-03  4.2089615e-03 -7.8818528e-04
 -5.7942630e-04 -1.7068239e-03  1.0365057e-03 -3.9404808e-03
 -3.9238501e-03  2.2020601e-03  5.5153808e-04 -3.9338912e-03
  1.5294791e-03 -5.6762607e-03  1.6811718e-03  1.9168577e-03
 -1.4376533e-03  2.6192833e-03 -3.2460545e-03  5.7271076e-03
  1.1034205e-03  4.5864144e-03 -2.1001701e-03 -6.8966905e-04
  2.7670534e-03 -1.2423365e-03 -4.9804421e-03 -3.9277337e-03
  3.3922880e-03  2.5752252e-03 -3.7201911e-03 -1.2721810e-03
 -4.9683517e-03 -3.6356994e-04  2.5169943e-03  2.8769271e-03
 -1.4622696e-05 -3.3910805e-04 -3.5390656e-03 -1.8255172e-03
  5.462681

In [33]:
# we see now that we get a length 100 tensor. And that this is what it looks like
# there's still another layer inside. And in fact, we can see that there's another nested
# layer here, inside of this array. 

# So the reason for this
# is because at every single time step, which means the length of the sequence, right?
# Because remember, a recurrent neural network is going to feed one at a time, every word in
# the sequence. In this case, our sequences are like the 100. 

# At every time step, we're
# actually saving that output as a prediction, right, and we're passing that
# back. So we can see that for one training
# example, we get 100 outputs.

# lets examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# notice this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[ 0.00040927 -0.00024462  0.00160096 ... -0.00353907 -0.00182552
   0.00054627]
 [ 0.000307   -0.00076781  0.00222701 ... -0.00686455  0.00113389
   0.00512418]
 [-0.00129873 -0.0004065   0.00471196 ... -0.00555601  0.00644692
   0.00484218]
 ...
 [-0.00096942  0.00200139  0.01692113 ... -0.00321132  0.0049075
   0.01291327]
 [-0.00607152 -0.00267706  0.01429912 ... -0.00385788  0.005547
   0.01445976]
 [-0.00316924 -0.00026644  0.01392164 ... -0.00846199  0.00287417
   0.01682824]], shape=(100, 65), dtype=float32)


In [34]:
# we look at the prediction at
# just the very first time step. So this is 100 different time steps. So let's look at the

# first time step, we can see that now we get a tensor
# of length 65. And this is telling us the probability of every single character
# occurring next at the first time step. 

# we need to actually make our own loss function to be able to determine how good are
# models performing, when it outputs something ridiculous that looks like this, because
# there is no just built in loss function in TensorFlow that can look at a three
# dimensional nested array of probabilities over you know the vocabulary size and tell us
# how different the two things are.



# and finally well look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probabillity of each character occuring next

65
tf.Tensor(
[ 4.0927017e-04 -2.4461775e-04  1.6009584e-03  8.0196955e-04
  2.0547262e-03  1.2459840e-03  2.0946953e-03  4.4540651e-03
  3.0043223e-03 -4.2792456e-04  1.7639406e-03  5.4706994e-04
  2.0709401e-03 -3.4895905e-03  6.0010869e-03  4.0110066e-03
  6.4030761e-04  2.8252690e-03 -4.2313468e-03 -3.6730856e-04
 -3.7246570e-03  3.1223055e-03 -1.8434611e-03  7.9889186e-03
 -3.5206166e-03 -2.9026887e-03  4.2089615e-03 -7.8818528e-04
 -5.7942630e-04 -1.7068239e-03  1.0365057e-03 -3.9404808e-03
 -3.9238501e-03  2.2020601e-03  5.5153808e-04 -3.9338912e-03
  1.5294791e-03 -5.6762607e-03  1.6811718e-03  1.9168577e-03
 -1.4376533e-03  2.6192833e-03 -3.2460545e-03  5.7271076e-03
  1.1034205e-03  4.5864144e-03 -2.1001701e-03 -6.8966905e-04
  2.7670534e-03 -1.2423365e-03 -4.9804421e-03 -3.9277337e-03
  3.3922880e-03  2.5752252e-03 -3.7201911e-03 -1.2721810e-03
 -4.9683517e-03 -3.6356994e-04  2.5169943e-03  2.8769271e-03
 -1.4622696e-05 -3.3910805e-04 -3.5390656e-03 -1.8255172e-03
  5.462681

In [35]:
# we can do is get the categorical with this We can sample the categorical distribution. And
# that will tell us the predicted character. 

# since our model works on random weights and biases, right
# now, we haven't trained yet. This is actually all of the predicted characters that it had.
# So at every time step, and the first time step, a predicted h, then a predicted
# -, then -, then G, then u, and so on so forth


# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)
sampled_indices = tf.random.categorical(pred, num_samples=1)



# we're going to sample the prediction. So at this, this is
# we're sampling every time step. We're gonna say sampled indices equals NP dot
# reshapes is reshaping this just changing the shape of it, we're gonna say predicted
# characters equals int to text sampled indices.

# why we're sampling, and not just taking the argument max value of like this
# array, because you would think that what we'll do is just take the one that has the
# highest probability out of here, and that will be the index of the next predicted
# character. There's some issues with doing that for the last function, just because if
# we do that, then what that means is, we're going to kind of get stuck in an infinite
# loop almost where we just keep accepting the biggest character. So what we'll do is pick a
# character based on this probability distribution.

# it's called sampling the distribution.
# But sampling is just like trying to pick a character based on a probability
# distribution, it doesn't guarantee that the character with the highest probability is
# going to be picked, it just uses those probabilities to pick it.


# here, we reshaped the array and convert all the integers to numbers to see the actual
# characters.



# now we can reshape that array and convert all the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0]
predicted_chars = int_to_text(sampled_indices)

predicted_chars  # and this is what the model predicted for training sequence 1

"MKGxF;tQAQbTrd\nisRueVu:luWslNBy3a,y;dGOvvcpHsl$h?aji .EymKcq;!sZgDYM3Lp..T'rFhJX;Zv i.?OUc!&wLBofppR"

So now we need to create a loss function that can compare that output to the expected output and give us some numeric value representing how close the two were. 

In [36]:
# logits - probability distribution
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

### Compiling the Model
At this point we can think of our problem as a classification problem where the model predicts the probabillity of each unique letter coming next. 


In [37]:
model.compile(optimizer='adam', loss=loss)

### Creating Checkpoints
Now we are going to setup and configure our model to save checkpoinst as it trains. This will allow us to load our model from a checkpoint and continue training it.

In [39]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Training
Finally, we will start training the model. 

**If this is taking a while go to Runtime > Change Runtime Type and choose "GPU" under hardware accelerator.**



In [40]:
# here we can set epoch to 100 
history = model.fit(data, epochs=100, callbacks=[checkpoint_callback])

ERROR! Session/line number was not unique in database. History logging moved to new session 73
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoc

Epoch 98/100
Epoch 99/100
Epoch 100/100


### Loading the Model
We'll rebuild the model from a checkpoint using a batch_size of 1 so that we can feed one peice of text to the model and have it make a prediction.

In [41]:
# now here we will rebuild it with batch_size as 1 
# so that we can pass any size of the sequence 
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

Once the model is finished training, we can find the **lastest checkpoint** that stores the models weights using the following line.



In [42]:
# we get latest checkpopint
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
# none means we don't know the dimension length
model.build(tf.TensorShape([1, None]))

We can load **any checkpoint** we want by specifying the exact file to load.

In [43]:
# can also get intemediate checkpoint

# checkpoint_num = 10
# model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
# model.build(tf.TensorShape([1, None]))

In [55]:
# import h5py

# model.save('saved_model/my_model.h5')

In [56]:
# from tensorflow.keras.model import load_model
# new_model = tf.keras.models.load_model('saved_model/my_model.h5')

# # Check its architecture
# new_model.summary()

In [57]:
# # Evaluate the restored model
# loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
# print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))

# print(new_model.predict(test_images).shape)

In [58]:
# saving model in JSON file
from tensorflow.keras.models import model_from_json
model_in_json = model.to_json()
with open('model.json','w') as json_file:
    json_file.write(model_in_json)

In [59]:
# Loading model from JSON file
model_file = open('model.json','r')
json_model = model_file.read()
model2 = model_from_json(json_model)
model2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (1, None, 256)            16640     
                                                                 
 lstm_2 (LSTM)               (1, None, 1024)           5246976   
                                                                 
 dense_2 (Dense)             (1, None, 65)             66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


In [61]:
json_model

{"class_name": "Sequential", "config": {"name": "sequential_2", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [1, null], "dtype": "float32", "sparse": false, "ragged": false, "name": "embedding_2_input"}}, {"class_name": "Embedding", "config": {"name": "embedding_2", "trainable": true, "batch_input_shape": [1, null], "dtype": "float32", "input_dim": 65, "output_dim": 256, "embeddings_initializer": {"class_name": "RandomUniform", "config": {"minval": -0.05, "maxval": 0.05, "seed": null}}, "embeddings_regularizer": null, "activity_regularizer": null, "embeddings_constraint": null, "mask_zero": false, "input_length": null}}, {"class_name": "LSTM", "config": {"name": "lstm_2", "trainable": true, "dtype": "float32", "return_sequences": true, "return_state": false, "go_backwards": false, "stateful": true, "unroll": false, "time_major": false, "units": 1024, "activation": "tanh", "recurrent_activation": "sigmoid", "use_bias": true, "kernel_initializer": {"class_name"

### Generating Text
Now we can use the lovely function provided by tensorflow to generate some text using any starting string we'd like.

In [44]:
def generate_text(model, start_string):
    # Evaluation step (generating text using the learned model)

    # Number of characters to generate
    num_generate = 800

    # Converting our start string to numbers (vectorizing)
    input_eval = [char2idx[s] for s in start_string]
    print("input_eval")
    print(input_eval)
    
    # this will convert [39, 40, 46, 39, 63] ->  [[39 40 46 39 63]]
    input_eval = tf.expand_dims(input_eval, 0)
    print("input_eval after expanding")
    print(input_eval)

    # Empty string to store our results
    text_generated = []

    # Low temperatures results in more predictable text.
    # Higher temperatures results in more surprising text.
    # Experiment to find the best setting.
    temperature = 1.0

    # resetting the states of the model. This is because when we rebuild the model, it's gonna have stored
    # the last state that it remembered when it was training. So we need to clear that before we
    # pass new input text to it
    
    # Here batch size == 1
    model.reset_states()
    
    for i in range(num_generate):
        predictions = model(input_eval)
        # remove the batch dimension

        # this will convert [[39, 40, 46, 39, 63]] ->  [39 40 46 39 63]
        predictions = tf.squeeze(predictions, 0)

        # using a categorical distribution to predict the character returned by the model
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

        # We pass the predicted character as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [52]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))

Type a starting string: my name is abhay
input_eval
[51, 63, 1, 52, 39, 51, 43, 1, 47, 57, 1, 39, 40, 46, 39, 63]
input_eval after expanding
tf.Tensor([[51 63  1 52 39 51 43  1 47 57  1 39 40 46 39 63]], shape=(1, 16), dtype=int32)
my name is abhay too much.

MENENIUS:
Not I, uncle's head? when do you woild!
If any think brawling d spoil their father?

PETRUCHIO:
And I usent.

KING EDWARD IV:
Seith his most stay be sworn: or both his pawn is rise and fall,
That er bands for honour; I
She is sentence, is noble lord?
Then let me pretty Brack Bostham.

VOLUMNIA:
I'll hear again, my fortunes here:
My comfort in that three, Kinger last of like request
Unto instruction makes on one part of youis suit,
Suffer: yet good my lord,
Sux of dany, and heralime lives which in a nightion of her facen places; one but, very welcome, Vironia, pray,
The noble knots and sue up the dangerness,
Of graceful foot of water, each partness, and save your age,
And royal-husband. Mistress of many
Hot offend you wha

*And* that's pretty much it for this module! I highly reccomend messing with the model we just created and seeing what you can get it to do!

## Sources

1. Chollet François. Deep Learning with Python. Manning Publications Co., 2018.
2. “Text Classification with an RNN &nbsp;: &nbsp; TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/text/text_classification_rnn.
3. “Text Generation with an RNN &nbsp;: &nbsp; TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/text/text_generation.
4. “Understanding LSTM Networks.” Understanding LSTM Networks -- Colah's Blog, https://colah.github.io/posts/2015-08-Understanding-LSTMs/.