# Recurrent Neural Networks and Keras

In this chapter, you will learn the foundations of Recurrent Neural Networks (RNN). Starting with some prerequisites, continuing to understanding how information flows through the network and finally seeing how to implement such models with Keras in the sentiment classification task.

# (1) Introduction to the course

## Text data is avaliable online

<img src="image/Screenshot 2021-02-03 135329.png">

## Applications of machine learning to text data
Four applications:

- Sentiment analysis
- Multi-class classification
- Text generation
- Machine neural translation

## Setiment analysis

<img src="image/Screenshot 2021-02-03 135621.png">

## Multi-class classification

<img src="image/Screenshot 2021-02-03 135718.png">

## Text generation

<img src="image/Screenshot 2021-02-03 135759.png">

## Neural machine translation

<img src="image/Screenshot 2021-02-03 135844.png">

## Recurrent Neural Networks

<img src="image/Screenshot 2021-02-03 135932.png">

## Sequence to sequence models
**Many to one: classification**

<img src="image/Screenshot 2021-02-03 140041.png">

**Many to many: text generation**

<img src="image/Screenshot 2021-02-03 140152.png">

**Many to many: neural machine translation**

<img src="image/Screenshot 2021-02-03 140252.png">

**Many to many: language model**

<img src="image/Screenshot 2021-02-03 140416.png">

# Exercise I: Comparing the number of parameter of RNN and ANN

In this exercise, you will compare the number of parameters of an artificial neural network (ANN) with the recurrent neural network (RNN) architectures. Here, the vocabulary size is equal to `10,000` for both models.

The models have been defined for you with similar architectures of only one layer with `256` units (Dense or RNN) plus the output layer. They are stored on variables `ann_model` and `rnn_model`.

Use the method `.summary()` to print the models' architecture and number of parameters and select the correct statement.

### Posible Answers

- The ANN model has more parameters on the second `Dense` layer than the RNN model.

- The RNN model has fewer parameters than the ANN model. (T)

- The RNN model needs to train approximately the same number of parameters as the ANN model.

- The one-hot encoding allows the RNN model to have fewer parameters.

# Exercise II: Sentiment analysis

In the video exercise, you were exposed to the various applications of sequence to sequence models. In this exercise you will see how to use a pre-trained model for sentiment analysis.

The model is pre-loaded in the environment on variable `model`. Also, the tokenized test set variables `X_test` and `y_test` and the pre-processed original text data `sentences` from IMDb are also available.You will learn how to pre-process the text data and how to create and train the model using Keras later in the course.

You will use the pre-trained model to obtain predictions of sentiment. The model returns a number between zero and one representing the probability of the sentence to have a positive sentiment. So, you will create a decision rule to set the prediction to positive or negative.

### Instructions

- Use the `.predict()` method to make predictions on the test data.
- Make the prediction equal to `"positive"` if its value is greater than 0.5 and `"negative"` otherwise and store the result in the `pred_sentiment` variable.
- Create a `pd.DataFrame` containing the pre-processed text, the prediction obtained in the previous step and their true values contained in the `y_test` variable.
- Print the first rows using the `.head()` method.

In [None]:
# Inspect the first sentence on `X_test`
print(X_test[0])

# Get the predicion for all the sentences
pred = model.predict(X_test)

# Transform the predition into positive (> 0.5) or negative (<= 0.5)
pred_sentiment = ["positive" if x>0.5 else "negative" for x in pred]

# Create a data frame with sentences, predictions and true values
result = pd.DataFrame({'sentence': sentences, 'y_pred': pred_sentiment, 'y_true': y_test})

# Print the first lines of the data frame
print(result.head())

# Exercise III: Sequence to sequence models

In the video exercise, you learned about four types of sequence to sequence models: many-to-one (classification) and many-to-many (text generation, neural machine translation and language models). In this exercise, you have to choose the correct type of model given the following problem description:

You are helping your friend who is a specialist in speech recognition. Your friend built a model that can recognize different accents of English, but the model is failing to distinguish homophones - words with the same pronunciation but have different meaning such as "sea" vs "see" or "write" vs "right".

You propose to use a model that will use the context around the words to identify the semantic meaning of the words. By learning the meaning of the words, the new model would avoid outputs like "Did you sea that car?" - it would identify that in this case, the correct word would be "see".

What type of sequence-to-sequence model is appropriate?

### Possible Answers

- Many-to-many, because it is a classification model.

- Many-to-one, because it is a classification model.

- Many-to-many, this problem can be solved with a language model. (T)

- Many-to-one, because it is a prediction problem.


# (2) Intruction to language models

## Sentence probability
Many available models

- Probability of "I loved this movie"
- Unigram
    - $$P(sentence) = P('I') P('loved') P('this') P('movie')$$
- N-gram
    - N = 2 (biagram): $$P(sentense) = P('I') P('loved'|'I') P('this'|'loved') P('movie'|'this')$$
    - N = 3 (trigram): $$P(sentense) = P('I') P('loved'|'I') P('this'|'I loved') P('movie'|'loved this')$$

- Skip gram
    - $$P(sentense) = P('context of I'|'I') P('context of loved'|'loved') P('context of this'|'this') P('context of movie'|'movie')

- Neural Networks
    - The probability of the sentence is given by a `softmax` function on the output layer of the network

## Link to RNNs
Language models are everywhere in RNNs!

- The network itself

<p align='center'>
    <img src='image/Screenshot 2021-02-11 111619.png'>
</p>

- Embedding layer

<p align='center'>
    <img src='image/Screenshot 2021-02-11 111759.png'>
</p>

## Buiilding vocabulary dictionaries

In [None]:
# Get unique words
unique_words = list(set(text.split(' ')))

In [None]:
# Create dictionary: word is key, index is value
word_to_index = {k:v for (v,k) in enumerate(unique_words)}

In [None]:
# Create dictionary: word is key, word is value
index_to_word = {k:v for (k,v) in enumerate(unique_words)}

## Preprocessing input

In [None]:
# Initialize varizbles X and y
X = []
y = []
# Loop over the text: length 'sentence_size' per time with step equal to 'step'
for i in range(0, len(text) - sentense_size, step):
    X.append(text[i:i + sentense_size])
    y.append(text[i + sentense_size])

## Transforming new texts

In [None]:
# Create list to keep the sentences of indexes 
new_text_split = []
# Loop and get the indexes from dictionary
for sentence in new_text:
    sent_split = []
    for wd in sentence.split(' '):
        ix = wd_to_index[wd]
        sent_split.append(ix)
    new_text_split.append(sent_split)

# Exercise IV: Getting used to text data

In this exercise, you will play with text data by analyzing quotes from Sheldon Cooper in The Big Bang Theory TV show. This will give you a chance to analyze sentences to obtain insights on what it's like to deal with real-world text data.

You will use dictionary comprehensions to create dictionaries that map words to indexes and vice versa. The use of dictionaries instead of, for example, a `pandas.DataFrame` is because they are more intuitive and don't add unnecessary extra complexity.

The data is available in `sheldon_quotes` with the first two sentences already printed for you.

### Instructions

- `join` the sentences into one variable and then extract all the words and store this list in `all_words`.
- Remove the duplicated words by applying `list(set())` on the list of words and store them in `unique_words`.
- Create a dictionary with indexes as keys and words as values using dictionary comprehensions.
- Create a dictionary with words as keys and indexes as values using dictionary comprehensions.

In [None]:
# Transform the list of sentences into a list of words
all_words = ' '.join(sheldon_quotes).split(' ')

# Get number of unique words
unique_words = list(set(all_words))

# Dictionary of indexes as keys and words as values
index_to_word = {i:wd for i, wd in enumerate(sorted(unique_words))}

print(index_to_word)

# Dictionary of words as keys and indexes as values
word_to_index = {wd:i for i,wd in enumerate(sorted(unique_words))}

print(word_to_index)

# Exercise V: Preparing text data for model input
Previously, you learned how to create dictionaries of indexes to words and vice versa. In this exercise, you will split the text by characters and continue to prepare the data for supervised learning.

Splitting the texts into characters may seem strange, but it is often done for text generation. Also, the process to prepare the data is the same, the only change is how to split the texts.

You will create the training data containing a list of fixed-length texts and their labels, which are the corresponding next characters.

You will continue to use the dataset containing quotes from Sheldon (The Big Bang Theory), available in the `sheldon_quotes` variable.

The `print_examples()` function print the pairs so you can see how the data was transformed. Use `help()` for details.

### Instructions

- Define `step` equal to `2` and `chars_window` equal to `10`.
- Append the next sentence to the variable `sentences`.
- Append the correct position of the text `sheldon` to the variable `next_chars`.
- Use the `print_examples()` function to print `10` sentences and next characters.

In [None]:
# Create lists to keep the sentences and the next character
sentences = []   # ~ Training data
next_chars = []  # ~ Training labels

# Define hyperparameters
step = 2          # ~ Step to take when reading the texts in characters
chars_window = 10 # ~ Number of characters to use to predict the next one  

# Loop over the text: length `chars_window` per time with step equal to `step`
for i in range(0, len(sheldon_quotes) - chars_window, step):
    sentences.append(sheldon_quotes[i:i + chars_window])
    next_chars.append(sheldon_quotes[i + chars_window])

# Print 10 pairs
print_examples(sentences, next_chars, 10)

# Exercise VI: Transforming new text
In this exercise, you will transform a new text into sequences of numerical indexes on the dictionaries created before.

This is useful when you already have a trained model and want to apply it on a new dataset. The preprocessing steps done on the training data should also be applied to the new text, so the model can make predictions/classifications.

Here, you will also use a special token `'<UKN/>'` to represent words that are not in the vocabulary. Typically, these special tokens are the first indexes of the dictionaries, the position `0`.

The variables `word_to_index`, `index_to_word` and `vocabulary` are already loaded in the environment. Also, the variable with the new text is also loaded as `new_text`. The new text has been printed for you to have a look.

### Instructions

- Loop through the list `new_text` containing the sentences.
- Set to `0` the index in case the word is not found in the dictionary.
- Append the sentence with indexes to the variable `new_text_split`.
- Convert the indexes back to text using the dictionary `index_to_word`.

In [None]:
# Loop through the sentences and get indexes
new_text_split = []
for sentence in new_text:
    sent_split = []
    for wd in sentence.split(' '):
        index = word_to_index.get(wd, 0)
        sent_split.append(index)
    new_text_split.append(sent_split)

# Print the first sentence's indexes
print(new_text_split[0])

# Print the sentence converted using the dictionary
print(' '.join([index_to_word[index] for index in new_text_split[0]]))

# (3) Introduction to RNN inside Keras

## What is keras?
- High-level API
- Can use Tensorflow, CNTK or Theano frameworks
- Easy to install and use

```
$pip install keras
```

Fast experimentation:

In [None]:
from keras.models import Sequential
from keras.layers import LSTM, Dense

## Keras.models

<p align='center'>
    <img src='image/Screenshot 2021-02-11 120856.png'>
</p>

## keras.layers
1. `LSTM`
2. `GRU`
3. `Dense`

## keras.preprocessing

`keras.preprocessing.sequence.pad_sequences(texts, maxlen=3)

<p align='center'>
    <img src='image/Screenshot 2021-02-11 121339.png'>
</p>

## keras.datasets
**Many useful datasets**
- IMDB Movie reviews
- Reuters newswire

And more!

## Creating a model

In [None]:
# Import required modules
from keras.models import Sequential
from keras.layers import Dense

In [None]:
# Instantiate the model class
model = Sequential()

In [None]:
# add the layers
model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

In [None]:
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])

## Training the model

The method `.fit()` trains the model on the training set

In [None]:
model.fit(X_train, y_train, epochs=10, batch_siz=32)

1. `epochs` determine how many weight updates will be done on the model
2. `batch_size` size of the data on each step

## Model evaluation and usage
**Evaluation and usage**

In [None]:
model.evaluate(X_test, y_test)

**Make predictions on new data:**

In [None]:
model.predict(new_data)

## Full example: IMDB Sentiment Classification

In [None]:
# Build and compile the model
model = Sequential()
model.add(Embedding(1000, 120))
model.add(LSTM(128, dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
# Training
model.fit(x_train, y_train, epochs=5)

In [None]:
# Evaluation
score, acc = model.evaluate(x_test, y_test)

# Exercise VII: Keras models

In this exercise you'll practice using two classes from the `keras.models` module. You will create one model using the two classes `Sequential` and `Model`.

The `Sequential` class is easier since the layers are assumed to be in order, while the `Model` class is more flexible and allows multiple inputs, multiple outputs and shared layers (shared weights).

The `Model` class needs to explicitly declare the input layer, while in the `Sequential` class, this is done with the `input_shape` parameter.

The objects and modules `Sequential`, `Model`, `Dense`, `Input`, `LSTM` and `np` (`numpy`) are already loaded on the environment.

### Instructions 1/2

- Instantiate the `Sequential` model with name `sequential_model`.
- Add a `LSTM` and a `Dense` layers, and print the summary.

In [None]:
 # Instantiate the class
model = Sequential(name="sequential_model")

# One LSTM layer (defining the input shape because it is the 
# initial layer)
model.add(LSTM(128, input_shape=(None, 10), name="LSTM"))

# Add a dense layer with one unit
model.add(Dense(1, activation="sigmoid", name="output"))

# The summary shows the layers and the number of parameters 
# that will be trained
model.summary()

### Intructions 2/2

- Create an `Input` layer, add `LSTM` and `Dense` layers and store in `main_output`.
- Instantiate the model and print its summary.

In [None]:
# Define the input layer
main_input = Input(shape=(None, 10), name="input")

# One LSTM layer (input shape is already defined)
lstm_layer = LSTM(128, name="LSTM")(main_input)

# Add a dense layer with one unit
main_output = Dense(1, activation="sigmoid", name="output")(lstm_layer)

# Instantiate the class at the end
model = Model(inputs=main_input, outputs=main_output, name="modelclass_model")

# Same amount of parameters to train as before (71,297)
model.summary()

# Exercise VIII: Keras preprocessing

The second most important module of Keras is `keras.preprocessing`. You will see how to use the most important modules and functions to prepare raw data to the correct input shape. Keras provides functionalities that substitute the dictionary approach you learned before.

You will use the module `keras.preprocessing.text.Tokenizer` to create a dictionary of words using the method `.fit_on_texts()` and change the texts into numerical ids representing the index of each word on the dictionary using the method `.texts_to_sequences()`.

Then, use the function `.pad_sequences()` from `keras.preprocessing.sequence` to make all the sequences have the same size (necessary for the model) by adding zeros on the small texts and cutting the big ones.

### Instructions

- Import `Tokenizer` and `pad_sequences` from relevant modules.
- Fit the `tokenizer` object on the sample data stored in `texts`.
- Transform the texts into sequences of numerical indexes using the method `.texts_to_sequences()`.
- Fix the size of the texts by padding them.

In [None]:
# Import relevant classes/functions
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Build the dictionary of indexes
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)

# Change texts into sequence of indexes
texts_numeric = tokenizer.texts_to_sequences(texts)
print("Number of words in the sample texts: ({0}, {1})".format(len(texts_numeric[0]), len(texts_numeric[1])))

# Pad the sequences
texts_pad = pad_sequences(texts_numeric, 60)
print("Now the texts have fixed length: 60. Let's see the first one: \n{0}".format(texts_pad[0]))

# Exercise IX: Your first RNN model

In this exercise you will put in practice the Keras modules to build your first `RNN` model and use it to classify sentiment on movie reviews.

This first model has one recurrent layer with the vanilla `RNN` cell: `SimpleRNN`, and the output layer with two possible values: `0` representing negative sentiment and `1` representing positive sentiment.

You will use the `IMDB` dataset contained in `keras.datasets`. A model was already trained and its weights stored in the file `model_weights.h5`. You will build the model's architecture and use the pre-loaded variables `x_test` and `y_test` to check the its performance.

### Instructions

- Add the `SimpleRNN` cell with `128` units.
- Add a `Dense` layer with one unit for sentiment classification.
- Use the proper loss function for binary classification.
- Evaluate the model on the pre-trained validation set: `(x_test, y_test)`.

In [None]:
# Build model
model = Sequential()
model.add(SimpleRNN(units=128, input_shape=(None, 1)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam',
              metrics=['accuracy'])

# Load pre-trained weights
model.load_weights('model_weights.h5')

# Method '.evaluate()' shows the loss and accuracy
loss, acc = model.evaluate(x_test, y_test, verbose=0)
print("Loss: {0} \nAccuracy: {1}".format(loss, acc))