# Recurrent Neural Networks and Keras

In this chapter, you will learn the foundations of Recurrent Neural Networks (RNN). Starting with some prerequisites, continuing to understanding how information flows through the network and finally seeing how to implement such models with Keras in the sentiment classification task.

# (1) Introduction to the course

## Text data is avaliable online

<img src="image/Screenshot 2021-02-03 135329.png">

## Applications of machine learning to text data
Four applications:

- Sentiment analysis
- Multi-class classification
- Text generation
- Machine neural translation

## Setiment analysis

<img src="image/Screenshot 2021-02-03 135621.png">

## Multi-class classification

<img src="image/Screenshot 2021-02-03 135718.png">

## Text generation

<img src="image/Screenshot 2021-02-03 135759.png">

## Neural machine translation

<img src="image/Screenshot 2021-02-03 135844.png">

## Recurrent Neural Networks

<img src="image/Screenshot 2021-02-03 135932.png">

## Sequence to sequence models
**Many to one: classification**

<img src="image/Screenshot 2021-02-03 140041.png">

**Many to many: text generation**

<img src="image/Screenshot 2021-02-03 140152.png">

**Many to many: neural machine translation**

<img src="image/Screenshot 2021-02-03 140252.png">

**Many to many: language model**

<img src="image/Screenshot 2021-02-03 140416.png">

# Exercise I: Comparing the number of parameter of RNN and ANN

In this exercise, you will compare the number of parameters of an artificial neural network (ANN) with the recurrent neural network (RNN) architectures. Here, the vocabulary size is equal to `10,000` for both models.

The models have been defined for you with similar architectures of only one layer with `256` units (Dense or RNN) plus the output layer. They are stored on variables `ann_model` and `rnn_model`.

Use the method `.summary()` to print the models' architecture and number of parameters and select the correct statement.

### Posible Answers

- The ANN model has more parameters on the second `Dense` layer than the RNN model.

- The RNN model has fewer parameters than the ANN model. (T)

- The RNN model needs to train approximately the same number of parameters as the ANN model.

- The one-hot encoding allows the RNN model to have fewer parameters.

# Exercise II: Sentiment analysis

In the video exercise, you were exposed to the various applications of sequence to sequence models. In this exercise you will see how to use a pre-trained model for sentiment analysis.

The model is pre-loaded in the environment on variable `model`. Also, the tokenized test set variables `X_test` and `y_test` and the pre-processed original text data `sentences` from IMDb are also available.You will learn how to pre-process the text data and how to create and train the model using Keras later in the course.

You will use the pre-trained model to obtain predictions of sentiment. The model returns a number between zero and one representing the probability of the sentence to have a positive sentiment. So, you will create a decision rule to set the prediction to positive or negative.

### Instructions

- Use the `.predict()` method to make predictions on the test data.
- Make the prediction equal to `"positive"` if its value is greater than 0.5 and `"negative"` otherwise and store the result in the `pred_sentiment` variable.
- Create a `pd.DataFrame` containing the pre-processed text, the prediction obtained in the previous step and their true values contained in the `y_test` variable.
- Print the first rows using the `.head()` method.

In [None]:
# Inspect the first sentence on `X_test`
print(X_test[0])

# Get the predicion for all the sentences
pred = model.predict(X_test)

# Transform the predition into positive (> 0.5) or negative (<= 0.5)
pred_sentiment = ["positive" if x>0.5 else "negative" for x in pred]

# Create a data frame with sentences, predictions and true values
result = pd.DataFrame({'sentence': sentences, 'y_pred': pred_sentiment, 'y_true': y_test})

# Print the first lines of the data frame
print(result.head())

# Exercise III: Sequence to sequence models

In the video exercise, you learned about four types of sequence to sequence models: many-to-one (classification) and many-to-many (text generation, neural machine translation and language models). In this exercise, you have to choose the correct type of model given the following problem description:

You are helping your friend who is a specialist in speech recognition. Your friend built a model that can recognize different accents of English, but the model is failing to distinguish homophones - words with the same pronunciation but have different meaning such as "sea" vs "see" or "write" vs "right".

You propose to use a model that will use the context around the words to identify the semantic meaning of the words. By learning the meaning of the words, the new model would avoid outputs like "Did you sea that car?" - it would identify that in this case, the correct word would be "see".

What type of sequence-to-sequence model is appropriate?

### Possible Answers

- Many-to-many, because it is a classification model.

- Many-to-one, because it is a classification model.

- Many-to-many, this problem can be solved with a language model. (T)

- Many-to-one, because it is a prediction problem.


# (2) Intruction to language models

## Sentence probability
Many available models

- Probability of "I loved this movie"
- Unigram
    - $$P(sentence) = P(I) P(loved) P(this) P(movie)$$
- N-gram
    - N = 2 (biagram): $$P(sentense) = P(I) P(loved|I) P(this|loved) P(movie|this)$$
    - N = 3 (trigram): $$P(sentense) = P(I) P(loved|I) P(this|I + loved) P(movie|loved + this)$$