# Sentiment Analysis with RNN

Now that we learned how to build and train recurrent neural networks, we will attempt to use them for sentiment analysis, thus combining everything that we have learned this far.

## Data Preprocessing

This part will be the same as previous axercises but try and do it without the help of previous solutions, this way you will understand much better how to work the different steps of the process.

### Import Data 

1. Import the following libraries:

* tensorflow 
* pathlib
* pandas 
* os
* io

In [None]:
# Import Tensorflow & Pathlib librairies
import tensorflow as tf 
import pathlib 
import pandas as pd 
import os
import io
import warnings
warnings.filterwarnings('ignore')

2. Copy the link below and read the file it contains with `pandas`.

* https://go.aws/314bBDq

In [None]:
# Import dataset with Pandas 
dataset = pd.read_csv("https://go.aws/314bBDq", error_bad_lines=False, encoding="utf-8")
dataset.head()

Unnamed: 0,user_id,review,stars,date_format,time_of_day,hour_of_day,day_of_week,review_format,review_lang,month_year,review_len,review_nb_words
0,efb62a167fee5cf3678b24427de8e31f,"Génial, fabuleux, exceptionnel ! J'aimerais qu...",5,2017-09-29 18:17:00,18:17,18,Ven,génial fabuleux exceptionnel j aimerais qu...,french,2017-09,115,19
1,e3be4f9c9e0b9572bfb2a5f88497bb14,,2,2017-09-29 17:29:00,17:29,17,Ven,,,2017-09,0,0
2,1b8e5760162d867e9b9ca80f645bdc60,"Toujours aussi magic, féerique !",5,2017-09-29 16:46:00,16:46,16,Ven,toujours aussi magic féerique,french,2017-09,32,4
3,fa330e5891a1bb486c3e9bf95c098726,,5,2017-09-29 15:52:00,15:52,15,Ven,,,2017-09,0,0
4,c1a693206aee1a2412d4bd9e45b80ec5,,3,2017-09-29 15:29:00,15:29,15,Ven,,,2017-09,0,0


3. We will need the reviews in French. Filter the reviews so that they are in the right language. For this you need to find a column that gives you that information.

In [None]:
# Taking only french reviews
french_reviews = dataset[dataset.review_lang == "french"]
french_reviews.head()

Unnamed: 0,user_id,review,stars,date_format,time_of_day,hour_of_day,day_of_week,review_format,review_lang,month_year,review_len,review_nb_words
0,efb62a167fee5cf3678b24427de8e31f,"Génial, fabuleux, exceptionnel ! J'aimerais qu...",5,2017-09-29 18:17:00,18:17,18,Ven,génial fabuleux exceptionnel j aimerais qu...,french,2017-09,115,19
2,1b8e5760162d867e9b9ca80f645bdc60,"Toujours aussi magic, féerique !",5,2017-09-29 16:46:00,16:46,16,Ven,toujours aussi magic féerique,french,2017-09,32,4
11,726b1a3e2664e8b075129bcd643dbf56,En vacances en région parisienne nous nous som...,2,2017-09-29 00:37:00,00:37,0,Ven,en vacances en région parisienne nous nous som...,french,2017-09,172,25
12,8a71763fbb3da7436b957681b24cc404,Tropbeaufinalpleinlesyeuxoreil,5,2017-09-29 00:16:00,00:16,0,Ven,tropbeaufinalpleinlesyeuxoreil,french,2017-09,30,1
23,ce7abd7798ee036d667c0ad84b85daa7,L'univers Disney reste merveilleux. Toutefois ...,4,2017-09-28 20:24:00,20:24,20,Jeu,l univers disney reste merveilleux toutefois ...,french,2017-09,148,23


4. Keep only the `review_format` & `stars` columns.

In [None]:
# Let's take the columns we're interested in 
french_reviews = french_reviews[["review_format", "stars"]]
french_reviews.head()

Unnamed: 0,review_format,stars
0,génial fabuleux exceptionnel j aimerais qu...,5
2,toujours aussi magic féerique,5
11,en vacances en région parisienne nous nous som...,2
12,tropbeaufinalpleinlesyeuxoreil,5
23,l univers disney reste merveilleux toutefois ...,4


### Preprocessing

We will now go through a preprocessing phase. The goal is to clean up the character strings and encode the words so they are represented as integers.

1. Use the command: `!python -m spacy download fr_core_news_sm` to download all language elements related to the French language

In [None]:
!python -m spacy download fr_core_news_sm -q

[K     |████████████████████████████████| 14.7 MB 5.5 MB/s 
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('fr_core_news_sm')


2. Load now `fr_core_news_sm`

In [None]:
# Import Spacy and french initialisation
import fr_core_news_sm
nlp = fr_core_news_sm.load()

3. Import the french STOP_WORDS

In [None]:
# Import Stop words 
from spacy.lang.fr.stop_words import STOP_WORDS

4. You will now have to clean our texts in order to prepare them for training.
Let's do this in three different steps :
 * using the command `str.isalnum` remove all characters from your strings that are not alphanumeric except for whitespaces.
 * using `str.replace`, `str.lower` and `str.strip` replace double whitespaces with single whitespaces, convert all characters to lowercase and trim starting and finishing whitespaces.
 * using spacy, replace all tokens in your texts with `lemma_` and remove all the stop words.

In [None]:
### 

# DO NOT RUN THIS COMMAND (TAKES TIME)
# rather explain it and import the cleaned dataset in the next cell

###
french_reviews["review_format_clean"] = french_reviews["review_format"].apply(lambda x:''.join(ch for ch in x if ch.isalnum() or ch==" "))
french_reviews["review_format_clean"] = french_reviews["review_format_clean"].apply(lambda x: x.replace(" +"," ").lower().strip())
french_reviews["review_format_clean"] = french_reviews["review_format_clean"].apply(lambda x: " ".join([token.lemma_ for token in nlp(x) if (token.lemma_ not in STOP_WORDS) & (token.text not in STOP_WORDS)]))

french_reviews

Unnamed: 0,review_format,stars,review_format_clean
0,génial fabuleux exceptionnel j aimerais qu...,5,génial fabuleu exceptionnel j aimerai w...
2,toujours aussi magic féerique,5,magic féerique
11,en vacances en région parisienne nous nous som...,2,vacance région parisien décider visiter parc r...
12,tropbeaufinalpleinlesyeuxoreil,5,tropbeaufinalpleinlesyeuxoreil
23,l univers disney reste merveilleux toutefois ...,4,l univers disney merveilleux regrette qu fal...
...,...,...,...
295057,toujours aussi magique même si à la fin du séj...,5,magique fin séjour rotule lol
295549,séjour au top mes enfants les plus heureux ...,5,séjour top enfant heureux vouloir voir per...
298475,magnifique un monde parfait lt,5,magnifique monde parfaire lt
298832,oui j ai aimé car j adore disney et tout ce qu...,4,oui j aimer j adore disney touche univers ...


5. Using `tf.keras.preprocessing.text.Tokenizer` [Tokenizer](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer), process to encoding all the reviews (be careful, some reviews might have been entirely erased, try and understand why, remove those reviews)

When instanciating the tokenizer, make sure you set it up to keep only the 1000 most common words.

In [None]:
#french_reviews.to_csv("french_review_clean.csv", index=False)

In [None]:
french_reviews = pd.read_csv("https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/NLP/french_review_clean.csv")

In [None]:
mask = french_reviews["review_format_clean"].isna()==False
french_reviews = french_reviews[mask]

In [None]:
import numpy as np
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=1000) # instanciate the tokenizer
tokenizer.fit_on_texts(french_reviews["review_format_clean"])
french_reviews["review_encoded"] = tokenizer.texts_to_sequences(french_reviews.review_format_clean)
french_reviews["len_review"] = french_reviews["review_encoded"].apply(lambda x: len(x))
french_reviews = french_reviews[french_reviews["len_review"]!=0]

6. Tensorflow is incapable as of now to create a tensor dataset based on lists of different lengths, we will have to store all of our encoded texts into a single numpy array before creating the tensorflow dataset.
Not all our sequences are the same length, this is where the `tf.keras.preprocessing.sequence.pad_sequences` comes in handy, it will add zero padding at the beginning (`padding="pre"`) or at the end (`padding="post"`) of your sequences so they all have equal length.
Pad the sequences.

In [None]:
reviews_pad = tf.keras.preprocessing.sequence.pad_sequences(french_reviews.review_encoded, padding="post")

7. Now that your sequences are padded, create the tensor dataset, we are going to start by treating the `stars` variable as categorical, so we need to represent it on scale of 0 to 4 (because of the way the `SparseCategoricalCrossentropy` works).
Form the full tensor dataset with these constraints in mind.

In [None]:
full_ds = tf.data.Dataset.from_tensor_slices((reviews_pad, french_reviews.stars.values-1))

8. Do a `train_test_split` of your data (keep about 70% in the train). For this you may use the `.take` and `.skip` methods on the tensorflow dataset.
Once you have done this you may use `.shuffle` on the train set, and `.batch`on both sets to organise them by batches of 64 observations.

* [take documentation](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#take)

* [skip documentation](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#skip)

* [shuffle documentation](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle)

* [batch documentation](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch)



In [None]:
# Train Test Split
TAKE_SIZE = int(0.7*french_reviews.shape[0])

train_data = full_ds.take(TAKE_SIZE).shuffle(TAKE_SIZE)
train_data = train_data.batch(64)

test_data = full_ds.skip(TAKE_SIZE)
test_data = test_data.batch(64)

9. Look at a batch of data 

In [None]:
 # Regardons un batch 
for review, star in train_data.take(1):
  print(review, star)

tf.Tensor(
[[ 37  24 487 ...   0   0   0]
 [162 169   3 ...   0   0   0]
 [  8 278  72 ...   0   0   0]
 ...
 [ 42  37  25 ...   0   0   0]
 [612  16   8 ...   0   0   0]
 [  1  10   1 ...   0   0   0]], shape=(64, 179), dtype=int32) tf.Tensor(
[4 4 3 3 3 4 2 3 4 2 1 4 0 3 4 2 4 3 4 3 4 4 2 3 4 4 4 2 4 0 4 4 3 4 1 4 4
 2 4 3 3 2 1 3 0 4 2 4 4 0 4 0 4 0 4 4 4 3 4 2 4 4 4 1], shape=(64,), dtype=int64)


## Classification Modeling

We'll start by treating the sentiment analysis as a classification problem (this will affect the last layer and the choice of loss function and metric).

### SimpleRNN

1. Follow a similar architecture to the one we used in the code embedding demonstration.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model = tf.keras.Sequential([
                  # Word Embedding layer           
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  # Gobal average pooling
                  SimpleRNN(units=64, return_sequences=True), # maintains the sequential nature
                  SimpleRNN(units=32, return_sequences=False), # returns the last output
                  # Dense layers once the data is flat
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  # output layer with as many neurons as the number of classes
                  # for the target variable and softmax activation
                  Dense(5, activation="softmax")
])

In [None]:
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 179, 64)           657600    
_________________________________________________________________
simple_rnn_6 (SimpleRNN)     (None, 179, 64)           8256      
_________________________________________________________________
simple_rnn_7 (SimpleRNN)     (None, 32)                3104      
_________________________________________________________________
dense_15 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_16 (Dense)             (None, 8)                 136       
_________________________________________________________________
dense_17 (Dense)             (None, 5)                 45        
Total params: 669,669
Trainable params: 669,669
Non-trainable params: 0
________________________________________________

In [None]:
optimizer= tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

3. Using the pandas function `.value_counts` create a dictionnary that assigns to each value of the target variable a weight that is inversely proportionnal to their frequency in the dataset.

In [None]:
(french_reviews["stars"]-1).value_counts()

4    4845
3    1535
2    1008
0     558
1     486
Name: stars, dtype: int64

In [None]:
weights = 1/(french_reviews["stars"]-1).value_counts()
weights = weights * len(french_reviews)/5
weights = {index : values for index , values in zip(weights.index,weights.values)}
weights

{0: 3.022222222222222,
 1: 3.4699588477366254,
 2: 1.6730158730158728,
 3: 1.0986319218241043,
 4: 0.3480701754385965}

4. Fit your model on 20 epochs with weights to penalize too frequent notes. 

In [None]:
# Model training 
model.fit(train_data,
          epochs=20, 
          validation_data=test_data,
          class_weight=weights)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8f6cbab150>

5. Save your model, dowload it and upload it to your S3 in order to be able to retrieve it without having to train it again later on.

In [None]:
model.save("model_simpleRNN.h5")

6. Import the `json` library and save the model history dictionnary (make sure it's a dictionary). Dowload it and save it to your S3.

In [None]:
import json
json.dump(model.history.history, open("/content/simpleRNN_history.json", 'w'))

### GRU

1. Create an object named `model_gru` by replacing the `SimpleRNN` layers by `GRU` layers and replicate the same steps.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model_gru = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  GRU(units=64, return_sequences=True), # maintains the sequential nature
                  GRU(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(5, activation="softmax")
])

In [None]:
model_gru.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 179, 64)           657600    
_________________________________________________________________
gru_4 (GRU)                  (None, 179, 64)           24960     
_________________________________________________________________
gru_5 (GRU)                  (None, 32)                9408      
_________________________________________________________________
dense_18 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_19 (Dense)             (None, 8)                 136       
_________________________________________________________________
dense_20 (Dense)             (None, 5)                 45        
Total params: 692,677
Trainable params: 692,677
Non-trainable params: 0
________________________________________________

In [None]:
optimizer= tf.keras.optimizers.Adam()

model_gru.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

In [None]:
model_gru.fit(train_data,
              epochs=20, 
              validation_data=test_data,
              class_weight=weights)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8fa037cc50>

In [None]:
model_gru.save("model_gru.h5")

In [None]:
import json
json.dump(model_gru.history.history, open("/content/GRU_history.json", 'w'))

### LSTM

1. Create an object named `model_lstm` by replacing the `SimpleRNN` layers by `LSTM` layers and replicate the same steps.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model_lstm = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  LSTM(units=64, return_sequences=True), # maintains the sequential nature
                  LSTM(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(5, activation="softmax", name="last")
])

In [None]:
model_lstm.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 179, 64)           657600    
_________________________________________________________________
lstm_4 (LSTM)                (None, 179, 64)           33024     
_________________________________________________________________
lstm_5 (LSTM)                (None, 32)                12416     
_________________________________________________________________
dense_21 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_22 (Dense)             (None, 8)                 136       
_________________________________________________________________
last (Dense)                 (None, 5)                 45        
Total params: 703,749
Trainable params: 703,749
Non-trainable params: 0
________________________________________________

In [None]:
optimizer= tf.keras.optimizers.Adam()

model_lstm.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

In [None]:
model_lstm.fit(train_data,
              epochs=20, 
              validation_data=test_data,
               class_weight=weights)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8f9d0ca450>

In [None]:
model_lstm.save("model_lstm.h5")

In [None]:
import json
json.dump(model_lstm.history.history, open("/content/LSTM_history.json", 'w'))

## Classification Evaluation

This part will focus on visualizing the training process and interpreting the results for our predictive models.

### SimpleRNN

1. Create a graph showing your loss and validation loss in relation to the number of epochs for the simpleRNN model.

In [None]:
tf.keras.utils.get_file("/content/model_simpleRNN.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_simpleRNN.h5")


'/content/model_simpleRNN.h5'

In [None]:
tf.keras.utils.get_file("/content/simpleRNN_history.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/simpleRNN_history.json")

'/content/simpleRNN_history.json'

In [None]:
simpleRNN_history = json.load(open("/content/simpleRNN_history.json", 'r'))

In [None]:
model_simpleRNN = tf.keras.models.load_model("/content/model_simpleRNN.h5")

In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=simpleRNN_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=simpleRNN_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


It seems that the model is learning a little on the training set but its predictions do not generalize at all on the validation set, the model is failing.

### GRU

In [None]:
tf.keras.utils.get_file("/content/model_gru.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_gru.h5")
tf.keras.utils.get_file("/content/GRU_history.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/GRU_history.json")
GRU_history = json.load(open("/content/GRU_history.json", 'r'))
model_gru = tf.keras.models.load_model("/content/model_gru.h5")


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=GRU_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=GRU_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()

The exact same conclusion can be drawn for the GRU model, theonly difference being that each epoch is 5 times quicker to compute.

### LSTM

In [None]:
tf.keras.utils.get_file("/content/model_lstm.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_lstm.h5")
tf.keras.utils.get_file("/content/LSTM_history.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/LSTM_history.json")
LSTM_history = json.load(open("/content/LSTM_history.json", 'r'))
model_lstm = tf.keras.models.load_model("/content/model_lstm.h5")


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=LSTM_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=LSTM_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


We observe the exact same behaviour that we got from the GRU model.

## Regression Modeling

Since the classification modeling was not a great success, due mostly to the data being highly imbalanced, we'll attemps to treat this problem as a regression, which makes sense because the target variable is qualitative ordinale (this will affect the last layer and the choice of loss function and metric).

### SimpleRNN

1. Reproduce the same steps that we applied for the classification approach, except this time you will need to change the last layer in order to make predictions that fit a regression approach.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model_reg = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  SimpleRNN(units=64, return_sequences=True), # maintains the sequential nature
                  SimpleRNN(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="linear")
])

2. When compiling the model you will have to choose loss and metric functions that are adapted to your regression approach.

In [None]:
optimizer= tf.keras.optimizers.Adam()

model_reg.compile(optimizer=optimizer,
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.MeanAbsoluteError()])

3. Fit the model for 20 epochs.

In [None]:
# Entrainement du modèle 
model_reg.fit(train_data,
              epochs=20, 
              validation_data=test_data)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8f63c92ad0>

4. Save the model and the history dictionary and upload the files to your S3.

In [None]:
model_reg.save("model_simpleRNN_reg.h5")

In [None]:
import json
json.dump(model_reg.history.history, open("/content/simpleRNN_history_reg.json", 'w'))

### GRU

1. Apply the same step, but replace the `SimpleRNN` layers by GRU layers.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model_gru_reg = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  GRU(units=64, return_sequences=True), # maintains the sequential nature
                  GRU(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="linear")
])

In [None]:
optimizer= tf.keras.optimizers.Adam()

model_gru_reg.compile(optimizer=optimizer,
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.MeanAbsoluteError()])

In [None]:
model_gru_reg.fit(train_data,
              epochs=20, 
              validation_data=test_data)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8f622aca90>

In [None]:
model_gru_reg.save("model_gru_reg.h5")

In [None]:
import json
json.dump(model_gru_reg.history.history, open("/content/GRU_history_reg.json", 'w'))

### LSTM

1. Reproduce the same steps, and use `LSTM` layers instead of `GRU` layers.

In [None]:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = 1000
model_lstm_reg = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[review.shape[1],],name="embedding"),
                  LSTM(units=64, return_sequences=True), # maintains the sequential nature
                  LSTM(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="linear", name="last")
])

In [None]:
optimizer= tf.keras.optimizers.Adam()

model_lstm_reg.compile(optimizer=optimizer,
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.MeanAbsoluteError()])

In [None]:
model_lstm_reg.fit(train_data,
              epochs=20, 
              validation_data=test_data)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8f5f7d66d0>

In [None]:
model_lstm_reg.save("model_lstm_reg.h5")

In [None]:
import json
json.dump(model_lstm_reg.history.history, open("/content/LSTM_history_reg.json", 'w'))

## Regression Evaluation

Now it's time to visualize the results we obtained from the training jobs!


### SimpleRNN

1. Visualize the results from the `SimpleRNN` model, what can you conclude?

In [None]:
import json

tf.keras.utils.get_file("/content/model_simpleRNN_reg.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_simpleRNN_reg.h5")
tf.keras.utils.get_file("/content/simpleRNN_history_reg.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/simpleRNN_history_reg.json")
simpleRNN_history_reg = json.load(open("/content/simpleRNN_history_reg.json", 'r'))
model_reg = tf.keras.models.load_model("/content/model_simpleRNN_reg.h5")


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=simpleRNN_history_reg["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=simpleRNN_history_reg["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


The simpleRNN model seem to be learning something on the training data over the course of the first epoch, but then immediatly gets stuck. This may be due to either a vanishing gradient problem (which are quite common when using SimpleRNN layers).

### GRU
1. Visualize the training results for the GRU model, what can you conclude?

In [None]:
tf.keras.utils.get_file("/content/model_gru_reg.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_gru_reg.h5")
tf.keras.utils.get_file("/content/GRU_history_reg.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/GRU_history_reg.json")
GRU_history_reg = json.load(open("/content/GRU_history_reg.json", 'r'))
model_gru_reg = tf.keras.models.load_model("/content/model_gru_reg.h5")


Downloading data from https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_gru_reg.h5
Downloading data from https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/GRU_history_reg.json


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=GRU_history_reg["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=GRU_history_reg["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


The results from the GRU model are actually quite good! The model continuously learns form the training examples, and starts overfitting after epoch number 7. This is a schoolbook example of model training right here!

In addition to this, the MSE on the validation data is around 0.8, which means that on average the prediction error is below 1 point, which is encouraging!

### LSTM

1. Now it's time to visualize the results for the LSTM model.

In [None]:
tf.keras.utils.get_file("/content/model_lstm_reg.h5",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_lstm_reg.h5")
tf.keras.utils.get_file("/content/LSTM_history_reg.json",
                        origin="https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/LSTM_history_reg.json")
LSTM_history_reg = json.load(open("/content/LSTM_history_reg.json", 'r'))
model_lstm_reg = tf.keras.models.load_model("/content/model_lstm_reg.h5")


Downloading data from https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/model_lstm_reg.h5
Downloading data from https://full-stack-assets.s3.eu-west-3.amazonaws.com/models/M08_Deep_learning/Text_classification/LSTM_history_reg.json


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=LSTM_history_reg["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=LSTM_history_reg["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


The results for the LSTM model are disappointing, the model gets immediatly stuck. Although performances of GRU and LSTM layers are mostof the time comparable, in this specific caseit seems that GRU layers handle the gradients better and have an easier time training.