In [26]:
# run this to shorten the data import from the files
import os
cwd = os.path.dirname(os.getcwd())+'/'
path_data = os.path.join(os.path.dirname(os.getcwd()), 'datasets/')


In [28]:
# load data
import numpy as np

X_train = np.genfromtxt(path_data+'X_train.csv', delimiter=',', skip_header=1)
y_train = np.genfromtxt(path_data+'y_train.csv', delimiter=',', skip_header=1)
X_test = np.genfromtxt(path_data+'X_test.csv', delimiter=',', skip_header=1)
y_test = np.genfromtxt(path_data+'y_test.csv', delimiter=',', skip_header=1)

In [29]:
# exercise 01

"""
Exploding gradient problem

In the video exercise, you learned about two problems that may arise when working with RNN models: the vanishing and exploding gradient problems.

This exercise explores the exploding gradient problem, showing that the derivative of a function can increase exponentially, and how to solve it with a simple technique.

The data is already loaded on the environment as X_train, X_test, y_train and y_test.

You will use a Stochastic Gradient Descent (SGD) optimizer and Mean Squared Error (MSE) as the loss function.

In the first step you will observe the gradient exploding by computing the MSE on the train and test sets. On step 2, you will change the optimizer using the clipvalue parameter to solve the problem.

The Stochastic Gradient Descent in Keras is loaded as SGD.
"""

# Instructions

"""

    Use SGD() as optimizer and (X_test, y_test) as validation data.
    Evaluate train performance and print all the MSE values.
---

    Set the SGD() parameter clipvalue equal to 3.0.
    Compute the MSE values and store them on train_mse and test_mse variables.

"""

# solution

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.initializers import he_uniform
from tensorflow.keras.optimizers import SGD

# Create a Keras model with one hidden Dense layer
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer=he_uniform(seed=42)))
model.add(Dense(1, activation='linear'))

# Compile and fit the model
model.compile(loss='mean_squared_error', optimizer=SGD(learning_rate=0.01, momentum=0.9))
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0)

# See Mean Square Error for train and test data
train_mse = model.evaluate(X_train, y_train, verbose=0)
test_mse = model.evaluate(X_test, y_test, verbose=0)

# Print the values of MSE
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

#----------------------------------#

# Create a Keras model with one hidden Dense layer
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer=he_uniform(seed=42)))
model.add(Dense(1, activation='linear'))

# Compile and fit the model
model.compile(loss='mean_squared_error', optimizer=SGD(learning_rate=0.01, momentum=0.9, clipvalue=3.0))
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0)

# See Mean Square Error for train and test data
train_mse = model.evaluate(X_train, y_train, verbose=0)
test_mse = model.evaluate(X_test, y_test, verbose=0)

# Print the values of MSE
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

#----------------------------------#

# Conclusion

"""
The Exploding gradient problem can happen when using RNN models. Luckily, this can be addressed with simple techniques such as gradient clipping. Notice how after applying this technique, the outputs are no longer NaN, meaning that the gradients didn't 'explode' during Step 2.
"""

Train: nan, Test: nan
Train: 142.074, Test: 137.123


"\nThe Exploding gradient problem can happen when using RNN models. Luckily, this can be addressed with simple techniques such as gradient clipping. Notice how after applying this technique, the outputs are no longer NaN, meaning that the gradients didn't 'explode' during Step 2.\n"

In [None]:
# exercise 02

"""
Vanishing gradient problem

The other possible gradient problem is when the gradients vanish, or go to zero. This is a much harder problem to solve because it is not as easy to detect. If the loss function does not improve on every step, is it because the gradients went to zero and thus didn't update the weights? Or is it because the model is not able to learn?

This problem occurs more often in RNN models when long memory is required, meaning having long sentences.

In this exercise you will observe the problem on the IMDB data, with longer sentences selected. The data is loaded in X and y variables, as well as classes Sequential, SimpleRNN, Dense and matplotlib.pyplot as plt. The model was pre-trained with 100 epochs and its weights are stored on the file model_weights.h5.
"""

# Instructions

"""

    Add a SimpleRNN layer to the model.
    Load the pre-trained weights on the model using the method .load_weights().
    Add the accuracy of the training data available on the attribute 'acc' to the plot.
    Display the plot using the method .show().

"""

# solution

# Create the model
model = Sequential()
model.add(SimpleRNN(units=600, input_shape=(None, 1)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Load pre-trained weights
model.load_weights('model_weights.h5')

# Plot the accuracy x epoch graph
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.legend(['train', 'val'], loc='upper left')
plt.show()

#----------------------------------#

# Conclusion

"""
You can observe that at some point the accuracy stopped to improve, which can happen because of the vanishing gradient problem. This kind of problem is harder to detect than the exploding gradient problem and will demand deeper analysis by the data scientist. Researchers found a model architecture way to solve this problem, which you will study later in this course. Instead of using SimpleRNN cells, you can use the more complex ones such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) cells.
"""

'\n\n'

In [None]:
# exercise 03

"""
GRU cells are better than simpleRNN

In this exercise you will re-run the same model as the first chapter of the course to compare the accuracy of the model by simpling changing the SimpleRNN cell to a GRU cell.

The model was already trained with 10 epochs, as in the previous model with a SimpleRNN cell. In order to compare the models, a test set (x_test, y_test) is already loaded in the environment, as well as the old model SimpleRNN_model.
"""

# Instructions

"""

    Import the GRU cell.
    Print the models' summaries.
    Print the accuracy of each model.

"""

# solution

# Import the modules
from tensorflow.keras.layers import GRU, Dense

# Print the old and new model summaries
SimpleRNN_model.summary()
gru_model.summary()

# Evaluate the models' performance (ignore the loss value)
_, acc_simpleRNN = SimpleRNN_model.evaluate(X_test, y_test, verbose=0)
_, acc_GRU = gru_model.evaluate(X_test, y_test, verbose=0)

# Print the results
print("SimpleRNN model's accuracy:\t{0}".format(acc_simpleRNN))
print("GRU model's accuracy:\t{0}".format(acc_GRU))

#----------------------------------#

# Conclusion

"""
Cool! Just changing the layer you already improved the model!
"""

'\n\n'

In [None]:
# exercise 04

"""
Stacking RNN layers

Deep RNN models can have tens to hundreds of layers in order to achieve state-of-the-art results.

In this exercise, you will get a glimpse of how to create deep RNN models by stacking layers of LSTM cells one after the other.

To do this, you will set the return_sequences argument to True on the firsts two LSTM layers and to False on the last LSTM layer.

To create models with even more layers, you can keep adding them one after the other or create a function that uses the .add() method inside a loop to add many layers with few lines of code.
"""

# Instructions

"""

    Import the LSTM layer.
    Return the sequences in the first two layers and don't return the sequences in the last LSTM layer.
    Load the pre-trained weights.
    Print the loss and accuracy obtained.

"""

# solution

# Import the LSTM layer
from tensorflow.keras.layers import LSTM

# Build model
model = Sequential()
model.add(LSTM(units=128, input_shape=(None, 1), return_sequences=True))
model.add(LSTM(units=128, return_sequences=True))
model.add(LSTM(units=128, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Load pre-trained weights
model.load_weights('lstm_stack_model_weights.h5')

print("Loss: %0.04f\nAccuracy: %0.04f" % tuple(model.evaluate(X_test, y_test, verbose=0)))

#----------------------------------#

# Conclusion

"""
Awesome! Stacking more layers also improve the accuracy of the model when comparing to the baseline 'simple_RNN' model! In the next lesson you will learn what else you can do to improve the model.
"""

'\n\n'

In [None]:
# exercise 05

"""
Number of parameters comparison

You saw that the one-hot representation is not a good representation of words because it is very sparse. Using the Embedding layer creates a dense representation of the vectors, but also demands a lot of parameters to be learned.

In this exercise you will compare the number of parameters of two models using embeddings and one-hot encoding to see the difference.

The model model_onehot is already loaded in the environment, as well as the Sequential, Dense and GRU from keras. Finally, the parameters vocabulary_size=80000 and sentence_len=200 are also loaded.
"""

# Instructions

"""

    Import the Embedding layer from keras.layers.
    On the embedding layer, use vocabulary size plus one as input dimension and sentence size as input length.
    Compile the model.
    Print the summary of the model with embedding.

"""

# solution

# Import the embedding layer
from tensorflow.keras.layers import Embedding

# Create a model with embeddings
model = Sequential(name="emb_model")
model.add(Embedding(input_dim=vocabulary_size + 1, output_dim=wordvec_dim, input_length=sentence_len, trainable=True))
model.add(GRU(128))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Print the summaries of the one-hot model
model_onehot.summary()

# Print the summaries of the model with embeddings
model.summary()

#----------------------------------#

# Conclusion

"""
You can see the immense difference in the number of parameters when using the embedding layer! Don't worry, in the next exercise you will learn how make transfer learning to avoid having to train this layer.
"""

'\n\n'

In [None]:
def load_glove(filename):
    with ZipFile(filename) as myzip:
        with myzip.open('tl_glove_200k.pickle') as myfile:
            glove = pickle.load(myfile)
        
    return glove

In [None]:
# exercise 06

"""
Transfer learning

You saw that when training an embedding layer, you need to learn a lot of parameters.

In this exercise, you will see that when using transfer learning it is possible to use the pre-trained weights and don't update them, meaning that all the parameters of the embedding layer will be fixed, and the model will only need to learn the parameters from the other layers.

The function load_glove is already loaded on the environment and retrieves the glove matrix as a numpy.ndarray vector. It uses the function covered on the lesson's slides to retrieve the glove vectors with 200 embedding dimensions for the vocabulary present in this exercise.
"""

# Instructions

"""

    Use the pre-defined function to load the glove vectors.
    Use the initializer Constant on the pre-trained vectors.
    Add the output layer as a Dense with one unit.
    Print the summary and check the trainable parameters.

"""

# solution

# Load the glove pre-trained vectors
glove_matrix = load_glove('glove_200d.zip')

# Create a model with embeddings
model = Sequential(name="emb_model")
model.add(Embedding(input_dim=vocabulary_size + 1, output_dim=wordvec_dim, 
                    embeddings_initializer=Constant(glove_matrix), 
                    input_length=sentence_len, trainable=False))
model.add(GRU(128))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Print the summaries of the model with embeddings
model.summary()

#----------------------------------#

# Conclusion

"""
As you can see, the total parameters is very big, but the number of parameteres that will be trained is much smaller. The trained vectors already has values for the words, but is equal to a vector of zeros for new words not present in the pre-trained vectors. This can lead to problems if the task at hand is very specific.
"""

'\n\n'

In [None]:
# exercise 07

"""
Embeddings improves performance

Does the embedding layer improves the accuracy of the model? Let's check it out in the same IMDB data.

The model was already trained with 10 epochs, as in the previous model with simpleRNN cell. In order to compare the models, a test set (X_test, y_test) is available in the environment, as well as the old model simpleRNN_model. The old model's accuracy is loaded in the variable acc_SimpleRNN.

All required modules and functions as loaded in the environment: Sequential() from keras.models, Embedding and Dense from keras.layers and SimpleRNN from keras.layers.
"""

# Instructions

"""

    Add the embedding layer to the model.
    Compute the model's accuracy and store on the variable acc_embeddings.
    Print the accuracy of the old and new models.

"""

# solution

# Create the model with embedding
model = Sequential(name="emb_model")
model.add(Embedding(input_dim=max_vocabulary, output_dim=wordvec_dim, input_length=max_len))
model.add(SimpleRNN(units=128))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Load pre-trained weights
model.load_weights('embedding_model_weights.h5')

# Evaluate the models' performance (ignore the loss value)
_, acc_embeddings = model.evaluate(X_test, y_test, verbose=0)

# Print the results
print("SimpleRNN model's accuracy:\t{0}\nEmbeddings model's accuracy:\t{1}".format(acc_simpleRNN, acc_embeddings))

#----------------------------------#

# Conclusion

"""
You can see that the embedding layer greatly improves the accuracy of the model!
"""

'\n\n'

In [None]:
# exercise 08

"""
Better sentiment classification

In this exercise, you go back to the sentiment classification problem seen in Chapter 1.

You are going to add more complexity to the model and improve its accuracy. You will use an Embedding layer to train word vectors on the training set and two LSTM layers to keep track of longer texts. Also, you will add an extra Dense layer before the output.

This is no longer a simple model, and the training can take some time. For this reason, a pre-trained model is available by loading its weights with the method .load_weights() from the keras.models.Sequential class. The model was trained with 10 epochs and its weights are available on the file model_weights.h5.

The following modules are loaded on the environment: Sequential, Embedding, LSTM, Dropout, Dense.
"""

# Instructions

"""

    Add an Embedding layer as the first layer of the model.
    Add a second LSTM layer with 64 units returning the sequences.
    Add an extra Dense layer with 16 units.
    Evaluate the model to print the accuracy on the training set.

"""

# solution

# Build and compile the model
model = Sequential()
model.add(Embedding(vocabulary_size, wordvec_dim, trainable=True, input_length=max_text_len))
model.add(LSTM(64, return_sequences=True, dropout=0.2, recurrent_dropout=0.15))
model.add(LSTM(64, return_sequences=False, dropout=0.2, recurrent_dropout=0.15))
model.add(Dense(16))
model.add(Dropout(rate=0.25))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Load pre-trained weights
model.load_weights('model_weights.h5')

# Print the obtained loss and accuracy
print("Loss: {0}\nAccuracy: {1}".format(*model.evaluate(X_test, y_test, verbose=0)))

#----------------------------------#

# Conclusion

"""
Superb! You just increased the accuracy of your sentiment classification task from poorly 50% to more than 80%, fantastic!
"""

'\n\n'

In [None]:
# exercise 09

"""
Using the CNN layer

In this exercise, you will use a pre-trained model that makes use of the Conv1D and MaxPooling1D layers from the keras.layers.convolutional module, and achieves even better accuracy on the classification task.

This architecture achieved good results in language modeling tasks such as classification, and is added here as an extra exercise to see it in action and have some intuitions.

Because this layer is not in the scope of the course, you will focus on how to use the layers together with the RNN layers you already learned.

Please follow the instructions to see the results.
"""

# Instructions

"""

    Print the model's architecture.
    Load the pre-trained weights.
    Evaluate the model on the test data.

"""

# solution

# Print the model summary
model_cnn.summary()

# Load pre-trained weights
model_cnn.load_weights('model_weights.h5')

# Evaluate the model to get the loss and accuracy values
loss, acc = model_cnn.evaluate(x_test, y_test, verbose=0)

# Print the loss and accuracy obtained
print("Loss: {0}\nAccuracy: {1}".format(loss, acc))

#----------------------------------#

# Conclusion

"""
Congratulations, you achieved very high accuracy on the sentiment classification task! Remark that on the training data the model achieved more than 98% accuracy, and because the accuracy was not in the same level on the test data, you can guess that it had some level of overfitting. It may be because the dataset was not big enough to train the model and some patterns present on the test data weren't present on the train set. Finally, the model can be further extended to have additional layers to achieve even better results, but will also demand more data and computer power.
"""

'\n\n'