## PART A: Recurrent Neural Network & Classification: 

1. Data Processing: This data set is a bit messy, so the preprocessing portion is largely a tutorial to make sure students have data ready for keras. 

a) Import the following libraries: 

In [None]:
import sys
import os
import json
import pandas
import numpy
import optparse

from keras.callbacks import TensorBoard
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dense, Dropout
from keras.layers import Embedding
from keras.preprocessing import sequence

#%pip install keras

Note: you may need to restart the kernel to use updated packages.


In [7]:
from tensorflow.keras.preprocessing.text import Tokenizer
from collections import OrderedDict

b) We will read the code in slightly differently than before: 

In [8]:
dataframe = pandas.read_csv("dev-access.csv", engine='python', quotechar='|', header=None)

c) We then need to convert to a numpy.ndarray type: 

In [9]:
dataset = dataframe.values

d) Check the shape of the data set - it should be (26773, 2). Spend some time looking at the data. 

In [12]:
print("Shape of the dataset: ", dataset.shape)
print("Number of rows: ", dataset.shape[0])
print("Number of columns: ", dataset.shape[1])

Shape of the dataset:  (26773, 2)
Number of rows:  26773
Number of columns:  2


e) Store all rows and the 0th index as the feature data: 

In [13]:
X = dataset[:,0]

f) Store all rows and index 1 as the target variable: 

In [14]:
Y = dataset[:,1]

g) In the next step, we will clean up the predictors. This includes removing features that are not valuable, such as timestamp and source. 

In [15]:
for index, item in enumerate(X):
    # Quick hack to space out json elements
    reqJson = json.loads(item, object_pairs_hook=OrderedDict)
    del reqJson['timestamp']
    del reqJson['headers']
    del reqJson['source']
    del reqJson['route']
    del reqJson['responsePayload']
    X[index] = json.dumps(reqJson, separators=(',', ':'))

h) We next will tokenize our data, which just means vectorizing our text. Given the data we will tokenize every character (thus char_level = True).

In [16]:
tokenizer = Tokenizer(filters='\t\n', char_level=True)
tokenizer.fit_on_texts(X)

# we will need this later
num_words = len(tokenizer.word_index)+1
X = tokenizer.texts_to_sequences(X)

i) Need to pad our data as each observation has a different length.

In [17]:
max_log_length = 1024
X_processed = sequence.pad_sequences(X, maxlen=max_log_length)

j) Create your train set to be 75% of the data and your test set to be 25%.

In [21]:
train_size = int(len(X_processed) * 0.75)
X_train = X_processed[:train_size]
X_test = X_processed[train_size:]
Y_train = Y[:train_size]
Y_test = Y[train_size:]

print("Shape of the training set: ", X_train.shape)
print("Shape of the test set: ", X_test.shape)

Shape of the training set:  (20079, 1024)
Shape of the test set:  (6694, 1024)


## 2. Model 1 - RNN: The first model will be a pretty minimal RNN with only an embedding layer, simple RNN and Dense layer. The next model we will add a few more layers.

a) Start by creating an instance of a Sequential model: https://keras.io/getting-started/sequential-model-guide/Links to an external site.

In [22]:
model1 = Sequential()

b) From there, add an Embedding layer: https://keras.io/layers/embeddings/Links to an external site.

Params:
- input_dim = num_words (the variable we created above)
- output_dim = 32
- input_length = max_log_length (we also created this above)
- Keep all other variables as the defaults (shown below)

In [23]:
model1.add(Embedding(input_dim=num_words, output_dim=32, input_length=max_log_length))



c) Add a SimpleRNN layer: https://keras.io/layers/recurrent/Links to an external site.

Params:
- units = 32
- activation = 'relu'

In [24]:
model1.add(LSTM(units=32, activation='relu'))

d) Finally, we will add a Dense layer: https://keras.io/layers/core/#denseLinks to an external site.

Params:
- units = 1 (this will be our output)
- activation --> you can choose to use either relu or sigmoid. 

In [25]:
model1.add(Dense(units=1, activation='sigmoid'))

e) Compile model using the .compile() method: https://keras.io/models/model/Links to an external site.

Params:
- loss = binary_crossentropy
- optimizer = adam
- metrics = accuracy

In [26]:
model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

f) Print the model summary.

In [27]:
model1.summary()

g) Use the .fit() method to fit the model on the train data. Use a validation split of 0.25, epochs=3 and batch size = 128.

In [29]:
Y_train = Y_train.astype('int')
Y_test = Y_test.astype('int')

history1 = model1.fit(
    X_train, Y_train,
    validation_split=0.25,
    epochs=3,
    batch_size=128
)

Epoch 1/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 194ms/step - accuracy: 0.6016 - loss: nan - val_accuracy: 0.3400 - val_loss: nan
Epoch 2/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 199ms/step - accuracy: 0.5876 - loss: nan - val_accuracy: 0.3400 - val_loss: nan
Epoch 3/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 198ms/step - accuracy: 0.5855 - loss: nan - val_accuracy: 0.3400 - val_loss: nan


h) Use the .evaluate() method to get the loss value & the accuracy value on the test data. Use a batch size of 128 again.

In [30]:
loss1, accuracy1 = model1.evaluate(X_test, Y_test, batch_size=128)
print(f'Test Loss: {loss1:.4f}')
print(f'Test Accuracy: {accuracy1:.4f}')

[1m53/53[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 51ms/step - accuracy: 0.6029 - loss: nan
Test Loss: nan
Test Accuracy: 0.4390


## 3) Model 2 - LSTM + Dropout Layers:

Now we will add a few new layers to our RNN and incorporate the more powerful LSTM. You will be creating a new model here, so make sure to call it something different than the model from Part 2.

a) This RNN needs to have the following layers (add in this order):

- Embedding Layer (use same params as before)
- LSTM Layer (units = 64, recurrent_dropout = 0.5)
- Dropout Layer - use a value of 0.5
- Dense Layer - (use same params as before)

In [31]:
model2 = Sequential()

In [32]:
model2.add(Embedding(input_dim=num_words, output_dim=32, input_length=max_log_length))



In [33]:
model2.add(LSTM(units=64, recurrent_dropout=0.5))

In [34]:
model2.add(Dropout(0.5))

In [35]:
model2.add(Dense(units=1, activation='sigmoid'))

b) Compile model using the .compile() method:

Params:
- loss = binary_crossentropy
- optimizer = adam
- metrics = accuracy

In [36]:
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

c) Print the model summary.

In [37]:
model2.summary()

d) Use the .fit() method to fit the model on the train data. Use a validation split of 0.25, epochs=3 and batch size = 128.

In [38]:
history2 = model2.fit(
    X_train, Y_train,
    validation_split=0.25,
    epochs=3,
    batch_size=128
)

Epoch 1/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 532ms/step - accuracy: 0.6424 - loss: 0.6374 - val_accuracy: 0.8713 - val_loss: 0.3173
Epoch 2/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 547ms/step - accuracy: 0.9039 - loss: 0.3064 - val_accuracy: 0.9556 - val_loss: 0.1492
Epoch 3/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 543ms/step - accuracy: 0.9436 - loss: 0.2058 - val_accuracy: 0.9612 - val_loss: 0.1377


e) Use the .evaluate() method to get the loss value & the accuracy value on the test data. Use a batch size of 128 again.

In [39]:
loss2, accuracy2 = model2.evaluate(X_test, Y_test, batch_size=128)
print(f'Test Loss: {loss2:.4f}')
print(f'Test Accuracy: {accuracy2:.4f}')

[1m53/53[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 118ms/step - accuracy: 0.9561 - loss: 0.1700
Test Loss: 0.1642
Test Accuracy: 0.9517


## 4) Recurrent Neural Net Model 3: Build Your Own

You will now create your RNN based on what you have learned from Model 1 & Model 2:

a) RNN Requirements:
- Use 5 or more layers
- Add a layer that was not utilized in Model 1 or Model 2 (Note: This could be a new Dense layer or an additional LSTM).

In [40]:
model3 = Sequential()

In [41]:
# Layer 1: Embedding layer
model3.add(Embedding(input_dim=num_words, output_dim=64, input_length=max_log_length))

# Layer 2: LSTM layer with return sequences for stacking another LSTM
model3.add(LSTM(units=128, return_sequences=True, activation='tanh'))

# Layer 3: Dropout layer
model3.add(Dropout(0.3))

# Layer 4: Second LSTM layer
model3.add(LSTM(units=64))

# Layer 5: Dropout layer
model3.add(Dropout(0.3))

# Layer 6: Dense layer with relu activation
model3.add(Dense(units=32, activation='relu'))

# Layer 7: Dropout layer
model3.add(Dropout(0.2))

# Layer 8: Output layer
model3.add(Dense(units=1, activation='sigmoid'))


b) Compiler Requirements:
- Try a new optimizer for the compile step
- Keep accuracy as a metric (feel free to add more metrics if desired)

In [42]:
model3.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

c) Print the model summary.

In [43]:
model3.summary()

d) Use the .fit() method to fit the model on the train data. Use a validation split of 0.25, epochs=3 and batch size = 128.

In [45]:
history3 = model3.fit(
    X_train, Y_train,
    validation_split=0.25,
    epochs=3,
    batch_size=128
)

Epoch 1/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m323s[0m 3s/step - accuracy: 0.7523 - loss: 0.5271 - val_accuracy: 0.9404 - val_loss: 0.1819
Epoch 2/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m319s[0m 3s/step - accuracy: 0.9198 - loss: 0.2612 - val_accuracy: 0.9793 - val_loss: 0.0837
Epoch 3/3
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m324s[0m 3s/step - accuracy: 0.9527 - loss: 0.1640 - val_accuracy: 0.9018 - val_loss: 0.2969


e) Use the .evaluate() method to get the loss value & the accuracy value on the test data. Use a batch size of 128 again.

In [46]:
loss3, accuracy3 = model3.evaluate(X_test, Y_test, batch_size=128)
print(f'Test Loss: {loss3:.4f}')
print(f'Test Accuracy: {accuracy3:.4f}')

[1m53/53[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 698ms/step - accuracy: 0.8301 - loss: 0.5099
Test Loss: 0.3258
Test Accuracy: 0.8905


In [47]:
print("\nModel Comparison:")
print(f"Model 1 - Simple RNN: Loss = {loss1:.4f}, Accuracy = {accuracy1:.4f}")
print(f"Model 2 - LSTM+Dropout: Loss = {loss2:.4f}, Accuracy = {accuracy2:.4f}")
print(f"Model 3 - Custom RNN: Loss = {loss3:.4f}, Accuracy = {accuracy3:.4f}")


Model Comparison:
Model 1 - Simple RNN: Loss = nan, Accuracy = 0.4390
Model 2 - LSTM+Dropout: Loss = 0.1642, Accuracy = 0.9517
Model 3 - Custom RNN: Loss = 0.3258, Accuracy = 0.8905


## Conceptual Questions: 

5) Explain the difference between the relu activation function and the sigmoid activation function.

_The ReLU activation function just outputs the input if it’s positive, and 0 if it’s negative. So it’s like:_
_ReLU(x) = max(0, x)_

_The sigmoid function squishes the input into a range between 0 and 1 using a curve that looks like an “S”:_
_Sigmoid(x) = 1 / (1 + e^(-x))_

_ The main differences are:_
_1. ReLU is super simple and fast, and it doesn’t squish big values down._
_2. Sigmoid is good when we need outputs between 0 and 1 like probabilities, but it can cause the vanishing gradient problem in deep networks, making learning slower._

6) Describe what one epoch actually is (epoch was a parameter used in the .fit() method).

_An epoch is one complete pass through the entire training dataset._

_For example, if I have 1,000 examples, and I run fit with 5 epochs, it means the model will see all 1.000 examples 5 times during training just in different shuffled orders each time._

7) Explain how dropout works (you can look at the keras code and/or documentation) for (a) training, and (b) test data sets.

_Dropout is a way to prevent overfitting by randomly turning off some neurons during training._

_During training dropout randomly sets some neurons outputs to 0. This forces the model to not depend too much on any one neuron and helps it generalize better._

_During testing the dropout is turned off and all neurons are used. But the output is scaled to make up for the fact that no neurons are dropped._

8) Explain why problems such as this homework assignment are better modeled with RNNs than CNNs. What type of problem will CNNs outperform RNNs on?

###......

_CNNs are better for images or spatial data, where local patterns like edges or shapes are more important than the order of the input. So CNNs will usually outperform RNNs on problems like image classification or object detection._

9) Explain what RNN problem is solved using LSTM and briefly describe how.

_RNNs can have trouble remembering things from far back in the sequence. This is called the vanishing gradient problem, it means the memory kind of fades as we go deeper into the sequence like in very large sentences._

_LSTMs fix this by adding a memory cell that can store information for a long time. It uses gates to decide:_

_- What to keep_

_- What to throw away_

_- And what new stuff to add_

_This helps the model remember important info from earlier in the sequence, which regular RNNs often forgets._