# Assignment 3

In this assignment, we will focus on healthcare. This data set is made available by MIT. It contains data about 9,026 heartbeat measurements. Each row represents a single measurement (captured on a timeline). There are a total of 80 data points (columns). This is a multiclass classification task: predict whether the measurement represents a normal heartbeat or other anomalies. 

## Description of Variables

You will use the **hearbeat_cleaned.csv** data set for this assignment. Each row represents a single measurement. Columns labeled as T1 from T80 are the time steps on the timeline (there are 80 time steps, each time step has only one measurement). 

The last column is the target variable. It shows the label (category) of the measurement as follows:<br>
0 = Normal<br>
1 = Supraventricular premature beat<br>
2 = Premature ventricular contraction<br>
3 = Fusion of ventricular and normal beat<br>
4 = Unclassifiable beat

## Goal

Use the data set **hearbeat_cleaned.csv** to predict the column called **Target**. The input variables are columns labeled as **T1 to T80**. 

## Submission:

Please save and submit this Jupyter notebook file. The correctness of the code matters for your grade. **Readability and organization of your code is also important.** You may lose points for submitting unreadable/undecipherable code. Therefore, use markdown cells to create sections, and use comments where necessary.


# Note:

The data is cleaned up. There are no unqueal length sequences. And, there is no zero padding. So, you shouldn't use any `Masking` layer (like I mentioned in the lecture). 

# Read and Prepare the Data (1 points)

In [1]:
import sklearn.model_selection as model_selection
from sklearn import preprocessing
from sklearn.model_selection import validation_curve
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
import itertools
from sklearn import tree
import os
import pandas as pd
from sklearn.utils import resample
from scipy.stats import norm
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import itertools
from imblearn.over_sampling import SMOTE
from collections import Counter
import keras
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from tensorflow.keras.callbacks import EarlyStopping
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img 
from keras.models import Sequential 
from keras import optimizers
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dropout, Flatten, Dense 
from keras import applications 
from keras.utils.np_utils import to_categorical 
import numpy as np
import tensorflow as tf
from tensorflow import keras
import math 
import datetime
import time

In [3]:
df = pd.read_csv("./heartbeat_cleaned.csv")

In [4]:
df.Target.value_counts()

0    4633
4    1584
2    1237
1     445
3      61
Name: Target, dtype: int64

# Find the baseline (0.5 point)

In [5]:
#baseline
df['Target'].value_counts()/len(df)

0    0.582035
4    0.198995
2    0.155402
1    0.055905
3    0.007663
Name: Target, dtype: float64

In [6]:
train = df.drop(columns = ['Target'])
target = df['Target']

In [7]:
X_train, X_test, y_train, y_test = train_test_split(train,target, test_size=0.3, random_state=42)

In [8]:

# define oversampling strategy
SMOTE = SMOTE()

# fit and apply the transform
X_train_SMOTE, y_train_SMOTE = SMOTE.fit_resample(X_train, y_train)

In [9]:
# summarize class distribution
print("After oversampling: ",Counter(y_train_SMOTE))

After oversampling:  Counter({1: 3217, 0: 3217, 2: 3217, 4: 3217, 3: 3217})


In [10]:
# evaluate the model
def model_eval_results(model, test_x, y_test):
    scores = model.evaluate(test_x, y_test, verbose=0)
    # In results, first is loss, second is accuracy
    # extract the accuracy from model.evaluate

    print("%s: %.2f" % (model.metrics_names[0], scores[0]))
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    return model, scores

### Data Prep for Keras

In [11]:
#Keras
#Keras expects a different input format:
#Data needs to have 3 dimensions
#Convert input variables to a 2-D array with float data type
X_train_SMOTE= np.array(X_train_SMOTE)
X_test= np.array(X_test)
train_x = np.reshape(X_train_SMOTE, (X_train_SMOTE.shape[0], X_train_SMOTE.shape[1], 1))
test_x = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Build a cross-sectional shallow model using Keras (with only one hidden layer) (2 points)

### Approach1

In [14]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

visible = keras.layers.Input(shape=(80,))
hidden1 = keras.layers.Dense(65, activation='relu')(visible)
output = keras.layers.Dense(5, activation='softmax')(hidden1)
model_keras_simple = keras.models.Model(inputs=visible, outputs=output)
#Optimizer:
adam = keras.optimizers.Adam(learning_rate=0.01)

model_keras_simple.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
# Fit the model

history = model_keras_simple.fit(train_x, y_train_SMOTE, epochs=50,
                    validation_data=(test_x, y_test))
model_eval_results(model_keras_simple, test_x, y_test)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
loss: 0.41
accuracy: 90.54%


(<keras.engine.functional.Functional at 0x7fbe2f618e90>,
 [0.4093019366264343, 0.9053601622581482])

### Approach 2


In [15]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
model_keras_simple = keras.models.Sequential()

model_keras_simple.add(keras.layers.Input(shape=80))
model_keras_simple.add(keras.layers.Dense(65, activation='relu'))
model_keras_simple.add(keras.layers.Dense(5, activation='softmax'))
# Compile model

#Optimizer:
adam = keras.optimizers.Adam(learning_rate=0.01)

model_keras_simple.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
# Fit the model

history = model_keras_simple.fit(train_x, y_train_SMOTE, epochs=50,
                    validation_data=(test_x, y_test))
model_eval_results(model_keras_simple, test_x, y_test)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
loss: 0.38
accuracy: 91.33%


(<keras.engine.sequential.Sequential at 0x7fbeb9161790>,
 [0.3778935372829437, 0.9133166074752808])

# Build a cross-sectional deep model using Keras (with two or more hidden layers) (2 points)

### Approach 1

In [18]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

visible = keras.layers.Input(shape=(80,))
hidden1 = keras.layers.Dense(65, activation='relu')(visible)
hidden2 = keras.layers.Dense(65, activation='relu')(hidden1)
output = keras.layers.Dense(5, activation='softmax')(hidden2)
model_keras_deep = keras.models.Model(inputs=visible, outputs=output)
#Optimizer:
adam = keras.optimizers.Adam(learning_rate=0.01)

model_keras_deep.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model
history = model_keras_deep.fit(train_x, y_train_SMOTE, epochs=50,
                    validation_data=(test_x, y_test), callbacks = callback)
model_eval_results(model_keras_deep, test_x, y_test)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 12: early stopping
loss: 0.29
accuracy: 92.21%


(<keras.engine.functional.Functional at 0x7fbe2aad4ad0>,
 [0.2901427447795868, 0.9221105575561523])

### Approach 2


In [17]:
#Define the model: for multi-class
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

#Set the learning rate:
lr=0.001


#Available optimizers:
adagrad = keras.optimizers.Adagrad(learning_rate=lr, epsilon=None, decay=0.0)
sgd = keras.optimizers.SGD(learning_rate=lr, momentum=0.0, decay=0.0, nesterov=False)
rmsprop = keras.optimizers.RMSprop(learning_rate=lr, rho=0.9, epsilon=None, decay=0.0)
adam = keras.optimizers.Adam(learning_rate=lr, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
nesterov_adam = keras.optimizers.Nadam(learning_rate=lr, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)

#Initializations:
xavier = keras.initializers.glorot_normal(seed=None)
he = keras.initializers.he_normal(seed=None)


# Activation functions. Uncomment only one
# activation = 'elu' 
activation = 'relu'
#activation = 'tanh'
#activation = 'sigmoid'



#See the droput layers below:
input1 = keras.layers.Input(shape=80)

hidden1 = keras.layers.Dense(70, activation=activation, kernel_initializer=xavier)(input1)
drop1   = keras.layers.Dropout(0.2)(hidden1)
hidden2 = keras.layers.Dense(65, activation=activation, kernel_initializer=xavier)(drop1)
drop2   = keras.layers.Dropout(0.2)(hidden2)
hidden3 = keras.layers.Dense(65, activation=activation, kernel_initializer=xavier)(drop2)

#final layer: there has to be 5 nodes with softmax (because we have 5 categories)
output = keras.layers.Dense(5, activation='softmax')(hidden3)

#Compile"
model_keras_deep = keras.Model(inputs = input1, outputs = output)
model_keras_deep.compile(loss='sparse_categorical_crossentropy', 
              optimizer=nesterov_adam, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model

history = model_keras_deep.fit(train_x, y_train_SMOTE, 
                    validation_data=(test_x, y_test), 
          epochs=50, callbacks=callback)
model_eval_results(model_keras_deep, test_x, y_test)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 19: early stopping
loss: 0.25
accuracy: 92.34%


(<keras.engine.functional.Functional at 0x7fbe29068e50>,
 [0.24958212673664093, 0.9233668446540833])

### Data Prep for LSTM and GRU

In [19]:
#Data Prep for LSTM and GRU
X_train_lstm,Y_train_lstm =[],[]
x_train_cv=X_train_SMOTE
y_train=np.array(y_train_SMOTE)
for i in range(1,len(x_train_cv)+1):
    X_train_lstm.append(x_train_cv[i-1:i])
    Y_train_lstm.append(y_train[i-1:i])
X_train_lstm=np.array(X_train_lstm)
Y_train_lstm=np.array(Y_train_lstm)
print(X_train_lstm.shape,Y_train_lstm.shape)
X_test_lstm,Y_test_lstm=[],[]
x_test_cv=X_test
y_test=np.array(y_test)
for i in range(1,len(x_test_cv)+1):
    X_test_lstm.append(x_test_cv[i-1:i])
    Y_test_lstm.append(y_test[i-1:i])
X_test_lstm=np.array(X_test_lstm)
Y_test_lstm=np.array(Y_test_lstm)
print(X_test_lstm.shape,Y_test_lstm.shape) 

(16085, 1, 80) (16085, 1)
(2388, 1, 80) (2388, 1)


# Build a sequential shallow LSTM Model (with only one LSTM layer) (2 points)

In [20]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

model_lstm_simple = keras.models.Sequential()

#model.add(keras.layers.Input(shape=80))
model_lstm_simple.add(keras.layers.LSTM(units=80, return_sequences=True, input_shape=(X_train_lstm.shape[1],X_train_lstm.shape[2])))
model_lstm_simple.add(keras.layers.Dropout(0.2))
model_lstm_simple.add(Dense(5, activation='softmax'))
#model_lstm_simple.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Compile model

#Optimizer:
adam = keras.optimizers.Adam(learning_rate=0.01)

model_lstm_simple.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model

history = model_lstm_simple.fit(X_train_lstm, Y_train_lstm, epochs=50,
                    validation_data=(X_test_lstm, Y_test_lstm), callbacks=callback)

model_eval_results(model_lstm_simple, X_test_lstm, Y_test_lstm)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 13: early stopping
loss: 0.26
accuracy: 91.83%


(<keras.engine.sequential.Sequential at 0x7fbe2b232f90>,
 [0.26092979311943054, 0.9183416962623596])

# Build a sequential deep LSTM Model (with only two LSTM layers) (2 points)

In [28]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
model_lstm_deep = keras.models.Sequential()

model_lstm_deep.add(keras.layers.LSTM(units=100,return_sequences=True,input_shape=(X_train_lstm.shape[1],X_train_lstm.shape[2])))
model_lstm_deep.add(keras.layers.Dropout(0.2))
model_lstm_deep.add(keras.layers.LSTM(units=75,return_sequences=True))
model_lstm_deep.add(keras.layers.Dropout(0.1))
model_lstm_deep.add(Dense(5, activation='softmax'))
#model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Compile model

#Optimizer:
adam = keras.optimizers.Adam(learning_rate=0.01)

model_lstm_deep.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model

history = model_lstm_deep.fit(X_train_lstm, Y_train_lstm, epochs=50,
                    validation_data=(X_test_lstm, Y_test_lstm), callbacks=callback)

model_eval_results(model_lstm_deep, X_test_lstm, Y_test_lstm)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 20: early stopping
loss: 0.24
accuracy: 93.05%


(<keras.engine.sequential.Sequential at 0x7fbe25bb8910>,
 [0.2370869219303131, 0.9304857850074768])

# Build a sequential shallow GRU Model (with only one GRU layer) (2 points)

In [22]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
model_gru_simple = keras.models.Sequential()
  
# Defining the cell type
model_gru_simple.add(keras.layers.GRU(80,return_sequences=True, input_shape =(X_train_lstm.shape[1],X_train_lstm.shape[2])))
model_gru_simple.add(keras.layers.Dropout(0.2))
# Defining the densely connected Neural Network layer
# Defining the activation function for the cell
model_gru_simple.add(Dense(5, activation='softmax'))
  
# Defining the optimizing function
optimizer = keras.optimizers.Adam(lr = 0.01)
  
# Configuring the model for training
model_gru_simple.compile(loss ='sparse_categorical_crossentropy', optimizer = optimizer, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model

history = model_gru_simple.fit(X_train_lstm, Y_train_lstm, epochs=50,
                    validation_data=(X_test_lstm, Y_test_lstm), callbacks = callback)

model_eval_results(model_gru_simple, X_test_lstm, Y_test_lstm)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 13: early stopping
loss: 0.33
accuracy: 88.36%


(<keras.engine.sequential.Sequential at 0x7fbe29e77090>,
 [0.3337429463863373, 0.8835846185684204])

# Build a sequential deep GRU Model (with only two GRU layers) (2 points)

In [29]:
# fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
model_gru_deep = keras.models.Sequential()
  
# Defining the cell type
model_gru_deep.add(keras.layers.GRU(80,return_sequences=True, input_shape =(X_train_lstm.shape[1],X_train_lstm.shape[2])))
model_gru_deep.add(keras.layers.Dropout(0.2))
# Defining the densely connected Neural Network layer
model_gru_deep.add(keras.layers.GRU(units=75,return_sequences=True))
model_gru_deep.add(keras.layers.Dropout(0.1))
# Defining the activation function for the cell
model_gru_deep.add(Dense(5, activation='softmax'))
  
# Defining the optimizing function
optimizer = keras.optimizers.Adam(lr = 0.01)
  
# Configuring the model for training
model_gru_deep.compile(loss ='sparse_categorical_crossentropy', optimizer = optimizer, metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]
# Fit the model

history = model_gru_deep.fit(X_train_lstm, Y_train_lstm, epochs=50,
                    validation_data=(X_test_lstm, Y_test_lstm), callbacks = callback)

model_eval_results(model_gru_deep, X_test_lstm, Y_test_lstm)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 19: early stopping
loss: 0.27
accuracy: 91.92%


(<keras.engine.sequential.Sequential at 0x7fbe25b0ad50>,
 [0.26514485478401184, 0.9191792011260986])

# Discussion

## List the test values of each model you built (0.5 points)

In [30]:
predictions_keras_simple = model_keras_simple.predict(X_test)
predictions_keras_simple = np.argmax(predictions_keras_simple, axis=1)
predictions_keras_deep = model_keras_deep.predict(X_test)
predictions_keras_deep = np.argmax(predictions_keras_deep, axis=1)
predictions_lstm_simple = model_lstm_simple.predict(X_test_lstm)
predictions_lstm_simple = list(itertools.chain(*np.argmax(predictions_lstm_simple, axis=2)))
predictions_lstm_deep = model_lstm_deep.predict(X_test_lstm)
predictions_lstm_deep = list(itertools.chain(*np.argmax(predictions_lstm_deep, axis=2)))
predictions_gru_simple = model_gru_simple.predict(X_test_lstm)
predictions_gru_simple = list(itertools.chain(*np.argmax(predictions_gru_simple, axis=2)))
predictions_gru_deep = model_gru_deep.predict(X_test_lstm)
predictions_gru_deep = list(itertools.chain(*np.argmax(predictions_gru_deep, axis=2)))

In [31]:
class_map = {0 : 'Normal',
            1 : 'Supraventricular premature beat',
            2 : 'Premature ventricular contraction',
            3 : 'Fusion of ventricular and normal beat',
            4 : 'Unclassifiable beat'}

### listed test values for each model

In [32]:
print(f'keras simple preds:{list(map(lambda x: class_map[x],predictions_keras_simple))}\n'\
      f'keras deep preds:{list(map(lambda x: class_map[x],predictions_keras_deep))}\n'\
      f'lstm simple preds:{list(map(lambda x: class_map[x],predictions_lstm_simple))}\n'\
      f'lstm deep preds:{list(map(lambda x: class_map[x],predictions_lstm_deep))}\n'\
      f'gru simple preds:{list(map(lambda x: class_map[x],predictions_gru_simple))}\n'\
      f'gru deep preds:{list(map(lambda x: class_map[x],predictions_gru_deep))}\n'\
      )

keras simple preds:['Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Unclassifiable beat', 'Unclassifiable beat', 'Premature ventricular contraction', 'Unclassifiable beat', 'Normal', 'Normal', 'Normal', 'Normal', 'Supraventricular premature beat', 'Premature ventricular contraction', 'Normal', 'Premature ventricular contraction', 'Normal', 'Normal', 'Unclassifiable beat', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Premature ventricular contraction', 'Normal', 'Premature ventricular contraction', 'Unclassifiable beat', 'Normal', 'Premature ventricular contraction', 'Supraventricular premature beat', 'Normal', 'Unclassifiable beat', 'Supraventricular premature beat', 'Normal', 'Premature ventricular contraction', 'Unclassifiable beat', 'Normal', 'Normal', 'Normal', 'Unclassifiable beat', 'Supraventricular premature beat', 'Normal', 'Supraventricular premature beat', 'Premature ventricular contraction', 'Normal', 'Normal', 'Normal', 'Normal', 'Norm

## Which model performs the best and why? (0.5 points) 
## How does it compare to baseline? (0.5 points)

In [None]:
"""keras, GRU, LSTM with 2 layers are better for simpler data like in our case, and they performed better\
  compared to simpler(1 layer) networks
  If the dataset is small, then GRU is preferred otherwise LSTM for the larger dataset.
  GRU exposes the complete memory and hidden layers but LSTM doesn't.
  also,  GRU is 29.29% faster than LSTM for processing the same dataset
  hence as the accuracies are almost similar: GRU/simpler keras is better(faster) because the data we trained is simpler
  and ""