# Design Criteria for Sequence Modeling?

In order to model sequences, we need to:
- Handle variable-length sequences
- Track long-term dependencies
- Maintain information about order
- Share parameters across the sequence

**Recurrent Neural Network (RNN) meet these sequence modeling design criteria.**

# Applications of RNN

- Many to One - Sentiment Classification
- One to Many - Text Generation, Image Captioning
- Many to Many - Machine Translation, Forecasting, Music Generation

# RNN Issues

- Exploding Gradients 
  - Use gradient clipping to scale big gradients
- Problem of Long Term dependencies because of vanishing gradient
  - Weight Initialization - Initialize weights to identity matrix
  - Network Architecture - Use gated cells like LSTMs or GRUs. These architecture rely on a gated cell to track information throughout many time steps
  - Activation Function - Using ReLU prevents derivative from shrinking the gradient when x > 0

# Limitations of Recurrent Models

- Encoding bottleneck
- Slow - No parallelization
- Not long memory



In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

## Read the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/text/spam_ham/email_data.csv')

df.head()

Unnamed: 0.1,Unnamed: 0,label,text,label_num
0,605,ham,Subject: enron methanol ; meter # : 988291\r\n...,0
1,2349,ham,"Subject: hpl nom for january 9 , 2001\r\n( see...",0
2,3624,ham,"Subject: neon retreat\r\nho ho ho , we ' re ar...",0
3,4685,spam,"Subject: photoshop , windows , office . cheap ...",1
4,2030,ham,Subject: re : indian springs\r\nthis deal is t...,0


In [None]:
df = df.drop('Unnamed: 0', axis=1)

df.head()

Unnamed: 0,label,text,label_num
0,ham,Subject: enron methanol ; meter # : 988291\r\n...,0
1,ham,"Subject: hpl nom for january 9 , 2001\r\n( see...",0
2,ham,"Subject: neon retreat\r\nho ho ho , we ' re ar...",0
3,spam,"Subject: photoshop , windows , office . cheap ...",1
4,ham,Subject: re : indian springs\r\nthis deal is t...,0


In [None]:
y = df['label_num']
X = df[['text']]

In [None]:
# Splitting into train and test

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
X_train.head()

Unnamed: 0,text
5132,Subject: april activity surveys\r\nwe are star...
2067,Subject: message subject\r\nhey i ' am julie ^...
4716,Subject: txu fuels / sds nomination for may 20...
4710,Subject: re : richardson volumes nov 99 and de...
2268,Subject: a new era of online medical care .\r\...


In [None]:
print(X_train.shape, X_test.shape)

(4136, 1) (1035, 1)


In [None]:
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)

print(keras.__version__)

2.8.2
2.8.0


In [None]:
from keras.utils import np_utils
from keras.models import Sequential 
from keras.layers import Dense, Dropout, BatchNormalization, Flatten

In [None]:
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.recurrent import SimpleRNN, LSTM, GRU
from keras.layers import Bidirectional, Embedding
from keras.preprocessing import sequence, text
from keras.callbacks import EarlyStopping

Credits -  
https://dzlab.github.io/dltips/en/keras/keras-text-preprocessing/

## Preprocessing

1. Tokenization

- Use `fit_on_texts` to update the tokenizer internal vocabulary based on a list of texts.   
Updates internal vocabulary based on a list of texts. This method creates the vocabulary index based on word frequency. So if you give it something like, "The cat sat on the mat." It will create a dictionary s.t. `word_index["the"] = 1; word_index["cat"] = 2` it is word -> index dictionary so every word gets a unique integer value. 0 is reserved for padding. So lower integer means more frequent word (often the first few are stop words because they appear a lot).
- Use `fit_on_sequences` to update the tokenizer internal vocabulary based on a list of sequences.

2. Numericalization

- Use `texts_to_sequences` to transforms each string in a list of strings to sequence of integers  
Transforms each text in texts to a sequence of integers. So it basically takes each word in the text and replaces it with its corresponding integer value from the word_index dictionary. Nothing more, nothing less, certainly no magic involved.
- Use `sequences_to_matrix` to convert a list of sequences into a Numpy matrix


3. Sequence Padding
- You can use `pad_sequences` to add padding to your data so that the result would have same format.


# If RNN can work with variable length sequence, then why `PADDING`?

In [None]:
token = text.Tokenizer()

X_train_values = X_train['text'].tolist()

token.fit_on_texts(list(X_train_values))

X_train_seq = token.texts_to_sequences(X_train_values)

In [None]:
max_seq_len = 0
for seq in X_train_seq:
  if len(seq) > max_seq_len:
    max_seq_len = len(seq)

print(max_seq_len)

5916


In [None]:
max_len = 600

X_train_pad = sequence.pad_sequences(X_train_seq, maxlen=max_len)

print(X_train_pad.shape)

print(X_train_pad[1].shape)

print(len(token.word_index))

(4136, 600)
(600,)
51762


In [None]:
list(token.word_index.items())[:10]

[('\r', 1),
 ('the', 2),
 ('to', 3),
 ('and', 4),
 ('ect', 5),
 ('for', 6),
 ('of', 7),
 ('a', 8),
 ("'", 9),
 ('subject', 10)]

In [None]:
number_of_top_words = len(token.word_index)
embedding_vector_length = 32

# STEP-1 Define the Model
model = None
model = Sequential()

# Embedding Layer
model.add(Embedding(input_dim=number_of_top_words + 1, 
                    output_dim=embedding_vector_length, 
                    input_length=max_len))

# SimpleRNN-100
model.add(SimpleRNN(100))

# FC-1
model.add(Dense(1, activation='relu', kernel_initializer='he_normal'))


# STEP-2 Compile the Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

print(model.summary())

# STEP-3 Fit the Model
model.fit(X_train_pad, y_train, batch_size=64, epochs=5)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 600, 32)           1656416   
                                                                 
 simple_rnn (SimpleRNN)      (None, 100)               13300     
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
Total params: 1,669,817
Trainable params: 1,669,817
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fb380cb8750>

In [None]:
X_test_values = X_test['text'].tolist()

X_test_seq = token.texts_to_sequences(X_test_values)

X_test_pad = sequence.pad_sequences(X_test_seq, maxlen=max_len)

print(X_test_pad.shape)

print(X_test_pad[1].shape)


(1035, 600)
(600,)


In [None]:
prediction_score = model.evaluate(X_test_pad, y_test, verbose=0)

print('Test Loss and Test Accuracy', prediction_score)

Test Loss and Test Accuracy [0.30551135540008545, 0.8599033951759338]


# LSTM

In [None]:
number_of_top_words = len(token.word_index)
embedding_vector_length = 32

# STEP-1 Define the Model
model = None
model = Sequential()

# Embedding Layer
model.add(Embedding(input_dim=number_of_top_words + 1, 
                    output_dim=embedding_vector_length, 
                    input_length=max_len))

# LSTM-100
model.add(LSTM(100, dropout=0.3, recurrent_dropout=0.3))

# FC-1
model.add(Dense(1, activation='relu', kernel_initializer='he_normal'))


# STEP-2 Compile the Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

print(model.summary())

# STEP-3 Fit the Model
model.fit(X_train_pad, y_train, batch_size=64, epochs=5)

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 600, 32)           1656416   
                                                                 
 lstm (LSTM)                 (None, 100)               53200     
                                                                 
 dense_1 (Dense)             (None, 1)                 101       
                                                                 
Total params: 1,709,717
Trainable params: 1,709,717
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fb3022a52d0>

In [None]:
prediction_score = model.evaluate(X_test_pad, y_test, verbose=0)

print('Test Loss and Test Accuracy', prediction_score)

Test Loss and Test Accuracy [0.11802364140748978, 0.9681159257888794]
