<a href="https://colab.research.google.com/github/ShreyMhatre/nlp-learning-journey/blob/main/NLP_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Please restart the Colab runtime** by going to "Runtime" -> "Restart runtime" in the menu. After the runtime restarts, run the following cell to install the necessary libraries.

In [2]:
!pip install --quiet tensorflow_datasets==4.9.2 tensorflow==2.18.0 protobuf==3.20.3

In [3]:
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np

# We are going to be training pretty large models. In order not to face errors, we need
# to set tensorflow option to grow GPU memory allocation when required
physical_devices = tf.config.list_physical_devices('GPU')
if len(physical_devices)>0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

ds_train, ds_test = tfds.load('ag_news_subset').values()

In [4]:
batch_size = 16
embed_size = 64

## Simple RNN classifier

While we can pass one-hot encoded tokens to the RNN layer directly, this is not a good idea because of their high dimensionality. Therefore, we will use an embedding layer to lower the dimensionality of word vectors, followed by an RNN layer, and finally a Dense classifier.


> Note: In cases where the dimensionality isn't so high, for example when using character-level tokenization, it might make sense to pass one-hot encoded tokens directly into the RNN cell.





In [5]:
vocab_size = 20000

vectorizer = keras.layers.TextVectorization(
    max_tokens=vocab_size,
    input_shape=(1,))

model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size, embed_size),
    keras.layers.SimpleRNN(16),
    keras.layers.Dense(4,activation='softmax')
])

  super().__init__(name=name, **kwargs)


> Note: We use an untrained embedding layer here for simplicity, but for better results we can use a pretrained embedding layer using Word2Vec, as described in the previous unit. It would be a good exercise for you to adapt this code to work with pretrained embeddings.

In [6]:
def extract_title(x):
    return x['title']

def tupelize_title(x):
    return (extract_title(x),x['label'])

print('Training vectorizer')
vectorizer.adapt(ds_train.take(2000).map(extract_title))

Training vectorizer


In [7]:
def vectorize_text(text, label):
  text = vectorizer(text)
  return text, label

ds_train_vec = ds_train.map(tupelize_title).map(vectorize_text)
ds_test_vec = ds_test.map(tupelize_title).map(vectorize_text)

In [8]:
model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')
model.fit(ds_train_vec.padded_batch(batch_size, padded_shapes=([None], [])),
          validation_data=ds_test_vec.padded_batch(batch_size, padded_shapes=([None], [])))

[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 6ms/step - acc: 0.6926 - loss: 0.8247 - val_acc: 0.7817 - val_loss: 0.6170


<keras.src.callbacks.history.History at 0x7dd380110350>

In [9]:
model.summary()

### Revisiting variable sequences

In [11]:
def extract_text(x):
    return x['title']+' '+x['description']

def tupelize(x):
    return (extract_text(x),x['label'])

model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size,embed_size,mask_zero=True),
    keras.layers.SimpleRNN(16),
    keras.layers.Dense(4,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')
model.fit(ds_train_vec.padded_batch(batch_size, padded_shapes=([None], [])),
          validation_data=ds_test_vec.padded_batch(batch_size, padded_shapes=([None], [])))

[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 6ms/step - acc: 0.7157 - loss: 0.7314 - val_acc: 0.8132 - val_loss: 0.5139


<keras.src.callbacks.history.History at 0x7dd308ea8390>

## LSTM: Long short-term memory

In [16]:
model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size, embed_size),
    keras.layers.LSTM(8),
    keras.layers.Dense(4,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')
model.fit(ds_train_vec.padded_batch(batch_size, padded_shapes=([None], [])),
          validation_data=ds_test_vec.padded_batch(batch_size, padded_shapes=([None], [])))

[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m55s[0m 7ms/step - acc: 0.7011 - loss: 0.7560 - val_acc: 0.8182 - val_loss: 0.4995


<keras.src.callbacks.history.History at 0x7dd305fac550>

## Bidirectional and multilayer RNNs

Keras makes constructing these networks an easy task, because you just need to add more recurrent layers to the model. For all layers except the last one, we need to specify return_sequences=True parameter, because we need the layer to return all intermediate states, and not just the final state of the recurrent computation.

In [17]:
model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size, 128, mask_zero=True),
    keras.layers.Bidirectional(keras.layers.LSTM(64,return_sequences=True)),
    keras.layers.Bidirectional(keras.layers.LSTM(64)),
    keras.layers.Dense(4,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')
model.fit(ds_train_vec.padded_batch(batch_size, padded_shapes=([None], [])),
          validation_data=ds_test_vec.padded_batch(batch_size, padded_shapes=([None], [])))

[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m101s[0m 12ms/step - acc: 0.7486 - loss: 0.6510 - val_acc: 0.8236 - val_loss: 0.4940


<keras.src.callbacks.history.History at 0x7dd3089dbd10>