# Introduction to NLP

<!--<badge>--><a href="https://colab.research.google.com/github/TheAIDojo/AI_4_Climate_Bootcamp/blob/main/Week 04 - Introduction to Sequence Modelling/1. Introduction to NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><!--</badge>-->

Natural Language Processing (NLP) is a field that focuses on the interaction between computers and human language. It deals with the analysis, understanding and generation of human language in a meaningful and useful manner.

One of the applications of NLP is sentiment analysis, which is the task of determining whether a given piece of text has a positive, negative or neutral sentiment. Another application is machine translation, which is the task of translating one language into another.

TensorFlow and Keras are popular open-source tools used in NLP, which are both highly customizable and flexible. These tools allow you to build and train deep learning models for NLP tasks such as sentiment analysis, text classification and text generation.

Pandas, Matplotlib and Numpy are data analysis libraries that are also commonly used in NLP. They allow you to manipulate and visualize data in a meaningful and useful manner.

For further reading on NLP and its applications, check out the following resources:
- [TensorFlow NLP Tutorials](https://www.tensorflow.org/tutorials/text)
- [Keras NLP Guide](https://keras.io/guides/keras_nlp/getting_started/)

Let's get started!

## Table of Contents <a name="toc"></a>
- [Text Preprocessing](#text-preprocessing)
  - [Text Cleaning](#text-cleaning)
  - [Text Tokenization](#text-tokenization)
  - [Text Padding](#text-padding)
- [Dense Model Training](#dense-model)
- [Word Embeddings](#word-embeddings)
  - [Embedding Model](#embedding-model)
- [Recurrent Neural Networks (RNNs)](#rnn)
  - [Simple RNN Model](#simple-rnn-model)
  - [LSTM Model](#lstm-model)
  - [GRU Model](#gru-model)
  - [Bidirectional RNN Model](#bidirectional-rnn-model)


In [45]:
# import libaries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn import model_selection
from nltk.corpus import (
    stopwords,
)  # stopwords module from NLTK (Natural Language Toolkit)
import re  # built-in regular expression module
import string  # built-in string module

In [None]:
# download stopwords
nltk.download("stopwords")

# Download Dataset from Kaggle

Before running the celll below, make sure you upload your Kaggle API token to the notebook. You can do this by clicking on the "Files" tab on the left, then clicking on the "Upload" button. You can find your Kaggle API token by going to your Kaggle account, clicking on "My Account", then clicking on "Create New API Token". You can then upload the "kaggle.json" file to the notebook.

In [None]:
# upload kaggle.json into this folder before running this command
!mkdir /root/.kaggle
!cp kaggle.json /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
!unzip imdb-dataset-of-50k-movie-reviews.zip

In [27]:
# read dataset from csv file
df = pd.read_csv("IMDB Dataset.csv")
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


## Text Preprocessing <a name="text-preprocessing"></a>
[Back to Top](#toc)

Before training our NLP models, it is necessary to preprocess the text data so that it is in a format that can be easily understood by the models. There are several steps involved in text preprocessing, including text cleanup, tokenization, and padding.

- Text cleanup is the process of removing any unwanted elements from the text, such as punctuation, stop words, numbers, and special characters. This step is important because these elements can often introduce noise into the data and negatively impact the performance of the model. Some common text cleanup tasks include:
  - Removing punctuation
  - Removing stop words
  - Removing numbers
  - Removing special characters
  - Lowercasing the text

- Tokenization is the process of converting a piece of text into individual tokens, which are typically words or phrases. This step is important because it allows the model to work with individual words or phrases rather than with the entire text.

- Padding is the process of adding padding tokens to the text so that all the text sequences have the same length. This step is important because many NLP models require fixed input lengths.

Let's go through them step by step.

In [47]:
# first, we will set some parameters
vocab_size = 8000  # number of words in the vocabulary, we will use the top 8000 most common words
max_length = 120  # maximum length of a review, we will truncate reviews longer than 120 words and pad reviews shorter than 120 words
embedding_dim = 50  # dimension of the embedding vector, we will use 50-dimensional embedding vectors
batch_size = 32  # number of reviews in each batch
seed = 42  # random seed

### Text Cleanup <a name="text-cleanup"></a>
[Back to Top](#toc)

Text cleanup is an important step in text preprocessing that removes any unwanted elements from the text, such as punctuation, stop words, numbers, and special characters. The motivation behind text cleanup is to reduce noise in the data and improve the performance of the model.

There are several tasks involved in text cleanup, including:
- Removing punctuation
- Removing stop words
- Removing numbers
- Removing special characters
- Lowercasing the text
  

It is important to consider which text cleanup tasks are necessary for your specific use case. For example, if you are performing sentiment analysis, it may not be necessary to remove numbers from the text. On the other hand, if you are performing machine translation, it may be necessary to remove special characters from the text.

In the next cells, we will show you how to perform text cleanup in Python using TensorFlow and Keras.

For comprehensive text cleanup functions, check out the following code:
- [English Text Cleanup](https://github.com/jfilter/clean-text/blob/main/cleantext/clean.py)
- [Arabic Text Cleanup](https://github.com/ARBML/tnkeeh/blob/master/tnkeeh/tnkeeh.py)


In [28]:
def text_cleanup(text, remove_stopwords=False):
    # change all text to lowercase
    text = text.lower()

    # remove HTML tags
    text = re.sub(r"<.*?>", "", text)

    # remove numbers
    text = re.sub(r"\d+", "", text)

    # remove words with numbers
    text = re.sub(r"\w*\d\w*", "", text)

    # remove URLs
    text = re.sub(r"http\S+", "", text)

    # remove emails
    text = re.sub(r"\S*@\S*\s?", "", text)

    # remove mentions (@username)
    text = re.sub(r"@\S+", "", text)

    # remove hashtags (#)
    text = re.sub(r"#\S+", "", text)

    # remove Punctuation
    text = re.sub(f"[{re.escape(string.punctuation)}]", " ", text)

    # remove extra spaces
    text = re.sub(r"\s+", " ", text)

    # note that removing stopwords is optional and depends on the task, in some tasks removing stopwords can be beneficial, in other tasks removing stopwords can be harmful.
    # as a rule of thumb, you should try both options and see which one works better for your task.

    if remove_stopwords:
        # remove stopwords (the, a, an, etc.)
        stop_words = set(stopwords.words("english"))
        words = word_tokenize(text)
        words = [word for word in words if not word in stop_words]
        text = " ".join(words)

    return text


# apply text_cleanup function to the reviews
df["review"] = df["review"].map(text_cleanup)
df.head()

Unnamed: 0,review,sentiment
0,one of the other reviewers has mentioned that ...,positive
1,a wonderful little production the filming tech...,positive
2,i thought this was a wonderful way to spend ti...,positive
3,basically there s a family where a little boy ...,negative
4,petter mattei s love in the time of money is a...,positive


### Text Tokenization <a name="tokenization"></a>
[Back to Top](#toc)

Text tokenization is the process of converting a sentence or document into tokens, or meaningful units of words. This is an important step in preparing text data for training machine learning models.

In Keras and TensorFlow, text tokenization can be performed using the Tokenizer class. This class has several important parameters:

- `num_words`: The maximum number of words to keep, based on word frequency. Only the most common `num_words` will be kept, and all other words will be set to an Out-of-Vocabulary (OOV) token.
- `oov_token`: The string that will be used to represent OOV words.
- `filters`: A string of characters to filter out, for example punctuation.
- `lower`: A flag to convert all text to lowercase before tokenization.

Let's see how to use the Tokenizer class to tokenize text in Python.

In [29]:
# create tokenizer object
tokenizer = tf.keras.preprocessing.text.Tokenizer(
    num_words=vocab_size, oov_token="<OOV>"
)

# fit tokenizer on the reviews
tokenizer.fit_on_texts(df["review"])

# preview the word index, notice that the word index is sorted by frequency
word_index = tokenizer.word_index
print({k: word_index[k] for k in list(word_index)[:10]})

{'<OOV>': 1, 'the': 2, 'and': 3, 'a': 4, 'of': 5, 'to': 6, 'is': 7, 'it': 8, 'in': 9, 'i': 10}


In [37]:
# let's see how the tokenizer works
text = "This is a sample text, it contains some words that are repeated, and some words that are not repeated."
sequence = tokenizer.texts_to_sequences([text])
print(sequence)

[[11, 7, 4, 1, 3024, 8, 1372, 49, 663, 12, 26, 2596, 3, 49, 663, 12, 26, 24, 2596]]


In [39]:
# now let's convert the reviews to sequences
sequences = tokenizer.texts_to_sequences(df["review"])

# preview some sequences, notice that the sequences have different lengths
for i in range(5):
    print(sequences[i])

[29, 5, 2, 78, 2040, 47, 1051, 12, 101, 150, 42, 3068, 395, 21, 231, 30, 3173, 33, 26, 204, 15, 11, 7, 615, 48, 591, 18, 69, 2, 88, 149, 12, 3217, 69, 45, 3068, 14, 92, 5323, 3, 1, 136, 5, 561, 62, 266, 9, 204, 38, 2, 649, 142, 1720, 69, 11, 7, 24, 4, 117, 17, 2, 7805, 2309, 41, 1, 11, 117, 2572, 57, 5843, 18, 5465, 6, 1454, 372, 41, 561, 92, 7, 3784, 9, 2, 356, 357, 5, 2, 649, 8, 7, 433, 3068, 15, 12, 7, 2, 1, 358, 6, 2, 1, 6785, 2543, 1032, 1, 8, 2683, 1398, 23, 1, 520, 35, 4638, 2437, 5, 2, 1182, 116, 31, 2, 6924, 28, 2884, 1, 3, 386, 1, 37, 1, 7, 24, 298, 23, 2, 4848, 2914, 520, 7, 342, 6, 108, 1, 1, 1, 1, 4989, 7684, 2425, 3, 53, 37, 1, 325, 1, 7229, 1, 3, 1, 1, 26, 112, 224, 241, 10, 61, 133, 2, 281, 1312, 5, 2, 117, 7, 682, 6, 2, 193, 12, 8, 267, 116, 78, 274, 574, 22, 2993, 819, 183, 1289, 4125, 17, 2473, 1211, 819, 1419, 819, 865, 3068, 153, 22, 940, 185, 2, 88, 395, 10, 124, 210, 3217, 69, 15, 37, 1604, 8, 14, 2220, 10, 412, 22, 133, 10, 14, 1567, 17, 8, 19, 15, 10, 291, 53, 

### Padding in NLP

In NLP, it is common to have sequences of different lengths, for example, in a sentiment analysis task, you may have reviews of different lengths. However, deep learning models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) require input data to have the same length. Therefore, padding is used to make the sequences have the same length.

In TensorFlow and Keras, padding can be done using the `pad_sequences` function from the `keras.preprocessing.sequence` module. The important parameters for this function are:

- `sequences`: A list of sequences, where each sequence is a list of integers.
- `maxlen`: The maximum length of the sequences. If the length of a sequence is greater than `maxlen`, it will be truncated, and if it is smaller, it will be padded with zeros.
- `padding`: The type of padding to use, either `'pre'` or `'post'`. Default is `'pre'`.
- `truncating`: The type of truncating to use, either `'pre'` or `'post'`. Default is `'pre'`.

Here's an example of how to use the `pad_sequences` function:

```python
sequences = [[1, 2, 3, 4, 5, 6], [4, 5], [6]]
padded_sequences = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=5, padding='post')
print(padded_sequences)
```

The output will be

```
[[1 2 3 4 5]
 [4 5 0 0 0]
 [6 0 0 0 0]]
```

In this example, all sequences have been padded to have a length of 5, and the padding was done on the right side (post).







In [43]:
# pad the sequences
padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=max_length, padding="post"
)

# print padded_sequences shape, notice that the shape is uniform (batch_size, max_length)
print("Padded Sequences Shape: ", padded_sequences.shape)

Padded Sequences Shape:  (50000, 120)


### Standard Preprocessing Steps

The following preprocessing steps are common to other classification tasks, we'll encoder the labels and split the data into training and testing sets.

In [48]:
# store padded sequences to `x`
x = padded_sequences

# encode labels to y
y = df["sentiment"].map({"positive": 1, "negative": 0})


# split the data into training and testing sets
x_train, x_test, y_train, y_test = model_selection.train_test_split(
    x, y, test_size=0.1, random_state=seed, stratify=y
)

print("x_train shape: ", x_train.shape)
print("x_test shape: ", x_test.shape)
print("y_train shape: ", y_train.shape)
print("y_test shape: ", y_test.shape)

x_train shape:  (45000, 120)
x_test shape:  (5000, 120)
y_train shape:  (45000,)
y_test shape:  (5000,)


## Dense Model Training <a name="dense-model"></a>
[Back to Top](#toc)

We will train a simple feed-forward (Dense) neural network model to classify the text data. This model will have an input layer, a hidden layer, and an output layer.

In [49]:
dense_model = tf.keras.Sequential(
    [
        tf.keras.layers.Dense(64, activation="relu", input_shape=(max_length,)),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

dense_model.summary()

Metal device set to: Apple M2 Pro

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB



2023-02-04 19:45:47.191124: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-04 19:45:47.191301: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                7744      
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
Total params: 9,857
Trainable params: 9,857
Non-trainable params: 0
_________________________________________________________________


In [50]:
# compile the model
dense_model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# train the model
dense_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)

Epoch 1/10


2023-02-04 19:46:12.419406: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-02-04 19:46:12.567668: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2023-02-04 19:46:21.867471: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x331f435e0>

When using a dense only neural network for sentiment analysis, we are essentially treating the words as numerical values. This means that each word is assigned a unique number, and this number is used to represent the word in the neural network.

The problem with this approach is that words are not equally meaningful. For example, the words "great" and "awful" are highly meaningful in the context of sentiment analysis, while words like "the" or "and" are not.

When using a dense only neural network, all words are treated equally, regardless of their meaning. This means that the network has no way to distinguish between important words and less important words.

As a result, a dense only neural network for sentiment analysis will typically perform at random chance (50% accuracy) when evaluated on a large dataset. This is because the network is unable to learn meaningful relationships between words and sentiments.

To overcome this limitation, we need to use a more advanced approach such as word embeddings, which will allow the network to learn meaningful relationships between words and sentiments.

## Word Embeddings <a name="word-embeddings"></a>
[Back to Top](#toc)

Word embeddings are a way to represent words in a dense vector representation, where semantically similar words have similar vector representations. The motivation behind using word embeddings is that one-hot encoded representations of words can be very sparse and high-dimensional, making it difficult for a neural network to learn from them.

Word embeddings are learned from the text during training and can capture semantic relationships between words. For example, in a word embedding, vectors for words like "king", "queen", "prince", and "princess" might be close together, while vectors for words like "computer" and "book" might be far apart.

In Keras, word embeddings can be learned using the Embedding layer. The Embedding layer takes an integer-encoded vocabulary and maps each word to a dense vector representation. The length of the vector representation is a hyperparameter that can be tuned, and the dimensionality of the vectors will be `(vocabulary size, embedding dimension)`.



### Embedding Model <a name="embedding-model"></a>
[Back to Top](#toc)

We will now train a neural network model that uses word embeddings. This model will have an input layer, an embedding layer, a hidden layer, and an output layer.

In [54]:
embedding_model = tf.keras.Sequential(
    [
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

embedding_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 120, 50)           400000    
                                                                 
 flatten (Flatten)           (None, 6000)              0         
                                                                 
 dense_6 (Dense)             (None, 64)                384064    
                                                                 
 dense_7 (Dense)             (None, 32)                2080      
                                                                 
 dense_8 (Dense)             (None, 1)                 33        
                                                                 
Total params: 786,177
Trainable params: 786,177
Non-trainable params: 0
_________________________________________________________________


In [55]:
# compile the model
embedding_model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# train the model
embedding_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)

Epoch 1/10


2023-02-04 19:53:12.399028: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2023-02-04 19:53:27.742409: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2ee0891c0>


Despite the improved performance that using word embeddings and dense layers in a neural network model can bring to sentiment analysis tasks, there are still limitations that need to be addressed.

A major limitation is that dense layers and embeddings only capture shallow semantic information, meaning that they are not capable of capturing the deeper relationships between words and the context in which they are used. This can be addressed by using other types of layers such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs).

Notice this two sentences:
```
The movie was good, not bad at all.
The movie was bad, not good at all.
```
The first sentence is positive, while the second sentence is negative. However, the words "good" and "bad" are used in the same way in both sentences, and the network has no way of distinguishing between the two since it is only looking at the words individually without considering the context in which they are used.

## Recurrent Neural Networks (RNNs) <a name="rnn"></a>
[Back to Top](#toc)

Recurrent Neural Networks (RNNs) are a type of neural network designed to handle sequential data. They are commonly used in Natural Language Processing (NLP) tasks such as sentiment analysis, machine translation, and text classification.

RNNs are motivated by the fact that traditional neural networks are not well suited for tasks that require processing sequences of data. Unlike feedforward neural networks, RNNs have the ability to retain information from previous time steps and use that information to inform future predictions. This allows them to model the relationships between elements in a sequence.

In an RNN, each unit (neuron) receives input from the previous unit and its own input at the current time step. This allows the network to maintain a hidden state that summarizes the information it has processed so far. This hidden state is updated at each time step and is used as input to the next unit in the sequence.

To use RNNs in Keras, you can use the `SimpleRNN`, `LSTM`, or `GRU` layers. These layers are similar in that they all process sequences of data, but they differ in their internal mechanisms for retaining information from previous time steps.

Note that RNNs expect the input data to be in a specific shape. The input data should be a 3-dimensional array with the shape `(batch size, sequence length, number of features)`. The batch size is the number of samples in the current batch, the sequence length is the number of time steps in each sample (e.g., number of words in a sentences), and the number of features is the number of features in each time step.

You can also stack multiple RNN layers on top of each other to create a deep RNN. This can be done by setting `return_sequences=True` when creating the layer. This will cause the layer to return the full sequence of outputs for each sample instead of just the last output. Note that RNNs always expect a sequence as input.

### Simple RNN Model <a name="simple-rnn-model"></a>
[Back to Top](#toc)

We will now train a neural network model that uses a simple RNN layer.

In [56]:
simple_rnn_model = tf.keras.Sequential(
    [
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.SimpleRNN(64, activation="tanh"),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

simple_rnn_model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 120, 50)           400000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 64)                7360      
                                                                 
 dense_9 (Dense)             (None, 64)                4160      
                                                                 
 dense_10 (Dense)            (None, 32)                2080      
                                                                 
 dense_11 (Dense)            (None, 1)                 33        
                                                                 
Total params: 413,633
Trainable params: 413,633
Non-trainable params: 0
_________________________________________________________________


In [None]:
# compile the model
simple_rnn_model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# train the model
simple_rnn_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)

### LSTM Model <a name="lstm-model"></a>

Recurrent Neural Networks (RNNs) have shown great success in various Natural Language Processing (NLP) tasks such as sentiment analysis, language translation, and text generation. However, traditional RNNs have a limitation in capturing long-term dependencies in sequences, due to the vanishing gradient problem. To overcome this issue, Long Short-Term Memory (LSTM) networks were introduced.

LSTMs were designed to tackle the problem of vanishing gradients by introducing memory cells, gates, and a cell state to control the flow of information in the network. This allows LSTMs to capture long-term dependencies, and make predictions based on both recent and historical context.

In [58]:
lstm_model = tf.keras.Sequential(
    [
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.LSTM(64, activation="tanh"),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

lstm_model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 120, 50)           400000    
                                                                 
 lstm (LSTM)                 (None, 64)                29440     
                                                                 
 dense_12 (Dense)            (None, 64)                4160      
                                                                 
 dense_13 (Dense)            (None, 32)                2080      
                                                                 
 dense_14 (Dense)            (None, 1)                 33        
                                                                 
Total params: 435,713
Trainable params: 435,713
Non-trainable params: 0
_________________________________________________________________


In [59]:
# compile the model
lstm_model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# train the model
lstm_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)

Epoch 10/10


<keras.callbacks.History at 0x331aa2910>

### GRU Model <a name="gru-model"></a>
[Back to Top](#toc)

Gated Recurrent Units (GRUs) are a type of RNN that was introduced in 2014 by Kyunghyun Cho et al. GRUs are similar to LSTMs, but they have a much simpler structure and are computationally less expensive. GRUs have two gates - an update gate and a reset gate - that control the flow of information in the network. The update gate determines how much of the new information from the current input should be added to the hidden state, while the reset gate determines how much of the information from the previous hidden state should be forgotten.

GRUs are computationally less expensive than LSTMs, which makes them more suitable for large-scale NLP tasks. GRUs are also simpler in structure compared to LSTMs, which makes them easier to understand and implement. However, LSTMs have a larger capacity to capture long-term dependencies in sequences compared to GRUs. Therefore, the choice between GRUs and LSTMs depends on the complexity of the NLP task and the computational resources available.

In [60]:
gru_model = tf.keras.Sequential(
    [
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.GRU(64, activation="tanh"),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

gru_model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 120, 50)           400000    
                                                                 
 gru (GRU)                   (None, 64)                22272     
                                                                 
 dense_15 (Dense)            (None, 64)                4160      
                                                                 
 dense_16 (Dense)            (None, 32)                2080      
                                                                 
 dense_17 (Dense)            (None, 1)                 33        
                                                                 
Total params: 428,545
Trainable params: 428,545
Non-trainable params: 0
_________________________________________________________________


In [61]:
# compile the model
gru_model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# train the model
gru_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)



<keras.callbacks.History at 0x2f7edcac0>

### Bidirectional RNN Model <a name="bidirectional-rnn-model"></a>
[Back to Top](#toc)

Bidirectional Recurrent Neural Networks (RNNs) are a variant of RNNs that have the ability to process inputs in both forward and backward directions, allowing the model to effectively capture context from both past and future words in a text sequence.

The motivation behind Bidirectional RNNs is that, in natural language processing tasks, the meaning of a word can be influenced by both the preceding and succeeding words. For example, consider the word "not" in the sentence "I do not like it." The word "not" influences the meaning of the word "like," making it its opposite. A traditional RNN model would only have access to context from preceding words, but a Bidirectional RNN would have access to both the preceding and succeeding words.

In Keras, a Bidirectional RNN can be implemented by wrapping an RNN layer in a Bidirectional layer. For example, if you have a simple LSTM layer defined as `LSTM(64)`, to make it bidirectional, you can wrap it like this: `Bidirectional(LSTM(64))`.

Bidirectional RNNs are often used in NLP tasks such as sentiment analysis, named entity recognition, and machine translation, where capturing context from both past and future words is important. However, they are computationally more expensive compared to traditional RNNs and should be used judiciously based on the size and complexity of the task at hand.

In [62]:
bi_gru_model = tf.keras.Sequential(
    [
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.Bidirectional(tf.keras.layers.GRU(64, activation="tanh")),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

bi_gru_model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_5 (Embedding)     (None, 120, 50)           400000    
                                                                 
 bidirectional (Bidirectiona  (None, 128)              44544     
 l)                                                              
                                                                 
 dense_18 (Dense)            (None, 64)                8256      
                                                                 
 dense_19 (Dense)            (None, 32)                2080      
                                                                 
 dense_20 (Dense)            (None, 1)                 33        
                                                                 
Total params: 454,913
Trainable params: 454,913
Non-trainable params: 0
________________________________________________

In [63]:
# compile the model
bi_gru_model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# train the model
bi_gru_model.fit(
    x_train, y_train, epochs=10, batch_size=batch_size, validation_data=(x_test, y_test)
)

Epoch 1/10


2023-02-04 20:19:46.039900: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:19:46.327065: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:19:46.339504: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:19:47.117449: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:19:47.132845: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2023-02-04 20:21:17.107310: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:21:17.208134: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-04 20:21:17.215530: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2ce62de80>