## Loading and Preprocessing Data

First, we need to load and preprocess the data. For this example, we'll use the IMDB dataset, which consists of movie reviews labeled as positive or negative.

We'll preprocess the data by tokenizing the text and padding the sequences to ensure they have the same length.


In [1]:
# Loading and Preprocessing Data

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the IMDB dataset, keeping only the top 10,000 most frequently occurring words
vocab_size = 10000
max_length = 200
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

# Pad the sequences to ensure they have the same length
x_train = pad_sequences(x_train, maxlen=max_length)
x_test = pad_sequences(x_test, maxlen=max_length)


2024-09-02 12:34:02.787815: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-02 12:34:02.787960: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-02 12:34:02.943692: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


## Building the RNN Model

Next, we'll build a simple RNN model using TensorFlow and Keras. The model consists of:
1. **Embedding Layer:** Converts the input sequences into dense vectors of fixed size.
2. **Recurrent Layer (LSTM):** Processes the input sequences and captures long-term dependencies.
3. **Dense Layer:** Fully connected layer with 1 neuron and sigmoid activation for binary classification.


In [5]:
# Building the RNN Model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, SimpleRNN

# Initialize a Sequential model
model = Sequential()

# Add an embedding layer to convert input sequences into dense vectors of fixed size
model.add(Embedding(input_dim=vocab_size, output_dim=128))

# Add an LSTM layer with 128 units to capture long-term dependencies
model.add(SimpleRNN(128))

# Add a fully connected layer with 1 neuron and sigmoid activation for binary classification
model.add(Dense(1, activation='sigmoid'))


## Compiling the Model

We need to compile the model by specifying the optimizer, loss function, and metrics. We'll use the Adam optimizer, binary crossentropy loss function, and accuracy as the evaluation metric.

- **Optimizer (Adam):** Efficient for training deep learning models.
- **Loss Function (Binary Crossentropy):** Suitable for binary classification tasks.
- **Metrics (Accuracy):** Evaluates the model's performance by calculating the percentage of correctly predicted instances.


In [6]:
# Compiling the Model

# Compile the model by specifying the optimizer, loss function, and metrics
model.compile(optimizer='adam',                        # Adam optimizer
              loss='binary_crossentropy',              # Binary crossentropy loss function for binary classification
              metrics=['accuracy'])                    # Evaluation metric: accuracy


## Training the Model

Now, we'll train the model using the training data. We'll set the number of epochs to 5 and use 20% of the training data for validation.

An epoch is one complete iteration over the entire training data. Validation data is used to evaluate the model's performance on data it hasn't seen during training, helping to detect overfitting.


In [None]:
# Training the Model

# Train the model with the training data
history = model.fit(x_train,                           # Training data
                    y_train,                           # Training labels
                    epochs=3,                          # Number of epochs
                    validation_split=0.2)              # Use 20% of training data for validation


Epoch 1/3
[1m413/625[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m9s[0m 46ms/step - accuracy: 0.8082 - loss: 0.4251

## Evaluating the Model

After training, we can evaluate the model's performance using the test data. We'll measure the test accuracy to see how well the model generalizes to new data.


In [None]:
# Evaluating the Model

# Evaluate the model's performance using the test data
test_loss, test_acc = model.evaluate(x_test,           # Test data
                                     y_test,           # Test labels
                                     verbose=2)        # Verbose output for evaluation
print('\nTest accuracy:', test_acc)                    # Print the test accuracy


## Visualizing Training Results

Let's plot the training and validation accuracy and loss over the epochs to see how the model's performance improved during training.

These plots help in understanding the model's learning process and identifying potential issues like overfitting.


In [None]:
# Visualizing Training Results

import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)                                   # Create subplot for accuracy
plt.plot(history.history['accuracy'])                  # Plot training accuracy
plt.plot(history.history['val_accuracy'])              # Plot validation accuracy
plt.title('Model Accuracy')                            # Title of the plot
plt.ylabel('Accuracy')                                 # Y-axis label
plt.xlabel('Epoch')                                    # X-axis label
plt.legend(['Train', 'Validation'], loc='upper left')  # Legend

# Plot training & validation loss values
plt.subplot(1, 2, 2)                                   # Create subplot for loss
plt.plot(history.history['loss'])                      # Plot training loss
plt.plot(history.history['val_loss'])                  # Plot validation loss
plt.title('Model Loss')                                # Title of the plot
plt.ylabel('Loss')                                     # Y-axis label
plt.xlabel('Epoch')                                    # X-axis label
plt.legend(['Train', 'Validation'], loc='upper left')  # Legend

plt.show()                                             # Display the plots
