<a href="https://colab.research.google.com/github/axel-sirota/nwm-llm-program/blob/main/Week1/Week1_Homework_Sentiment_Analysis_with_2_Layer_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Sentiment Analysis with a Simple 2-Layer Transformer Encoder

In this exercise, you'll learn how to create and train a simple 2-layer transformer encoder to perform sentiment analysis.
We'll be using the IMDB dataset, which contains movie reviews labeled as positive or negative.

This exercise should take approximately 30 minutes to complete.

## Objectives
- Prepare the IMDB dataset for sentiment analysis using the `TextVectorization` layer.
- Build a simple 2-layer transformer encoder model.
- Train the model on the IMDB dataset.
- Evaluate the model's performance.


In [None]:

import tensorflow as tf
from tensorflow.keras.layers import TextVectorization, Embedding, Dense, LayerNormalization, MultiHeadAttention, Dropout, Layer
from tensorflow.keras.models import Sequential
from tensorflow.keras import Model, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.metrics import BinaryAccuracy
import tensorflow_datasets as tfds

# Set random seed for reproducibility
tf.random.set_seed(42)

# Load the IMDB dataset
(train_data, test_data), info = tfds.load('imdb_reviews', split=['train', 'test'], with_info=True, as_supervised=True)



## Data Preparation

We'll use the `TextVectorization` layer to preprocess the text data, converting it into integer sequences.


In [None]:

# Define constants
max_features = 10000
sequence_length = 250

# Create the TextVectorization layer
vectorize_layer = None  # TODO: Create the TextVectorization layer

# Adapt the layer to the training data
train_text = train_data.map(lambda x, y: x)
# Adapt the vectorize_layer
None

# Prepare the datasets
def vectorize_text(text, label):
    return vectorize_layer(text), label

train_data = train_data.map(vectorize_text).cache().shuffle(10000).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
test_data = test_data.map(vectorize_text).batch(32).prefetch(tf.data.experimental.AUTOTUNE)


In [None]:
for text_batch, label_batch in train_data.take(5):
    print(text_batch.shape, label_batch.shape)


## Model Construction

We'll build a simple 2-layer transformer encoder. The model will consist of an embedding layer, two transformer encoder layers, and a final dense layer for classification.


In [None]:

# Transformer Encoder Layer
class TransformerEncoderLayer(Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = None  # TODO: Create the MultiHeadAttention layer
        self.ffn = None # Create the FFN layer with ff_dim first and then with embed_dim units.
        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)
        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)

    def call(self, inputs, training):
        attn_output = None  # TODO: Pass the inputs through the att layer
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = None  # TODO: Pass the out1 through the ffn layer
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

# Build the model
embed_dim = 32  # Embedding size for each token
num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = Input(shape=(sequence_length,))
x = Embedding(max_features, embed_dim)(inputs)
x = TransformerEncoderLayer(embed_dim, num_heads, ff_dim)(x, training=True)
x = TransformerEncoderLayer(embed_dim, num_heads, ff_dim)(x, training=True)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dropout(0.1)(x)
outputs = Dense(1, activation="sigmoid")(x)

model = Model(inputs=inputs, outputs=outputs)

# Compile the model
model.compile(optimizer=Adam(), loss=BinaryCrossentropy(), metrics=[BinaryAccuracy()])

model.summary()



## Model Training

We'll train the model using the IMDB dataset.


In [None]:

# Train the model
history = None  # TODO: Train the model on the train_data



## Model Evaluation

Finally, we'll evaluate the model's performance on the test set to see how well it has learned to classify sentiment.


In [None]:

# Evaluate the model
loss, accuracy = model.evaluate(test_data)
print(f"Test Accuracy: {accuracy:.2f}")
