# Sentiment Analysis Using Transformers

This notebook demonstrates how to perform classification of user-generated content (e.g. product reviews) using transformers. 

### Use Case
We have a lage amount of user-generated content such as product reviews, call transcripts, or social media posts. We want to create a model that assigns labels to individual content items. For example, movie reviews can be assigned with `positive` and `negative` sentiment labels. We assume that we have a significant amount of labeled training data.

### Prototype: Approach and Data
We implement the sentiment classification model using a transformer that is trained from scratch on a labeled dataset. The implementation is based on [1]. 

We use the IMDB dataset that contain 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). The dataset is automatically downloaded using Keras repository.  

### Usage and Productization
This prototype can be used to evaluate how well a model trained from scratch can perform a given text classification task. The default dataset can be easily replaced with custom labeled dataset that contains enough training samples.

In practice, one would typically use a pretrained model or LLM service to perform sentiment analysis and other text classification tasks.

### References
1. https://keras.io/examples/nlp/text_classification_with_transformer/

In [3]:
#
# Imports and settings
#
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np

from matplotlib import pyplot as plt
plt.rcParams.update({'pdf.fonttype': 'truetype'})

In [4]:
#
# Load the dataset
#
vocab_size = 20000  # Only consider the top 20k words
maxlen = 200        # Only consider the first 200 words of each movie review
(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size)
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

#
# Preview the dataset
#
movie_index = 0 
# Retrieve the word index file mapping words to indices
word_index = keras.datasets.imdb.get_word_index()
# Reverse the word index to obtain a dict mapping indices to words
inverted_word_index = dict((i, word) for (word, i) in word_index.items())
# Decode the first sequence in the dataset
decoded_sequence = " ".join(inverted_word_index[i] for i in x_train[movie_index])
sentiment = 'Positive' if y_train[movie_index] == 1 else 'Negative'
print(f'Review [{decoded_sequence}] -> {sentiment}')

#
# Model components
#
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = Sequential(
            [layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)
    
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions
    
#
# Model specification
#
embed_dim = 32  # Embedding size for each token
num_heads = 2   # Number of attention heads
ff_dim = 32     # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)


#
# Model training
#
model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])
history = model.fit(
    x_train, y_train, batch_size=32, epochs=2, validation_data=(x_val, y_val)
)

25000 Training sequences
25000 Validation sequences
Review [to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but pratfalls to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other tricky in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of 'n odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have two of script their with her nobody most t

2022-05-27 05:53:54.519789: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/2
Epoch 2/2


In [27]:
#
# Test scoring
#
x = x_train[movie_index]
y = y_train[movie_index]
class_p = model.predict(np.atleast_2d(x))
decoded_sequence = " ".join(inverted_word_index[i] for i in x)
true_sentiment = 'Positive' if y == 1 else 'Negative'
print(f'Review [{decoded_sequence}] -> {class_p} ({true_sentiment})')

Review [to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but pratfalls to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other tricky in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of 'n odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have two of script their with her nobody most that with wasn't to with armed acting watch an for wi