<a href="https://colab.research.google.com/github/Arun-mac/text-classification-using-transformers/blob/main/implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importing the necessary libraries

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import MultiHeadAttention, LayerNormalization, Dropout, Layer
from tensorflow.keras.layers import Embedding, Input, GlobalAveragePooling1D, Dense
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential, Model
import numpy as np

##Creating Transformer blocks and positional embedding

This section defines two custom layers for building a transformer-based neural network:

**Transformer Block:**


1.   This layer implements a single block of the transformer architecture.
2.   It consists of a multi-head self-attention mechanism (MultiHeadAttention) followed by a feedforward neural network (Sequential of dense layers with ReLU activation).
3. Layer normalization and dropout are applied after each sub-layer.
4. The output is the sum of the input and the result of the feedforward neural network after normalization and dropout.

**Token and Position Embedding:**


1.   This layer combines token embeddings and positional embeddings.
2.   It uses two separate embedding layers (Embedding): one for token embeddings based on the vocabulary size, and the other for positional embeddings based on the maximum sequence length.
3. The positional embeddings are added to the token embeddings.
4. The output is the combination of token and positional embeddings.


In [None]:
class TransformerBlock(Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = Sequential(
            [Dense(ff_dim, activation="relu"),
             Dense(embed_dim),]
        )
        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)
        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

In [None]:
class TokenAndPositionEmbedding(Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

## Preparing the data

Here we load the IMDB movie reviews dataset, limiting the vocabulary to the top 20,000 words and sequences to a maximum length of 400. It splits the data into training and validation sets, printing the respective sequence counts.

In [None]:
vocab_size = 20000
maxlen = 400

(x_train, y_train), (x_val, y_val) = imdb.load_data(num_words=vocab_size)
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

25000 Training sequences
25000 Validation sequences


In [None]:
y_val[:5]

array([0, 1, 1, 0, 1])

In [None]:
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = tf.keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

## Model definition and training

This section defines a transformer-based neural network for sequence classification using Keras. The model includes an embedding layer, a transformer block, global average pooling, dropout for regularization, and dense layers with ReLU activation. The final output layer uses softmax activation for binary classification. The model is designed for tasks like sentiment analysis on sequences with a maximum length of 400 words and a vocabulary size of 20,000.

In [None]:
embed_dim = 32
num_heads = 2
ff_dim = 32

inputs = Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)
x = GlobalAveragePooling1D()(x)
x = Dropout(0.1)(x)
x = Dense(20, activation="relu")(x)
x = Dropout(0.1)(x)
outputs = Dense(2, activation="softmax")(x)

model = Model(inputs=inputs, outputs=outputs)

In [None]:
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

history = model.fit(x_train, y_train,
                    batch_size=64, epochs=2,
                    validation_data=(x_val, y_val)
                   )

Epoch 1/2
Epoch 2/2


In [None]:
model.save_weights("predict_class.h5")

## Model Evaluation


This code evaluates the previously defined transformer-based model on the validation data and prints the results, including metrics such as accuracy and loss.

In [None]:
results = model.evaluate(x_val, y_val, verbose=2)

for name, value in zip(model.metrics_names, results):
    print("%s: %.3f" % (name, value))

782/782 - 5s - loss: 0.2697 - accuracy: 0.8901 - 5s/epoch - 6ms/step
loss: 0.270
accuracy: 0.890
