<a href="https://colab.research.google.com/github/ashaduzzaman-sarker/Text-classification-Sentiment-Analysis/blob/main/Text_Sentiment_Classification_on_the_IMDb_Dataset_using_FNet_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Classification on the IMDb Dataset using FNet Encoder

## [FNet: Mixing Tokens with Fourier Transforms](https://doi.org/10.48550/arXiv.2105.03824)

The FNet encoder model, introduced by Google Research, is a type of transformer model that replaces the self-attention mechanism with a Fourier Transform. This approach significantly reduces the computational complexity while maintaining competitive performance in various natural language processing tasks. Here's a detailed explanation:

### Key Concepts and Components

![](https://miro.medium.com/v2/resize:fit:1010/1*7ZfynrPBS6jNIu4U49TMCA.png)

1. **Fourier Transform**:
   - The core idea of the FNet model is to use the Fourier Transform to capture interactions between different parts of the input sequence. The Fourier Transform helps convert the sequence data from the time domain to the frequency domain, where the global structure of the data can be analyzed more efficiently.
   - The Fourier Transform in FNet is applied to the input embeddings or the outputs of the previous layer, effectively replacing the self-attention mechanism used in traditional transformers.

2. **Encoder Structure**:
   - Similar to the traditional transformer encoder, the FNet encoder consists of multiple layers. Each layer has two main components:
     1. **Fourier Transform Layer**: This replaces the self-attention mechanism. The layer applies a 2D Fourier Transform to the input sequence.
     2. **Feed-Forward Neural Network (FFN)**: After the Fourier Transform, the output is passed through a position-wise feed-forward neural network, which is the same as in traditional transformers.
   - Each layer also includes layer normalization and residual connections to stabilize training and improve performance.

3. **Advantages**:
   - **Efficiency**: By replacing the self-attention mechanism with a Fourier Transform, the FNet encoder reduces the computational complexity from \(O(n^2)\) (where \(n\) is the sequence length) to \(O(n \log n)\). This makes it much more efficient, especially for long sequences.
   - **Simplicity**: The Fourier Transform is simpler and less resource-intensive than the self-attention mechanism, making the FNet encoder easier to implement and train.

4. **Performance**:
   - Despite its simplicity and efficiency, the FNet encoder has been shown to perform competitively on various NLP benchmarks, demonstrating that the Fourier Transform can capture important relationships in the data without the need for complex attention mechanisms.

### Detailed Workflow

1. **Input Embedding**:
   - The input tokens are first embedded into a continuous vector space, similar to other transformer models.

2. **Fourier Transform Layer**:
   - The embedded sequence is transformed using a 2D Fourier Transform. This step captures interactions between all parts of the sequence efficiently.

3. **Feed-Forward Layer**:
   - The transformed sequence is passed through a feed-forward neural network, which applies a series of linear transformations and non-linear activations.

4. **Layer Normalization and Residual Connections**:
   - Each layer includes layer normalization and residual connections to maintain stability and enhance gradient flow during training.

5. **Output**:
   - The output of the last encoder layer is used for downstream tasks, such as classification, sequence labeling, or language modeling.

### Applications

The FNet encoder can be used in various NLP tasks, including:
- Text classification
- Named entity recognition
- Machine translation
- Text generation

### Conclusion

The FNet encoder model is a significant advancement in the field of NLP, offering a more efficient alternative to the traditional transformer architecture by leveraging the Fourier Transform. It maintains competitive performance while reducing computational requirements, making it a promising approach for applications requiring processing of long sequences.


## Imports

In [1]:
!pip install -q --upgrade keras-nlp
!pip install -q --upgrade keras

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.8/47.8 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m571.8/571.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m601.3/601.3 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m48.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m347.7/347.7 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does no

In [2]:
import keras_nlp
import tensorflow as tf
import keras
import os

keras.utils.set_random_seed(42)

## Define Hyperparameters

In [3]:
BATCH_SIZE = 64
EPOCHS = 3
MAX_SEQUENCE_LENGTH = 512
VOCAB_SIZE = 15000

EMBED_DIM = 128
INTERMEDIATE_DIM = 512

## Loading the IMDB dataset

In [4]:
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -xf aclImdb_v1.tar.gz

--2024-07-31 03:52:33--  https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘aclImdb_v1.tar.gz’


2024-07-31 03:52:36 (29.5 MB/s) - ‘aclImdb_v1.tar.gz’ saved [84125825/84125825]



In [5]:
# Let's inspect the structure of the directory
print(os.listdir('./aclImdb'))
print(os.listdir('./aclImdb/train'))
print(os.listdir('./aclImdb/test'))

['train', 'README', 'imdb.vocab', 'test', 'imdbEr.txt']
['neg', 'urls_neg.txt', 'urls_pos.txt', 'pos', 'labeledBow.feat', 'urls_unsup.txt', 'unsup', 'unsupBow.feat']
['neg', 'urls_neg.txt', 'urls_pos.txt', 'pos', 'labeledBow.feat']


In [6]:
!cat aclImdb/train/pos/11558_10.txt

"The Odd Couple" is one of those movies that far surpasses its reputation. People all know it, they hum the theme song, they complain of living with a sloppy "Oscar" or a fastidious "Felix"...but they're under-selling the film without knowing it. This isn't just about a neat guy living with a sloppy guy; it's a portrait of two friends helping each other through the agony of divorce. It's also damn funny from start to finish, but it's the kind of comedy that arises from realistic, stressful, and just plain awful situations. So, some viewers have actually found the film to be a bit uncomfortable, but I think its verisimilitude is its strength. Besides, Matthau's bulldog face just cracks me up! My favorite comedy, by a country mile.

In [7]:
!cat aclImdb/train/neg/11008_1.txt

I bought this at tower records after seeing the info-mercial about fifteen hundred times on comedy central. I was actually really looking forward to watching this. My god where did i go wrong? Now before i give my review let me just say that i am a person who can pretty much find the good in all movies, hell i own over 1,500 dvd's! With that said, the underground comedy movie ranks up there with the worst film i have EVER seen. I tried to give it a chance, but not only was it not funny. It had no point, did not offend what-so-ever and was all around stupid. God who in their right mind thought these pieces of crap were funny? this is going right to the bottom of the bin...

In [8]:
# Remove the `unsup` folder as it has unlabelled samples
!rm -rf aclImdb/train/unsup

## Split the dataset

In [9]:
# Generate our labelled tf.data.Dataset dataset from text files
train_ds = keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=BATCH_SIZE,
    validation_split=0.2,
    subset='training',
    seed=42
)
val_ds = keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=BATCH_SIZE,
    validation_split=0.2,
    subset='validation',
    seed=42
)
test_ds = keras.utils.text_dataset_from_directory(
    'aclImdb/test',
    batch_size=BATCH_SIZE
)

print(f'Number of batches in train_ds: {train_ds.cardinality()}')
print(f'Number of batches in val_ds: {val_ds.cardinality()}')
print(f'Number of batches in test_ds: {test_ds.cardinality()}')

Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.
Found 25000 files belonging to 2 classes.
Number of batches in train_ds: 313
Number of batches in val_ds: 79
Number of batches in test_ds: 391


In [10]:
# Preview some samples
for text_batch, label_batch in train_ds.take(1):
    for i in range(3):
        print(f'Review {i + 1}: {text_batch.numpy()[i]}')
        print(f'Label {i + 1}: {label_batch.numpy()[i]}')

Review 1: b'An illegal immigrant resists the social support system causing dire consequences for many. Well filmed and acted even though the story is a bit forced, yet the slow pacing really sets off the conclusion. The feeling of being lost in the big city is effectively conveyed. The little person lost in the big society is something to which we can all relate, but I cannot endorse going out of your way to see this movie.'
Label 1: 0
Review 2: b"To get in touch with the beauty of this film pay close attention to the sound track, not only the music, but the way all sounds help to weave the imagery. How beautifully the opening scene leading to the expulsion of Gino establishes the theme of moral ambiguity! Note the way music introduces the characters as we are led inside Giovanna's marriage. Don't expect to find much here of the political life of Italy in 1943. That's not what this is about. On the other hand, if you are susceptible to the music of images and sounds, you will be led in

## Data Preparation

### Standardizating the data

In [11]:
# Convert text to lowercase
train_ds = train_ds.map(lambda x, y: (tf.strings.lower(x), y))
val_ds = val_ds.map(lambda x, y: (tf.strings.lower(x), y))
test_ds = test_ds.map(lambda x, y: (tf.strings.lower(x), y))

In [12]:
# Visualize some samples
for text_batch, label_batch in train_ds.take(1):
    for i in range(3):
        print(f'Review : {text_batch.numpy()[i]}')
        print(f'Label : {label_batch.numpy()[i]}')

Label : 1
Review : b"a different look at horror. the styling differences between american and russian films is interesting. however from my american perspective this movie just wasn't that good. the protagonist, marie played by anastasia hille wasn't a pleasant character and i had a hard time identifying with her. she was disagreeable most of the time and confused for much of what little time was left. also too much time was spent in bringing her to the main location of the film. then a long time passed before any real suspense built up. once that happened it seemed volume was used as the main effect which was more annoying than anything else. the concept was more original than most direct-to-video movies and they didn't use sex to make up for a thin plot. all in all i'd recommend it for renting, but not for theater goers."
Label : 0
Review : b'the movie "atlantis: the lost empire" is a shining gem in the rubble of films produced by the disney studios recently. parents who have had to 

### Tokenizing the data

In [13]:
def train_word_piece(ds, vocab_size, reserved_tokens):
    word_piece_ds = ds.unbatch().map(lambda x, y: x)
    vocab = keras_nlp.tokenizers.compute_word_piece_vocabulary(
        word_piece_ds.batch(1000).prefetch(2),
        vocabulary_size=vocab_size,
        reserved_tokens=reserved_tokens,
    )
    return vocab

In [14]:
# Every vocabulary has a few special, reserved tokens : "[PAD]" - Padding token, "[UNK]" - Unknown token
reserved_tokens = ["[PAD]", "[UNK]"]

# Create the tokenizer
train_sentences = [element[0] for element in train_ds]
vocab = train_word_piece(train_ds, VOCAB_SIZE, reserved_tokens)

# Print some tokens
print(f'Tokens: {vocab[100:110]}')

Tokens: ['à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é']


In [15]:
# Define the tokenizer
tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
    vocabulary=vocab,
    sequence_length=MAX_SEQUENCE_LENGTH,
    lowercase=False,
)

In [16]:
# Let's try and tokenize a sample from our dataset
input_sentence_ex = train_ds.take(1).get_single_element()[0][0]
input_tokens_ex = tokenizer(input_sentence_ex)

print(f'Input sentence: {input_sentence_ex}')
print(f'Tokens: {input_tokens_ex}')
print(f'Recovered text after detokenizing: ', tokenizer.detokenize(input_tokens_ex))

Input sentence: b"prot\xc3\xa9g\xc3\xa9 runs in a linear fashion; expect no fast-paced action, and neither will you find yourself with baited breath because there are simply no seating-on-the-edge moments.<br /><br />there is not much of a crux, so don't expect one either. i would not fault the acting - the show would have been much worst if not for wu's acting which was the film's only saving grace. and, oh that cute little girl too.<br /><br />the humour is at best, weak, and the show must as well pass off as an anti-drug campaign which employs the usual shock-tactic (esp in the scenes with zhang) to tell us stuff that we already know - i.e. drugs break up families, heroin drives you crazy, it is not so easy to wean off, you will fall into a vicious cycle.<br /><br />i know it may seem all a little harsh, but i feel that the show is far from seamless and somewhat patchy (*spoiler alert*: take for example when andy lau got brought to the police station: what? we were just told 'oh we 

### Formatting the dataset

In [17]:
# Tokenize the text
def format_dataset(sentence, label):
    sentence = tokenizer(sentence)
    return ({'input_ids': sentence}, label)

def make_dataset_(dataset):
    dataset = dataset.map(format_dataset, num_parallel_calls=tf.data.AUTOTUNE)
    return dataset.shuffle(512).prefetch(16).cache()

train_ds = make_dataset_(train_ds)
val_ds = make_dataset_(val_ds)
test_ds = make_dataset_(test_ds)

## Building the FNet model
![](https://blog-assets.freshworks.com/freshworks/wp-content/uploads/2023/10/25070756/i-attention-is-not_inline-1_812x400.png)

In [18]:
input_ids = keras.Input(shape=(None,), dtype='int64', name='input_ids')

x = keras_nlp.layers.TokenAndPositionEmbedding(
    vocabulary_size=VOCAB_SIZE,
    sequence_length=MAX_SEQUENCE_LENGTH,
    embedding_dim=EMBED_DIM,
    mask_zero=True,
)(input_ids)

x = keras_nlp.layers.FNetEncoder(intermediate_dim=INTERMEDIATE_DIM)(x)
x = keras_nlp.layers.FNetEncoder(intermediate_dim=INTERMEDIATE_DIM)(x)
x = keras_nlp.layers.FNetEncoder(intermediate_dim=INTERMEDIATE_DIM)(x)

x = keras.layers.GlobalAveragePooling1D()(x)
x = keras.layers.Dropout(0.1)(x)

outputs = keras.layers.Dense(1, activation='sigmoid')(x)

fnet_classifier = keras.Model(inputs=input_ids, outputs=outputs, name='fnet_classifier')




## Training our model

In [25]:
fnet_classifier.summary()

fnet_classifier.compile(
    loss='binary_crossentropy',
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    metrics=['accuracy']
)

history = fnet_classifier.fit(
    train_ds,
    validation_data=val_ds,
    epochs=5,

)

Epoch 1/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 66ms/step - accuracy: 0.9800 - loss: 0.0564 - val_accuracy: 0.8500 - val_loss: 0.5483
Epoch 2/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 45ms/step - accuracy: 0.9839 - loss: 0.0449 - val_accuracy: 0.8604 - val_loss: 0.5723
Epoch 3/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 44ms/step - accuracy: 0.9938 - loss: 0.0196 - val_accuracy: 0.8622 - val_loss: 0.5803
Epoch 4/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 44ms/step - accuracy: 0.9976 - loss: 0.0103 - val_accuracy: 0.8510 - val_loss: 0.6902
Epoch 5/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 44ms/step - accuracy: 0.9984 - loss: 0.0079 - val_accuracy: 0.8438 - val_loss: 0.7566


In [26]:
# Calculate the test accuracy.
test_loss, test_acc = fnet_classifier.evaluate(
    test_ds,
    batch_size=BATCH_SIZE
)

print(f'Test accuracy: {test_acc}')

[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 18ms/step - accuracy: 0.8309 - loss: 0.8576
Test accuracy: 0.8307600021362305


## Comparison with Transformer model

In [None]:
# We set the number of heads to 2 for Transformer classifier model
NUM_HEADS = 2

input_ids = keras.Input(shape=(None,), dtype='int64', name='input_ids')

x = keras_nlp.layers.TokenAndPositionEmbedding(
    vocabulary_size=VOCAB_SIZE,
    sequence_length=MAX_SEQUENCE_LENGTH,
    embedding_dim=EMBED_DIM,
    mask_zero=True,
)(input_ids)

x = keras_nlp.layers.TransformerEncoder(intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS)(x)
x = keras_nlp.layers.TransformerEncoder(intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS)(x)
x = keras_nlp.layers.TransformerEncoder(intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS)(x)

x = keras.layers.GlobalAveragePooling1D()(x)
x = keras.layers.Dropout(0.1)(x)

outputs = keras.layers.Dense(1, activation='sigmoid')(x)

transformer_classifier = keras.Model(inputs=input_ids, outputs=outputs, name='transformer_classifier')


In [22]:
transformer_classifier.summary()

transformer_classifier.compile(
    loss='binary_crossentropy',
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    metrics=['accuracy']
)

history = transformer_classifier.fit(
    train_ds,
    validation_data=val_ds,
    epochs=5,
)

Epoch 1/5




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 140ms/step - accuracy: 0.6644 - loss: 0.6374 - val_accuracy: 0.8788 - val_loss: 0.2999
Epoch 2/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 102ms/step - accuracy: 0.9099 - loss: 0.2343 - val_accuracy: 0.8856 - val_loss: 0.3019
Epoch 3/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 101ms/step - accuracy: 0.9423 - loss: 0.1549 - val_accuracy: 0.8832 - val_loss: 0.3421
Epoch 4/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 110ms/step - accuracy: 0.9599 - loss: 0.1131 - val_accuracy: 0.8556 - val_loss: 0.4909
Epoch 5/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 103ms/step - accuracy: 0.9654 - loss: 0.0918 - val_accuracy: 0.8750 - val_loss: 0.4979


In [23]:
# Calculate the test accuracy.
test_loss, test_acc = transformer_classifier.evaluate(
    test_ds,
    batch_size=BATCH_SIZE
)

print(f'Test accuracy: {test_acc}')

[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 40ms/step - accuracy: 0.8478 - loss: 0.6000
Test accuracy: 0.8485599756240845


## Let's make a table and compare the two models
accuracy: 0.9984
 val_accuracy: 0.8438
  Total params: 7,147,013



|                         | **FNet Classifier** | **Transformer Classifier** |
|:-----------------------:|:-------------------:|:--------------------------:|
|    **Training Time**    |      98 seconds     |         196 seconds        |
|    **Train Accuracy**   |        99.84 %       |           96.45%           |
| **Validation Accuracy** |        84.34%       |           87.50%           |
|    **Test Accuracy**    |        83.07%       |           84.85%           |
|       **#Params**       |      2,321,921      |          2,580,481         |