# **FINE-TUNING BERT FOR SENTIMENT ANALYSIS ON IMDB MOVIE REVIEWS: A METHODOLOGICAL APPROACH**

**1. Introduction**
LLMs, including BERT, continue to revolutionize the domain of NLP and, generally, natural language understanding. BERT, which is developed by Google, uses a transformer architecture for pre-training of deep bidirectional representations (Roccabruna et al., 2022). This means that it is able to the left and thereby improving its chances of comprehending complex contextual meanings from a word. The main focus of this task is to perform sentiment analysis on the movie review data taken from IMDb using the IMDb movie reviews dataset, in which the model should work as a BERT-based model to understand the content of the text and be able to classify its sentiment accurately.



**2. Literature review **
Prottasha et al., (2022) explored that the quantity of information provided by users through a variety of sources has increased as a result of the Internet connection (Nguyen et al., 2020). Therefore, it should come as no surprise that significant plurality in terms of life experiences and worldviews is advantageous for sentiment analysis. However, the primary problem facing the Bangla natural language processing (NLP) community, which include sentiment analysis, is the absence of labeled data (Geetha and Renuka, 2021). Deep learning methods such as Word2Vec, GloVe, and fastText, which focus primarily on context-free word vectors, are used in much of the Bangla literature. All the same, these models provide every word its own distinct vector. NLP has dramatically improved in recent years, largely because to pre-trained language models like Bert.
Nugroho et al., (2021) researched that the one of the key measures to evaluate the effectiveness of mobile apps is the feedback of their users (Pota et al., 2022). Owing to their high complexity levels, textual user evaluations pose large challenges for sentiment analysis because they are within the unstructured data (Li et al., 2021). Prior methods do not take care of contextual information, as found in the reviews. Furthermore, due to the small amount of data, the model gets overfit (Pota et al., 2020). More specifically, BERT is a novel approach to transfer learning that amplifies the negative impact of contextual representation (Qasim et al., 2022). This work adjusts two different pre-trained models to investigate the efficacy of fine-tuning BERT for sentiment analysis.


**3. Methodology**
This assignment affords an opportunity to use and fine-tune a BERT model for sentiment classification. The procedure consists of several crucial stages: The process consists of a few key steps:
1.	Dataset Preparation: The IMDb dataset is used, which contains movie reviews and ratings. The data collection is suitable for this type of tokenization.
2.	Tokenization: The task of tokenization is to take raw text and transform it into a form that the model is going to accept (Nguyen et al., 2020). Another preprocessing step practiced is the tokenizer, which is used to divide the text into tokens so they fit the input specification of the model.
3.	Model Configuration: It select a BERT model, known as a transformer that has previously undergone sequence classification training. This model is a good fit for the sentiment analysis problem because it has two possible output labels: positive and negative.
4.	Training: The model is trained on eighty percent of the IMDb dataset, whereas validation is performed on the remaining twenty percent of the dataset. Hyper-parameters like learning rate, the number of batches or iterations, and epochs are set to achieve the best model.
5.	Evaluation: The last step in training the model is to analyze its performance on a data set that it has not previously encountered. It is often important to know how well the model generalizes to unseen data, and therefore such measures as accuracy and loss are used to determine this.


In [None]:
!pip install torch transformers datasets


Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cu

In [None]:
# Import necessary libraries
import torch
from datasets import load_dataset
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from transformers import DataCollatorWithPadding, pipeline
# Import evaluate library for metrics calculation
from sklearn.metrics import accuracy_score
import numpy as np

In [None]:
from datasets import load_dataset

ds = load_dataset("stanfordnlp/imdb")

**4. Result and Discussion**
**4.1 BERT Tokenization and Model Preparation**
The BERT tokenizer is being fine-tuned via the tokenize_function, which performs the text truncation for the dataset (Geetha and Renuka, 2021). For the tokenization of the dataset with dynamic padding, DataCollatorWithPadding is used next. The BERT model, which is used for sequence classification, is pretrained with two classes: the positive class and the negative class. The dataset is further divided into a training set and a testing set for training as well as for assessing the model.

In [None]:
# Sample only 100 rows from the training and test sets
train_subset = ds["train"].shuffle(seed=42).select([i for i in list(range(100))])
test_subset = ds["test"].shuffle(seed=42).select([i for i in list(range(100))])

# Load the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize the dataset
def tokenize_function(example):
    return tokenizer(example['text'], truncation=True, padding='max_length', max_length=256)

# Tokenize the dataset
tokenized_train = train_subset.map(tokenize_function, batched=True)
tokenized_test = test_subset.map(tokenize_function, batched=True)

# Data Collator for dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Load the BERT model for sequence classification (2 classes: positive, negative)
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Define a compute_metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_score(labels, predictions)
    return {"accuracy": accuracy}


**4.2 Configuration of Training Parameters**
The TrainingArguments class is used to set the positional arguments for training and implement training configurations, such as the output directory for results and logs. Those hyperparameters will be as follows: they are going to be trained and evaluated at the end of each epoch, with the learning rate set at 2e-5 and using batches of 8 samples for training and evaluating (Azhar and Khodra, 2020). It is planned to train the model for 3 epochs with a weight decay of 0.01 to reduce overfitting, and they decided that the training log will be executed every 10 steps.


In [None]:
# Define training arguments (remove compute_metrics from here)
training_args = TrainingArguments(
    output_dir="./results",          # output directory
    evaluation_strategy="epoch",     # evaluate every epoch
    learning_rate=2e-5,              # learning rate
    per_device_train_batch_size=2,   # batch size for training
    per_device_eval_batch_size=2,    # batch size for evaluation
    num_train_epochs=1,              # number of training epochs
    weight_decay=0.01,               # strength of weight decay
    logging_dir="./logs",            # directory for storing logs
    logging_steps=5,
)

# Initialize the Trainer (pass compute_metrics here)
trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=tokenized_train,       # training dataset
    eval_dataset=tokenized_test,         # evaluation dataset
    tokenizer=tokenizer,                 # tokenizer for data preprocessing
    data_collator=data_collator,         # data collator for dynamic padding
    compute_metrics=compute_metrics     # Pass compute_metrics to the Trainer
)



**4.3 Trainer Initialization and Model Training**
The Trainer class takes as parameters the trained BERT model, the arguments for training, and the data for training and evaluation (Pota et al., 2021). The tokenizer is given to help with processing, and the data collator is for dynamic padding. The model is then trained using the trainer. Below is a view of an example of the trainer used to train the model: train () method, through which it trains the model depending upon the training configuration.


In [None]:
# Train the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()

print(f"Test Accuracy: {results['eval_accuracy']:.4f}")
print(f"Test Loss: {results['eval_loss']:.4f}")

# Save the fine-tuned model
model.save_pretrained("./fine-tuned-bert-imdb")
tokenizer.save_pretrained("./fine-tuned-bert-imdb")

Epoch,Training Loss,Validation Loss,Accuracy
1,0.8371,0.730052,0.4


('./fine-tuned-bert-imdb/tokenizer_config.json',
 './fine-tuned-bert-imdb/special_tokens_map.json',
 './fine-tuned-bert-imdb/vocab.txt',
 './fine-tuned-bert-imdb/added_tokens.json')

**4.4 Model Evaluation, Saving, and Inference**
The section first assesses the model using trainer.evaluate()  and then prints the accuracy of the test and the loss. The fine-tuned model and tokenizer are then saved in the directory. /fine-tuned-bert-imdb. For inference, the sentiment analysis pipeline function is used to load the model. The model is tested with example reviews, and what is more, the outcomes provoking the sentiment analysis are given with the sentiment label and the degree of confidence.


In [None]:
# Load the model for inference
sentiment_analysis = pipeline("sentiment-analysis", model="./fine-tuned-bert-imdb", tokenizer=tokenizer)

# Test the model with some example reviews
examples = [
    "I absolutely loved this movie. The acting was great and the storyline was touching.",
    "This was the worst movie I've ever seen. It was a complete waste of time."
]

results = sentiment_analysis(examples)

for i, result in enumerate(results):
    print(f"Review: {examples[i]}")
    print(f"Sentiment: {result['label']}, Confidence: {result['score']:.4f}")
    print()

Review: I absolutely loved this movie. The acting was great and the storyline was touching.
Sentiment: LABEL_0, Confidence: 0.5890

Review: This was the worst movie I've ever seen. It was a complete waste of time.
Sentiment: LABEL_0, Confidence: 0.5755



**5. Conclusion**
This research effectively illustrates to optimize a BERT model for sentiment analysis when taken as a whole (Qasim et al., 2022). The pre-trained feature of BERT is used by the classifier to determine if a text is sentimental, and it is adjusted based on task requirements. This experiment demonstrates the applicability of LLMs in various NLP scenarios and emphasizes the significance of fine-tuning to customize models for particular tasks and datasets.


**Reference **
Azhar, A.N. and Khodra, M.L., 2020, September. Fine-tuning pretrained multilingual BERT model for Indonesian aspect-based sentiment analysis. In 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA) (pp. 1-6). IEEE.
Geetha, M.P. and Renuka, D.K., 2021. Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks, 2, pp.64-69.
Li, X., Wang, X. and Liu, H., 2021, May. Research on fine-tuning strategy of sentiment analysis model based on BERT. In 2021 international conference on communications, information system and computer engineering (CISCE) (pp. 798-802). IEEE.
Nguyen, Q.T., Nguyen, T.L., Luong, N.H. and Ngo, Q.H., 2020, November. Fine-tuning bert for sentiment analysis of vietnamese reviews. In 2020 7th NAFOSTED conference on information and computer science (NICS) (pp. 302-307). IEEE.
Nugroho, K.S., Sukmadewa, A.Y., Wuswilahaken DW, H., Bachtiar, F.A. and Yudistira, N., 2021, September. Bert fine-tuning for sentiment analysis on indonesian mobile apps reviews. In Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology (pp. 258-264).
Pota, M., Ventura, M., Catelli, R. and Esposito, M., 2020. An effective BERT-based pipeline for Twitter sentiment analysis: A case study in Italian. Sensors, 21(1), p.133.
Pota, M., Ventura, M., Fujita, H. and Esposito, M., 2021. Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Systems with Applications, 181, p.115119.
Prottasha, N.J., Sami, A.A., Kowsher, M., Murad, S.A., Bairagi, A.K., Masud, M. and Baz, M., 2022. Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors, 22(11), p.4157.
Qasim, R., Bangyal, W.H., Alqarni, M.A. and Ali Almazroi, A., 2022. A Fine‐Tuned BERT‐Based Transfer Learning Approach for Text Classification. Journal of healthcare engineering, 2022(1), p.3498123.
Roccabruna, G., Azzolin, S. and Riccardi, G., 2022. Multi-source multi-domain sentiment analysis with BERT-based models. In European Language Resources Association (pp. 581-589). European Language Resources Association.
