# IMDB movie review sentiment classification using huggingface models

In this notebook, we'll test pre-trained sentiment analysis models and later finetune a DistilBERT model to perform IMDB movie review sentiment classification. This notebook is adpated from [Getting Started with Sentiment Analysis using Python](https://huggingface.co/blog/sentiment-analysis-python).

First, we need to install several packages. 

In [1]:
!pip install datasets transformers huggingface_hub

Defaulting to user installation because normal site-packages is not writeable


## Import the libraries

In [2]:
from transformers import pipeline
import torch
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
import numpy as np
from datasets import load_metric
from huggingface_hub import notebook_login
from transformers import TrainingArguments, Trainer
from transformers import pipeline

## Check if torch is using the GPU

In [4]:
print('Using PyTorch version:', torch.__version__)
if torch.cuda.is_available():
    print('Using GPU, device name:', torch.cuda.get_device_name(0))
    device = torch.device('cuda')
else:
    print('No GPU found, using CPU instead.') 
    device = torch.device('cpu')

Using PyTorch version: 2.4.1+rocm6.1
Using GPU, device name: AMD Instinct MI250X


## Use Pre-trained Sentiment Analysis Models

In [5]:
sentiment_pipeline = pipeline("sentiment-analysis",device=device)
data = ["I love you", "I hate you"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998656511306763},
 {'label': 'NEGATIVE', 'score': 0.9991129040718079}]

- This code snippet above utilizes the **[pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines)** class to generate predictions using models from the Hub. It applies the [default sentiment analysis model](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english?text=I+like+you.+I+love+you) to evaluate the provided list of text data.
- The analysis results are **POSITIVE** for first entry and **NEGATIVE** for the second entry.

### One can also use a specific sentiment analysis model by providing the name of the model, e.g., if you want a sentiment analysis model for tweets, you can specify the model id.

In [6]:
specific_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis", device = device)
specific_model(data)

[{'label': 'POS', 'score': 0.9916695356369019},
 {'label': 'NEG', 'score': 0.9806600213050842}]

## Fine-tuning DistilBERT model using IMDB dataset 

- The [IMDB](https://huggingface.co/datasets/stanfordnlp/imdb) contains 50000 movies reviews from the Internet Movie Database, split into 25000 reviews for training and 25000 reviews for testing. Half of the reviews are positive and half are negative. 

- IMDB dataset is relatively large, so let's use 5000 samples for training and 500 for testing:

In [7]:
imdb = load_dataset("imdb")
small_train_dataset = imdb["train"].shuffle(seed=0).select([i for i in list(range(5000))])
small_test_dataset = imdb["test"].shuffle(seed=0).select([i for i in list(range(500))])

To preprocess our data, we will use DistilBERT tokenizer:

In [8]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

- Next, we will prepare the text inputs for the model for both splits of our dataset (training and test) by using the map method:

In [9]:
def preprocess_function(examples):
   return tokenizer(examples["text"], truncation=True)
 
tokenized_train = small_train_dataset.map(preprocess_function, batched=True)
tokenized_test = small_test_dataset.map(preprocess_function, batched=True)

- To speed up training, let's use a data_collator to convert your training samples to PyTorch tensors and concatenate them with the correct amount of padding:

In [10]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

### Training the model
- We will be throwing away the pretraining head of the DistilBERT model and replacing it with a classification head fine-tuned for sentiment analysis. This enables us to transfer the knowledge from DistilBERT to our custom model.

In [11]:
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


- Then, let's define the metrics you will be using to evaluate how good is your fine-tuned model (accuracy and f1 score)

In [12]:
def compute_metrics(eval_pred):
   load_accuracy = load_metric("accuracy")
   load_f1 = load_metric("f1")
  
   logits, labels = eval_pred
   predictions = np.argmax(logits, axis=-1)
   accuracy = load_accuracy.compute(predictions=predictions, references=labels)["accuracy"]
   f1 = load_f1.compute(predictions=predictions, references=labels)["f1"]
   return {"accuracy": accuracy, "f1": f1}

- Define the training arguments

In [16]:
repo_name = "finetuning-sentiment-model-5000-samples"
 
training_args = TrainingArguments(
   output_dir=repo_name,
   learning_rate=2e-5,
   per_device_train_batch_size=16,
   per_device_eval_batch_size=16,
   num_train_epochs=2,
   weight_decay=0.01,
   save_strategy="epoch",
   push_to_hub=False,
)
 
trainer = Trainer(
   model=model,
   args=training_args,
   train_dataset=tokenized_train,
   eval_dataset=tokenized_test,
   tokenizer=tokenizer,
   data_collator=data_collator,
   compute_metrics=compute_metrics,
)

- Start training

In [17]:
trainer.train()

Step,Training Loss
500,0.1244


TrainOutput(global_step=626, training_loss=0.11648416519165039, metrics={'train_runtime': 90.4291, 'train_samples_per_second': 110.584, 'train_steps_per_second': 6.923, 'total_flos': 1313372861612160.0, 'train_loss': 0.11648416519165039, 'epoch': 2.0})

- Evaluate the model

In [18]:
trainer.evaluate()

  load_accuracy = load_metric("accuracy")


{'eval_loss': 0.3359145224094391,
 'eval_accuracy': 0.918,
 'eval_f1': 0.9168356997971603,
 'eval_runtime': 3.2344,
 'eval_samples_per_second': 154.589,
 'eval_steps_per_second': 9.894,
 'epoch': 2.0}

In [19]:
pipe = pipeline("text-classification", model=model,tokenizer=tokenizer, device = device)
pipe(["I love this move", "This movie sucks!"])

[{'label': 'LABEL_1', 'score': 0.9975378513336182},
 {'label': 'LABEL_0', 'score': 0.9927016496658325}]

## Task 1 Increase the training sample to 10,000 and see how it affetcs the eval_accuracy

In [None]:
imdb = load_dataset("imdb")
small_train_dataset = imdb["train"].shuffle(seed=0).select([i for i in list(range(10000))])
small_test_dataset = imdb["test"].shuffle(seed=0).select([i for i in list(range(500))])

## Task 2 Tune the hyperparameters, e.g., learning rate,  and see it affects the eval_accuracy