# IMDB movie review sentiment classification using Hugging Face models

In this notebook, we'll test pre-trained sentiment analysis models and later finetune a DistilBERT model to perform IMDB movie review sentiment classification. This notebook is adapted from [Getting Started with Sentiment Analysis using Python](https://huggingface.co/blog/sentiment-analysis-python).

Import the libraries

In [None]:
from transformers import pipeline
import torch
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
import numpy as np
from datasets import load_metric
from huggingface_hub import notebook_login
from transformers import TrainingArguments, Trainer
from transformers import pipeline

Check if PyTorch is using the GPU

In [None]:
print('Using PyTorch version:', torch.__version__)
if torch.cuda.is_available():
    print('Using GPU, device name:', torch.cuda.get_device_name(0))
    device = torch.device('cuda')
else:
    print('No GPU found, using CPU instead.') 
    device = torch.device('cpu')

## Use Pre-trained Sentiment Analysis Models

In [None]:
sentiment_pipeline = pipeline("sentiment-analysis",device=device)
data = ["I love you", "I hate you"]
sentiment_pipeline(data)

- This code snippet above utilizes the **[pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines)** class to generate predictions using models from the Hub. It applies the [default sentiment analysis model](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) to evaluate the provided list of text data.
- The analysis results are **POSITIVE** for first entry and **NEGATIVE** for the second entry.

One can also use a specific sentiment analysis model by providing the name of the model, e.g., if you want a sentiment analysis model for tweets, you can specify the model id.

In [None]:
specific_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis", device = device)
specific_model(data)

## Fine-tuning DistilBERT model using IMDB dataset 

- The [IMDB](https://huggingface.co/datasets/stanfordnlp/imdb) dataset contains 50000 movies reviews from the Internet Movie Database, split into 25000 reviews for training and 25000 reviews for testing. Half of the reviews are positive and half are negative. 

- The IMDB dataset is relatively large, so let's use 5000 samples for training to speed up our process for this exercise.

In [None]:
imdb = load_dataset("imdb")
small_train_dataset = imdb["train"].shuffle(seed=0).select([i for i in list(range(5000))])
test_dataset = imdb["test"]

To preprocess our data, we will use DistilBERT tokenizer:

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

- Next, we will prepare the text inputs for the model for both splits of our dataset (training and test) by using the map method:

In [None]:
def preprocess_function(examples):
   return tokenizer(examples["text"], truncation=True)
 
tokenized_train = small_train_dataset.map(preprocess_function, batched=True)
tokenized_test = test_dataset.map(preprocess_function, batched=True)

- To speed up training, let's use a data_collator to convert your training samples to PyTorch tensors and concatenate them with the correct amount of padding:

In [None]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

### Training the model
- We will be throwing away the pretraining head of the DistilBERT model and replacing it with a classification head fine-tuned for sentiment analysis. This enables us to transfer the knowledge from DistilBERT to our custom model.

In [None]:
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

- Then, let's define the metrics you will be using to evaluate how good is your fine-tuned model (accuracy and f1 score)

In [None]:
def compute_metrics(eval_pred):
   load_accuracy = load_metric("accuracy")
   load_f1 = load_metric("f1")
  
   logits, labels = eval_pred
   predictions = np.argmax(logits, axis=-1)
   accuracy = load_accuracy.compute(predictions=predictions, references=labels)["accuracy"]
   f1 = load_f1.compute(predictions=predictions, references=labels)["f1"]
   return {"accuracy": accuracy, "f1": f1}

- Define the training arguments

In [None]:
repo_name = "finetuning-sentiment-model-5000-samples"
 
training_args = TrainingArguments(
   output_dir=repo_name,
   learning_rate=2e-5,
   per_device_train_batch_size=16,
   per_device_eval_batch_size=16,
   num_train_epochs=2,
   weight_decay=0.01,
   save_strategy="epoch",
   push_to_hub=False,
)
 
trainer = Trainer(
   model=model,
   args=training_args,
   train_dataset=tokenized_train,
   eval_dataset=tokenized_test,
   tokenizer=tokenizer,
   data_collator=data_collator,
   compute_metrics=compute_metrics,
)

- Start training

In [None]:
trainer.train()

- Evaluate the model

In [None]:
trainer.evaluate()

- Model inference

In [None]:
pipe = pipeline("text-classification", model=model,tokenizer=tokenizer, device = device)
pipe(["I love this move", "This movie sucks!"])

## Task 1 Run this script with GPU

## Task 2 Compare the test dataset accuracy achieved from finetuned DistilBERT model and the previous RNN model. What do you notice?