<a href="https://colab.research.google.com/github/SaibalPatraDS/Hands-on-LLM/blob/main/Movie_Review_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## `BERT` + `FineTunning` - Movie Review Sentiment Analysis

In [None]:
## installing necessary packages
!pip install datasets transformers

In [2]:
## Loading the data from HuggingFace
from datasets import load_dataset
review_data = load_dataset(
    "cornell-movie-review-data/rotten_tomatoes"
)
review_data

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

train.parquet:   0%|          | 0.00/699k [00:00<?, ?B/s]

validation.parquet:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

test.parquet:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})

In [3]:
review_data["train"][0,-1]

{'text': ['the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',
  'things really get weird , though not particularly scary : the movie is all portent and no content .'],
 'label': [1, 0]}

## Text Classification with Representation Model

### Using `Task` Specific Model

-- Pretrained Models

In [4]:
from transformers import pipeline
## defining model path
model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"

## defining pipeline
pipe = pipeline(
    model=model_path,
    tokenizer=model_path,
    return_all_scores=True,
    device="cuda"
)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



FLOW of The Solution:

1. loading pre-trained model
2. Passing the movie review data into the model
3. Model : Transformers + Pre-trained Model
4. Classification output from the model
5. Evaluation of the Results

In [7]:
## Classification Modelling
import numpy as np
from tqdm import tqdm
from transformers.pipelines.pt_utils import KeyDataset

## Classification Report
y_pred = []
for output in tqdm(pipe(KeyDataset(review_data["test"], "text")), total=len(review_data['test'])):
  negative_score = output[0]["score"]
  positive_score = output[2]["score"]
  label = np.argmax([negative_score, positive_score])
  y_pred.append(label)

Disabling tokenizer parallelism, we're using DataLoader multithreading already
100%|██████████| 1066/1066 [00:18<00:00, 59.12it/s]


In [12]:
## testing the predicted score
# sum(y_pred)
y_pred[1]

1

### Evaluation of the Results

In [15]:
## Evaluation of Reviews
from sklearn.metrics import confusion_matrix, classification_report

## function for CLassification Report
def evaluate_performance(y_true, y_pred):
  """
  Create and Print the Classification Report
  """
  performance = classification_report(
      y_true,y_pred,
      target_names = ["Negative Reviews", "Positive Reviews"]
  )
  print(performance)

In [16]:
## Visualization of CLassification Report
evaluate_performance(review_data["test"]["label"], y_pred)

                  precision    recall  f1-score   support

Negative Reviews       0.76      0.88      0.81       533
Positive Reviews       0.86      0.72      0.78       533

        accuracy                           0.80      1066
       macro avg       0.81      0.80      0.80      1066
    weighted avg       0.81      0.80      0.80      1066



### Conclusion:
Model has performed significantly well.

1. About the Data, data was balanced. And then the `accuracy` was 80% though we have classified the reviews on the model which was trained on twitter data.
2. Next, `f1-score` difference is also not that large, and 0.8 `f1-score` is good enough.
3. `precision` = `TP`/`TP+FP`
Percision tries to indicate out of all positively classified reviews, how many of them were actually positive.
4. `recall` = `TP`/`TP+FP`
recall tries to indicate out of all actual positive, how many of them were actually classified as positive.
5. Scope of Improvemnet -
    * Model suffers a little to classify/detect the positive reciews. (`low score` of recall for `Positive Reviews`)
    * As well as have classified some of positive reviews as negative reviews.
    (`low score` of precision for `Negative Reviews`)