## Testing the sentiment-roberta-large-english model on IMDB movie reviews

In [1]:
# data processing tools
import os
import csv
import urllib.request
import pandas as pd
from tqdm import tqdm

# Huggingface tools
from transformers import AutoTokenizer
from transformers import TFAutoModelForSequenceClassification #sequence classification
from transformers import pipeline, set_seed

#classification report
from sklearn.metrics import (confusion_matrix, 
                            classification_report)

2022-05-20 21:25:43.531618: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-20 21:25:43.531666: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [2]:
#which model do we want to use?
MODEL = f"siebert/sentiment-roberta-large-english" #URL

In [3]:
#initalize a pretrained tokenizer (to tokenize the texts)
tokenizer = AutoTokenizer.from_pretrained(MODEL)

In [4]:
#Initialize the model
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) 
#pytorch = tensorflow is what we use, so we need to use TF at start, see start cell

2022-05-20 21:25:51.618057: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-05-20 21:25:51.618106: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (j-72560-job-0): /proc/driver/nvidia/version does not exist
2022-05-20 21:25:51.618608: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at siebert/sentiment-roberta-large-english.
If your task is similar to the task the model of th

In [5]:
#load the data from https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
filename = os.path.join("..","..","cds-lang", "project", "IMDB.csv")
data = pd.read_csv(filename)

In [6]:
print(data)

                                                  review sentiment
0      One of the other reviewers has mentioned that ...  positive
1      A wonderful little production. <br /><br />The...  positive
2      I thought this was a wonderful way to spend ti...  positive
3      Basically there's a family where a little boy ...  negative
4      Petter Mattei's "Love in the Time of Money" is...  positive
...                                                  ...       ...
49995  I thought this movie did a down right good job...  positive
49996  Bad plot, bad dialogue, bad acting, idiotic di...  negative
49997  I am a Catholic taught in parochial elementary...  negative
49998  I'm going to have to disagree with the previou...  negative
49999  No one expects the Star Trek movies to be high...  negative

[50000 rows x 2 columns]


In [7]:
#converting negative and positive into ones and zeros to match the model
data['sentiment'] = data['sentiment'].map({'negative': 0, 'positive': 1})

In [8]:
print(data)

                                                  review  sentiment
0      One of the other reviewers has mentioned that ...          1
1      A wonderful little production. <br /><br />The...          1
2      I thought this was a wonderful way to spend ti...          1
3      Basically there's a family where a little boy ...          0
4      Petter Mattei's "Love in the Time of Money" is...          1
...                                                  ...        ...
49995  I thought this movie did a down right good job...          1
49996  Bad plot, bad dialogue, bad acting, idiotic di...          0
49997  I am a Catholic taught in parochial elementary...          0
49998  I'm going to have to disagree with the previou...          0
49999  No one expects the Star Trek movies to be high...          0

[50000 rows x 2 columns]


In [10]:
#making the very large dataset smaller: make a subset of 500 reviews
data = data.head(500)

In [11]:
print(data)

                                                review  sentiment
0    One of the other reviewers has mentioned that ...          1
1    A wonderful little production. <br /><br />The...          1
2    I thought this was a wonderful way to spend ti...          1
3    Basically there's a family where a little boy ...          0
4    Petter Mattei's "Love in the Time of Money" is...          1
..                                                 ...        ...
495  "American Nightmare" is officially tied, in my...          0
496  First off, I have to say that I loved the book...          0
497  This movie was extremely boring. I only laughe...          0
498  I was disgusted by this movie. No it wasn't be...          0
499  Such a joyous world has been created for us in...          1

[500 rows x 2 columns]


In [12]:
#making a pipeline to quickly generate results with default parameters
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at siebert/sentiment-roberta-large-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [13]:
#some of the reviews are too long for the model to do sentiment analysis on
#so we make a list of those reviews it is able to score and another list with the reviews that are too long

sentiment_scores = [] #list for reviews for the model to do sentiment analysis on
bad_ids = [] #list for reviews that are too long to do sentiment analysis on
#for every row in my data
for idx, row in tqdm(data.iterrows()):
    #iterate over pandas dataframe (go through all entrys in df)
    #make prediction on the review column
    try:
        # try the following code to extract sentiment
        predictions = sentiment_analysis(row["review"])
        #append to the empty output list called sentiment_scores
        sentiment_scores.append(predictions)
    except:
        # if that doesn't work, just print this and move on:
        print(f"Oops! Review {idx} is too long for the model!")
        # a list of ids to be excluded: append to the bad_ids list
        bad_ids.append(idx)

12it [00:13,  1.05it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (525 > 512). Running this sequence through the model will result in indexing errors


Oops! Review 12 is too long for the model!


26it [00:26,  1.02s/it]

Oops! Review 26 is too long for the model!


29it [00:28,  1.30it/s]

Oops! Review 29 is too long for the model!
Oops! Review 30 is too long for the model!


33it [00:31,  1.23it/s]

Oops! Review 33 is too long for the model!


48it [00:47,  1.04it/s]

Oops! Review 48 is too long for the model!


50it [00:48,  1.41it/s]

Oops! Review 50 is too long for the model!
Oops! Review 51 is too long for the model!


58it [00:54,  1.12it/s]

Oops! Review 58 is too long for the model!
Oops! Review 59 is too long for the model!


77it [01:14,  1.30s/it]

Oops! Review 77 is too long for the model!


92it [01:30,  1.18s/it]

Oops! Review 92 is too long for the model!


99it [01:36,  1.01it/s]

Oops! Review 99 is too long for the model!


101it [01:37,  1.35it/s]

Oops! Review 101 is too long for the model!


126it [02:02,  1.07it/s]

Oops! Review 126 is too long for the model!


131it [02:05,  1.17it/s]

Oops! Review 131 is too long for the model!


140it [02:15,  1.23s/it]

Oops! Review 140 is too long for the model!


142it [02:16,  1.01it/s]

Oops! Review 142 is too long for the model!


156it [02:31,  1.00s/it]

Oops! Review 156 is too long for the model!


163it [02:36,  1.25it/s]

Oops! Review 163 is too long for the model!


172it [02:45,  1.11s/it]

Oops! Review 172 is too long for the model!


177it [02:51,  1.24s/it]

Oops! Review 177 is too long for the model!


182it [02:55,  1.07it/s]

Oops! Review 182 is too long for the model!


186it [02:59,  1.06it/s]

Oops! Review 186 is too long for the model!


189it [03:01,  1.26it/s]

Oops! Review 189 is too long for the model!


191it [03:02,  1.22it/s]

Oops! Review 191 is too long for the model!


198it [03:08,  1.08it/s]

Oops! Review 198 is too long for the model!


210it [03:18,  1.24it/s]

Oops! Review 210 is too long for the model!


218it [03:25,  1.04it/s]

Oops! Review 218 is too long for the model!


228it [03:35,  1.14s/it]

Oops! Review 228 is too long for the model!


254it [03:59,  1.31s/it]

Oops! Review 254 is too long for the model!


257it [04:01,  1.06s/it]

Oops! Review 257 is too long for the model!
Oops! Review 258 is too long for the model!


260it [04:02,  1.39it/s]

Oops! Review 260 is too long for the model!


263it [04:05,  1.32it/s]

Oops! Review 263 is too long for the model!


267it [04:07,  1.31it/s]

Oops! Review 267 is too long for the model!


276it [04:16,  1.09s/it]

Oops! Review 276 is too long for the model!


295it [04:35,  1.09it/s]

Oops! Review 295 is too long for the model!


297it [04:36,  1.46it/s]

Oops! Review 297 is too long for the model!


310it [04:48,  1.07it/s]

Oops! Review 310 is too long for the model!


314it [04:50,  1.40it/s]

Oops! Review 314 is too long for the model!


320it [04:56,  1.11s/it]

Oops! Review 320 is too long for the model!


322it [04:57,  1.30it/s]

Oops! Review 322 is too long for the model!


332it [05:04,  1.28it/s]

Oops! Review 332 is too long for the model!


338it [05:09,  1.29it/s]

Oops! Review 338 is too long for the model!


353it [05:25,  1.30s/it]

Oops! Review 353 is too long for the model!


363it [05:36,  1.26s/it]

Oops! Review 363 is too long for the model!
Oops! Review 364 is too long for the model!
Oops! Review 365 is too long for the model!


373it [05:44,  1.19s/it]

Oops! Review 373 is too long for the model!
Oops! Review 374 is too long for the model!


378it [05:47,  1.23it/s]

Oops! Review 378 is too long for the model!


400it [06:10,  1.01s/it]

Oops! Review 400 is too long for the model!


402it [06:11,  1.23it/s]

Oops! Review 402 is too long for the model!


407it [06:15,  1.11it/s]

Oops! Review 407 is too long for the model!


410it [06:17,  1.35it/s]

Oops! Review 410 is too long for the model!


418it [06:24,  1.14it/s]

Oops! Review 418 is too long for the model!


422it [06:28,  1.07s/it]

Oops! Review 422 is too long for the model!


424it [06:29,  1.18it/s]

Oops! Review 424 is too long for the model!


433it [06:39,  1.15s/it]

Oops! Review 433 is too long for the model!


435it [06:39,  1.22it/s]

Oops! Review 435 is too long for the model!


442it [06:46,  1.04it/s]

Oops! Review 442 is too long for the model!


454it [06:57,  1.07s/it]

Oops! Review 454 is too long for the model!


456it [06:58,  1.23it/s]

Oops! Review 456 is too long for the model!


463it [07:04,  1.11it/s]

Oops! Review 463 is too long for the model!


473it [07:14,  1.11s/it]

Oops! Review 473 is too long for the model!


479it [07:21,  1.24s/it]

Oops! Review 479 is too long for the model!


492it [07:34,  1.03s/it]

Oops! Review 492 is too long for the model!


500it [07:40,  1.09it/s]

Oops! Review 499 is too long for the model!





In [46]:
#look at the scored reviews
print(sentiment_scores)

[[{'label': 'POSITIVE', 'score': 0.9988124370574951}], [{'label': 'POSITIVE', 'score': 0.9989343285560608}], [{'label': 'POSITIVE', 'score': 0.998934805393219}], [{'label': 'NEGATIVE', 'score': 0.9995043277740479}], [{'label': 'POSITIVE', 'score': 0.9988835453987122}], [{'label': 'POSITIVE', 'score': 0.9988895058631897}], [{'label': 'POSITIVE', 'score': 0.9941018223762512}], [{'label': 'NEGATIVE', 'score': 0.9995146989822388}], [{'label': 'NEGATIVE', 'score': 0.9995176792144775}], [{'label': 'POSITIVE', 'score': 0.9989001750946045}], [{'label': 'NEGATIVE', 'score': 0.9994913339614868}], [{'label': 'POSITIVE', 'score': 0.9932581186294556}], [{'label': 'NEGATIVE', 'score': 0.9995067119598389}], [{'label': 'POSITIVE', 'score': 0.9989084005355835}], [{'label': 'NEGATIVE', 'score': 0.9995158910751343}], [{'label': 'NEGATIVE', 'score': 0.9983599781990051}], [{'label': 'NEGATIVE', 'score': 0.9995115995407104}], [{'label': 'POSITIVE', 'score': 0.9960166811943054}], [{'label': 'NEGATIVE', 'scor

In [47]:
#look at the non-scored reviews (those that are too long)
print(bad_ids)

[12, 26, 29, 30, 33, 48, 50, 51, 58, 59, 77, 92, 99, 101, 126, 131, 140, 142, 156, 163, 172, 177, 182, 186, 189, 191, 198, 210, 218, 228, 254, 257, 258, 260, 263, 267, 276, 295, 297, 310, 314, 320, 322, 332, 338, 353, 363, 364, 365, 373, 374, 378, 400, 402, 407, 410, 418, 422, 424, 433, 435, 442, 454, 456, 463, 473, 479, 492, 499]


In [14]:
#make list keeping only the prediction (not certainty)
predicted_labels = []
for score in sentiment_scores:
    label = score[0]["label"]
    predicted_labels.append(label.lower()) #lowercase the predictions to match the IMDB labels

In [80]:
#we are left with the prediction in lower case
print(predicted_labels)

['positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'negative', 'positive', 'negative', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative', 'po

In [15]:
# keep only the scored labels in list in our data
scores_keep = []
for idx, row in data.iterrows():
    # if row index is in the bad_id list (too long)
    if idx in bad_ids:
        # do nothing
        pass
    else:
        # if its not in the bad_id list then keep the score
        scores_keep.append(row)

In [16]:
# keep only the data we want into a df
keep_data = pd.DataFrame(scores_keep)

In [17]:
print(keep_data)

                                                review  sentiment
0    One of the other reviewers has mentioned that ...          1
1    A wonderful little production. <br /><br />The...          1
2    I thought this was a wonderful way to spend ti...          1
3    Basically there's a family where a little boy ...          0
4    Petter Mattei's "Love in the Time of Money" is...          1
..                                                 ...        ...
494  Despite some reviews being distinctly Luke-war...          1
495  "American Nightmare" is officially tied, in my...          0
496  First off, I have to say that I loved the book...          0
497  This movie was extremely boring. I only laughe...          0
498  I was disgusted by this movie. No it wasn't be...          0

[431 rows x 2 columns]


In [18]:
#classification report to get an overview of how well the model is performing
scores_keep = ["positive" if item == 1 else "negative" for item in keep_data["sentiment"]]

In [19]:
labels = ["negative", "positive"]
print(classification_report(scores_keep, predicted_labels))

              precision    recall  f1-score   support

    negative       0.98      0.97      0.97       233
    positive       0.96      0.97      0.97       198

    accuracy                           0.97       431
   macro avg       0.97      0.97      0.97       431
weighted avg       0.97      0.97      0.97       431



In [86]:
#save the model in output folder

report = classification_report(scores_keep, predicted_labels, target_names = labels)

f = open("../../cds-lang/project/output/project_classification.txt",'w') #saving in this folder as project_classification.txt
print(report, file=f)

print("Done! Report has been generated and saved in the output folder as project_classification.txt")

Done! Report has been generated and saved in the output folder as project_classification.txt


In [20]:
# Is the model better at predicting negative or positive sentiments?
cm = pd.DataFrame(confusion_matrix(scores_keep, predicted_labels), 
             index=labels, columns=labels)

In [21]:
print(cm)

          negative  positive
negative       226         7
positive         5       193


In [22]:
#save the matrix to output folder
cm.to_csv("../../cds-lang/project/output/confusion_matrix.csv")

print("Done! Confusion matrix has been generated and saved in the output folder as confusion_matrix.csv")

Done! Confusion matrix has been generated and saved in the output folder as confusion_matrix.csv
