Install the following libraries: <br>
-transformers <br>
-torch <br>
-pytorch <br>
-tqdm <br>


SENTIMENT ANALYSIS - roBERTa pre-trained model classification. <br>

This project will be using 'cardiffnlp/twitter-roberta-base-sentiment' as it is fine-tuned on a large corpus of Twitter data and specifically trained for sentiment analysis tasks.

In [14]:
!pip install transformers


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [35]:
#Import libraries and dependencies
import pandas as pd
import numpy as np

from tqdm import tqdm

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

In [36]:
# Load the reviews from your dataset
df = pd.read_csv("cleaned_reviews2.csv")

df.head()


# #Create and Id column
df['Id'] = np.arange(len(df))
df.head()


# # #Rearrange the columns
df = df[['Id', 'restaurant_ids', 'id_review', 'caption', 'relative_date', 'username', 'name', 'cuisine']]
df.head()

Unnamed: 0,Id,restaurant_ids,id_review,caption,relative_date,username,name,cuisine
0,0,1,ChdDSUhNMG9nS0VJQ0FnSURSOTRpbmxnRRAB,Great experience,6 days ago,Sandeep Kaur,Domino's Pizza Flinders St,Fast Food
1,1,1,ChdDSUhNMG9nS0VJQ0FnSURScktPaDV3RRAB,Must be the worst dominos in the world. Pizz...,3 weeks ago,Robert McFarland,Domino's Pizza Flinders St,Fast Food
2,2,1,ChZDSUhNMG9nS0VJQ0FnSURSNUotOUt3EAE,They are so nice and the Food is Perfect !!,3 weeks ago,Patricia Tchialeu,Domino's Pizza Flinders St,Fast Food
3,3,1,ChdDSUhNMG9nS0VJQ0FnSURSNklYTG93RRAB,Staff was friendly and helpful. The deals make...,3 weeks ago,Brent Folan,Domino's Pizza Flinders St,Fast Food
4,4,1,ChZDSUhNMG9nS0VJQ0FnSUNPNXJhd0VREAE,Been coming here for awhile usually happy,a month ago,Johnathan Vanderwerf,Domino's Pizza Flinders St,Fast Food


In [37]:
df.tail()

Unnamed: 0,Id,restaurant_ids,id_review,caption,relative_date,username,name,cuisine
14309,14309,652,ChZDSUhNMG9nS0VJQ0FnSURocThfa0xREAE,Went there with some friends. The pasta was de...,a month ago,Maria Varvara,Da Guido La Pasta,Italian
14310,14310,652,ChdDSUhNMG9nS0VJQ0FnSUNSdEtyQm13RRAB,We had a lovely dinner here tonight for my hus...,a month ago,Emily Dale,Da Guido La Pasta,Italian
14311,14311,652,ChdDSUhNMG9nS0VJQ0FnSURoOTd5RmtnRRAB,We rocked in late and were greeted with wonder...,a month ago,Kat Moore,Da Guido La Pasta,Italian
14312,14312,652,ChdDSUhNMG9nS0VJQ0FnSURoeC1ua2dRRRAB,Super relaxed and authentic atmosphere with th...,a month ago,Leslie Schmidt,Da Guido La Pasta,Italian
14313,14313,652,ChZDSUhNMG9nS0VJQ0FnSURobTV6UUhBEAE,"Mircos service tonight was amazing, we loved e...",a month ago,Shaun Marolia,Da Guido La Pasta,Italian


In [38]:
null_values = df.isnull()

null_values.sum()

# # Checking for NaN values


Id                0
restaurant_ids    0
id_review         0
caption           0
relative_date     0
username          0
name              0
cuisine           0
dtype: int64

In [39]:
nan_values = df.isna()

nan_values.sum()

Id                0
restaurant_ids    0
id_review         0
caption           0
relative_date     0
username          0
name              0
cuisine           0
dtype: int64

In [40]:
#Pull the pre-trained model provided by HuggingFace

model_name = f"cardiffnlp/twitter-roberta-base-sentiment"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [41]:
#Create one example

example = df['caption'][1000]
print(example)

everything is good except no coffee or tea available


In [42]:
#Create a function through to process the example into the pre-trained model

def roberta_polarity_scores(example):
    encoded_text = tokenizer(example, return_tensors='pt')
    output = model(**encoded_text)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)

    scores_dict = {
        'roberta_neg': scores[0],
        'roberta_neu': scores[1],
        'roberta_pos': scores[2]
    }

    return scores_dict

In [44]:
#iterrate the function using a loop to each row in dataset
#Use Try and Except for any errors

res = {}

for i, row in tqdm(df.iterrows(), total=len(df)):
    try:
        text = row['caption']
        myid = row['Id']
        roberta_result = roberta_polarity_scores(text)
        res[myid] = roberta_result
        
    except (RuntimeError, IndexError):
        print(f'Broke for id {myid}')

  7%|▋         | 1029/14314 [03:59<47:06,  4.70it/s]  

Broke for id 1027


 10%|▉         | 1384/14314 [05:31<45:25,  4.74it/s]  

Broke for id 1382


 12%|█▏        | 1735/14314 [06:58<44:50,  4.68it/s]

Broke for id 1735


 19%|█▉        | 2711/14314 [10:52<49:32,  3.90it/s]

Broke for id 2711


 24%|██▍       | 3466/14314 [13:45<36:41,  4.93it/s]

Broke for id 3466


 36%|███▌      | 5085/14314 [20:12<33:43,  4.56it/s]

Broke for id 5083


 43%|████▎     | 6090/14314 [24:04<30:43,  4.46it/s]

Broke for id 6090


 48%|████▊     | 6814/14314 [26:57<27:47,  4.50it/s]

Broke for id 6812


 49%|████▉     | 7060/14314 [28:16<17:56,  6.74it/s]

Broke for id 7058


 57%|█████▋    | 8225/14314 [32:42<18:01,  5.63it/s]

Broke for id 8223


 66%|██████▌   | 9450/14314 [37:44<23:20,  3.47it/s]

Broke for id 9448


 68%|██████▊   | 9684/14314 [38:50<23:27,  3.29it/s]

Broke for id 9682


 97%|█████████▋| 13814/14314 [54:29<01:44,  4.80it/s]

Broke for id 13814


 98%|█████████▊| 14037/14314 [55:21<00:48,  5.67it/s]

Broke for id 14035


 99%|█████████▉| 14145/14314 [55:48<01:24,  2.00it/s]

Broke for id 14145


100%|██████████| 14314/14314 [56:31<00:00,  4.22it/s]


In [46]:
#Create a DataFrame
results_df = pd.DataFrame(res)

results_df.head()

#Flip the columns and rows

results_df = results_df.T


results_df

Unnamed: 0,roberta_neg,roberta_neu,roberta_pos
0,0.005796,0.054793,0.939411
1,0.984013,0.013361,0.002626
2,0.002001,0.007393,0.990606
3,0.003910,0.041748,0.954342
4,0.003776,0.144302,0.851922
...,...,...,...
14309,0.001061,0.009207,0.989732
14310,0.001514,0.005663,0.992823
14311,0.001741,0.010284,0.987975
14312,0.001966,0.005208,0.992825


In [47]:
results_df = results_df.reset_index().rename(columns={'index': 'Id'})

results_df = results_df.merge(df, how='left')

results_df

Unnamed: 0,Id,roberta_neg,roberta_neu,roberta_pos,restaurant_ids,id_review,caption,relative_date,username,name,cuisine
0,0,0.005796,0.054793,0.939411,1,ChdDSUhNMG9nS0VJQ0FnSURSOTRpbmxnRRAB,Great experience,6 days ago,Sandeep Kaur,Domino's Pizza Flinders St,Fast Food
1,1,0.984013,0.013361,0.002626,1,ChdDSUhNMG9nS0VJQ0FnSURScktPaDV3RRAB,Must be the worst dominos in the world. Pizz...,3 weeks ago,Robert McFarland,Domino's Pizza Flinders St,Fast Food
2,2,0.002001,0.007393,0.990606,1,ChZDSUhNMG9nS0VJQ0FnSURSNUotOUt3EAE,They are so nice and the Food is Perfect !!,3 weeks ago,Patricia Tchialeu,Domino's Pizza Flinders St,Fast Food
3,3,0.003910,0.041748,0.954342,1,ChdDSUhNMG9nS0VJQ0FnSURSNklYTG93RRAB,Staff was friendly and helpful. The deals make...,3 weeks ago,Brent Folan,Domino's Pizza Flinders St,Fast Food
4,4,0.003776,0.144302,0.851922,1,ChZDSUhNMG9nS0VJQ0FnSUNPNXJhd0VREAE,Been coming here for awhile usually happy,a month ago,Johnathan Vanderwerf,Domino's Pizza Flinders St,Fast Food
...,...,...,...,...,...,...,...,...,...,...,...
14294,14309,0.001061,0.009207,0.989732,652,ChZDSUhNMG9nS0VJQ0FnSURocThfa0xREAE,Went there with some friends. The pasta was de...,a month ago,Maria Varvara,Da Guido La Pasta,Italian
14295,14310,0.001514,0.005663,0.992823,652,ChdDSUhNMG9nS0VJQ0FnSUNSdEtyQm13RRAB,We had a lovely dinner here tonight for my hus...,a month ago,Emily Dale,Da Guido La Pasta,Italian
14296,14311,0.001741,0.010284,0.987975,652,ChdDSUhNMG9nS0VJQ0FnSURoOTd5RmtnRRAB,We rocked in late and were greeted with wonder...,a month ago,Kat Moore,Da Guido La Pasta,Italian
14297,14312,0.001966,0.005208,0.992825,652,ChdDSUhNMG9nS0VJQ0FnSURoeC1ua2dRRRAB,Super relaxed and authentic atmosphere with th...,a month ago,Leslie Schmidt,Da Guido La Pasta,Italian


SAVE CSV FILE then ADD INTO DATABASE

In [48]:
reviews_scores = results_df.to_csv('output/reviews_scores.csv', index=False)

In [49]:
import os
print(os.listdir())

['.config', 'cleaned_reviews2.csv', '.ipynb_checkpoints', 'reviews_scores.csv', 'sample_data']
