## LABELLING THE DATASET - PART 2

In [None]:
%pip install transformers vaderSentiment

In [1]:
import pandas as pd
from transformers import pipeline
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

### 1. Labelling with RoBERTa based sentiment analysis model

In [2]:
# Read the selected data
selected_data = pd.read_csv('../Data/selected_data.csv')

In [3]:
# Initialize the sentiment analysis pipeline
sentiment_pipeline = pipeline("text-classification", 
                              model="cardiffnlp/twitter-roberta-base-sentiment-latest",
                              device=0) 




Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [4]:
# Extract the text column of selected_data as a list
reviews = selected_data["text"].tolist()

In [5]:
# Calculate the sentiment of the each of the reviews
kwargs = {'padding':True,'truncation':True,'max_length':512}
results = sentiment_pipeline(reviews, **kwargs) 

In [6]:
selected_data["roberta_label"] = [res["label"] for res in results]
selected_data["roberta_score"] = [res["score"] for res in results]

### Labelling with VADER (Lexicon and Rule-Based Model)

VADER was selected as it is specialized for sentiment analysis on social media comments.

In [7]:
sentimentAnalyzer = SentimentIntensityAnalyzer()

In [8]:
vader_label, vader_score = [], []

for review in reviews:
    # Calculate the sentiment of the review using VADER
    sentiment = sentimentAnalyzer.polarity_scores(review)
    vader_score.append(sentiment["compound"])
    
    if sentiment["compound"] >= 0.07:
        vader_label.append("positive")
    elif sentiment["compound"] <= -0.07:
        vader_label.append("negative")
    else:
        vader_label.append("neutral")


In [9]:
# Add the VADER sentiment label and score to the selected_data DataFrame
selected_data["vader_label"] = vader_label
selected_data["vader_score"] = vader_score

In [10]:
# Display the first few rows of the selected_data DataFrame
selected_data.head()

Unnamed: 0,post_id,subreddit,post_title,post_body,number_of_comments,readable_datetime,post_author,number_of_upvotes,query,text,comment_id,comment_body,comment_author,cosine_similarity,roberta_label,roberta_score,vader_label,vader_score
0,1d31lxf,technology,Former OpenAI board member explains why they f...,,97,2024-05-29 06:31:18,Maxie445,84,OpenAI,Good luck to the consumers/customers who are t...,l64i9ts,Good luck to the consumers/customers who are t...,imaketrollfaces,0.717946,positive,0.931254,positive,0.8225
1,1dn7dwq,OpenAI,I’m sick of waiting for chatGPT 4o Voice and I...,I’ve been religiously checking for the voice u...,368,2024-06-24 11:02:41,surfer808,45,ChatGPT vs Claude,OpenAI did a great job of showing the public t...,la0rsb1,OpenAI did a great job of showing the public t...,q_freak,0.710471,positive,0.950637,positive,0.9408
2,1hiru1c,ChatGPT,OpenAI's new model is equivalent to the 175th ...,,114,2024-12-20 23:38:56,MetaKnowing,236,o3,OpenAI's new model is equivalent to the 175th ...,,,,0.708699,positive,0.947382,positive,0.7096
3,1cr4jbo,technology,OpenAI launches new AI model and desktop versi...,,140,2024-05-13 21:08:09,Puginator,163,ChatGPT,OpenAI launches new AI model and desktop versi...,,,,0.700197,neutral,0.666814,neutral,0.0
4,1ggjfwi,OpenAI,OpenAI brings a new web search tool to ChatGPT,,94,2024-10-31 21:02:22,techreview,106,ChatGPT search,OpenAI brings a new web search tool to ChatGPT,,,,0.698247,neutral,0.518475,neutral,0.0


In [11]:
# Save the selected_data DataFrame to a new CSV file
selected_data.to_csv('../Data/labelled_data.csv', index=False)