In [3]:
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok


## Imports

In [1]:
import pandas as pd
import numpy as np
from transformers import pipeline
from transformers.pipelines.pt_utils import Dataset, KeyDataset
import time
from tqdm.auto import tqdm

In [2]:
class ListDataset(Dataset):
    def __init__(self, original_list):
        self.original_list = original_list

    def __len__(self):
        return len(self.original_list)
    
    def __getitem__(self, i):
        return self.original_list[i]

In [28]:
def run_sentiment_analysis_and_save(path_to_tweets):
    input_csv = pd.read_csv(path_to_tweets) 
    input_csv.head()
    tweet_text = input_csv['text'].to_list()
    filtered_tweets = [text for text in tweet_text if type(text) == str] # If some tweets have no text for whatever reason, we remove them
    print('Removed ', len(tweet_text) - len(filtered_tweets), 'invalid tweets')
    # print(tweet_text[:10])
    # tweets_dataset = ListDataset(tweet_text)
    print('Loaded tweets at ' + path_to_tweets)
    
    
    model_path = f"cardiffnlp/twitter-roberta-base-sentiment-latest"

    sentiment_pipeline = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
    
    print('Running Sentiment Analysis...')
    start_time = time.time()
    result = sentiment_pipeline(filtered_tweets)
    end_time = time.time()
    print('Time elapsed: ', end_time - start_time, ' seconds')
    
    result_df = pd.DataFrame(result)
    result_df = input_csv.join(result_df)
    result_df.to_csv(path_to_tweets.split('.csv')[0] + '_with_sentiment.csv')
    
    

## Q1

In [17]:
q1_path = 'data/q1/all_tweets.csv'
run_sentiment_analysis_and_save(q1_path)

Loaded tweets at data/q1/all_tweets.csv


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Running Sentiment Analysis...
Time elapsed:  2448.1815102100372  seconds


## Q2

In [19]:
q2_path_1 = 'data/q2/nato_english.csv'
run_sentiment_analysis_and_save(q2_path_1)

Loaded tweets at data/q2/nato_english.csv


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Running Sentiment Analysis...
Time elapsed:  1248.4962439537048  seconds


In [20]:
q2_path_2 = 'data/q2/putin_english.csv'
run_sentiment_analysis_and_save(q2_path_2)

Loaded tweets at data/q2/putin_english.csv


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Running Sentiment Analysis...
Time elapsed:  1128.437345981598  seconds


In [26]:
q2_path_3 = 'data/q2/zelensky_english.csv'
run_sentiment_analysis_and_save(q2_path_3)

27507 27505
Loaded tweets at data/q2/zelensky_english.csv


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Running Sentiment Analysis...
Time elapsed:  1356.382895231247  seconds


## Q3

In [27]:
q3_path_1 = 'data/q3/fox_news.csv'
run_sentiment_analysis_and_save(q3_path_1)

16636 16636
Loaded tweets at data/q3/fox_news.csv


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Running Sentiment Analysis...


RuntimeError: The expanded size of the tensor (515) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 515].  Tensor sizes: [1, 514]

In [None]:
q3_path_2 = 'data/q3/new_york_times.csv'
run_sentiment_analysis_and_save(q3_path_2)