## Step 2: Performing sentiment analysis on news headings

**Objectives**
- 2.1. Importing a sentiment model from Huggingface
- 2.2. Writing functions to calculate average sentiment for each day
- 2.3. Getting the news heading and outputting the sentiment score in JSON format

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

import json

### 2.1. Importing a sentiment model from Huggingface

In [2]:
modelName = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(modelName)
model = AutoModelForSequenceClassification.from_pretrained(modelName)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/669M [00:00<?, ?B/s]

### 2.2. Writing functions to calculate average sentiment for each day

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
def calculateDailySentiment(headlines):
    texts = [headline['heading'] for headline in headlines]
    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512, return_attention_mask=True)
    outputs = model(**inputs)
    logits = outputs.logits
    scores = logits.softmax(dim=1)
    averageScore = scores.mean(dim=0).tolist()
    return averageScore
#enddef

def analyzeAndSaveSentiment(inputFile, outputFile):
    with open(inputFile, 'r') as file:
        data = json.load(file)
    #endwith

    result = {}

    for date, headlines in data.items():
        averageScore = calculateDailySentiment(headlines)
        print(f"{date} > {averageScore}")
        result[date] = averageScore
    #endfor

    with open(outputFile, 'w') as outputFile:
        json.dump(result, outputFile, indent=2)
    #endwith
#enddef


### 2.3. Getting the news heading and outputting the sentiment score in JSON format
- Score for each day is saved in the file ([daily_scores.json](./data/news2023/daily_scores.json))

In [6]:
inputJsonFile = '/content/drive/MyDrive/headlines.json'
outputJsonFile = '/content/drive/MyDrive/daily_scores.json'

analyzeAndSaveSentiment(inputJsonFile, outputJsonFile)

2025-01-01 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-02 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-03 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-04 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-05 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-06 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-07 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-08 > [0.024691052734851837, 0.03412671014666557, 0.15708516538143158, 0.42625612020492554, 0.35784101486206055]
2025-01-09 > [0.024691052734851837, 0.03