# Subjectivity in News Articles

## Group:
- Luca Babboni - luca.babboni2@studio.unibo.it
- Matteo Fasulo - matteo.fasulo@studio.unibo.it
- Luca Tedeschini - luca.tedeschini3@studio.unibo.it

## Description

This notebook addresses Task 1 proposed in [CheckThat Lab](https://checkthat.gitlab.io/clef2025/) of CLEF 2025. In this task, systems are challenged to distinguish whether a sentence from a news article expresses the subjective view of the author behind it or presents an objective view on the covered topic instead.

This is a binary classification tasks in which systems have to identify whether a text sequence (a sentence or a paragraph) is subjective (SUBJ) or objective (OBJ).

The task comprises three settings:

* Monolingual: train and test on data in a given language
* Multilingual: train and test on data comprising several languages
* Zero-shot: train on several languages and test on unseen languages

training data in five languages:
* Arabic
* Bulgarian
* English
* German
* Italian

The official evaluation is macro-averaged F1 between the two classes.

In [None]:
import os

import csv

import numpy as np
import pandas as pd

from tqdm import tqdm

from joblib import delayed, Parallel

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, roc_auc_score, f1_score

from sklearn.utils.class_weight import compute_class_weight

import torch
import torch.nn as nn

from sentence_transformers import SentenceTransformer
from datasets import Dataset
from huggingface_hub import notebook_login
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding, RobertaTokenizerFast, RobertaForSequenceClassification, pipeline

2025-02-27 20:26:23.282371: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-27 20:26:23.283221: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-27 20:26:23.285397: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-27 20:26:23.289936: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1740684383.298434   30201 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1740684383.30

In [2]:
SEED = 42
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
train_filepath = '/kaggle/input/clef2025-checkthat/data/english/train_en.tsv'
test_filepath = '/kaggle/input/clef2025-checkthat/data/english/dev_test_en.tsv'

In [4]:
train_data = pd.read_csv(train_filepath, sep='\t', quoting=csv.QUOTE_NONE)
test_data = pd.read_csv(test_filepath, sep='\t', quoting=csv.QUOTE_NONE)

In [5]:
train_data.label.value_counts(), test_data.label.value_counts()

(label
 OBJ     532
 SUBJ    298
 Name: count, dtype: int64,
 label
 OBJ     362
 SUBJ    122
 Name: count, dtype: int64)

Legend:
* Objective -> 0
* Subjective -> 1

In [6]:
train_data['label'] = train_data['label'].apply(lambda x: 1 if x == 'SUBJ' else 0)
test_data['label'] = test_data['label'].apply(lambda x: 1 if x == 'SUBJ' else 0)

In [7]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# Baseline Model

In [8]:
vect = SentenceTransformer("all-mpnet-base-v2")

In [9]:
model = LogisticRegression(class_weight="balanced", random_state=SEED)
model.fit(X=vect.encode(train_data['sentence'].values), y=train_data['label'].values)

  return F.linear(input, self.weight, self.bias)


In [10]:
predictions = model.predict(X=vect.encode(test_data['sentence'].values)).tolist()

In [11]:
pred_df = pd.DataFrame()
pred_df['sentence_id'] = test_data['sentence_id']
pred_df['label'] = predictions

In [12]:
def evaluate_model(gold_values, predicted_values):
    acc = accuracy_score(gold_values, predicted_values)
    m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(gold_values, predicted_values, average="macro",
                                                               zero_division=0)
    p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(gold_values, predicted_values, labels=[1],
                                                               zero_division=0)
    #roc_auc = roc_auc_score(gold_values, predicted_probabilities)

    return {
        'macro_F1': m_f1,
        'macro_P': m_prec,
        'macro_R': m_rec,
        'SUBJ_F1': p_f1[0],
        'SUBJ_P': p_prec[0],
        'SUBJ_R': p_rec[0],
        'accuracy': acc,
        #'roc_auc': roc_auc
    }

In [13]:
evaluate_model(gold_values=test_data.label.values, predicted_values=predictions)

{'macro_F1': 0.6408223299348154,
 'macro_P': 0.6906694403710398,
 'macro_R': 0.6265963227968481,
 'SUBJ_F1': np.float64(0.42487046632124353),
 'SUBJ_P': np.float64(0.5774647887323944),
 'SUBJ_R': np.float64(0.3360655737704918),
 'accuracy': 0.7706611570247934}

# Twitter RoBERTa-base 2022 154M

In [14]:
model_card = "cardiffnlp/twitter-roberta-base-2022-154m"
tokenizer = AutoTokenizer.from_pretrained(model_card, use_Fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_card, num_labels=2, id2label={0: 'OBJ', 1: 'SUBJ'}, label2id={'OBJ': 0, 'SUBJ': 1})

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-2022-154m and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
def preprocess_text(texts):
    return tokenizer(texts['sentence'])

In [16]:
train_dl = Dataset.from_pandas(train_data)
test_dl = Dataset.from_pandas(test_data)

In [17]:
train_dl = train_dl.map(preprocess_text, batched=True)
test_dl = test_dl.map(preprocess_text, batched=True)

Map:   0%|          | 0/830 [00:00<?, ? examples/s]

Map:   0%|          | 0/484 [00:00<?, ? examples/s]

In [18]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [19]:
training_args = TrainingArguments(
    output_dir=f'model',                 
    learning_rate=5e-6,
    per_device_train_batch_size=16,         
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=1e-4,
    eval_strategy="epoch",       
    save_strategy="no",           
    #save_safetensors=True,
    #load_best_model_at_end=True,
    report_to='none',
    seed=SEED,
    data_seed=SEED
)

In [20]:
# Taken from https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3700 (with some minor changes removing useless parts)
class CustomTrainer(Trainer):
    def __init__(self, class_weights, device, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # You pass the class weights when instantiating the Trainer
        self.class_weights = class_weights
        self.device = device

    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We extract the logits from the model outputs
            logits = outputs.get('logits')
            # We compute the loss manually passing the class weights to the loss function
            criterion = torch.nn.CrossEntropyLoss(weight=self.class_weights.to(self.device)) # Modified to use the class weights
            # We compute the loss using the modified criterion
            loss = criterion(logits, inputs['labels'])

        return (loss, outputs) if return_outputs else loss

In [21]:
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(train_data['label']), y=train_data['label'])
class_weights = torch.tensor(class_weights, dtype=torch.float32)
class_weights

tensor([0.7801, 1.3926])

In [None]:
def compute_metrics(output_info):
    """
    Compute various evaluation metrics for model predictions.
    
    Args:
        output_info (tuple): A tuple containing the model predictions and the true labels.
            - predictions (np.ndarray): The predicted labels from the model.
            - labels (np.ndarray): The true labels.
    
    Returns:
        dict: A dictionary containing the computed metrics:
            - 'f1': The F1 score (macro average).
            - 'accuracy': The accuracy score.
            - 'precision': The precision score (macro average).
            - 'recall': The recall score (macro average).
    """
    print(output_info)
    predictions, labels = output_info
    predictions = np.array(predictions)
    labels = np.array(labels)
    predictions = np.argmax(predictions, axis=-1)
    
    f1 = f1_score(labels, predictions, average="macro", zero_division=0)
    acc = accuracy_score(labels, predictions)
    
    return {"f1-score" : f1, "Accuracy" : acc}

In [24]:
trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dl,
    eval_dataset=test_dl,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    class_weights=class_weights,
    device=device,
)

In [25]:
trainer.train()

Epoch,Training Loss,Validation Loss,F1-score,Accuracy
1,No log,0.672813,0.427896,0.747934
2,No log,0.610597,0.56887,0.764463
3,No log,0.58817,0.640164,0.783058
4,No log,0.544097,0.693212,0.799587
5,No log,0.656458,0.660726,0.795455
6,No log,0.647451,0.686666,0.807851
7,No log,0.655587,0.698769,0.809917
8,No log,0.722645,0.690804,0.811983
9,No log,0.788891,0.661697,0.799587
10,0.364500,0.772912,0.681288,0.807851


TrainOutput(global_step=520, training_loss=0.3574243848140423, metrics={'train_runtime': 75.7716, 'train_samples_per_second': 109.54, 'train_steps_per_second': 6.863, 'total_flos': 295059520812600.0, 'train_loss': 0.3574243848140423, 'epoch': 10.0})

# Emotions

In [29]:
model_card = "arpanghoshal/EmoRoBERTa"
tokenizer = RobertaTokenizerFast.from_pretrained(model_card)
model = RobertaForSequenceClassification.from_pretrained(model_card, from_tf=True)

2025-02-27 20:28:44.102974: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
All TF 2.0 model weights were used when initializing RobertaForSequenceClassification.

All the weights of RobertaForSequenceClassification were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForSequenceClassification for predictions without further training.


In [30]:
emotion = pipeline('sentiment-analysis', model='arpanghoshal/EmoRoBERTa', return_all_scores= True)

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at arpanghoshal/EmoRoBERTa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
Device set to use 0


In [31]:
# Example
print(train_data.iloc[0]['sentence'], train_data.iloc[0]['label'])
emotion_labels = emotion(train_data.iloc[0]['sentence'])
pd.DataFrame(emotion_labels[0]).sort_values(by='score', ascending=False)

Gone are the days when they led the world in recession-busting 1


Unnamed: 0,label,score
22,realization,0.370919
27,neutral,0.308371
9,disappointment,0.295116
3,annoyance,0.006366
4,approval,0.003771
20,optimism,0.002501
25,sadness,0.002472
10,disapproval,0.001753
0,admiration,0.001569
11,disgust,0.001283


In [32]:
emotion_array = np.zeros((train_data.shape[0], 28))

for i, sentence in enumerate(tqdm(train_data['sentence'])):
    result = emotion(sentence)[0]
    emotion_array[i] = np.array([list(d.values())[1] for d in result])

100%|██████████| 830/830 [01:10<00:00, 11.80it/s]


In [33]:
emotion_df_train = pd.DataFrame(emotion_array, columns=[list(d.values())[0] for d in result])
emotion_df_train.head()

Unnamed: 0,admiration,amusement,anger,annoyance,approval,caring,confusion,curiosity,desire,disappointment,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,0.001569,9.5e-05,0.000157,0.006366,0.003771,5.9e-05,0.001009,0.000406,0.00084,0.295116,...,0.000332,8.1e-05,0.002501,0.000134,0.370919,0.000118,0.000479,0.002472,0.000803,0.308371
1,2.1e-05,0.000125,0.000232,0.000431,0.000237,0.001657,0.000134,6.6e-05,0.000486,0.000117,...,8e-06,0.000179,0.990538,8.1e-05,0.000629,0.00021,0.000638,0.000158,0.000158,0.000849
2,4.9e-05,0.001679,0.003128,0.946737,0.001021,2.4e-05,0.002792,0.000393,0.000149,0.000748,...,1.2e-05,0.000314,0.000164,0.000124,0.001899,3.5e-05,2.4e-05,0.000103,6e-05,0.026365
3,5.2e-05,0.00026,4.4e-05,0.000441,0.001115,7e-05,0.000278,0.000303,9.9e-05,0.005117,...,5.4e-05,0.000472,0.001911,4e-05,0.045726,0.000123,7.3e-05,0.00046,0.861293,0.077763
4,0.000269,0.000101,3.4e-05,0.000148,0.000979,3.8e-05,2.9e-05,3.6e-05,0.000171,6.7e-05,...,1.2e-05,1.6e-05,0.000761,1.4e-05,0.000136,1.4e-05,2.2e-05,1.9e-05,6.1e-05,0.996816


In [34]:
train_data_augmented = pd.concat([train_data, emotion_df_train], axis=1)
train_data_augmented.head()

Unnamed: 0,sentence_id,sentence,label,solved_conflict,admiration,amusement,anger,annoyance,approval,caring,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,b9e1635a-72aa-467f-86d6-f56ef09f62c3,Gone are the days when they led the world in r...,1,True,0.001569,9.5e-05,0.000157,0.006366,0.003771,5.9e-05,...,0.000332,8.1e-05,0.002501,0.000134,0.370919,0.000118,0.000479,0.002472,0.000803,0.308371
1,f99b5143-70d2-494a-a2f5-c68f10d09d0a,The trend is expected to reverse as soon as ne...,0,False,2.1e-05,0.000125,0.000232,0.000431,0.000237,0.001657,...,8e-06,0.000179,0.990538,8.1e-05,0.000629,0.00021,0.000638,0.000158,0.000158,0.000849
2,4076639c-aa56-4202-ae0f-9d9217f8da68,But there is the specious point again.,0,False,4.9e-05,0.001679,0.003128,0.946737,0.001021,2.4e-05,...,1.2e-05,0.000314,0.000164,0.000124,0.001899,3.5e-05,2.4e-05,0.000103,6e-05,0.026365
3,b057c366-698e-419d-a284-9b16d835c64e,He added he wouldn’t be surprised to see a new...,0,False,5.2e-05,0.00026,4.4e-05,0.000441,0.001115,7e-05,...,5.4e-05,0.000472,0.001911,4e-05,0.045726,0.000123,7.3e-05,0.00046,0.861293,0.077763
4,a5a9645e-7850-41ba-90a2-5def725cd5b8,"Not less government, you see; the same amount ...",1,False,0.000269,0.000101,3.4e-05,0.000148,0.000979,3.8e-05,...,1.2e-05,1.6e-05,0.000761,1.4e-05,0.000136,1.4e-05,2.2e-05,1.9e-05,6.1e-05,0.996816


In [35]:
emotion_array = np.zeros((test_data.shape[0], 28))

for i, sentence in enumerate(tqdm(test_data['sentence'])):
    result = emotion(sentence)[0]
    emotion_array[i] = np.array([list(d.values())[1] for d in result])

100%|██████████| 484/484 [00:41<00:00, 11.80it/s]


In [36]:
emotion_df_test = pd.DataFrame(emotion_array, columns=[list(d.values())[0] for d in result])
emotion_df_test.head()

Unnamed: 0,admiration,amusement,anger,annoyance,approval,caring,confusion,curiosity,desire,disappointment,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,0.000211,0.000236,9e-06,0.000151,0.077252,0.000118,0.000153,2.8e-05,0.000184,0.000205,...,3.9e-05,3.8e-05,0.001291,4.8e-05,0.02791,7.2e-05,5.6e-05,5e-05,3.4e-05,0.891072
1,0.000428,0.00047,0.000124,0.000181,0.002491,0.000174,0.000955,9.9e-05,0.00065,0.000106,...,0.000465,0.00049,0.001429,6.3e-05,0.000649,2.5e-05,0.000232,0.000235,0.00027,0.981867
2,0.000551,0.000124,2.1e-05,0.000141,0.020175,2.9e-05,4.2e-05,1.6e-05,3.3e-05,0.000214,...,2e-05,1.1e-05,0.000158,1.1e-05,0.00211,2.1e-05,2.5e-05,5.6e-05,2e-05,0.974265
3,0.00042,0.000101,1.7e-05,0.00012,0.982816,8.5e-05,0.000133,0.000135,0.000268,4.9e-05,...,0.000642,5.7e-05,0.00074,4.8e-05,0.000641,0.00056,1e-05,1.6e-05,1.7e-05,0.010893
4,0.000398,0.000266,2.7e-05,0.000141,0.000731,6.3e-05,1.4e-05,3.2e-05,9.6e-05,6.6e-05,...,2.7e-05,2.1e-05,0.000153,1.9e-05,9.6e-05,1.3e-05,4e-05,5.1e-05,4e-05,0.996801


In [37]:
test_data_augmented = pd.concat([test_data, emotion_df_test], axis=1)
test_data_augmented.head()

Unnamed: 0,sentence_id,sentence,label,admiration,amusement,anger,annoyance,approval,caring,confusion,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,44f33601-157a-42ce-aa9f-0f7d305501f2,Blanco established himself earlier in his care...,0,0.000211,0.000236,9e-06,0.000151,0.077252,0.000118,0.000153,...,3.9e-05,3.8e-05,0.001291,4.8e-05,0.02791,7.2e-05,5.6e-05,5e-05,3.4e-05,0.891072
1,6f9e0f53-f76c-432f-bbea-b78400d600b8,RULE 13: ARTIFICIAL INTELLIGENCE Not only thi...,0,0.000428,0.00047,0.000124,0.000181,0.002491,0.000174,0.000955,...,0.000465,0.00049,0.001429,6.3e-05,0.000649,2.5e-05,0.000232,0.000235,0.00027,0.981867
2,61f93bdc-4c3e-4963-926c-0bbf139b44b9,The valuation is required by law and the figur...,0,0.000551,0.000124,2.1e-05,0.000141,0.020175,2.9e-05,4.2e-05,...,2e-05,1.1e-05,0.000158,1.1e-05,0.00211,2.1e-05,2.5e-05,5.6e-05,2e-05,0.974265
3,902148ec-dda3-4736-b318-0f20c63a1cf3,A sip can really hit the spot after a long bik...,1,0.00042,0.000101,1.7e-05,0.00012,0.982816,8.5e-05,0.000133,...,0.000642,5.7e-05,0.00074,4.8e-05,0.000641,0.00056,1e-05,1.6e-05,1.7e-05,0.010893
4,065b1996-4b40-4c74-9f62-afb44f69834e,"""Lobster!""""""",0,0.000398,0.000266,2.7e-05,0.000141,0.000731,6.3e-05,1.4e-05,...,2.7e-05,2.1e-05,0.000153,1.9e-05,9.6e-05,1.3e-05,4e-05,5.1e-05,4e-05,0.996801


In [38]:
if not os.path.exists('train_en_aug.csv'):
    train_data_augmented.to_csv('train_en_aug.csv', encoding='UTF-8')
    test_data_augmented.to_csv('dev_test_en_aug.csv', encoding='UTF-8')
else:
    train_data_augmented = pd.read_csv('train_en_aug.csv', encoding='UTF-8', index_col=0)
    test_data_augmented = pd.read_csv('dev_test_en_aug.csv', encoding='UTF-8', index_col=0)

In [112]:
def preprocess(text):
    preprocessed_text = []
    for t in text.split():
        if len(t) > 1:
            t = '@user' if t[0] == '@' and t.count('@') == 1 else t
            t = 'http' if t.startswith('http') else t
        preprocessed_text.append(t)
    return ' '.join(preprocessed_text)

In [113]:
# Taken from https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3700 (with some minor changes removing useless parts)
class CustomTrainer(Trainer):
    def __init__(self, class_weights, device, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # You pass the class weights when instantiating the Trainer
        self.class_weights = class_weights
        self.device = device

    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We extract the logits from the model outputs
            logits = outputs.logits
            # We compute the loss manually passing the class weights to the loss function
            criterion = torch.nn.CrossEntropyLoss(weight=self.class_weights.to(self.device)) # Modified to use the class weights
            # We compute the loss using the modified criterion
            loss = criterion(logits, inputs['labels'])

        return (loss, outputs) if return_outputs else loss

In [None]:
from transformers.modeling_outputs import SequenceClassifierOutput

class CustomEmotionModel(nn.Module):
    def __init__(self, model_card: str, num_labels: int = 2, num_emotions: int = 28):
        super(CustomEmotionModel, self).__init__()
        self.base_model = AutoModel.from_pretrained(model_card)
        self.emotion_branch = nn.Linear(num_emotions, 128)  # Example: 128 hidden units
        self.dropout = nn.Dropout(0.1)
        self.classifier = nn.Linear(self.base_model.config.hidden_size + 128, num_labels)

    def forward(self, input_ids, attention_mask, emotion_features, labels=None):
        outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.last_hidden_state[:, 0, :]  # CLS token representation
        
        # Process emotion features
        emotion_output = torch.relu(self.emotion_branch(emotion_features))
        
        # Concatenate base model output with emotion features
        combined_output = torch.cat((pooled_output, emotion_output), dim=1)
        
        # Apply dropout and classification layer
        combined_output = self.dropout(combined_output)
        logits = self.classifier(combined_output)

        loss = None
        if labels is not None:
            criterion = torch.nn.CrossEntropyLoss()
            loss = criterion(logits, labels)

        return SequenceClassifierOutput(loss=loss, logits=logits)
    

"""Should be something like
    def forward(self, input_ids, attention_mask, emotion_features, labels=None):
        outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.last_hidden_state[:, 0, :]  # CLS token representation
        
        # Process emotion features
        emotion_output = torch.relu(self.emotion_branch(emotion_features))
        
        # Concatenate base model output with emotion features
        combined_output = torch.cat((pooled_output, emotion_output), dim=1)
        
        # Apply dropout and classification layer
        combined_output = self.dropout(combined_output)
        logits = self.classifier(combined_output)

        loss = None
        if labels is not None:
            if self.class_weights is not None:
                criterion = torch.nn.CrossEntropyLoss(weight=self.class_weights)
            else:
                criterion = torch.nn.CrossEntropyLoss()
            loss = criterion(logits, labels)

        return SequenceClassifierOutput(loss=loss, logits=logits)

    but it doesn't have the class_weight in input so it yields errors
"""


In [140]:
train_data_augmented['all_emotions'] = train_data_augmented[train_data_augmented.columns[-28:]].apply(lambda x: np.array(x.values, dtype=np.float32), axis=1)
test_data_augmented['all_emotions'] = test_data_augmented[test_data_augmented.columns[-28:]].apply(lambda x: np.array(x.values, dtype=np.float32), axis=1)

ValueError: setting an array element with a sequence.

In [141]:
train_dl = Dataset.from_pandas(train_data_augmented)
test_dl = Dataset.from_pandas(test_data_augmented)

In [142]:
def tokenize_and_prepare(texts):
    tokenized = tokenizer(texts['sentence'])
    return {**tokenized, 'emotion_features': texts['all_emotions']}

In [143]:
train_dl = train_dl.map(tokenize_and_prepare, batched=True)
test_dl = test_dl.map(tokenize_and_prepare, batched=True)

Map:   0%|          | 0/830 [00:00<?, ? examples/s]

Map:   0%|          | 0/484 [00:00<?, ? examples/s]

In [144]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="pt")


In [145]:
training_args = TrainingArguments(
    output_dir=f'model',                 
    learning_rate=5e-6,
    per_device_train_batch_size=16,         
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=1e-4,
    eval_strategy="epoch",       
    save_strategy="no",           
    #save_safetensors=True,
    #load_best_model_at_end=True,
    report_to='none',
    seed=SEED,
    data_seed=SEED
)

In [146]:
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(train_data['label']), y=train_data['label'])
class_weights = torch.tensor(class_weights, dtype=torch.float32)
class_weights

tensor([0.7801, 1.3926])

In [147]:
model_card = "cardiffnlp/twitter-roberta-base-2022-154m"
tokenizer = AutoTokenizer.from_pretrained(model_card, use_Fast=False)
model = CustomEmotionModel(model_card, num_labels = 2, num_emotions=28)

Some weights of RobertaModel were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-2022-154m and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [148]:
def compute_metrics(output_info):
    """
    Compute various evaluation metrics for model predictions.
    
    Args:
        output_info (tuple): A tuple containing the model predictions and the true labels.
            - predictions (np.ndarray): The predicted labels from the model.
            - labels (np.ndarray): The true labels.
    
    Returns:
        dict: A dictionary containing the computed metrics:
            - 'f1': The F1 score (macro average).
            - 'accuracy': The accuracy score.
            - 'precision': The precision score (macro average).
            - 'recall': The recall score (macro average).
    """
    predictions, labels = output_info
    predictions = np.array(predictions)
    labels = np.array(labels)
    predictions = np.argmax(predictions, axis=-1)
    
    f1 = f1_score(labels, predictions, average="macro", zero_division=0)
    acc = accuracy_score(labels, predictions)
    
    return {"f1-score" : f1, "Accuracy" : acc}

In [149]:
trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dl,
    eval_dataset=test_dl,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    class_weights=class_weights,
    device=device,
)

In [150]:
trainer.train()

Epoch,Training Loss,Validation Loss,F1-score,Accuracy
1,No log,0.668794,0.48573,0.762397
2,No log,0.586765,0.567568,0.768595
3,No log,0.552118,0.637979,0.791322
4,No log,0.502421,0.718515,0.816116
5,No log,0.660964,0.638356,0.795455
6,No log,0.691435,0.647083,0.797521
7,No log,0.668955,0.665685,0.803719
8,No log,0.702885,0.677698,0.809917
9,No log,0.723253,0.677698,0.809917
10,0.342700,0.770981,0.662624,0.803719


TrainOutput(global_step=520, training_loss=0.33451887644254247, metrics={'train_runtime': 75.4638, 'train_samples_per_second': 109.986, 'train_steps_per_second': 6.891, 'total_flos': 0.0, 'train_loss': 0.33451887644254247, 'epoch': 10.0})