# Subjectivity in News Articles

## Group:
- Luca Babboni - luca.babboni2@studio.unibo.it
- Matteo Fasulo - matteo.fasulo@studio.unibo.it
- Luca Tedeschini - luca.tedeschini3@studio.unibo.it

## Description

This notebook addresses Task 1 proposed in [CheckThat Lab](https://checkthat.gitlab.io/clef2025/) of CLEF 2025. In this task, systems are challenged to distinguish whether a sentence from a news article expresses the subjective view of the author behind it or presents an objective view on the covered topic instead.

This is a binary classification tasks in which systems have to identify whether a text sequence (a sentence or a paragraph) is subjective (SUBJ) or objective (OBJ).

The task comprises three settings:

* Monolingual: train and test on data in a given language
* Multilingual: train and test on data comprising several languages
* Zero-shot: train on several languages and test on unseen languages

training data in five languages:
* Arabic
* Bulgarian
* English
* German
* Italian

The official evaluation is macro-averaged F1 between the two classes.

# Installing dependencies

This notebook uses quantized models, and some additional libraries are required. If you are running this notebook on either Colab or Kaggle, please run the cell below once, then run the whole notebook normally.



In [1]:
%%capture
%pip install -U transformers[torch] bitsandbytes trl peft sacremoses ctranslate2 accelerate

In [None]:
import os
os.kill(os.getpid(), 9)

# Importing libraries

In [None]:
import os
import itertools
import gc
from pathlib import Path

import csv

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from tqdm import tqdm

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay, precision_recall_fscore_support
from sklearn.utils.class_weight import compute_class_weight

import torch
import torch.nn as nn
import torch.nn.functional as F

from sentence_transformers import SentenceTransformer
from datasets import Dataset
from huggingface_hub import notebook_login
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification, 
    Trainer, 
    TrainingArguments, 
    DataCollatorWithPadding, 
    PreTrainedModel,
    DebertaV2Model, 
    DebertaV2Config, 
    pipeline
)
from transformers.trainer_utils import PredictionOutput
from transformers.models.deberta.modeling_deberta import ContextPooler

## Setting the device

In [2]:
SEED = 42
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tqdm.pandas() # display tqdm on pandas apply functions
print(f"Using device: {device}")

Using device: cuda


## Setting Library Seeds

This step is necessary to guarantee reproducibility.



In [3]:
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

# Subjectivity Class  
This class is used throughout the whole notebook as a utility toolbox to avoid code redundancy. When a method of this class is called for the first time, its behavior will be explained.  

In [4]:
class Subjectivity:
    def __init__(self, data_folder: str = 'data', seed: int = 42, device: str = 'cuda'):
        self.seed = seed
        self.device = device
        self.languages = [language for language in os.listdir(data_folder)]

        dataset = self.create_dataset(data_folder=data_folder)
        self.dataset = dataset
        
        train, dev, test = self.get_splits(dataset, print_shapes=True)
        self.train = train
        self.dev = dev
        self.test = test

        self.all_data = self.get_per_lang_dataset()
        

    def create_dataset(self, data_folder: str = 'data'):
        dataset = pd.DataFrame(columns=['sentence_id','sentence','label','lang','split'])
        for language in os.listdir(data_folder):
            for filename in os.listdir(f"{data_folder}{os.sep}{language}"):
                if '.tsv' in filename:
                    abs_path = f"{data_folder}{os.sep}{language}{os.sep}{filename}"
                    df = pd.read_csv(abs_path, sep='\t', quoting=csv.QUOTE_NONE)
                    if 'solved_conflict' in df.columns:
                        df.drop(columns=['solved_conflict'], inplace=True)
                    df['lang'] = language
                    df['split'] = Path(filename).stem
                    dataset = pd.concat([dataset, df], axis=0)
        return dataset

    def get_splits(self, dataset: pd.DataFrame, print_shapes: bool = True):
        train = dataset[dataset['split'].str.contains('train')].copy()
        dev = dataset[(dataset['split'].str.contains('dev')) & ~(dataset['split'].str.contains('dev_test'))].copy()
        test = dataset[dataset['split'].str.contains('dev_test')].copy()

        # encode the target variable to int (0: obj; 1: subj)
        train.loc[:, 'label'] = train['label'].apply(lambda x: 0 if x == 'OBJ' else 1)
        dev.loc[:, 'label'] = dev['label'].apply(lambda x: 0 if x == 'OBJ' else 1)
        test.loc[:, 'label'] = test['label'].apply(lambda x: 0 if x == 'OBJ' else 1)

        # cast to int
        train['label'] = train['label'].astype(int)
        dev['label'] = dev['label'].astype(int)
        test['label'] = test['label'].astype(int)

        if print_shapes:
            print(f"Train: {train.shape}")
            print(f"Dev: {dev.shape}")
            print(f"Test: {test.shape}")
            
        return train, dev, test

    def get_per_lang_dataset(self):
        """
        dataset_dict = {
            'english': {
                'train': ...
                'dev': ...
                'test': ...
            },
        }
        """
        dataset_dict = {}
        for language in self.languages:
            dataset_dict[language] = {}
            # get the train data
            dataset_dict[language]['train'] = self.train[self.train['lang']==language].copy()
            # get the dev data
            dataset_dict[language]['dev'] = self.dev[self.dev['lang']==language].copy()
            # get the test data
            dataset_dict[language]['test'] = self.test[self.test['lang']==language].copy()
        return dataset_dict

    def print_label_distrib(self, dataset: pd.DataFrame):
        print(dataset['label'].value_counts(normalize=True))

    def get_baseline_model(self, model_name: str = "paraphrase-multilingual-MiniLM-L12-v2"):
        vect = SentenceTransformer(model_name)
        self.vect = vect
        return vect

    def train_baseline_model(self, vect, train_data: pd.DataFrame, test_data: pd.DataFrame, solver: str = 'saga'):
        model = LogisticRegression(class_weight="balanced", solver=solver, random_state=self.seed)
        model.fit(X=vect.encode(train_data['sentence'].values), y=train_data['label'].values)
        predictions = model.predict(X=vect.encode(test_data['sentence'].values)).tolist()

        # eval performances
        perfs = self.evaluate_model(gold_values=test_data['label'].values, predicted_values=predictions)

        return perfs

    def get_tokenizer(self, model_card: str = "microsoft/mdeberta-v3-base"):
        tokenizer = AutoTokenizer.from_pretrained(model_card)
        self.tokenizer = tokenizer
        return tokenizer

    def get_model(self, model_card: str = "microsoft/mdeberta-v3-base", *args, **kwargs):
        model = AutoModelForSequenceClassification.from_pretrained(model_card, *args, **kwargs)
        self.model = model
        return model

    def get_class_weights(self, dataset: pd.DataFrame):
        class_weights = compute_class_weight('balanced', classes=np.unique(dataset['label']), y=dataset['label'])
        return class_weights

    def evaluate_model(self, gold_values, predicted_values):
        acc = accuracy_score(gold_values, predicted_values)
        m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(gold_values, predicted_values, average="macro",
                                                                   zero_division=0)
        p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(gold_values, predicted_values, labels=[1],
                                                                   zero_division=0)
    
        return {
            'macro_F1': m_f1,
            'macro_P': m_prec,
            'macro_R': m_rec,
            'SUBJ_F1': p_f1[0],
            'SUBJ_P': p_prec[0],
            'SUBJ_R': p_rec[0],
            'accuracy': acc
        }

# Setting the Data Folder  

Please modify this path with your dataset's local path. The `data` folder should follow the official challenge structure:
```
data/
|---- arabic/
|--------- xxxx.tsv
|---- bulgarian/
|--------- xxxx.tsv
|---- english/
|--------- xxxx.tsv
|---- german/
|--------- xxxx.tsv
|---- italian/
|--------- xxxx.tsv
```

In [5]:
data_folder = '/kaggle/input/clef2025-checkthat/data' # data

# Creating Our Detector Object  

The `__init__()` method of the `Subjectivity` class will load the dataset, set the device and the seeds, and automatically create the `test`, `dev`, and `train` splits from the datasets. It will also convert the `SUBJ` and `OBJ` labels to their corresponding numerical versions, so they are ready to be fed into a model.  


In [6]:
detector = Subjectivity(data_folder=data_folder, seed=SEED, device=device)

Train: (6418, 5)
Dev: (2401, 5)
Test: (2332, 5)


In [7]:
detector.print_label_distrib(detector.train)
detector.print_label_distrib(detector.dev)
detector.print_label_distrib(detector.test)

label
0    0.631349
1    0.368651
Name: proportion, dtype: float64
label
0    0.612245
1    0.387755
Name: proportion, dtype: float64
label
0    0.657376
1    0.342624
Name: proportion, dtype: float64


In [8]:
detector.all_data['german']['train']['sentence'].str.len().describe()

count    800.000000
mean     126.296250
std       67.334117
min       31.000000
25%       80.000000
50%      112.500000
75%      161.000000
max      625.000000
Name: sentence, dtype: float64

# Hugging face notebook login

To correctly download and use the hugging face models, a token key needs to be provided. Please refer to this [page](https://huggingface.co/docs/hub/security-tokens)

In [9]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Here we create the `results` dictionary and the `predictions_dict` dictionary. They will be used to store all the model's output

In [11]:
results = {}
predictions_dict = {}

# Custom Trainer  

This class extends the `Trainer` provided by Hugging Face. Since we needed to tweak some details in the training process, we opted to override some `Trainer` functions with custom ones.  


In [None]:
# Taken from https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3700 (with some minor changes removing unused parts)
class CustomTrainer(Trainer):
    def __init__(self, *args, class_weights=None, weights_dtype=torch.float32, **kwargs):
        super().__init__(*args, **kwargs)
        # Ensure label_weights is a tensor
        if class_weights is not None:
            self.class_weights = torch.tensor(class_weights, dtype=weights_dtype).to(self.args.device)
        else:
            self.class_weights = None

    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        # Extract labels
        labels = inputs.get("labels")

        # Forward pass
        outputs = model(**inputs)

        # Extract logits 
        logits = outputs.get('logits')

        # Compute loss with class weights for imbalanced data handling
        if self.class_weights is not None:
            loss = F.cross_entropy(logits, labels, weight=self.class_weights)
        else:
            loss = F.cross_entropy(logits, labels)

        return (loss, outputs) if return_outputs else loss

    def compute_best_threshold(self, dataset, ignore_keys=None, metric_key_prefix="test"):
        # Get raw predictions from parent class
        output = super().predict(dataset, ignore_keys, metric_key_prefix)

        # Convert logits to probabilities using softmax (for binary classification)
        logits = output.predictions
        logits_tensor = torch.tensor(logits)
        probabilities = torch.softmax(logits_tensor, dim=-1).numpy()

        # Calculate optimal threshold
        labels = output.label_ids
        thresholds = np.linspace(0.1, 0.9, 100) 

        best_threshold = 0.5  # Default threshold
        best_f1 = 0

        for threshold in thresholds:
            predictions = (probabilities[:, 1] >= threshold).astype(int)
            _, _, f1, _ = precision_recall_fscore_support(labels, predictions, average="macro", zero_division=0)
            
            if f1 > best_f1:
                best_f1 = f1
                best_threshold = threshold

        # Return the best threshold found
        return best_threshold
        

    def predict(self, dataset, threshold: float = 0.5, ignore_keys=None, metric_key_prefix="test"):
        # Get raw predictions from parent class
        output = super().predict(dataset, ignore_keys, metric_key_prefix)
        
        # Convert logits to probabilities using softmax (for binary classification)
        logits = output.predictions
        logits_tensor = torch.tensor(logits)
        probabilities = torch.softmax(logits_tensor, dim=-1).numpy()
        
        final_predictions = (probabilities[:, 1] >= threshold).astype(int)

        # Update predictions in the output object
        return PredictionOutput(
            predictions=final_predictions,
            label_ids=output.label_ids,
            metrics=output.metrics
        )

In [13]:
def tokenize_text(texts):
    return tokenizer(texts['sentence'], padding=True, truncation=True, max_length=256, return_tensors='pt')

In [14]:
def evaluate_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    acc = accuracy_score(labels, predictions)
    m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(labels, predictions, average="macro",
                                                                zero_division=0)
    p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(labels, predictions, labels=[1],
                                                                zero_division=0)

    return {
        'macro_F1': m_f1,
        'macro_P': m_prec,
        'macro_R': m_rec,
        'SUBJ_F1': p_f1[0],
        'SUBJ_P': p_prec[0],
        'SUBJ_R': p_rec[0],
        'accuracy': acc
    }

In [15]:
def save_predictions(test_data, predictions, filename: str, save_dir: str = 'results'):
    os.makedirs(save_dir, exist_ok=True)
    pred_df = pd.DataFrame()
    pred_df['sentence_id'] = test_data['sentence_id']
    pred_df['label'] = predictions
    pred_df['label'] = pred_df['label'].apply(lambda x: 'OBJ' if x == 0 else 'SUBJ')

    predictions_filepath = os.path.join(save_dir, filename)
    pred_df.to_csv(predictions_filepath, index=False, sep='\t')

    print(f"Saved predictions into file:", predictions_filepath)
    return predictions_filepath

In [None]:
class CustomModel(PreTrainedModel):
    config_class = DebertaV2Config

    def __init__(self, config, sentiment_dim=3, num_labels=2, *args, **kwargs):
        super().__init__(config, *args, **kwargs)
        self.deberta = DebertaV2Model(config)
        self.pooler = ContextPooler(config)
        output_dim = self.pooler.output_dim
        self.dropout = nn.Dropout(0.1)

        self.classifier = nn.Linear(output_dim + sentiment_dim, num_labels)

    def forward(self, input_ids, positive, neutral, negative, attention_mask=None, labels=None):
        outputs = self.deberta(input_ids=input_ids, attention_mask=attention_mask)

        encoder_layer = outputs[0]
        pooled_output = self.pooler(encoder_layer)
        
        # Sentiment features as a single tensor
        sentiment_features = torch.stack((positive, neutral, negative), dim=1)  # Shape: (batch_size, 3)
        
        # Combine CLS embedding with sentiment features
        combined_features = torch.cat((pooled_output, sentiment_features), dim=1)
        
        # Classification head
        logits = self.classifier(self.dropout(combined_features))
        
        return {'logits': logits}

# Zero shot inference

## Utility function

This function can be then integrated in the Subjectivity class

In [11]:
def zero_shot_prepare_data(train_languages : list,
                           train : pd.DataFrame,
                           dev : pd.DataFrame,
                           test : pd.DataFrame):

    
    train_set = train[train["lang"].isin(train_languages)].copy()
    dev_set = dev[~dev["lang"].isin(train_languages)].copy()
    test_set = test[~test["lang"].isin(train_languages)].copy()

    return train_set, dev_set, test_set

In [12]:
def generate_name(names : tuple):
    generated_name = ""
    for x in names:
        generated_name = generated_name + x.capitalize()[:2]
    return generated_name

## Custom loop to test triplets

In [None]:
pipe = pipeline("sentiment-analysis", model="cardiffnlp/twitter-xlm-roberta-base-sentiment", tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment", top_k=None)

In [None]:
def extract_sentiment(text):
    sentiments = pipe(text)[0]
    return {k:v for k,v in [(list(sentiment.values())[0], list(sentiment.values())[1]) for sentiment in sentiments]}

In [None]:
model_card = "microsoft/mdeberta-v3-base"
tokenizer = detector.get_tokenizer(model_card=model_card)

epochs = 6
batch_size = 16
lr = 1e-5
weight_decay = 0.0
label_smoothing = 0.0

training_args = TrainingArguments(
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    learning_rate=lr,
    num_train_epochs=epochs,
    weight_decay=weight_decay,
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    #warmup_ratio=0.5,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to="none"
)

for i, group in enumerate(itertools.combinations(detector.languages, 3)):
    print(f"TESTING {i} - {group}")
    with torch.no_grad():
        torch.cuda.empty_cache()
    if 'model' in locals() or 'model' in globals():
        del model
        print("Model deleted!")
    gc.collect()
    
    group_name = generate_name(group)

    zs_train, zs_dev, zs_test = zero_shot_prepare_data(group, detector.train, detector.dev, detector.test)

    language = group_name+'-NoSentiment'


    
    train_data = Dataset.from_pandas(zs_train)
    dev_data = Dataset.from_pandas(zs_dev)
    test_data = Dataset.from_pandas(zs_test)
    
    train_data = train_data.map(tokenize_text, batched=True)
    dev_data = dev_data.map(tokenize_text, batched=True)
    test_data = test_data.map(tokenize_text, batched=True)
    
    class_weights = detector.get_class_weights(zs_train)

    model = detector.get_model(
        model_card=model_card, 
        num_labels=2, 
        id2label={0: 'OBJ', 1: 'SUBJ'}, 
        label2id={'OBJ': 0, 'SUBJ': 1},
        output_attentions = False,
        output_hidden_states = False
    )
    
    collator_fn = DataCollatorWithPadding(tokenizer=tokenizer)

    trainer = CustomTrainer(
        model=model,
        args=training_args,
        train_dataset=train_data,
        eval_dataset=dev_data,
        data_collator=collator_fn,
        compute_metrics=evaluate_metrics,
        class_weights=class_weights,
    )

    trainer.train()

    best_thr = trainer.compute_best_threshold(dataset=dev_data)
    pred_info = trainer.predict(dataset=test_data, threshold=best_thr)
    predictions, labels = pred_info.predictions, pred_info.label_ids
    
    acc = accuracy_score(labels, predictions)
    m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(labels, predictions, average="macro",
                                                                zero_division=0)
    p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(labels, predictions, labels=[1],
                                                                zero_division=0)
    stats = {
            'macro_F1': m_f1,
            'macro_P': m_prec,
            'macro_R': m_rec,
            'SUBJ_F1': p_f1[0],
            'SUBJ_P': p_prec[0],
            'SUBJ_R': p_rec[0],
            'accuracy': acc
        }
    
    results[language] = stats

    
    with torch.no_grad():
        torch.cuda.empty_cache()
    if 'model' in locals() or 'model' in globals():
        del model
        print("Model deleted!")
    gc.collect()

    language = group_name+'-Sentiment'
    
    model = CustomModel(
        model_name=model_card, 
        num_labels=2, 
        sentiment_dim=3
    )



    zs_train[['positive', 'neutral', 'negative']] = zs_train.progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')
    zs_dev[['positive', 'neutral', 'negative']] = zs_dev.progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')
    zs_test[['positive', 'neutral', 'negative']] = zs_test.progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')

    train_data = Dataset.from_pandas(zs_train)
    dev_data = Dataset.from_pandas(zs_dev)
    test_data = Dataset.from_pandas(zs_test)

    train_data = train_data.map(tokenize_text, batched=True)
    dev_data = dev_data.map(tokenize_text, batched=True)
    test_data = test_data.map(tokenize_text, batched=True)

    collator_fn = DataCollatorWithPadding(tokenizer=tokenizer)

    trainer = CustomTrainer(
        model = model,
        args = training_args,
        train_dataset = train_data,
        eval_dataset = dev_data,
        data_collator = collator_fn,
        compute_metrics = evaluate_metrics,
        class_weights=class_weights,
    )

    trainer.train()

    best_thr = trainer.compute_best_threshold(dataset=dev_data)
    pred_info = trainer.predict(dataset=test_data, threshold=best_thr)
    predictions, labels = pred_info.predictions, pred_info.label_ids
    
    acc = accuracy_score(labels, predictions)
    m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(labels, predictions, average="macro",
                                                                zero_division=0)
    p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(labels, predictions, labels=[1],
                                                                zero_division=0)
    stats = {
            'macro_F1': m_f1,
            'macro_P': m_prec,
            'macro_R': m_rec,
            'SUBJ_F1': p_f1[0],
            'SUBJ_P': p_prec[0],
            'SUBJ_R': p_rec[0],
            'accuracy': acc
        }
    
    results[language] = stats
    
    

Device set to use cuda:0


TESTING 0 - ('english', 'italian')
Model deleted!


Map:   0%|          | 0/2443 [00:00<?, ? examples/s]

Map:   0%|          | 0/1272 [00:00<?, ? examples/s]

Map:   0%|          | 0/1335 [00:00<?, ? examples/s]

Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/mdeberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Macro F1,Macro P,Macro R,Subj F1,Subj P,Subj R,Accuracy
1,No log,0.657371,0.668724,0.668303,0.669217,0.607522,0.602294,0.61284,0.680031
2,No log,0.828612,0.645487,0.692242,0.645853,0.519512,0.696078,0.414397,0.690252
3,No log,1.014649,0.661566,0.69591,0.659338,0.550351,0.691176,0.457198,0.698113
4,0.446900,1.211402,0.649742,0.693915,0.649397,0.527207,0.696486,0.424125,0.69261
5,0.446900,1.423838,0.649075,0.700674,0.649497,0.522167,0.711409,0.412451,0.694969
6,0.446900,1.465843,0.647287,0.694394,0.647485,0.521951,0.699346,0.416342,0.691824


Model deleted!


100%|██████████| 2443/2443 [00:20<00:00, 121.21it/s]
100%|██████████| 1272/1272 [00:10<00:00, 120.28it/s]
100%|██████████| 1335/1335 [00:11<00:00, 119.28it/s]


Map:   0%|          | 0/2443 [00:00<?, ? examples/s]

Map:   0%|          | 0/1272 [00:00<?, ? examples/s]

Map:   0%|          | 0/1335 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss,Macro F1,Macro P,Macro R,Subj F1,Subj P,Subj R,Accuracy
1,No log,0.725917,0.61222,0.629209,0.631277,0.603379,0.514403,0.729572,0.612421
2,No log,0.972189,0.634223,0.718251,0.64007,0.486129,0.757202,0.357977,0.694182
3,No log,0.9552,0.685101,0.702052,0.681131,0.596721,0.680798,0.531128,0.709906
4,0.504100,1.252779,0.65337,0.706423,0.653422,0.527744,0.720539,0.416342,0.698899
5,0.504100,1.394758,0.648759,0.707636,0.649911,0.518148,0.726316,0.402724,0.697327
6,0.504100,1.417408,0.653806,0.701681,0.653355,0.53106,0.710098,0.424125,0.697327


In [27]:
pd.DataFrame(results).T.sort_values(by='macro_F1', ascending=False).round(4)

Unnamed: 0,macro_F1,macro_P,macro_R,SUBJ_F1,SUBJ_P,SUBJ_R,accuracy
EnIt-NoSentiment,0.6147,0.6219,0.6135,0.5166,0.5661,0.475,0.6397
EnIt-Sentiment,0.6022,0.6074,0.6012,0.5045,0.5451,0.4695,0.6262


In [None]:
dataframe = pd.DataFrame(results)

In [None]:
dataframe.to_csv("results.csv")

# Zero shot without Arabic language

In [129]:
with torch.no_grad():
    torch.cuda.empty_cache()

if 'model' in locals() or 'model' in globals():
    del model
    print("Model deleted!")

gc.collect()

Model deleted!


10705

In [131]:
language = 'english'

epochs = 6
batch_size = 16
lr = 1e-5
weight_decay = 0.0

In [132]:
model_card = "microsoft/mdeberta-v3-base"

tokenizer = detector.get_tokenizer(model_card=model_card)

# Load the config
config = DebertaV2Config.from_pretrained(
    model_card,
    num_labels=2,
    id2label={0: 'OBJ', 1: 'SUBJ'},
    label2id={'OBJ': 0, 'SUBJ': 1},
    output_attentions=False,
    output_hidden_states=False
)

# Initialize the custom model
model = CustomModel(config=config, sentiment_dim=3, num_labels=2)

# Load pretrained weights from the original DeBERTa model
model.deberta = DebertaV2Model.from_pretrained(model_card, config=config)



In [None]:
detector.all_data[language]['train'][['positive', 'neutral', 'negative']] = detector.all_data[language]['train'].progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')
detector.all_data[language]['dev'][['positive', 'neutral', 'negative']] = detector.all_data[language]['dev'].progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')
detector.all_data[language]['test'][['positive', 'neutral', 'negative']] = detector.all_data[language]['test'].progress_apply(lambda x: extract_sentiment(x['sentence']), axis=1, result_type='expand')

Device set to use cuda:0
100%|██████████| 830/830 [00:06<00:00, 123.08it/s]
100%|██████████| 462/462 [00:03<00:00, 121.79it/s]
100%|██████████| 484/484 [00:04<00:00, 120.10it/s]


In [134]:
train_data = Dataset.from_pandas(detector.all_data[language]['train'])
dev_data = Dataset.from_pandas(detector.all_data[language]['dev'])
test_data = Dataset.from_pandas(detector.all_data[language]['test'])

train_data = train_data.map(tokenize_text, batched=True)
dev_data = dev_data.map(tokenize_text, batched=True)
test_data = test_data.map(tokenize_text, batched=True)

collator_fn = DataCollatorWithPadding(tokenizer=tokenizer)

class_weights = detector.get_class_weights(detector.all_data[language]['train'])

Map:   0%|          | 0/830 [00:00<?, ? examples/s]

Map:   0%|          | 0/462 [00:00<?, ? examples/s]

Map:   0%|          | 0/484 [00:00<?, ? examples/s]

In [135]:
# Define training args
training_args = TrainingArguments(
    output_dir=f"mdeberta-v3-base-subjectivity-sentiment-{language}",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    learning_rate=lr,
    num_train_epochs=epochs,
    weight_decay=weight_decay,
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    #warmup_ratio=0.5,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to="none"
)

In [136]:
trainer = CustomTrainer(
    model = model,
    args = training_args,
    train_dataset = train_data,
    eval_dataset = dev_data,
    data_collator = collator_fn,
    compute_metrics = evaluate_metrics,
    class_weights=class_weights,
)

In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,Macro F1,Macro P,Macro R,Subj F1,Subj P,Subj R,Accuracy
1,No log,0.676271,0.638498,0.650187,0.644707,0.610329,0.698925,0.541667,0.640693
2,No log,0.492784,0.766216,0.768179,0.767736,0.764192,0.802752,0.729167,0.766234
3,No log,0.512452,0.767482,0.779157,0.772016,0.752887,0.84456,0.679167,0.768398
4,No log,0.506866,0.792145,0.79524,0.794088,0.788546,0.836449,0.745833,0.792208


In [None]:
# Decision threshold calibration on dev set
best_thr = trainer.compute_best_threshold(dataset=dev_data)
# Predictions on dev set (with best threshold on dev set)
pred_info = trainer.predict(dataset=dev_data, threshold=best_thr)

predictions, labels = pred_info.predictions, pred_info.label_ids

# Save dev set predictions
save_predictions(dev_data, predictions, filename = f"dev_{language}_sentiment_predicted.tsv")

# Predictions on test set (with best threshold on dev set)
pred_info = trainer.predict(dataset=test_data, threshold=best_thr)

predictions, labels = pred_info.predictions, pred_info.label_ids

# Save test set predictions
save_predictions(test_data, predictions, filename = f"test_{language}_sentiment_predicted.tsv")

acc = accuracy_score(labels, predictions)
m_prec, m_rec, m_f1, m_s = precision_recall_fscore_support(labels, predictions, average="macro",
                                                            zero_division=0)
p_prec, p_rec, p_f1, p_s = precision_recall_fscore_support(labels, predictions, labels=[1],
                                                            zero_division=0)
stats = {
        'macro_F1': m_f1,
        'macro_P': m_prec,
        'macro_R': m_rec,
        'SUBJ_F1': p_f1[0],
        'SUBJ_P': p_prec[0],
        'SUBJ_R': p_rec[0],
        'accuracy': acc
    }

print(stats)
results[f"{language}-sentiment-thr"] = stats

cm = confusion_matrix(labels, predictions, normalize='all')
ConfusionMatrixDisplay(cm, display_labels=['OBJ', 'SUBJ']).plot()
plt.title(f"Confusion Matrix ({language})")
plt.show()