<figure>
  <img src="https://raw.githubusercontent.com/shadowkshs/DimABSA2026/refs/heads/main/banner.png" width="100%">
</figure>

# SemEval-2026 Task 3 (Track A: DimABSA, Track B: DimStance)
# Subtask 1: Dimensional Aspect Sentiment Regression (DimASR)

-----

## Starter Notebook
Leveraging Pretrained Language Models for Dimensional Sentiment Regression


## Introduction:

You are welcome to participate in our SemEval Shared Task!

In this starter notebook, we will take you through the process of fine-tuning a pre-trained language model on a sample data to build a sentiment regressor. The notebook was adapted from a Hugginface implementation for such tasks.

### Outline:

- Installation and importation of necessary libraries
Setting up the project parameters.
Running training and evaluation
Before you start:

- It is strongly advised that you use a GPU to speed up training. To do this, go to the "Runtime" menu in Colab, select "Change runtime type" and then in the popup menu, choose "GPU" in the "Hardware accelerator" box.

### NB:

The codes in this notebook are provided to familiarize yourselves with fine-tuning language models for sentiment regression. You may extend and (or) modify as appropriate to obtain competitive performances.

### Languages and Domains:
#### Track A: Subtask 1
- eng_restaurant
- eng_laptop
- jpn_hotel
- jpn_finance
- rus_restaurant
- tat_restaurant
- ukr_restaurant
- zho_restaurant
- zho_laptop
#### Track B: Subtask 1
- deu-stance
- eng-stance
- hau-stance
- kin-stance
- swa-stance
- twi-stance


### Model:
This Starter Notebook uses the bert-base-multilingual-cased pretrained model, developed by Google. The model was trained with a masked language modeling (MLM) objective on the top 104 languages with the largest Wikipedia presence. You can find the model here: https://huggingface.co/google-bert/bert-base-multilingual-cased

If your target language is not included in the common set supported by this model, you can search for a more suitable model on Hugging Face: https://huggingface.co/models



In [1]:
import json
from typing import List, Dict
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModel

from scipy.stats import pearsonr
from tqdm import tqdm
import math
import re
import requests


def load_jsonl(filepath: str) -> List[Dict]:
    with open(filepath, "r", encoding="utf-8") as f:
        return [json.loads(line) for line in f]

def load_jsonl_url(url: str) -> List[Dict]:
    resp = requests.get(url)
    resp.raise_for_status()
    return [json.loads(line) for line in resp.text.splitlines()]

  from .autonotebook import tqdm as notebook_tqdm


### First, visit the [DimABSA2006](https://github.com/DimABSA/DimABSA2026) repository, check the task-dataset.

### Step 1: Load the competition data

- Read JSONL files (train/dev/predict) into Colab.  
- Train files contain Valence–Arousal (VA) labels.  
- Predict files have no VA labels.  
- This script:
  1. Loads the JSONL data.
  2. Splits 10% of train data as dev set.
  3. Converts JSONL into DataFrames (ID, Text, Aspect, Valence, Arousal).
  4. Prints the first few rows for checking.


In [2]:
#task config
subtask = "subtask_1"#don't change
task = "task1"#don't change
lang = "eng" #chang the language you want to test
domain = "laptop" #change what domain you want to test

train_url = f"https://raw.githubusercontent.com/DimABSA/DimABSA2026/refs/heads/main/task-dataset/track_a/{subtask}/{lang}/{lang}_{domain}_train_alltasks.jsonl"
predict_url = f"https://raw.githubusercontent.com/DimABSA/DimABSA2026/refs/heads/main/task-dataset/track_a/{subtask}/{lang}/{lang}_{domain}_dev_{task}.jsonl"

#model config
model_name = "bert-base-multilingual-cased" # chage your transformer model
lr = 1e-5 #learning rate
epochs = 5

train_raw = load_jsonl_url(train_url)
predict_raw = load_jsonl_url(predict_url)

another transformer models you can try:
1. roberta-large
2. roberta-base
3. bert-base-uncased

more models please visit [huggingface](https://huggingface.co/models)

In [3]:
#==== step 1 load the data ====
# you can change the env for your task.
# train data should have the VA labels, predit data without VA labels

def jsonl_to_df(data):
    if 'Quadruplet' in data[0]:
        df = pd.json_normalize(data, 'Quadruplet', ['ID', 'Text'])
        df[['Valence', 'Arousal']] = df['VA'].str.split('#', expand=True).astype(float)
        df = df.drop(columns=['VA', 'Category', 'Opinion'])  # drop unnecessary columns
        df = df.drop_duplicates(subset=['ID', 'Aspect'], keep='first')  # remove duplicate ID+Aspect

    elif 'Triplet' in data[0]:
        df = pd.json_normalize(data, 'Triplet', ['ID', 'Text'])
        df[['Valence', 'Arousal']] = df['VA'].str.split('#', expand=True).astype(float)
        df = df.drop(columns=['VA', 'Opinion'])  # drop unnecessary columns
        df = df.drop_duplicates(subset=['ID', 'Aspect'], keep='first')  # remove duplicate ID+Aspect

    elif 'Aspect_VA' in data[0]:
        df = pd.json_normalize(data, 'Aspect_VA', ['ID', 'Text'])
        df = df.rename(columns={df.columns[0]: "Aspect"})  # rename to Aspect
        df[['Valence', 'Arousal']] = df['VA'].str.split('#', expand=True).astype(float)
        df = df.drop_duplicates(subset=['ID', 'Aspect'], keep='first')  # remove duplicate ID+Aspect

    elif 'Aspect' in data[0]:
        df = pd.json_normalize(data, 'Aspect', ['ID', 'Text'])
        df = df.rename(columns={df.columns[0]: "Aspect"})  # rename to Aspect
        df['Valence'] = 0  # default value
        df['Arousal'] = 0  # default value

    else:
        raise ValueError("Invalid format: must include 'Quadruplet' or 'Triplet' or 'Aspect'")

    return df

train_df = jsonl_to_df(train_raw)
predict_df = jsonl_to_df(predict_raw)

# split 10% for dev
train_df, dev_df = train_test_split(train_df, test_size=0.1, random_state=42)

### Display the dataframe

In [4]:
from IPython.display import display, Markdown

display(Markdown(f"### {subtask}_{lang}_{domain} train_df"))
display(train_df.head())

display(Markdown(f"### {subtask}_{lang}_{domain} dev_df"))
display(dev_df.head())

display(Markdown(f"### {subtask}_{lang}_{domain} predict_df"))
display(predict_df.head())

### subtask_1_eng_laptop train_df

Unnamed: 0,Aspect,ID,Text,Valence,Arousal
251,computer,laptop_quad_dev_190,"if i had it to do over , i would not purchase ...",3.1,6.3
4516,unit,laptop_quad_train_2141,after charging the unit for 2 hours i discover...,4.75,5.25
335,,laptop_quad_dev_253,"freezes with red lines across it , froze five ...",2.0,7.67
3286,device,laptop_quad_train_1230,a wonderful device with extremely clear display .,8.0,7.83
753,screen,laptop_quad_test_236,the screen does look good .,6.62,6.62


### subtask_1_eng_laptop dev_df

Unnamed: 0,Aspect,ID,Text,Valence,Arousal
3628,,laptop_quad_train_1485,but it lost the coil whine roulette - - badly .,3.12,6.12
3096,key board,laptop_quad_train_1095,the key board is one of the best i ' ve ever t...,7.67,7.5
4814,sleep time,laptop_quad_train_2357,"- boot time , sleep time and wake time are cra...",7.5,7.5
5443,track pad,laptop_quad_train_2729,please note that the track pad is way better t...,7.12,7.0
197,retina screen,laptop_quad_dev_147,the retina screen is amazing .,8.12,8.25


### subtask_1_eng_laptop predict_df

Unnamed: 0,Aspect,ID,Text,Valence,Arousal
0,touchscreen,lap26_aspect_va_dev_1,The touchscreen works very well,0,0
1,HP,lap26_aspect_va_dev_2,I am so disappointed in HP,0,0
2,keyboard,lap26_aspect_va_dev_3,The keyboard is big enough to use for real typing,0,0
3,screen size,lap26_aspect_va_dev_4,I like the screen size,0,0
4,Lenovo,lap26_aspect_va_dev_5,Lenovo is my favorite brand of computer,0,0


### Step 2: Build Dataset and DataLoader

- Define a custom `VADataset` class for PyTorch:
  - Joins Aspect + Text into a single input string.
  - Uses BERT tokenizer to create `input_ids` and `attention_mask`.
  - Returns `[Valence, Arousal]` labels as float tensor.
- Convert the processed DataFrames into PyTorch `Dataset` objects.
- Wrap them with `DataLoader` for mini-batch training and evaluation.

In [5]:
#==== Dataset ====
class VADataset(Dataset):
    '''
    A PyTorch Dataset for Valence–Arousal regression.

    - Combines aspect and text into a single input (e.g., "keyboard: The keyboard is good").
    - Tokenizes the input using a HuggingFace tokenizer.
    - Returns:
        * input_ids: token IDs, shape [max_len]
        * attention_mask: mask, shape [max_len]
        * labels: [Valence, Arousal], shape [2], float tensor

    Args:
        dataframe (pd.DataFrame): must contain "Text", "Aspect", "Valence", "Arousal".
        tokenizer: HuggingFace tokenizer.
        max_len (int): max sequence length.
    '''
    def __init__(self, dataframe, tokenizer, max_len=128):
        self.sentences = dataframe["Text"].tolist()
        self.aspects = dataframe["Aspect"].tolist()
        self.labels = dataframe[["Valence", "Arousal"]].values.astype(float)
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.sentences)

    def __getitem__(self, idx):
        text = f"{self.aspects[idx]}: {self.sentences[idx]}"
        encoded = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors="pt"
        )
        return {
            "input_ids": encoded["input_ids"].squeeze(0),
            "attention_mask": encoded["attention_mask"].squeeze(0),
            "labels": torch.tensor(self.labels[idx], dtype=torch.float)
        }


# convert to Dataset and Dataloader
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_dataset = VADataset(train_df, tokenizer)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

dev_dataset = VADataset(dev_df, tokenizer)
dev_loader = DataLoader(dev_dataset, batch_size=64, shuffle=True)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


### Step 3: Build and Train TransformerVARegressor

- Define **`TransformerVARegressor`**:  
  - Uses pretrained Transformer (e.g. BERT) as backbone.  
  - Adds dropout and linear layer to predict **Valence** and **Arousal**.  

- Implement helper methods:  
  - `train_epoch`: one training pass with optimizer and loss.  
  - `eval_epoch`: validation pass without gradient updates.  

- Set training parameters:  
  - `lr = 1e-5`, `epochs = 5`, `loss_fn = MSELoss`.  

- Run training loop:  
  - For each epoch, print training and validation loss to monitor progress.


In [6]:
#====step 3 build your model ====
class TransformerVARegressor(nn.Module):
    '''
    A BERT-based regressor for predicting Valence and Arousal scores.

    - Uses a pretrained BERT backbone to encode text.
    - Takes the [CLS] token representation as sentence-level embedding.
    - Adds a dropout layer and a linear head to output 2 values: [Valence, Arousal].
    - Includes helper methods for one training epoch and one evaluation epoch.

    Args:
        model_name (str): HuggingFace model name, default "bert-base-multilingual-cased".
        dropout (float): Dropout rate before the regression head.

    Methods:
        train_epoch(dataloader, optimizer, loss_fn, device):
            Train the model for one epoch.
            Returns average training loss.

        eval_epoch(dataloader, loss_fn, device):
            Evaluate the model for one epoch (no gradient).
            Returns average validation loss.
    '''
    def __init__(self, model_name=model_name, dropout=0.1):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(model_name)
        self.dropout = nn.Dropout(dropout)
        self.reg_head = nn.Linear(self.backbone.config.hidden_size, 2)  # Valence + Arousal

    def forward(self, input_ids, attention_mask):
        outputs = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        cls_output = outputs.last_hidden_state[:, 0]  # [CLS] token
        x = self.dropout(cls_output)
        return self.reg_head(x)


    def train_epoch(self, dataloader, optimizer, loss_fn, device):
        self.train()
        total_loss = 0
        for batch in tqdm(dataloader):
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)

            optimizer.zero_grad()
            outputs = self(input_ids, attention_mask)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
        return total_loss / len(dataloader)

    def eval_epoch(self, dataloader, loss_fn, device):
        self.eval()
        total_loss = 0
        with torch.no_grad():
            for batch in dataloader:
                input_ids = batch["input_ids"].to(device)
                attention_mask = batch["attention_mask"].to(device)
                labels = batch["labels"].to(device)

                outputs = self(input_ids, attention_mask)
                loss = loss_fn(outputs, labels)
                total_loss += loss.item()
        return total_loss / len(dataloader)

# Training bert on your data
model = TransformerVARegressor().to(device)
lr = locals().get("lr", 1e-5)
epochs = locals().get("epochs", 5)

optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
loss_fn = nn.MSELoss()

for epoch in range(epochs):
    train_loss = model.train_epoch(train_loader, optimizer, loss_fn, device)
    val_loss = model.eval_epoch(dev_loader, loss_fn, device)
    print(f"model:{model_name} Epoch:{epoch+1}: train={train_loss:.4f}, val={val_loss:.4f}")

W1210 17:27:29.762000 4784 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
100%|██████████| 70/70 [04:11<00:00,  3.59s/it]


model:bert-base-multilingual-cased Epoch:1: train=6.7597, val=2.1829


100%|██████████| 70/70 [04:13<00:00,  3.62s/it]


model:bert-base-multilingual-cased Epoch:2: train=1.7991, val=1.2844


100%|██████████| 70/70 [04:12<00:00,  3.61s/it]


model:bert-base-multilingual-cased Epoch:3: train=1.1469, val=1.1054


100%|██████████| 70/70 [04:18<00:00,  3.69s/it]


model:bert-base-multilingual-cased Epoch:4: train=0.8859, val=1.0767


 64%|██████▍   | 45/70 [02:49<01:34,  3.77s/it]


KeyboardInterrupt: 

### Step 4: Evaluate model performance on dev set

- Define helper function `get_prd`:
  - For **dev**: get both predictions and gold labels.
  - For **pred**: only get predictions (no gold labels).
- Define `evaluate_predictions_task1`:
  - Compute Pearson Correlation Coefficient (PCC) for Valence (V) and Arousal (A).
  - Compute normalized RMSE for combined VA score.
- Run evaluation on laptop and restaurant dev sets.
- Print metrics to check how well the models perform.


In [8]:
#==== step 4 use dev data to check your model's performance ====
def get_prd(model,dataloder, type ="dev"):
    if type == "dev":
        all_preds, all_labels = [], []
        with torch.no_grad():
            for batch in dataloder:
                input_ids = batch["input_ids"].to(device)
                attention_mask = batch["attention_mask"].to(device)
                labels = batch["labels"].cpu().numpy()
                outputs = model(input_ids, attention_mask).cpu().numpy()
                all_preds.append(outputs)
                all_labels.append(labels)
        preds = np.vstack(all_preds)
        lables = np.vstack(all_labels)

        pred_v = preds[:,0]
        pred_a = preds[:,1]

        gold_v = lables[:,0]
        gold_a = lables[:,1]

        return pred_v, pred_a, gold_v, gold_a

    elif type == "pred":
        all_preds = []
        with torch.no_grad():
            for batch in dataloder:
                input_ids = batch["input_ids"].to(device)
                attention_mask = batch["attention_mask"].to(device)
                outputs = model(input_ids, attention_mask).cpu().numpy()
                all_preds.append(outputs)
        preds = np.vstack(all_preds)

        pred_v = preds[:, 0]
        pred_a = preds[:, 1]

        return pred_v, pred_a

# def rmse_pairwise(pred_a, pred_v, gold_a, gold_v):
#     pcc_v = pearsonr(pred_v,gold_v)[0]
#     pcc_a = pearsonr(pred_a,gold_a)[0]
#     total_sq_error = sum((pv - gv)**2 + (pa - ga)**2 for gv,pv,ga,pa in zip(gold_v, pred_v, gold_a, pred_a))
#     rmse_va = math.sqrt(total_sq_error / len(gold_v))

#     return {
#         'PCC_V': pcc_v,
#         'PCC_A': pcc_a,
#         'RMSE_VA': rmse_va,
#     }

# def rmse_concat(pred_a, pred_v, gold_a, gold_v):
#     pcc_v = pearsonr(pred_v,gold_v)[0]
#     pcc_a = pearsonr(pred_a,gold_a)[0]

#     gold_va = list(gold_v) + list(gold_a)
#     pred_va = list(pred_v) + list(pred_a)
#     total_sq_error = [(a - b)**2 for a,b in zip(gold_va, pred_va)]
#     rmse_va = math.sqrt(sum(total_sq_error) / len(gold_v))
#     return {
#         'PCC_V': pcc_v,
#         'PCC_A': pcc_a,
#         'RMSE_VA': rmse_va,
#     }

def evaluate_predictions_task1(pred_a, pred_v, gold_a, gold_v, is_norm = False):
    if not (all(1 <= x <= 9 for x in pred_v) and all(1 <= x <= 9 for x in pred_a)):
        print(f"Warning: Some predicted values are out of the numerical range.")
    pcc_v = pearsonr(pred_v,gold_v)[0]
    pcc_a = pearsonr(pred_a,gold_a)[0]

    gold_va = list(gold_v) + list(gold_a)
    pred_va = list(pred_v) + list(pred_a)
    def rmse_norm(gold_va, pred_va, is_normalization = True):
        result = [(a - b)**2 for a, b in zip(gold_va, pred_va)]
        if is_normalization:
            return math.sqrt(sum(result)/len(gold_v))/math.sqrt(128)
        return math.sqrt(sum(result)/len(gold_v))
    rmse_va = rmse_norm(gold_va, pred_va, is_norm)
    return {
        'PCC_V': pcc_v,
        'PCC_A': pcc_a,
        'RMSE_VA': rmse_va,
    }


pred_v, pred_a, gold_v, gold_a = get_prd(model, dev_loader,type="dev")
eval_score = evaluate_predictions_task1(pred_a, pred_v, gold_a, gold_v)
print(f"{model_name} dev_eval: {eval_score}")

bert-base-multilingual-cased dev_eval: {'PCC_V': np.float32(0.80650467), 'PCC_A': np.float32(0.64724743), 'RMSE_VA': 1.4552186219660175}


### Step 5: Save and submit prediction results

- Define helper `df_to_jsonl`:
  - Sort by ID number.
  - Group rows by ID.
  - Save predictions in JSONL format (`ID`, `Aspect_VA`).
- Run the model on the predict sets (laptop & restaurant).
- Fill in predicted Valence/Arousal values.
- Export three JSONL files:




  - `pred_eng_laptop.jsonl`
  - `pred_eng_restaurant.jsonl`
  - `pred_zho_laptop.jsonl`
- These files can be uploaded as the final submission.


### File Naming Guidelines
When submitting your predictions on the Codabench task page:

Decide the target language(s) and domain(s). Each submission file corresponds to one language-domain combination.
For each language-domain combination, name the file pred_[lang_code]_[domain].jsonl, where
- [lang_code] represents a 3-letter language code, and
- [domain] represents a domain.
For example, Hausa predictions for the movie domain should be named pred_hau_movie.jsonl.
If submitting for multiple languages or domains, submit one prediction file per language-domain combination. For example, submitting for multiple languages or domains would look like this:
```plaintext
subtask_1
├── pred_eng_restaurant.jsonl
├── pred_eng_laptop.jsonl
└── pred_zho_laptop.jsonl

In [9]:
#==== step 5 save & submit your predict results ====
def extract_num(s):
    m = re.search(r"(\d+)$", str(s))
    return int(m.group(1)) if m else -1

def df_to_jsonl(df, out_path):
    df_sorted = df.sort_values(by="ID", key=lambda x: x.map(extract_num))
    grouped = df_sorted.groupby("ID", sort=False)

    with open(out_path, "w", encoding="utf-8") as f:
        for gid, gdf in grouped:
            record = {
                "ID": gid,
                "Aspect_VA": []
            }
            for _, row in gdf.iterrows():
                record["Aspect_VA"].append({
                    "Aspect": row["Aspect"],
                    "VA": f"{row['Valence']:.2f}#{row['Arousal']:.2f}"
                })
            f.write(json.dumps(record, ensure_ascii=False) + "\n")

pred_dataset = VADataset(predict_df, tokenizer)
pred_loader = DataLoader(pred_dataset, batch_size=64, shuffle=True)
pred_v, pred_a, = get_prd(model, pred_loader,type="pred")

predict_df["Valence"] = pred_v
predict_df["Arousal"] = pred_a

df_to_jsonl(predict_df, f"pred_{lang}_{domain}.jsonl")

### Download the submit files

In [10]:
import os
import shutil
import zipfile
from google.colab import files

# Create the folder subtask if it does not exist
os.makedirs(subtask, exist_ok=True)

# Move the three files into the subtask folder
for fname in [f"pred_{lang}_{domain}.jsonl"]:
    if os.path.exists(fname):
        shutil.move(fname, os.path.join(subtask, fname))

# Create a zip file named "submit.zip" containing the folder subtask
with zipfile.ZipFile(f"{subtask}.zip", "w", zipfile.ZIP_DEFLATED) as zf:
    for root, _, files_in_dir in os.walk(subtask):
        for file in files_in_dir:
            path = os.path.join(root, file)
            # Keep folder structure inside the zip
            zf.write(path, os.path.relpath(path, "."))

# Download the created zip file to local machine
files.download(f"{subtask}.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Conclusion

In this notebook, we walked through the full pipeline for **Dimensional Aspect Sentiment Regression (DimASR)**:

1. **Load data**: Import the competition JSONL files, split train/dev sets, and convert to DataFrames.  
2. **Build dataset & dataloaders**: Define a custom `VADataset` to tokenize text and prepare `[Valence, Arousal]` labels.  
3. **Train & evaluate**: Train BERT-based regressors and check model performance on the dev sets using PCC and RMSE metrics.  
4. **Predict & submit**: Run the trained models on the prediction sets, generate VA scores, and save results as JSONL for submission.  

This pipeline ensures that your model is trained, validated, and ready for competition submission. You can further improve results by tuning hyperparameters, trying different pretrained models, or applying data augmentation strategies.
