**Installing Required Libraries and Import them**

In [1]:
%%capture
!pip install torch transformers datasets pandas scikit-learn sentencepiece

In [2]:
import torch
import numpy as np
import pandas as pd
from transformers import RobertaTokenizer, RobertaForSequenceClassification, DataCollatorWithPadding, AdamW, get_scheduler
from sklearn.preprocessing import LabelEncoder
from torch.utils.data import DataLoader, WeightedRandomSampler
from datasets import Dataset
from tqdm.auto import tqdm
from sklearn.metrics import classification_report, f1_score
from torch.cuda.amp import autocast, GradScaler

**Setting Global Random Seeds for Reproducibility**

To ensure reproducibility in our experiments, we set a fixed random seed for both PyTorch and NumPy. This helps in obtaining consistent results across multiple runs of the notebook.

In [3]:
SEED = 2025

# Set global seeds
torch.manual_seed(SEED)
np.random.seed(SEED)

## Data Loading  

We load all the available data (**train, validation, and test**).  
- The `train` and `validation` datasets are combined to form the **final training set** for model training.  
- The `test` dataset is kept separate and will be used solely for **model evaluation**.  


In [4]:
# Train data
train = pd.read_csv('incidents_train.csv', index_col=0)
valid = pd.read_csv('incidents_valid.csv', index_col=0)
# Test data
dev_df = pd.read_csv('incidents_test.csv', index_col=0)

# Concatenate both dataframes and remove duplicates
train_df = pd.concat([train, valid]).drop_duplicates().reset_index(drop=True)

data = pd.concat([train_df, dev_df]).drop_duplicates().reset_index(drop=True)

# Train and Evaluate Models

**Initializing Tokenizer**

For each label we use the **`RoBERTa-large`** model from Hugging Face.
As input we take column **text**.

The `tokenize_function`:
- Applies **tokenization with padding and truncation**, ensuring all sequences are of consistent length.  
- The **maximum sequence length is set to 256 tokens** to prevent excessively long inputs while retaining important information.  


In [5]:
model_name = 'roberta-large'
tokenizer = RobertaTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples['text'], padding=True, truncation=True, max_length=256)

### Data Preparation with Optional Oversampling  

**`prepare_data`** processes the dataset for training and evaluation, handling **label encoding, tokenization, and optional oversampling**.

#### **Key Steps:**
1. **Label Encoding**: Converts categorical labels to numerical values using `LabelEncoder`.  
2. **Dataset Conversion**: Transforms DataFrames into **Hugging Face `Dataset` objects**.  
3. **Tokenization**: Applies **`tokenize_function`** with padding and truncation.  
4. **Data Formatting**: Converts datasets into **PyTorch tensors** (`input_ids`, `attention_mask`, `label`).  
5. **Oversampling (Optional)**:  
   - Adjusts **class weights** to balance data.  
   - Uses **WeightedRandomSampler** with smoothing factor **`alpha`** (lower = stronger oversampling).  
6. **DataLoader Creation**: Returns **PyTorch `DataLoader`** objects with or without oversampling.  

#### **Outputs:**
- `train_loader`: Training DataLoader.  
- `dev_loader`: Validation DataLoader.  
- `label_encoder`: Converts predictions back to original labels.  


In [6]:
def prepare_data(label, oversampling=True, alpha=0.5):
    # encode labels:
    label_encoder = LabelEncoder()
    label_encoder.fit(data[label])

    train_df['label'] = label_encoder.transform(train_df[label])
    dev_df['label'] = label_encoder.transform(dev_df[label])

    # Convert DataFrame to Hugging Face Dataset
    train_dataset = Dataset.from_pandas(train_df)
    dev_dataset = Dataset.from_pandas(dev_df)
    
    # Apply the tokenizer to the dataset
    train_dataset = train_dataset.map(tokenize_function, batched=True)
    dev_dataset = dev_dataset.map(tokenize_function, batched=True)
    
    # Create DataCollator to handle padding
    data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True)

    # Convert dataset to PyTorch format
    train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
    dev_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

    # Apply oversampling to handle class imbalance if enabled
    if oversampling == True:
        # Compute class weights for oversampling
        labels = [example['label'] for example in train_dataset]  # Extract labels
        label_counts = np.bincount(labels)  # Count occurrences per class
    
        # Controlled smoothing function (Adjust alpha for fine-tuning)
        sample_weights = 1.0 / ((label_counts[train_df['label'].values] + 1e-6) ** alpha)
        
        # Normalize weights (prevents extreme differences)
        sample_weights = sample_weights / sample_weights.sum()
        
        # Define sampler with a smaller oversampling effect
        sampler = WeightedRandomSampler(sample_weights.tolist(), num_samples=int(len(train_dataset) * 1.2), replacement=True)

        # Create DataLoader objects
        return (
            DataLoader(train_dataset, batch_size=16, collate_fn=data_collator, sampler=sampler),
            DataLoader(dev_dataset, batch_size=16, collate_fn=data_collator),
            label_encoder
        )

    else:
        return (
            DataLoader(train_dataset, batch_size=16, collate_fn=data_collator),
            DataLoader(dev_dataset, batch_size=16, collate_fn=data_collator),
            label_encoder
         )

**Evaluation function:**

We compute the performance for ST1 and ST2 by calculating the macro-F1-score on the predicted labels (hazards_pred & products_pred) using the annotated labels (hazards_true & products_true) as ground truth, as provided by the challenge.

In [7]:
def compute_score(hazards_true, products_true, hazards_pred, products_pred):
  # compute f1 for hazards:
  f1_hazards = f1_score(
    hazards_true,
    hazards_pred,
    average='macro'
  )

  # compute f1 for products:
  f1_products = f1_score(
    products_true[hazards_pred == hazards_true],
    products_pred[hazards_pred == hazards_true],
    average='macro'
  )

  return (f1_hazards + f1_products) / 2.

## Sub-Task 1:

### Label:`Hazard Category`

In [8]:
label = 'hazard-category'

# Create DataLoader objects, DOESN'T USE OVERSAMPLING
train_dataloader, dev_dataloader, le_hazard_category = prepare_data(label, False)

Map:   0%|          | 0/5623 [00:00<?, ? examples/s]

Map:   0%|          | 0/997 [00:00<?, ? examples/s]

* choose model

In [9]:
model_hazard_category = RobertaForSequenceClassification.from_pretrained(model_name, num_labels=len(data[label].unique()))
model_hazard_category = model_hazard_category.to('cuda')

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**Train** with:
  - *Epochs* = 2
  - *Learning rate* = 2e-5
  - *Weight decay* = 0.01
  - *Warmup steps* = 10%
  - NO Oversampling

In [10]:
# Define optimizer and learning rate scheduler
optimizer = AdamW(model_hazard_category.parameters(), lr=2e-5, weight_decay=0.01)

num_epochs = 2
num_training_steps = num_epochs * len(train_dataloader)
num_warmup_steps = int(0.1 * num_training_steps)  # 10% warmup
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

model_hazard_category.train()
progress_bar = tqdm(range(num_training_steps))

log_steps = 50  # Log every 50 steps
total_loss = 0

for epoch in range(num_epochs):
    for step, batch in enumerate(train_dataloader):
        batch = {k: v.to('cuda') for k, v in batch.items()}
        
        outputs = model_hazard_category(**batch)
        loss = outputs.loss
        
        loss.backward()
        optimizer.step()  
        lr_scheduler.step()
        optimizer.zero_grad()
        
        total_loss += loss.item()
        
        if (step + 1) % log_steps == 0:
            print(f"Epoch {epoch+1}, Step {step+1}, Avg Loss: {total_loss / log_steps:.4f}")
            total_loss = 0  # Reset loss counter
        
        progress_bar.update(1)

    # Validation Loop
    model_hazard_category.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for batch in dev_dataloader:
            batch = {k: v.to('cuda') for k, v in batch.items()}  
            outputs = model_hazard_category(**batch)
            predictions = torch.argmax(outputs.logits, dim=-1)
            correct += (predictions == batch["labels"]).sum().item()
            total += batch["labels"].size(0)
    
    accuracy = correct / total
    print(f"Validation Accuracy after Epoch {epoch+1}: {accuracy:.4f}")
    model_hazard_category.train()  # Switch back to training mode



  0%|          | 0/704 [00:00<?, ?it/s]

Epoch 1, Step 50, Avg Loss: 1.8729
Epoch 1, Step 100, Avg Loss: 0.8662
Epoch 1, Step 150, Avg Loss: 0.3259
Epoch 1, Step 200, Avg Loss: 0.2912
Epoch 1, Step 250, Avg Loss: 0.2811
Epoch 1, Step 300, Avg Loss: 0.3690
Epoch 1, Step 350, Avg Loss: 0.2660
Validation Accuracy after Epoch 1: 0.9388
Epoch 2, Step 50, Avg Loss: 0.3498
Epoch 2, Step 100, Avg Loss: 0.2724
Epoch 2, Step 150, Avg Loss: 0.1549
Epoch 2, Step 200, Avg Loss: 0.1596
Epoch 2, Step 250, Avg Loss: 0.1682
Epoch 2, Step 300, Avg Loss: 0.1945
Epoch 2, Step 350, Avg Loss: 0.1708
Validation Accuracy after Epoch 2: 0.9488


* Evaluate the model

In [11]:
model_hazard_category.eval()
total_predictions = []
with torch.no_grad():
    for batch in dev_dataloader:
        batch = {k: v.to('cuda') for k, v in batch.items()}
        outputs = model_hazard_category(**batch)
        predictions = torch.argmax(outputs.logits, dim=-1)
        total_predictions.extend([p.item() for p in predictions])

predicted_labels = le_hazard_category.inverse_transform(total_predictions)
gold_labels = le_hazard_category.inverse_transform(dev_df.label.values)
print(classification_report(gold_labels, predicted_labels, zero_division=0))

dev_df['predictions-hazard-category'] = predicted_labels

                                precision    recall  f1-score   support

                     allergens       0.96      0.97      0.97       365
                    biological       0.98      0.97      0.98       343
                      chemical       0.91      0.94      0.92        52
food additives and flavourings       1.00      0.50      0.67         4
                foreign bodies       0.97      0.97      0.97       111
                         fraud       0.82      0.80      0.81        75
                     migration       0.00      0.00      0.00         1
          organoleptic aspects       0.91      1.00      0.95        10
                  other hazard       0.74      0.77      0.75        26
              packaging defect       0.75      0.90      0.82        10

                      accuracy                           0.95       997
                     macro avg       0.80      0.78      0.78       997
                  weighted avg       0.95      0.95      0.95 

### Label:`Product Category`

In [12]:
label = 'product-category'

# Create DataLoader objects
train_dataloader, dev_dataloader, le_product_category = prepare_data(label)

Map:   0%|          | 0/5623 [00:00<?, ? examples/s]

Map:   0%|          | 0/997 [00:00<?, ? examples/s]

* choose model

In [13]:
model_product_category = RobertaForSequenceClassification.from_pretrained(model_name, num_labels=len(data[label].unique()))
model_product_category = model_product_category.to('cuda')

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**Train** with:
  - *Epochs* = 3
  - *Learning rate* = 2e-5
  - *Weight decay* = 0.01
  - *Warmup steps* = 10%
  - *FP16 Mixed Precision*
  - *Oversampling* (alpha = 0.5)

In [None]:
# Define optimizer and learning rate scheduler
optimizer = AdamW(model_product_category.parameters(), lr=2e-5, weight_decay=0.01)

num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
num_warmup_steps = int(0.1 * num_training_steps)  # 10% warmup
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

# Initialize Gradient Scaler for FP16
scaler = GradScaler()

model_product_category.train()
progress_bar = tqdm(range(num_training_steps))

log_steps = 50  # Log every 50 steps
total_loss = 0

for epoch in range(num_epochs):
    for step, batch in enumerate(train_dataloader):
        batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}
        
        # Enable FP16 with autocast
        with autocast():
            outputs = model_product_category(**batch)
            loss = outputs.loss

        # Scale loss and backpropagate
        scaler.scale(loss).backward()

        # Scale optimizer step
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()
        lr_scheduler.step()

        total_loss += loss.item()
        
        if (step + 1) % log_steps == 0:
            print(f"Epoch {epoch+1}, Step {step+1}, Avg Loss: {total_loss / log_steps:.4f}")
            total_loss = 0  # Reset loss counter
        
        progress_bar.update(1)

    #  Validation Loop
    model_product_category.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for batch in dev_dataloader:
            batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}  
            outputs = model_product_category(**batch)
            predictions = torch.argmax(outputs.logits, dim=-1)
            correct += (predictions == batch["labels"]).sum().item()
            total += batch["labels"].size(0)
    
    accuracy = correct / total
    print(f"Validation Accuracy after Epoch {epoch+1}: {accuracy:.4f}")
    model_product_category.train()  # Switch back to training mode


  scaler = GradScaler()


  0%|          | 0/1266 [00:00<?, ?it/s]

  with autocast():


Epoch 1, Step 50, Avg Loss: 3.0061
Epoch 1, Step 100, Avg Loss: 2.8339
Epoch 1, Step 150, Avg Loss: 2.0823
Epoch 1, Step 200, Avg Loss: 1.4000
Epoch 1, Step 250, Avg Loss: 1.1606
Epoch 1, Step 300, Avg Loss: 0.9090
Epoch 1, Step 350, Avg Loss: 0.8594
Epoch 1, Step 400, Avg Loss: 0.8640
Validation Accuracy after Epoch 1: 0.7673
Epoch 2, Step 50, Avg Loss: 0.9079
Epoch 2, Step 100, Avg Loss: 0.6319
Epoch 2, Step 150, Avg Loss: 0.6325
Epoch 2, Step 200, Avg Loss: 0.5543
Epoch 2, Step 250, Avg Loss: 0.5031
Epoch 2, Step 300, Avg Loss: 0.4553
Epoch 2, Step 350, Avg Loss: 0.4256
Epoch 2, Step 400, Avg Loss: 0.3719
Validation Accuracy after Epoch 2: 0.8054
Epoch 3, Step 50, Avg Loss: 0.5754
Epoch 3, Step 100, Avg Loss: 0.3459
Epoch 3, Step 150, Avg Loss: 0.3093
Epoch 3, Step 200, Avg Loss: 0.3108
Epoch 3, Step 250, Avg Loss: 0.3486
Epoch 3, Step 300, Avg Loss: 0.2888
Epoch 3, Step 350, Avg Loss: 0.2923
Epoch 3, Step 400, Avg Loss: 0.2878
Validation Accuracy after Epoch 3: 0.8104


* Evaluate the model

In [15]:
model_product_category.eval()
total_predictions = []
with torch.no_grad():
    for batch in dev_dataloader:
        batch = {k: v.to('cuda') for k, v in batch.items()}  # Move batch to GPU if available
        outputs = model_product_category(**batch)
        predictions = torch.argmax(outputs.logits, dim=-1)
        total_predictions.extend([p.item() for p in predictions])

predicted_labels = le_product_category.inverse_transform(total_predictions)
gold_labels = le_product_category.inverse_transform(dev_df.label.values)
print(classification_report(gold_labels, predicted_labels, zero_division=0))

dev_df['predictions-product-category'] = predicted_labels

                                                   precision    recall  f1-score   support

                              alcoholic beverages       0.94      0.94      0.94        16
                      cereals and bakery products       0.79      0.79      0.79       121
     cocoa and cocoa preparations, coffee and tea       0.75      0.90      0.82        42
                                    confectionery       0.70      0.70      0.70        33
dietetic foods, food supplements, fortified foods       0.57      0.77      0.66        26
                                    fats and oils       1.00      0.83      0.91         6
                   food additives and flavourings       1.00      0.25      0.40         4
                            fruits and vegetables       0.86      0.76      0.80       103
                                 herbs and spices       0.61      0.70      0.65        20
                            honey and royal jelly       0.50      0.50      0.50         

### Evaluate Sub-Task 1

In [16]:
score = compute_score(
    dev_df['hazard-category'], dev_df['product-category'],
    dev_df['predictions-hazard-category'], dev_df['predictions-product-category']
)
print(f"Score Sub-Task 1: {score:.3f}")

Score Sub-Task 1: 0.779


## Sub-Task 2

### Label: `Hazard`

In [17]:
label = 'hazard'

# Create DataLoader objects
train_dataloader, dev_dataloader, le_hazard = prepare_data(label)

Map:   0%|          | 0/5623 [00:00<?, ? examples/s]

Map:   0%|          | 0/997 [00:00<?, ? examples/s]

* choose model

In [18]:
model_hazard = RobertaForSequenceClassification.from_pretrained(model_name, num_labels=len(data[label].unique()))
model_hazard = model_hazard.to('cuda')

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**Train** with:
  - *Epochs* = 5
  - *Learning rate* = 2e-5
  - *Weight decay* = 0.001
  - *Warmup steps* = 10%
  - *FP16 Mixed Precision*
  - *Oversampling* (alpha = 0.5)

In [19]:
# Define optimizer and learning rate scheduler
optimizer = AdamW(model_hazard.parameters(), lr=2e-5, weight_decay=0.001)

num_epochs = 5
num_training_steps = num_epochs * len(train_dataloader)
num_warmup_steps = int(0.1 * num_training_steps)  # 10% warmup
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

# Initialize Gradient Scaler for FP16
scaler = GradScaler()

model_hazard.train()
progress_bar = tqdm(range(num_training_steps))

log_steps = 50  # Log every 50 steps
total_loss = 0

for epoch in range(num_epochs):
    for step, batch in enumerate(train_dataloader):
        batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}
        
        # Enable FP16 with autocast
        with autocast():
            outputs = model_hazard(**batch)
            loss = outputs.loss  

        # Scale loss and backpropagate
        scaler.scale(loss).backward()

        #  Scale optimizer step
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()
        lr_scheduler.step()

        total_loss += loss.item()
        
        if (step + 1) % log_steps == 0:
            print(f"Epoch {epoch+1}, Step {step+1}, Avg Loss: {total_loss / log_steps:.4f}")
            total_loss = 0  # Reset loss counter
        
        progress_bar.update(1)

    # Validation Loop
    model_hazard.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for batch in dev_dataloader:
            batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}  
            outputs = model_hazard(**batch)
            predictions = torch.argmax(outputs.logits, dim=-1)
            correct += (predictions == batch["labels"]).sum().item()
            total += batch["labels"].size(0)
    
    accuracy = correct / total
    print(f"Validation Accuracy after Epoch {epoch+1}: {accuracy:.4f}")
    model_hazard.train()  # Switch back to training mode

  scaler = GradScaler()


  0%|          | 0/2110 [00:00<?, ?it/s]

  with autocast():


Epoch 1, Step 50, Avg Loss: 4.8244
Epoch 1, Step 100, Avg Loss: 4.6256
Epoch 1, Step 150, Avg Loss: 4.5908
Epoch 1, Step 200, Avg Loss: 3.8827
Epoch 1, Step 250, Avg Loss: 3.0045
Epoch 1, Step 300, Avg Loss: 2.3308
Epoch 1, Step 350, Avg Loss: 2.0527
Epoch 1, Step 400, Avg Loss: 1.9020
Validation Accuracy after Epoch 1: 0.8305
Epoch 2, Step 50, Avg Loss: 2.1894
Epoch 2, Step 100, Avg Loss: 1.2733
Epoch 2, Step 150, Avg Loss: 1.2524
Epoch 2, Step 200, Avg Loss: 1.1719
Epoch 2, Step 250, Avg Loss: 1.0366
Epoch 2, Step 300, Avg Loss: 0.7948
Epoch 2, Step 350, Avg Loss: 0.8457
Epoch 2, Step 400, Avg Loss: 0.8207
Validation Accuracy after Epoch 2: 0.8445
Epoch 3, Step 50, Avg Loss: 1.1041
Epoch 3, Step 100, Avg Loss: 0.6683
Epoch 3, Step 150, Avg Loss: 0.6452
Epoch 3, Step 200, Avg Loss: 0.6049
Epoch 3, Step 250, Avg Loss: 0.5795
Epoch 3, Step 300, Avg Loss: 0.5328
Epoch 3, Step 350, Avg Loss: 0.4992
Epoch 3, Step 400, Avg Loss: 0.4686
Validation Accuracy after Epoch 3: 0.8425
Epoch 4, Step

* Evaluate the model

In [20]:
model_hazard.eval()
total_predictions = []
with torch.no_grad():
    for batch in dev_dataloader:
        batch = {k: v.to('cuda') for k, v in batch.items()}  # Move batch to GPU if available
        outputs = model_hazard(**batch)
        predictions = torch.argmax(outputs.logits, dim=-1)
        total_predictions.extend([p.item() for p in predictions])

predicted_labels = le_hazard.inverse_transform(total_predictions)
gold_labels = le_hazard.inverse_transform(dev_df.label.values)
print(classification_report(gold_labels, predicted_labels, zero_division=0))

dev_df['predictions-hazard'] = predicted_labels

                                                 precision    recall  f1-score   support

                                      Aflatoxin       1.00      0.50      0.67         2
                                 abnormal smell       1.00      1.00      1.00         1
                                alcohol content       0.00      0.00      0.00         1
                                      alkaloids       1.00      1.00      1.00         1
                                      allergens       0.00      0.00      0.00         3
                                         almond       1.00      0.85      0.92        13
           altered organoleptic characteristics       0.00      0.00      0.00         0
                                      amygdalin       0.00      0.00      0.00         1
                         antibiotics, vet drugs       1.00      1.00      1.00         1
                                  bacillus spp.       1.00      1.00      1.00         3
                    

### Label: `Product`

In [21]:
label = 'product'

# Create DataLoader objects
train_dataloader, dev_dataloader, le_product = prepare_data(label, True, 0.3)

Map:   0%|          | 0/5623 [00:00<?, ? examples/s]

Map:   0%|          | 0/997 [00:00<?, ? examples/s]

* choose model

In [22]:
model_product = RobertaForSequenceClassification.from_pretrained(model_name, num_labels=len(data[label].unique()))
model_product = model_product.to('cuda')

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**Train** with:
  - *Epochs* = 9
  - *Learning rate* = 2e-5
  - *Weight decay* = 0.01
  - *Warmup steps* = 10%
  - *FP16 Mixed Precision*
  - *Oversampling* (alpha = 0.3)

In [32]:
# Define optimizer and learning rate scheduler
optimizer = AdamW(model_product.parameters(), lr=2e-5, weight_decay=0.01)

num_epochs = 9
num_training_steps = num_epochs * len(train_dataloader)
num_warmup_steps = int(0.1 * num_training_steps)  # 10% warmup
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

# Initialize Gradient Scaler for FP16
scaler = GradScaler()

model_product.train()
progress_bar = tqdm(range(num_training_steps))

log_steps = 50  # Log every 50 steps
total_loss = 0

for epoch in range(num_epochs):
    for step, batch in enumerate(train_dataloader):
        batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}
        
        # Enable FP16 with autocast
        with autocast():
            outputs = model_product(**batch)
            loss = outputs.loss  # No accumulation

        # Scale loss and backpropagate
        scaler.scale(loss).backward()

        # Scale optimizer step
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()
        lr_scheduler.step()

        total_loss += loss.item()

        if (step + 1) % log_steps == 0:
            print(f"Epoch {epoch+1}, Step {step+1}, Avg Loss: {total_loss / log_steps:.4f}")
            total_loss = 0  # Reset loss counter

        progress_bar.update(1)

    # Validation Loop
    model_product.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for batch in dev_dataloader:
            batch = {k: v.to('cuda', non_blocking=True) for k, v in batch.items()}
            outputs = model_product(**batch)
            predictions = torch.argmax(outputs.logits, dim=-1)
            correct += (predictions == batch["labels"]).sum().item()
            total += batch["labels"].size(0)

    accuracy = correct / total
    print(f"Validation Accuracy after Epoch {epoch+1}: {accuracy:.4f}")
    model_product.train()  # Switch back to training mode


  scaler = GradScaler()


  0%|          | 0/3798 [00:00<?, ?it/s]

  with autocast():


Epoch 1, Step 50, Avg Loss: 4.2157
Epoch 1, Step 100, Avg Loss: 4.0655
Epoch 1, Step 150, Avg Loss: 4.0751
Epoch 1, Step 200, Avg Loss: 4.1036
Epoch 1, Step 250, Avg Loss: 4.2108
Epoch 1, Step 300, Avg Loss: 4.0501
Epoch 1, Step 350, Avg Loss: 3.9858
Epoch 1, Step 400, Avg Loss: 3.9612
Validation Accuracy after Epoch 1: 0.4443
Epoch 2, Step 50, Avg Loss: 5.7416
Epoch 2, Step 100, Avg Loss: 3.7920
Epoch 2, Step 150, Avg Loss: 3.7401
Epoch 2, Step 200, Avg Loss: 3.6968
Epoch 2, Step 250, Avg Loss: 3.6686
Epoch 2, Step 300, Avg Loss: 3.6377
Epoch 2, Step 350, Avg Loss: 3.3159
Epoch 2, Step 400, Avg Loss: 3.3660
Validation Accuracy after Epoch 2: 0.4674
Epoch 3, Step 50, Avg Loss: 4.6437
Epoch 3, Step 100, Avg Loss: 3.1227
Epoch 3, Step 150, Avg Loss: 3.1785
Epoch 3, Step 200, Avg Loss: 3.0950
Epoch 3, Step 250, Avg Loss: 3.0594
Epoch 3, Step 300, Avg Loss: 2.9642
Epoch 3, Step 350, Avg Loss: 2.8260
Epoch 3, Step 400, Avg Loss: 2.6940
Validation Accuracy after Epoch 3: 0.4885
Epoch 4, Step

* Evaluate the model

In [33]:
model_product.eval()
total_predictions = []
with torch.no_grad():
    for batch in dev_dataloader:
        batch = {k: v.to('cuda') for k, v in batch.items()}
        outputs = model_product(**batch)
        predictions = torch.argmax(outputs.logits, dim=-1)
        total_predictions.extend([p.item() for p in predictions])

predicted_labels = le_product.inverse_transform(total_predictions)
gold_labels = le_product.inverse_transform(dev_df.label.values)
print(classification_report(gold_labels, predicted_labels, zero_division=0))

dev_df['predictions-product'] = predicted_labels

                                                               precision    recall  f1-score   support

                                       Catfishes (freshwater)       0.50      0.33      0.40         3
                                              Dried pork meat       0.00      0.00      0.00         1
                                        Fishes not identified       0.64      0.88      0.74         8
                                     Not classified pork meat       0.50      0.33      0.40         3
                          Precooked cooked pork meat products       0.00      0.00      0.00         1
                                            Saurida (generic)       0.00      0.00      0.00         1
                                                Veggie Burger       0.25      1.00      0.40         1
                                                        algae       0.50      1.00      0.67         1
                                               almond kernels       0.00

### Evaluate Sub-Task 2

In [34]:
score = compute_score(
    dev_df['hazard'], dev_df['product'],
    dev_df['predictions-hazard'], dev_df['predictions-product']
)
print(f"Score Sub-Task 2: {score:.3f}")

Score Sub-Task 2: 0.499


## Create submission file

In [35]:
dev_df.to_csv("submission.csv", index=True)

# Summary

- Used all availiable datasets (train and valid)
- Used **roberta-large** for all 4 labels
- Tokenized with padding, truncation and max_length 256
- Used *oversampling* technique for all labels, except for the hazard-category, because the classes were well distributed
- Fined-tuned all models by altering:
  * Epochs
  * Learning Rate
  * Weight Decay
  * FP16 Precision
  * Warmup Steps
  * Oversampling effect (alpha variable in my code)
- Calculated evaluation scores using the test data
- Created submission.csv with all model outputs for the test data

ST1 score: **``0.779``**

ST2 score: **``0.499``**