# Transformers: Fine-tuning for multi-class classification

## Report


### Experimental Setup (No Preprocessing Applied)

Experiments were conducted using the following Transformer models:
- **English**: RoBERTa-base
- **Spanish**: `PlanTL-GOB-ES/roberta-base-bne`

All models were trained using the same base set of hyperparameters:

```json
{
  "learning_rate": [5e-5, 3e-5] (in some cases also 2e-5),
  "num_train_epochs": 4,
  "per_device_train_batch_size": [16, 32],
  "warmup_steps": 100,
  "weight_decay": 0.01,
  "early_stopping_patience": 2,
}
```

For each of the 4 learning rate and batch size combinations, the best-performing model based on F1 score was selected.

---

### **Results Overview**

#### **English – Subtask 1 (Binary Classification)**

- **Fine-Tuning (FT)** achieved an average F1 of **0.84**
- **LoRA** achieved an average F1 of **0.79**
- However, LoRA completed training in **12 minutes** vs **30 minutes** for FT  
- That’s **94% of the performance at only 40% of the time**

#### **English – Subtask 2 (Multi-Class Classification)**

- **FT** slightly outperformed LoRA (avg F1: **0.250** vs **0.245**)  
- Training times: FT took **30 minutes**, while LoRA took just **4 minutes**
- FT performance fluctuated between **0.245–0.265**, while LoRA remained stable
- Accuracy was similar for both methods (~**0.58**)

#### **Spanish – Subtask 1 (Binary Classification)**

- **Best LoRA model** outperformed the best FT model (F1: **0.86** vs **0.85**)
- Average F1 scores were nearly identical:  
  - LoRA: **0.8535**  
  - FT: **0.8527**
- However, **worst LoRA model** (F1: **0.839**) underperformed compared to **worst FT** (F1: **0.859**)  
- Accuracy for FT was consistent (~**0.84**), while LoRA varied from **0.83–0.85**

#### **Spanish – Subtask 2 (Multi-Class Classification)**

- **FT had a higher average F1** (0.29 vs 0.25), largely due to one exceptional model (F1: **0.438**)  
- Excluding that outlier, performance was comparable between FT and LoRA

---

### **Conclusion**

While full fine-tuning occasionally yields slightly higher F1 scores, **LoRA offers a far more efficient training process**—both in time and resource consumption—making it significantly more practical, especially at scale. The **performance trade-off is minimal**, while **time savings are substantial**. In nearly all cases, LoRA models delivered results comparable to FT in a fraction of the time.

## Many libraries

In [None]:
!pip install transformers --upgrade
!pip install datasets accelerate --upgrade
!pip install peft --upgrade
!pip install jupyter --upgrade
!pip install ipywidgets --upgrade

In [None]:
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import  AutoTokenizer, AutoModelForSequenceClassification,  Trainer, TrainingArguments,  EarlyStoppingCallback
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from peft import LoraConfig, get_peft_model, TaskType
import random
import os
import pandas as pd
import sys
import tempfile
import time

In [None]:
# IF YOU USE GOOGLE COLAB -> COLAB=True
COLAB = True

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# if COLAB is True:
#   from google.colab import drive
#   drive.mount('/content/drive')
#   base_path = "/content/drive/MyDrive/docencia/LNR/LNR_2024-2025/Lab2"
# else:
#   base_path = ".."
# base_path

# from google.colab import userdata
# userdata.get('HuggingFaceToken')

if COLAB is True:
  from google.colab import drive
  drive.mount('/content/drive')
  base_path = "/content/drive/MyDrive/NLP"
else:
  base_path = ".."
base_path

## Import readerEXIST2025 library

In [None]:
import sys

base_path = "drive/MyDrive/NLP"
sys.path.append(base_path)
from readerEXIST2025 import EXISTReader

## Read dataset

In [None]:
file_train = os.path.join(base_path, "EXIST2025_training.json")
file_dev = os.path.join(base_path, "EXIST2025_dev.json")

reader_train = EXISTReader(file_train)
reader_dev = EXISTReader(file_dev)

EnTrainTask1, EnDevTask1 = reader_train.get(lang="EN", subtask="1"), reader_dev.get(lang="EN", subtask="1")
EnTrainTask2, EnDevTask2 = reader_train.get(lang="EN", subtask="2"), reader_dev.get(lang="EN", subtask="2")

SpTrainTask1, SpDevTask1 = reader_train.get(lang="ES", subtask="1"), reader_dev.get(lang="ES", subtask="1")
SpTrainTask2, SpDevTask2 = reader_train.get(lang="ES", subtask="2"), reader_dev.get(lang="ES", subtask="2")

## Set the seed

In [None]:
def set_seed(seed=1234):
    """
    Sets the seed to make everything deterministic, for reproducibility of experiments
    Parameters:
    seed: the number to set the seed to
    Return: None
    """
    # Random seed
    random.seed(seed)
    # Numpy seed
    np.random.seed(seed)
    # Torch seed
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    # os seed
    os.environ['PYTHONHASHSEED'] = str(seed)

## Dataset class

In [None]:
class SexismDataset(Dataset):
    def __init__(self, texts, labels, ids, tokenizer, max_len=128, pad="max_length", trunc=True,rt='pt'):
        self.texts = texts.tolist()
        self.labels = labels
        self.ids = ids
        self.tokenizer = tokenizer
        self.max_len = max_len
        self.pad = pad
        self.trunc = trunc
        self.rt = rt

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        inputs = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,padding=self.pad, truncation=self.trunc,
            return_tensors=self.rt
        )

        return {
            'input_ids': inputs['input_ids'].flatten(),
            'attention_mask': inputs['attention_mask'].flatten(),
            'labels': torch.tensor(self.labels[idx], dtype=torch.long),
            'id': torch.tensor(self.ids[idx], dtype=torch.long)
        }

## Metrics

In [None]:
def compute_metrics_1(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average='binary', zero_division=0
    )
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

def compute_metrics_2(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average='macro', zero_division=0
    )
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

## Pipelines

In [None]:
def sexism_classification_pipeline_task1(trainInfo, devInfo, testInfo=None, model_name='roberta-base', nlabels=2, ptype="single_label_classification", **args):
    # Model and Tokenizer
    labelEnc= LabelEncoder()
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=nlabels,
        problem_type=ptype,
    )

    # Prepare datasets
    train_dataset = SexismDataset(trainInfo[1], labelEnc.fit_transform(trainInfo[2]),[int(x) for x in trainInfo[0]], tokenizer )
    val_dataset = SexismDataset(devInfo[1], labelEnc.transform(devInfo[2]), [int(x) for x in devInfo[0]], tokenizer)

    # Training Arguments
    training_args = TrainingArguments(
        report_to="none", # alt: "wandb", "tensorboard" "comet_ml" "mlflow" "clearml"
        output_dir= args.get('output_dir', './results'),
        num_train_epochs= args.get('num_train_epochs', 5),
        learning_rate=args.get('learning_rate', 5e-5),
        per_device_train_batch_size=args.get('per_device_train_batch_size', 16),
        per_device_eval_batch_size=args.get('per_device_eval_batch_size', 64),
        warmup_steps=args.get('warmup_steps', 500),
        weight_decay=args.get('weight_decay',0.01),
        logging_dir=args.get('logging_dir', './logs'),
        logging_steps=args.get('logging_steps', 10),
        eval_strategy=args.get('eval_strategy','epoch'),
        save_strategy=args.get('save_strategy', "epoch"),
        load_best_model_at_end=args.get('load_best_model_at_end', True),
        metric_for_best_model=args.get('metric_for_best_model',"f1")
    )

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics_1,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=args.get("early_stopping_patience",3))]
    )

    # Fine-tune the model
    trainer.train()

    # Evaluate on validation set
    eval_results = trainer.evaluate()
    print("Validation Results:", eval_results)

    # If there is a test dataset
    if testInfo is not None:
        # Prepare test dataset for prediction
        test_dataset = SexismDataset(testInfo[1], [0] * len(testInfo[1]),  [int(x) for x in testInfo[0]],   tokenizer)

        # Predict test set labels
        predictions = trainer.predict(test_dataset)
        predicted_labels = np.argmax(predictions.predictions, axis=1)

        # Create submission DataFrame
        submission_df = pd.DataFrame({
            'id': testInfo[0],
            'label': labelEnc.inverse_transform(predicted_labels),
            "test_case": ["EXIST2025"]*len(predicted_labels)
        })
        submission_df.to_csv('sexism_predictions_task1.csv', index=False)
        print("Prediction for TASK 1 completed. Results saved to sexism_predictions_task1.csv")
        return model, submission_df
    return model, eval_results




###Task 2

In [None]:
def sexism_classification_pipeline_task2(trainInfo, devInfo, testInfo=None, model_name='bert-base-uncased', nlabels=3, ptype="single_label_classification", **args):
    # Model and Tokenizer
    labelEnc= LabelEncoder()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=nlabels,
        problem_type=ptype
    )

    # Prepare datasets
    train_dataset = SexismDataset(trainInfo[1], labelEnc.fit_transform(trainInfo[2]),[int(x) for x in trainInfo[0]], tokenizer )
    # val_dataset = SexismDataset(devInfo[1], labelEnc.transform(devInfo[2]), [int(x) for x in devInfo[0]], tokenizer)

    # Encode validation labels safely
    val_labels_raw = devInfo[2]
    val_labels_safe = []
    val_texts_safe = []
    val_ids_safe = []

    for text, label, id_ in zip(devInfo[1], val_labels_raw, devInfo[0]):
        if label in labelEnc.classes_:
            val_labels_safe.append(labelEnc.transform([label])[0])
            val_texts_safe.append(text)
            val_ids_safe.append(int(id_))
        else:
            print(f"[Warning] Unknown label in dev set: {label} — skipping")

    val_dataset = SexismDataset(val_texts_safe, val_labels_safe, val_ids_safe, tokenizer)

    # Training Arguments
    training_args = TrainingArguments(
        report_to="none", # alt: "wandb", "tensorboard" "comet_ml" "mlflow" "clearml"
        output_dir= args.get('output_dir', './results'),
        num_train_epochs= args.get('num_train_epochs', 5),
        learning_rate=args.get('learning_rate', 5e-5),
        per_device_train_batch_size=args.get('per_device_train_batch_size', 16),
        per_device_eval_batch_size=args.get('per_device_eval_batch_size', 64),
        warmup_steps=args.get('warmup_steps', 500),
        weight_decay=args.get('weight_decay',0.01),
        logging_dir=args.get('logging_dir', './logs'),
        logging_steps=args.get('logging_steps', 10),
        eval_strategy=args.get('eval_strategy','epoch'),
        save_strategy=args.get('save_strategy', "epoch"),
        save_total_limit=args.get('save_total_limit', 1),
        load_best_model_at_end=args.get('load_best_model_at_end', True),
        #metric_for_best_model=args.get('metric_for_best_model',"ICM")
        metric_for_best_model=args.get('metric_for_best_model',"f1")
    )

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics_2,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=args.get("early_stopping_patience",3))]
    )

    # Fine-tune the model
    trainer.train()

    # Evaluate on validation set
    eval_results = trainer.evaluate()
    print("Validation Results:", eval_results)

    # If there is a test dataset
    if testInfo is not None:
        # Prepare test dataset for prediction
        test_dataset = SexismDataset(testInfo[1], [0] * len(testInfo[1]),  [int(x) for x in testInfo[0]],   tokenizer)

        # Predict test set labels
        predictions = trainer.predict(test_dataset)
        predicted_labels = np.argmax(predictions.predictions, axis=1)

        # Create submission DataFrame
        submission_df = pd.DataFrame({
            'id': testInfo[0],
            'label': labelEnc.inverse_transform(predicted_labels),
            "test_case": ["EXIST2025"]*len(predicted_labels)

        })
        submission_df.to_csv('sexism_predictions_task2.csv', index=False)
        print("Prediction TASK2 completed. Results saved to sexism_predictions_task2.csv")
        return model, submission_df
    return model, eval_results


# LoRA

## LoRA pipeline subtask1

In [None]:
######################################CHANGE###############################################
from peft import LoraConfig, get_peft_model, TaskType
###########################################################################################
def sexism_classification_pipeline_task1_LoRA(trainInfo, devInfo, testInfo=None, model_name='roberta-base', nlabels=2, ptype="single_label_classification", **args):
    # Model and Tokenizer
    labelEnc = LabelEncoder()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=nlabels,
        problem_type=ptype
    )

    ######################################CHANGE###############################################
    # Configure LoRA
    lora_config = LoraConfig(
    task_type= args.get("task_type", TaskType.SEQ_CLS),
    target_modules= args.get("target_modules", ["query", "value"]),
    r= args.get("rank", 64),  # Rank of LoRA adaptation
    lora_alpha=args.get("lora_alpha", 32),  # Scaling factor
    lora_dropout=args.get("lora_dropout", 0.1),
    bias=args.get("bias", "none")
)
    ###########################################################################################

    ######################################CHANGE###############################################
    # Prepare LoRA model
    peft_model = get_peft_model(model, lora_config)

    ###########################################################################################
    # Prepare datasets
    train_dataset = SexismDataset(trainInfo[1], labelEnc.fit_transform(trainInfo[2]),[int(x) for x in trainInfo[0]], tokenizer )
    val_dataset = SexismDataset(devInfo[1], labelEnc.transform(devInfo[2]), [int(x) for x in devInfo[0]], tokenizer)

    # Training Arguments
    training_args = TrainingArguments(
        report_to="none", # alt: "wandb", "tensorboard" "comet_ml" "mlflow" "clearml"
        output_dir= args.get('output_dir', './results_task1_LoRA0'),
        num_train_epochs= args.get('num_train_epochs', 5),
        learning_rate=args.get('learning_rate', 5e-5),
        per_device_train_batch_size=args.get('per_device_train_batch_size', 16),
        per_device_eval_batch_size=args.get('per_device_eval_batch_size', 64),
        warmup_steps=args.get('warmup_steps', 500),
        weight_decay=args.get('weight_decay',0.01),
        logging_dir=args.get('logging_dir', './logs'),
        logging_steps=args.get('logging_steps', 10),
        eval_strategy=args.get('eval_strategy','epoch'),
        save_strategy=args.get('save_strategy', "epoch"),
        save_total_limit=args.get('save_total_limit', 1),
        load_best_model_at_end=args.get('load_best_model_at_end', True),
        metric_for_best_model=args.get('metric_for_best_model',"f1")
    )

    # Initialize Trainer
    trainer = Trainer(
        ######################################CHANGE###############################################
        # Prepare LoRA model
        model=peft_model,
        ###########################################################################################
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics_1,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=args.get("early_stopping_patience",3))]
    )

    # Fine-tune the model
    trainer.train()

    # Evaluate on validation set
    eval_results = trainer.evaluate()
    print("Validation Results:", eval_results)

    ######################################CHANGE###############################################
    #Saving the new weigths for the LoRA model
    trainer.save_model('./final_best_model_LoRA')
    # Notice that, in this case only the LoRA matrices are saved.
    # The weigths for the classification head are not saved.
    ###########################################################################################

    ######################################CHANGE###############################################
    #Mixing the LoRA matrices with the weigths of the base model used
    mixModel=peft_model.merge_and_unload()
    mixModel.save_pretrained("./final_best_model_mixpeft")
    # IN this case the full model is saved.
    ###########################################################################################

    if testInfo is not None:
        # Prepare test dataset for prediction
        test_dataset = SexismDataset(testInfo[1], [0] * len(testInfo[1]),  [int(x) for x in testInfo[0]],   tokenizer)

        # Predict test set labels
        predictions = trainer.predict(test_dataset)
        predicted_labels = np.argmax(predictions.predictions, axis=1)

        # Create submission DataFrame
        submission_df = pd.DataFrame({
            'id': testInfo[0],
            'label': labelEnc.inverse_transform(predicted_labels),
            "test_case": ["EXIST2025"]*len(predicted_labels)
        })
        submission_df.to_csv('sexism_predictions_task1.csv', index=False)
        print("Prediction for TASK 1 completed. Results saved to sexism_predictions_task1.csv")
        return model, submission_df
    return model, eval_results

## LoRA pipeline subtask2

In [None]:
######################################CHANGE###############################################
from peft import LoraConfig, get_peft_model, TaskType
###########################################################################################
def sexism_classification_pipeline_task2_LoRA(trainInfo, devInfo, testInfo=None, model_name='roberta-base', nlabels=3, ptype="single_label_classification", **args):
    # Model and Tokenizer
    labelEnc = LabelEncoder()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=nlabels,
        problem_type=ptype
    )
    ######################################CHANGE###############################################
    # Configure LoRA
    lora_config = LoraConfig(
    task_type= args.get("task_type", TaskType.SEQ_CLS),
    target_modules= args.get("target_modules", ["query", "value"]),
    r= args.get("rank", 64),  # Rank of LoRA adaptation
    lora_alpha=args.get("lora_alpha", 32),  # Scaling factor
    lora_dropout=args.get("lora_dropout", 0.1),
    bias=args.get("bias", "none"),
)
    ###########################################################################################

    ######################################CHANGE###############################################
    # Prepare LoRA model
    peft_model = get_peft_model(model, lora_config)

    ###########################################################################################

    # Prepare datasets
    train_dataset = SexismDataset(trainInfo[1], labelEnc.fit_transform(trainInfo[2]),[int(x) for x in trainInfo[0]], tokenizer )
    val_dataset = SexismDataset(devInfo[1], labelEnc.transform(devInfo[2]), [int(x) for x in devInfo[0]], tokenizer)

    # Training Arguments
    training_args = TrainingArguments(
        report_to="none", # alt: "wandb", "tensorboard" "comet_ml" "mlflow" "clearml"
        output_dir= args.get('output_dir', './results_task2_LoRA0'),
        num_train_epochs= args.get('num_train_epochs', 5),
        learning_rate=args.get('learning_rate', 5e-5),
        per_device_train_batch_size=args.get('per_device_train_batch_size', 16),
        per_device_eval_batch_size=args.get('per_device_eval_batch_size', 64),
        warmup_steps=args.get('warmup_steps', 500),
        weight_decay=args.get('weight_decay',0.01),
        logging_dir=args.get('logging_dir', './logs'),
        logging_steps=args.get('logging_steps', 10),
        eval_strategy=args.get('eval_strategy','epoch'),
        save_strategy=args.get('save_strategy', "epoch"),
        save_total_limit=args.get('save_total_limit', 1),
        load_best_model_at_end=args.get('load_best_model_at_end', True),
        metric_for_best_model=args.get('metric_for_best_model',"f1")
    )

    # Initialize Trainer
    trainer = Trainer(
        ######################################CHANGE###############################################
        # Prepare LoRA model
        model=peft_model,
        ###########################################################################################
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics_2,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=args.get("early_stopping_patience",3))]
    )

    # Fine-tune the model
    trainer.train()

    # Evaluate on validation set
    eval_results = trainer.evaluate()
    print("Validation Results:", eval_results)

    ######################################CHANGE###############################################
    #Saving the new weigths for the LoRA model
    trainer.save_model('./final_best_model_LoRA')
    # Notice that, in this case only the LoRA matrices are saved.
    # The weigths for the classification head are not saved.
    ###########################################################################################

    ######################################CHANGE###############################################
    #Mixing the LoRA matrices with the weigths of the base model used
    mixModel=peft_model.merge_and_unload()
    mixModel.save_pretrained("./final_best_model_mixpeft")
    # IN this case the full model is saved.
    ###########################################################################################

    if testInfo is not None:
        # Prepare test dataset for prediction
        test_dataset = SexismDataset(testInfo[1], [0] * len(testInfo[1]),  [int(x) for x in testInfo[0]],   tokenizer)

        # Predict test set labels
        predictions = trainer.predict(test_dataset)
        predicted_labels = np.argmax(predictions.predictions, axis=1)

        # Create submission DataFrame
        submission_df = pd.DataFrame({
            'id': testInfo[0],
            'label': labelEnc.inverse_transform(predicted_labels),
            "test_case": ["EXIST2025"]*len(predicted_labels)
        })
        submission_df.to_csv('sexism_predictions_task2.csv', index=False)
        print("Prediction for TASK 2 completed. Results saved to sexism_predictions_task1.csv")
        return model, submission_df
    return model, eval_results

# Experimental work

## English, subtask1

In [None]:
set_seed()

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

results_en_task1 = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
        }
        modelname = "roberta-base"
        m, res = sexism_classification_pipeline_task1(EnTrainTask1, EnDevTask1, None, modelname, 2, "single_label_classification", **params )
        results_en_task1.append({"lr": lr, "batch_size": bs, "result": res})
        print(f"lr: {lr}, bs: {bs}, res: {res}")


## English, subtask2

In [None]:
set_seed()

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

results_en_task2 = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
        }
        modelname = "roberta-base"
        m, res = sexism_classification_pipeline_task2(EnTrainTask2, EnDevTask2, None, modelname, 3, "single_label_classification", **params)
        results_en_task2.append({"lr": lr, "batch_size": bs, "result": res})

## English with LoRA, subtask1

In [None]:
set_seed()

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

results_en_task1_lora = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
            "use_lora": True,
        }
        modelname = "roberta-base"
        m, res = sexism_classification_pipeline_task1_LoRA(EnTrainTask1, EnDevTask1, None, modelname, 2, "single_label_classification", **params)
        results_en_task1_lora.append({"lr": lr, "batch_size": bs, "result": res})


## English with LoRA, subtask2

In [None]:
set_seed()

results_en_task2_lora = []

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
            "use_lora": True,
        }
        modelname = "roberta-base"
        m, res = sexism_classification_pipeline_task2_LoRA(EnTrainTask2, EnDevTask2, None, modelname, 5 , "single_label_classification", **params)
        results_en_task2_lora.append({"lr": lr, "batch_size": bs, "result": res})


In [None]:
from peft import PeftModel # importing the PeftModel class
# The model can be loadded in a simple way.
model = AutoModelForSequenceClassification.from_pretrained("./final_best_model_mixpeft")

# Spanish

# Task 1

In [None]:
set_seed()

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

results_es_task1 = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
            "use_lora": True,
        }
        modelname = "PlanTL-GOB-ES/roberta-base-bne"
        m, res = sexism_classification_pipeline_task1(SpTrainTask1, SpDevTask1, None, modelname, 2, "single_label_classification", **params )
        results_es_task1.append({"lr": lr, "batch_size": bs, "result": res})

# Task 2

In [None]:
set_seed()

learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

results_es_task2 = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
        }
        modelname = "PlanTL-GOB-ES/roberta-base-bne"
        m, res = sexism_classification_pipeline_task2(SpTrainTask2, SpDevTask2, None, modelname, 3, "single_label_classification", **params)
        results_es_task2.append((lr, bs, res))

# Task 1 - Lora

In [None]:
learning_rates = [5e-5, 3e-5, 2e-5]
batch_sizes = [16, 32]

results_es_task1_lora = []

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
            "use_lora": True,
        }
        modelname = "PlanTL-GOB-ES/roberta-base-bne"
        m, res = sexism_classification_pipeline_task1_LoRA(SpTrainTask1, SpDevTask1, None, modelname, 2, "single_label_classification", **params)
        results_es_task1_lora.append({"lr": lr, "batch_size": bs, "result": res})
        print(f"ES Task1 | LR: {lr}, BS: {bs}, Result: {res}")

# Task 2 - Lora

In [None]:
set_seed()

results_es_task2_lora = []
learning_rates = [5e-5, 3e-5]
batch_sizes = [16, 32]

for lr in learning_rates:
    for bs in batch_sizes:
        params = {
            "learning_rate": lr,
            "num_train_epochs": 4,
            "per_device_train_batch_size": bs,
            "warmup_steps": 100,
            "weight_decay": 0.01,
            "early_stopping_patience": 2,
            "use_lora": True,
        }
        modelname = "PlanTL-GOB-ES/roberta-base-bne"
        m, res = sexism_classification_pipeline_task2_LoRA(SpTrainTask2, SpDevTask2, None, modelname, 5, "single_label_classification", **params)
        results_es_task2_lora.append({"lr": lr, "batch_size": bs, "result": res})

# Show results

## Evaluate - function

In [None]:
def evaluate(results):
  best = 0.0
  best_params = 0, 0

  worst = 10
  worst_params = 0, 0

  avg = 0.0

  for mod_res in results:
      val = mod_res["result"]["eval_f1"]
      if val > best:
        best = val
        best_params = mod_res["batch_size"], mod_res["lr"]
      if val < worst:
        worst = val
        worst_params = mod_res["batch_size"], mod_res["lr"]
       # {mod_res['lr']}, batch size: {mod_res['batch_size']},
      print(f"f1: {mod_res['result']['eval_f1']}, accuracy: {mod_res['result']['eval_accuracy']}")
      avg += val

  return avg / len(results), best, worst


## Results - Subtask 1

In [None]:
print(f"English - Fine-tuning - Subtask 1")
avg, best, best_params, worst, worst_params = evaluate(results_en_task1)
print(f"Average F1: {avg}")
print(f"Best: {best} with these params (bs, lr): {best_params}")
print(f"Worst: {worst} with these params (bs, lr): {worst_params}")

In [None]:
print(f"English - LORA - Subtask 1")
avg, best, best_params, worst, worst_params = evaluate(results_en_task1_lora)
print(f"Average F1: {avg}")
print(f"Best: {best} with these params (bs, lr): {best_params}")
print(f"Worst: {worst} with these params (bs, lr): {worst_params}")

In [None]:
print(f"Spanish - Fine-tuning - Subtask 1")
avg, best, best_params, worst, worst_params = evaluate(results_es_task1)
print(f"Average F1: {avg}")
print(f"Best: {best} with these params (bs, lr): {best_params}")
print(f"Worst: {worst} with these params (bs, lr): {worst_params}")

In [None]:
print(f"Spanish - LORA - Subtask 1")
avg, best, best_params, worst, worst_params = evaluate(results_es_task1_lora)
print(f"Average F1: {avg}")
print(f"Best: {best} with these params (bs, lr): {best_params}")
print(f"Worst: {worst} with these params (bs, lr): {worst_params}")


## Results - Subtask 2

In [None]:
print(f"English - Fine-tuning - Subtask 2")
avg, best, worst = evaluate(results_en_task2)
print(f"Average F1: {avg}")
print(f"Best: {best}")
print(f"Worst: {worst}")

In [None]:
print(f"English - LORA - Subtask 2")
avg, best, worst = evaluate(results_en_task2_lora)
print(f"Average F1: {avg}")
print(f"Best: {best}")
print(f"Worst: {worst}")

In [None]:
print(f"Spanish - Fine-tuning - Subtask 2")
avg, best, worst = evaluate(results_es_task2)
print(f"Average F1: {avg}")
print(f"Best: {best}")
print(f"Worst: {worst}")

In [None]:
print(f"Spanish - LORA - Subtask 2")
avg, best, worst = evaluate(results_es_task2_lora)
print(f"Average F1: {avg}")
print(f"Best: {best}")
print(f"Worst: {worst}")