### Домашнее задание 5 - 10 баллов

В этом задании вам предстоит дообучить трансформерную модель для задачи классификации с помощью различных техник и сравнить их между собой.

Датасет: [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion)

Модель: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) (если хочется, можно заменить на что-то более интересное)

1. Скачайте датасет и модель. Измерьте базовые метрики классификации перед началом экспериментов.

**NB!** Для всех типов дообучения замерьте :
- качество классификации на выходе
- время дообучения
- количество параметров для обучения
- потребление ресурсов (не нужно заморачиваться с профайлингом - можно просто посмотреть в `nvidia-smi` или `torch.cuda.memory_allocated`)

2. Обучите модель в режиме full finetuning - **1 балл**
3. Обучите модель в режиме linear probing - реализуйте кастомную классификационную голову и обучайте только ее. Не забудьте описать, чем обусловлено устройство головы, как вы пришли к такой архитектуре - **2 балла**
4. Обучите модель в режиме PEFT с использованием [prompt tuning или prefix tuning](https://ericwiener.github.io/ai-notes/AI-Notes/Large-Language-Models/Prompt-Tuning-and-Prefix-Tuning). При выборе метода напишите пару слов, почему решили остановиться именно на этом методе - **2 балла**
4. Обучите модель в режиме PEFT с использованием LoRA. Попробуйте подобрать оптимальный ранг - `r`, при желании поэкспериментируйте с остальными гиперпараметрами. Опишите, чем обусловлена ваша финальная конфигурация - **2 балла**

5. Соберите все результаты отдельных замеров в таблицу и сделайте выводы о вычислительной сложности методов, итоговом качестве и прочих наблюдаемых свойствах моделей - **1 балл**

**Задание выполнено в Google Colab**

In [None]:
!pip install evaluate

In [3]:
import torch
import numpy as np
import random
import evaluate
import time
import pandas as pd

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType, PromptTuningConfig
from transformers import DataCollatorWithPadding

In [4]:
# Обеспечим воспроизводимость
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

**1. Загрузка данных и модели**

In [None]:
dataset = load_dataset("dair-ai/emotion")
label_list = dataset["train"].features["label"].names
num_labels = len(label_list)

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize(example):
    return tokenizer(example["text"], truncation=True)

dataset = dataset.map(tokenize, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [6]:
pd.Series(dataset["train"]["label"]).value_counts()

Unnamed: 0,count
1,5362
0,4666
3,2159
4,1937
2,1304
5,572


В коллабе есть ограничение на время использование GPU, поэтому не будем трогать количество эпох и learning rate. Будем смотреть сходимость за фиксированное количество эпох. А для оценки метрик в конце используется лучшая модель среди всех эпох по минимальному валидационному лоссу

In [7]:
args = TrainingArguments(
    output_dir="/content/hw_5",
    eval_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    num_train_epochs=10,
    learning_rate=1e-4,
    seed=SEED,
    load_best_model_at_end=True,
)

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to(device)

accuracy_metric = evaluate.load("accuracy")
recall_metric = evaluate.load("recall")
precision_metric = evaluate.load("precision")
f1_metric = evaluate.load("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {
        "Accuracy": accuracy_metric.compute(predictions=predictions, references=labels)["accuracy"],
        "Recall": recall_metric.compute(predictions=predictions, references=labels, average="macro", zero_division=0)[
            "recall"
        ],
        "Precision": precision_metric.compute(
            predictions=predictions, references=labels, average="macro", zero_division=0
        )["precision"],
        "F1": f1_metric.compute(predictions=predictions, references=labels, average="macro")["f1"],
    }

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

In [10]:
eval_results = trainer.evaluate(dataset["test"])
for metric, value in eval_results.items():
    print(f"{metric}: {value:.4f}")
print(f"Trainable parameters: {(sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6):.2f}, M")
print(f"CUDA memory used: {(torch.cuda.memory_allocated() / 1e6):.2f}MB")

eval_loss: 1.6840
eval_model_preparation_time: 0.0030
eval_Accuracy: 0.1510
eval_Recall: 0.1619
eval_Precision: 0.1449
eval_F1: 0.0729
eval_runtime: 5.6283
eval_samples_per_second: 355.3460
eval_steps_per_second: 5.6860
Trainable parameters: 109.49, M
CUDA memory used: 447.61MB


In [13]:
def train_and_show_results(trainer):
    start = time.time()
    trainer.train()
    end = time.time()

    eval_results = trainer.evaluate(dataset["test"])
    for metric, value in eval_results.items():
        print(f"{metric}: {value:.4f}")
    print("Training time:", round(end - start, 2), "sec")
    print(f"Trainable parameters: {(sum(p.numel() for p in trainer.model.parameters() if p.requires_grad) / 1e6):.2f}M")
    print(f"CUDA memory used: {(torch.cuda.memory_allocated() / 1e6):.2f}MB")

**2. Full finetuning**

In [14]:
train_and_show_results(trainer)

Epoch,Training Loss,Validation Loss,Model Preparation Time,Accuracy,Recall,Precision,F1
1,0.3035,0.241592,0.003,0.92,0.913736,0.880857,0.894471
2,0.2169,0.210723,0.003,0.9335,0.892222,0.93113,0.90777
3,0.1681,0.233989,0.003,0.9265,0.904639,0.905898,0.901467
4,0.1414,0.267452,0.003,0.9305,0.903833,0.903611,0.903173
5,0.1053,0.370138,0.003,0.924,0.910869,0.889622,0.898964
6,0.0739,0.38133,0.003,0.931,0.89935,0.908406,0.90376
7,0.039,0.425138,0.003,0.9305,0.897094,0.913173,0.904842
8,0.0273,0.39316,0.003,0.9335,0.907258,0.904395,0.9057
9,0.0166,0.468375,0.003,0.9335,0.911238,0.905388,0.907803
10,0.0104,0.482386,0.003,0.932,0.907905,0.903346,0.9054


eval_loss: 0.2020
eval_model_preparation_time: 0.0030
eval_Accuracy: 0.9265
eval_Recall: 0.8639
eval_Precision: 0.9136
eval_F1: 0.8830
eval_runtime: 6.5028
eval_samples_per_second: 307.5590
eval_steps_per_second: 4.9210
epoch: 10.0000
Training time: 1783.31 sec
Trainable parameters: 109.49M
CUDA memory used: 1333.99MB


**3. Linear probing**

Загружаем модель и замораживаем Encoder, чтобы обучалась только голова. Здесь используется:
- CLS токен → Dropout → Dense → Softmax

Это простой и эффективный классификатор, не требовательный по ресурсам + класс уже реализован в transformers


In [15]:
lp_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

for param in lp_model.bert.parameters():
    param.requires_grad = False

lp_model = lp_model.to(device)

trainer = Trainer(
    model=lp_model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

train_and_show_results(trainer)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.56,1.546992,0.4275,0.22303,0.14063,0.172138
2,1.5322,1.519435,0.4275,0.218589,0.143581,0.168333
3,1.5164,1.507072,0.4505,0.231203,0.151246,0.178556
4,1.5,1.500926,0.454,0.242405,0.153779,0.18617
5,1.4926,1.478775,0.4685,0.242642,0.155058,0.187503
6,1.4831,1.481191,0.4655,0.248182,0.156873,0.190619
7,1.4805,1.468469,0.4745,0.246809,0.156574,0.190684
8,1.473,1.465585,0.477,0.249848,0.156922,0.192717
9,1.4768,1.464318,0.4785,0.250227,0.157362,0.193068
10,1.4748,1.46206,0.4805,0.251108,0.158095,0.193804


eval_loss: 1.4335
eval_Accuracy: 0.4640
eval_Recall: 0.2389
eval_Precision: 0.1531
eval_F1: 0.1860
eval_runtime: 6.5645
eval_samples_per_second: 304.6670
eval_steps_per_second: 4.8750
epoch: 10.0000
Training time: 600.46 sec
Trainable parameters: 0.00M
CUDA memory used: 895.94MB


**4. Prompt/Prefix tuning**

Метод **Prefix tuning** звучит интереснее, так как в нем добавляются обучаемые векторы прямо в attention, а не просто в начало текста. А значит влияние этих токенов по логике должно быть сильнее, что может быть довольно полезно в задачах, подобных этой. Такой подход немного тяжелее по памяти + лучше работает на моделях типа BERT.

Однако я буду использовать **Prompt Tuning**, так как он позволяет дообучать небольшое количество виртуальных токенов, которые добавляются в начало входа, а это крутой способ, особенно если нет возможности гонять всю модель. Дополнительной фичей является то, что его легко внедрить через готовые инструменты и он не требует изменений внутри самой модели.

In [17]:
pt_config = PromptTuningConfig(
    task_type=TaskType.SEQ_CLS,
    prompt_tuning_init="TEXT",
    prompt_tuning_init_text="Classify the emotion in this sentence.",
    num_virtual_tokens=20,
    tokenizer_name_or_path=model_name,
)

pt_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
pt_model = get_peft_model(pt_model, pt_config)
pt_model.print_trainable_parameters()
pt_model.to(device)

trainer = Trainer(
    model=pt_model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

train_and_show_results(trainer)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 15,360 || all params: 109,502,214 || trainable%: 0.0140


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.6117,1.591554,0.3515,0.166496,0.100284,0.087268
2,1.5972,1.577233,0.3585,0.172263,0.119499,0.106427
3,1.5892,1.569896,0.3845,0.188286,0.147094,0.131974
4,1.5781,1.563704,0.3905,0.202263,0.128662,0.156108
5,1.5749,1.564147,0.378,0.182689,0.163678,0.118418
6,1.5726,1.553932,0.422,0.209953,0.165296,0.157326
7,1.5733,1.553407,0.416,0.205985,0.171795,0.15259
8,1.5619,1.547219,0.4255,0.212074,0.168752,0.160088
9,1.5684,1.543463,0.4335,0.217055,0.168218,0.165598
10,1.5659,1.545428,0.427,0.212917,0.170859,0.161135


eval_loss: 1.5239
eval_Accuracy: 0.4235
eval_Recall: 0.2109
eval_Precision: 0.1702
eval_F1: 0.1589
eval_runtime: 9.2131
eval_samples_per_second: 217.0830
eval_steps_per_second: 3.4730
epoch: 10.0000
Training time: 1438.34 sec
Trainable parameters: 0.02M
CUDA memory used: 1334.92MB


**5. LoRA**

In [18]:
for r in [2, 4, 8, 16, 32]:
    print("R = ", r)
    lora_config = LoraConfig(task_type=TaskType.SEQ_CLS, r=r)

    lora_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
    lora_model = get_peft_model(lora_model, lora_config)
    lora_model.print_trainable_parameters()

    lora_model.to(device)

    trainer = Trainer(
        model=lora_model,
        args=args,
        train_dataset=dataset["train"],
        eval_dataset=dataset["validation"],
        processing_class=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

    train_and_show_results(trainer)
    print()

R =  2


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 78,342 || all params: 109,565,196 || trainable%: 0.0715


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0229,0.83125,0.6925,0.486546,0.564912,0.463246
2,0.6709,0.568593,0.8,0.676687,0.813248,0.706475
3,0.501,0.406667,0.8645,0.796574,0.860621,0.817945
4,0.4143,0.36312,0.887,0.840503,0.872995,0.853895
5,0.3714,0.283621,0.9075,0.867005,0.893234,0.878671
6,0.3344,0.255554,0.915,0.875825,0.896806,0.885486
7,0.2921,0.245937,0.922,0.889205,0.900985,0.89424
8,0.2867,0.23709,0.9215,0.901226,0.893671,0.896931
9,0.2724,0.234125,0.9205,0.888167,0.894853,0.891086
10,0.2789,0.230692,0.9235,0.894152,0.899698,0.896359


eval_loss: 0.2305
eval_Accuracy: 0.9160
eval_Recall: 0.8553
eval_Precision: 0.8786
eval_F1: 0.8642
eval_runtime: 6.9214
eval_samples_per_second: 288.9590
eval_steps_per_second: 4.6230
epoch: 10.0000
Training time: 1039.32 sec
Trainable parameters: 0.08M
CUDA memory used: 1786.06MB

R =  4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 152,070 || all params: 109,638,924 || trainable%: 0.1387


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0232,0.817579,0.7045,0.531508,0.577121,0.517158
2,0.6305,0.539259,0.81,0.692367,0.809127,0.715676
3,0.4716,0.382202,0.8755,0.817688,0.85773,0.830674
4,0.391,0.319898,0.8945,0.858076,0.869006,0.862093
5,0.342,0.267661,0.907,0.872903,0.877982,0.874606
6,0.3037,0.241609,0.916,0.880849,0.895269,0.887608
7,0.2685,0.240214,0.9205,0.885444,0.895499,0.889294
8,0.2574,0.235974,0.9225,0.904006,0.888561,0.895535
9,0.255,0.231833,0.9245,0.899768,0.897019,0.898009
10,0.2438,0.229395,0.927,0.904767,0.899592,0.901898


eval_loss: 0.2376
eval_Accuracy: 0.9220
eval_Recall: 0.8743
eval_Precision: 0.8899
eval_F1: 0.8796
eval_runtime: 6.9077
eval_samples_per_second: 289.5310
eval_steps_per_second: 4.6330
epoch: 10.0000
Training time: 1044.36 sec
Trainable parameters: 0.15M
CUDA memory used: 1784.36MB

R =  8


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 299,526 || all params: 109,786,380 || trainable%: 0.2728


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0028,0.793938,0.705,0.520763,0.593392,0.509184
2,0.6212,0.522613,0.8175,0.699954,0.821436,0.726111
3,0.4618,0.366882,0.875,0.801693,0.868239,0.823991
4,0.3763,0.308729,0.897,0.849479,0.888764,0.864891
5,0.3328,0.254524,0.915,0.879919,0.892776,0.885645
6,0.2915,0.228422,0.9225,0.888838,0.899297,0.893751
7,0.2605,0.225807,0.925,0.889869,0.911254,0.898734
8,0.2529,0.225534,0.9255,0.901018,0.898106,0.899133
9,0.247,0.21848,0.9265,0.895619,0.903457,0.898723
10,0.237,0.216913,0.9265,0.89685,0.899321,0.897504


eval_loss: 0.2158
eval_Accuracy: 0.9235
eval_Recall: 0.8700
eval_Precision: 0.8945
eval_F1: 0.8797
eval_runtime: 6.9332
eval_samples_per_second: 288.4680
eval_steps_per_second: 4.6150
epoch: 10.0000
Training time: 1044.96 sec
Trainable parameters: 0.30M
CUDA memory used: 1785.83MB

R =  16


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 594,438 || all params: 110,081,292 || trainable%: 0.5400


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0128,0.802148,0.702,0.517468,0.568944,0.507106
2,0.6078,0.503655,0.8205,0.711339,0.817975,0.737621
3,0.4533,0.364199,0.883,0.830135,0.865911,0.841417
4,0.37,0.293962,0.901,0.866735,0.877146,0.87093
5,0.3242,0.252691,0.9155,0.882842,0.891816,0.886851
6,0.2901,0.223058,0.9245,0.88544,0.906051,0.89456
7,0.2582,0.22611,0.923,0.884074,0.905121,0.892405
8,0.2479,0.22068,0.927,0.903428,0.895543,0.899193
9,0.2437,0.21618,0.928,0.899295,0.904457,0.900963
10,0.2301,0.213702,0.929,0.902075,0.901856,0.901521


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


eval_loss: 0.2205
eval_Accuracy: 0.9260
eval_Recall: 0.8797
eval_Precision: 0.8982
eval_F1: 0.8868
eval_runtime: 6.8845
eval_samples_per_second: 290.5080
eval_steps_per_second: 4.6480
epoch: 10.0000
Training time: 1050.12 sec
Trainable parameters: 0.59M
CUDA memory used: 1791.24MB

R =  32


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 1,184,262 || all params: 110,671,116 || trainable%: 1.0701


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,0.9931,0.781931,0.7115,0.530398,0.592225,0.522438
2,0.6038,0.50697,0.823,0.715156,0.824947,0.73958
3,0.4471,0.352985,0.88,0.812128,0.870329,0.831242
4,0.3645,0.302454,0.899,0.850782,0.889358,0.865788
5,0.3201,0.247632,0.9155,0.879062,0.890407,0.884107
6,0.2834,0.22087,0.925,0.889054,0.904471,0.896156
7,0.2503,0.219385,0.928,0.889245,0.914645,0.899807
8,0.245,0.219204,0.93,0.903059,0.903367,0.902914
9,0.2398,0.215959,0.928,0.895047,0.904061,0.898474
10,0.228,0.213497,0.93,0.898098,0.907337,0.901774


eval_loss: 0.2120
eval_Accuracy: 0.9270
eval_Recall: 0.8742
eval_Precision: 0.9093
eval_F1: 0.8877
eval_runtime: 6.8926
eval_samples_per_second: 290.1640
eval_steps_per_second: 4.6430
epoch: 10.0000
Training time: 1056.81 sec
Trainable parameters: 1.18M
CUDA memory used: 1797.50MB



In [19]:
data = [
    {
        "R": 2,
        "Trainable Params (M)": 0.08,
        "Trainable %": 0.0715,
        "Accuracy": 0.9160,
        "Recall": 0.8553,
        "Precision": 0.8786,
        "F1": 0.8642,
        "Training Time (s)": 1039.32,
        "Eval Time (s)": 6.9214,
        "CUDA Memory (MB)": 1786.06,
    },
    {
        "R": 4,
        "Trainable Params (M)": 0.15,
        "Trainable %": 0.1387,
        "Accuracy": 0.9220,
        "Recall": 0.8743,
        "Precision": 0.8899,
        "F1": 0.8796,
        "Training Time (s)": 1044.36,
        "Eval Time (s)": 6.9077,
        "CUDA Memory (MB)": 1784.36,
    },
    {
        "R": 8,
        "Trainable Params (M)": 0.30,
        "Trainable %": 0.2728,
        "Accuracy": 0.9235,
        "Recall": 0.8700,
        "Precision": 0.8945,
        "F1": 0.8797,
        "Training Time (s)": 1044.96,
        "Eval Time (s)": 6.9332,
        "CUDA Memory (MB)": 1785.83,
    },
    {
        "R": 16,
        "Trainable Params (M)": 0.59,
        "Trainable %": 0.5400,
        "Accuracy": 0.9260,
        "Recall": 0.8797,
        "Precision": 0.8982,
        "F1":0.8868,
        "Training Time (s)": 1050.12,
        "Eval Time (s)": 6.8845,
        "CUDA Memory (MB)": 1791.24,
    },
    {
        "R": 32,
        "Trainable Params (M)": 1.18,
        "Trainable %": 1.0701,
        "Accuracy": 0.9270,
        "Recall": 0.8742,
        "Precision": 0.9093,
        "F1": 0.8877,
        "Training Time (s)": 1056.81,
        "Eval Time (s)": 6.8926,
        "CUDA Memory (MB)": 1797.50,
    },
]

df_lora = pd.DataFrame(data)
df_lora = df_lora.style.highlight_max(subset=["Accuracy", "Recall", "Precision", "F1"], color="green")
df_lora

Unnamed: 0,R,Trainable Params (M),Trainable %,Accuracy,Recall,Precision,F1,Training Time (s),Eval Time (s),CUDA Memory (MB)
0,2,0.08,0.0715,0.916,0.8553,0.8786,0.8642,1039.32,6.9214,1786.06
1,4,0.15,0.1387,0.922,0.8743,0.8899,0.8796,1044.36,6.9077,1784.36
2,8,0.3,0.2728,0.9235,0.87,0.8945,0.8797,1044.96,6.9332,1785.83
3,16,0.59,0.54,0.926,0.8797,0.8982,0.8868,1050.12,6.8845,1791.24
4,32,1.18,1.0701,0.927,0.8742,0.9093,0.8877,1056.81,6.8926,1797.5


Мы видим, что качество постепенно растёт с увеличением r, но незначительно. При R = 32 достигается лучший F1-score = 0.8877, но прирост незначителен по сравнению с 16. То есть в целом, можно было его и не проверять. Использование GPU памяти почти не растёт, что делает LoRA очень экономичным. Даже при R = 32 обучается только 1.07% параметров модели (1.18M из 110M)

Вывод такой:
- если намважна эффективность - оптимальным можно считать R = 8 или R = 16
- если хотим легкий бейзлайн - смело берем R = 2



**6. Итоговое сравнение**

In [22]:
data = {
    "method": ["no finetuning", "finetuning", "linear probing", "prompt tuning", "LoRA_R32"],
    "Accuracy": [0.1510, 0.9265, 0.4640, 0.4235, 0.9270],
    "Recall": [0.1619, 0.8639, 0.2389, 0.2109, 0.8742],
    "Precision": [0.1449, 0.9136, 0.1531, 0.1702, 0.9093],
    "F1-score": [0.0729, 0.8830, 0.1860, 0.1589, 0.8877],
    "trainable_params_M": [109.49, 109.49, 0.00, 0.02, 1.18],
    "cuda_memory_MB": [447.61, 1333.99, 895.94, 1334.92, 1797.50],
    "training_time_sec": [None, 1783.31, 600.46, 1438.34, 1056.81],
}

result_df = pd.DataFrame(data)
result_df = (
    result_df.style.format(precision=4)
    .highlight_max(subset=["Accuracy", "Recall", "Precision", "F1-score"], color="green")
    .highlight_min(subset=["trainable_params_M", "cuda_memory_MB", "training_time_sec"], color="green")
)
result_df

Unnamed: 0,method,Accuracy,Recall,Precision,F1-score,trainable_params_M,cuda_memory_MB,training_time_sec
0,no finetuning,0.151,0.1619,0.1449,0.0729,109.49,447.61,
1,finetuning,0.9265,0.8639,0.9136,0.883,109.49,1333.99,1783.31
2,linear probing,0.464,0.2389,0.1531,0.186,0.0,895.94,600.46
3,prompt tuning,0.4235,0.2109,0.1702,0.1589,0.02,1334.92,1438.34
4,LoRA_R32,0.927,0.8742,0.9093,0.8877,1.18,1797.5,1056.81


**7. Выводы**

- Лучшие показатели у нас вышли на LoRA_R32 - выигрываем по Accuracy, Recall и F1 мере. Немного проигрываем по Precision файнтюнингу, а по времени обучения на втором месте, если не считать no finetuning метод. Единственное, где мы сильно проигрываем - это по используемой памяти.

- Fine-tuning тоже неплохой, поставил бы его на второе место, но он ну очень долго обучался, почти в 2 раза дольше Лоры.

- Prompt tuning и Linear probing - плохенькие по всем показателям, только промпт тюнинг ещё и обучался долго почему-то.

- Ну и конечно, no finetuning дал наихудший результат, разве что по памяти всех выиграл, но он и так понятно почему :)
