### Домашнее задание 5 - 10 баллов

В этом задании вам предстоит дообучить трансформерную модель для задачи классификации с помощью различных техник и сравнить их между собой.

Датасет: [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) 

Модель: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) (если хочется, можно заменить на что-то более интересное)

1. Скачайте датасет и модель. Измерьте базовые метрики классификации перед началом экспериментов.

**NB!** Для всех типов дообучения замерьте :
- качество классификации на выходе
- время дообучения
- количество параметров для обучения
- потребление ресурсов (не нужно заморачиваться с профайлингом - можно просто посмотреть в `nvidia-smi` или `torch.cuda.memory_allocated`)

2. Обучите модель в режиме full finetuning - **1 балл**
3. Обучите модель в режиме linear probing - реализуйте кастомную классификационную голову и обучайте только ее. Не забудьте описать, чем обусловлено устройство головы, как вы пришли к такой архитектуре - **2 балла**
4. Обучите модель в режиме PEFT с использованием [prompt tuning или prefix tuning](https://ericwiener.github.io/ai-notes/AI-Notes/Large-Language-Models/Prompt-Tuning-and-Prefix-Tuning). При выборе метода напишите пару слов, почему решили остановиться именно на этом методе - **2 балла**
4. Обучите модель в режиме PEFT с использованием LoRA. Попробуйте подобрать оптимальный ранг - `r`, при желании поэкспериментируйте с остальными гиперпараметрами. Опишите, чем обусловлена ваша финальная конфигурация - **2 балла**

5. Соберите все результаты отдельных замеров в таблицу и сделайте выводы о вычислительной сложности методов, итоговом качестве и прочих наблюдаемых свойствах моделей - **1 балл**

**Общее**

- Принимаемые решения обоснованы (почему выбрана определенная архитектура/гиперпараметр/оптимизатор/преобразование и т.п.) - **1 балл**
- Обеспечена воспроизводимость решения: зафиксированы random_state, ноутбук воспроизводится от начала до конца без ошибок - **1 балл**

In [None]:
import torch
import numpy as np
import random
import evaluate
import time
import pandas as pd

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, PrefixTuningConfig, TaskType, PromptTuningConfig
from transformers import DataCollatorWithPadding

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 1. Загрузка датасета и модели

In [2]:
dataset = load_dataset("dair-ai/emotion")
label_list = dataset["train"].features["label"].names
num_labels = len(label_list)

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)


def tokenize(example):
    return tokenizer(example["text"], truncation=True)


dataset = dataset.map(tokenize, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [3]:
pd.Series(dataset["train"]["label"]).value_counts()

1    5362
0    4666
3    2159
4    1937
2    1304
5     572
Name: count, dtype: int64

In [None]:
# DISCLAIMER: для чистоты эксперимента и из-за ограничения по времени, параметры обучения типа
# lr, количества эпох и тд. подбираться не будут, чтобы посмотреть сходимость за фиксированное количество эпох
# + для оценки метрик в конце используется лучшая модель среди всех эпох по минимальному валидационному лоссу
args = TrainingArguments(
    output_dir="./hw_5",
    eval_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    num_train_epochs=10,
    learning_rate=1e-4,
    seed=SEED,
    load_best_model_at_end=True,
)

In [5]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to(device)

accuracy_metric = evaluate.load("accuracy")
recall_metric = evaluate.load("recall")
precision_metric = evaluate.load("precision")
f1_metric = evaluate.load("f1")


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {
        "Accuracy": accuracy_metric.compute(predictions=predictions, references=labels)["accuracy"],
        "Recall": recall_metric.compute(predictions=predictions, references=labels, average="macro", zero_division=0)[
            "recall"
        ],
        "Precision": precision_metric.compute(
            predictions=predictions, references=labels, average="macro", zero_division=0
        )["precision"],
        "F1": f1_metric.compute(predictions=predictions, references=labels, average="macro")["f1"],
    }


trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
eval_results = trainer.evaluate(dataset["test"])
for metric, value in eval_results.items():
    print(f"{metric}: {value:.4f}")
print(f"Trainable parameters: {(sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6):.2f}, M")
print(f"CUDA memory used: {(torch.cuda.memory_allocated() / 1e6):.2f}MB")

eval_loss: 1.1865
eval_model_preparation_time: 0.0018
eval_Accuracy: 0.5775
eval_Recall: 0.3039
eval_Precision: 0.2751
eval_F1: 0.2425
Trainable parameters: 109.49, M
CUDA memory used: 1334.25MB


In [11]:
def train_and_pretty_print_results(trainer):
    start = time.time()
    trainer.train()
    end = time.time()

    eval_results = trainer.evaluate(dataset["test"])
    for metric, value in eval_results.items():
        print(f"{metric}: {value:.4f}")
    print("Training time:", round(end - start, 2), "sec")
    print(f"Trainable parameters: {(sum(p.numel() for p in trainer.model.parameters() if p.requires_grad) / 1e6):.2f}M")
    print(f"CUDA memory used: {(torch.cuda.memory_allocated() / 1e6):.2f}MB")

# 2. Full finetuning

In [12]:
train_and_pretty_print_results(trainer)

Epoch,Training Loss,Validation Loss,Model Preparation Time,Accuracy,Recall,Precision,F1
1,0.3333,0.248381,0.0018,0.916,0.906065,0.880939,0.890773
2,0.21,0.233923,0.0018,0.921,0.886878,0.904915,0.893152
3,0.159,0.173627,0.0018,0.9325,0.907931,0.914385,0.905983
4,0.1448,0.302654,0.0018,0.927,0.910205,0.902231,0.904963
5,0.1294,0.291294,0.0018,0.9295,0.911359,0.89978,0.904006
6,0.0778,0.299224,0.0018,0.9295,0.901708,0.906698,0.903801
7,0.0502,0.445278,0.0018,0.936,0.898278,0.924899,0.910674
8,0.0314,0.389718,0.0018,0.934,0.901167,0.921105,0.910464
9,0.0189,0.391706,0.0018,0.9355,0.916138,0.901923,0.908733
10,0.0115,0.390262,0.0018,0.9415,0.917145,0.916783,0.916942


eval_loss: 0.1965
eval_model_preparation_time: 0.0018
eval_Accuracy: 0.9235
eval_Recall: 0.8891
eval_Precision: 0.8951
eval_F1: 0.8823
eval_runtime: 3.2845
eval_samples_per_second: 608.9210
eval_steps_per_second: 9.7430
epoch: 10.0000
Training time: 1127.64 sec
Trainable parameters: 109.49M
CUDA memory used: 1334.25MB


# 3. Linear probing

In [13]:
# загружаем модель и замораживаем encoder, обучается только голова
lp_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
for param in lp_model.bert.parameters():
    param.requires_grad = False
lp_model = lp_model.to(device)

# используется просто CLS токен → Dropout → Dense → Softmax
# простой и эффективный классификатор, минимальные затраты, уже реализованный класс из transformers
# остальная часть кода аналогична Full finetuning, кроме `requires_grad = False`

trainer = Trainer(
    model=lp_model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
train_and_pretty_print_results(trainer)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.5597,1.549771,0.4295,0.221127,0.142387,0.170567
2,1.5328,1.521343,0.4155,0.211117,0.139228,0.161438
3,1.5182,1.510248,0.456,0.235398,0.150733,0.181655
4,1.5012,1.502037,0.453,0.242263,0.153756,0.185876
5,1.4922,1.479818,0.4685,0.242708,0.154883,0.187507
6,1.4847,1.482899,0.4665,0.248059,0.156442,0.190704
7,1.4824,1.469791,0.47,0.244148,0.154711,0.188472
8,1.4769,1.467436,0.4745,0.248731,0.156104,0.191786
9,1.4804,1.465191,0.479,0.25053,0.157562,0.193284
10,1.4735,1.46335,0.48,0.250672,0.157891,0.193446


eval_loss: 1.4338
eval_Accuracy: 0.4645
eval_Recall: 0.2388
eval_Precision: 0.1532
eval_F1: 0.1858
eval_runtime: 3.3332
eval_samples_per_second: 600.0290
eval_steps_per_second: 9.6000
epoch: 10.0000
Training time: 315.08 sec
Trainable parameters: 0.00M
CUDA memory used: 1774.06MB


# 4. Prompt/Prefix tuning

In [None]:
# prefix tuning мне показался интереснее prompt tuning, потому что он добавляет обучаемые векторы прямо
# в attention внутри модели, а не просто в начало текста. то есть влияние этих токенов по идее должно быть
# глубже и сильнее, что может быть полезно в задачах с тонкими различиями, как эмоции. да, он чуть тяжелее
# по памяти, но не сильно. также прочитала, что он лучше работает на меньших моделях (типа bert)

peft_config = PrefixTuningConfig(
    task_type=TaskType.SEQ_CLS,
    num_virtual_tokens=20,  # средний стартовый вариант для большинства задач
)

peft_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
peft_model = get_peft_model(peft_model, peft_config)
peft_model.print_trainable_parameters()
peft_model.to(device)

trainer = Trainer(
    model=peft_model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
train_and_pretty_print_results(trainer)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 368,640 || all params: 109,855,494 || trainable%: 0.3356


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.7147,1.690119,0.3515,0.167829,0.106241,0.090243
2,1.6952,1.669006,0.3515,0.167829,0.12523,0.090218
3,1.6712,1.647909,0.3515,0.167829,0.12523,0.090218
4,1.6472,1.627662,0.3545,0.168447,0.142286,0.092249
5,1.6278,1.612693,0.3595,0.172405,0.130946,0.104655
6,1.6193,1.60298,0.3695,0.180256,0.123616,0.123184
7,1.6157,1.595963,0.3785,0.186572,0.12815,0.133435
8,1.6051,1.591501,0.382,0.189223,0.128174,0.137469
9,1.6069,1.589112,0.387,0.192519,0.129324,0.141518
10,1.6054,1.588218,0.389,0.193598,0.130272,0.142478


eval_loss: 1.5779
eval_Accuracy: 0.3820
eval_Recall: 0.1901
eval_Precision: 0.1284
eval_F1: 0.1400
eval_runtime: 3.3695
eval_samples_per_second: 593.5530
eval_steps_per_second: 9.4970
epoch: 10.0000
Training time: 581.81 sec
Trainable parameters: 0.37M
CUDA memory used: 2256.85MB


In [None]:
# однако из-за того, что prefix tuning показал такие плохие метрики (как бы я не меняла параметры), решила попробовать и prompt tuning
# в итоге я остановилась на нем из-за лучших метрик. он позволяет дообучать только маленькое количество виртуальных токенов,
# которые добавляются в начало входа. это суперэффективный способ, особенно если нет возможности гонять всю модель.
# ещё один плюс — его легко внедрить через готовые инструменты, и он не требует изменений внутри самой модели.

peft_config = PromptTuningConfig(
    task_type=TaskType.SEQ_CLS,
    prompt_tuning_init="TEXT",
    prompt_tuning_init_text="Classify the emotion in this sentence.",
    num_virtual_tokens=20,  #
    tokenizer_name_or_path=model_name,
)

peft_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
peft_model = get_peft_model(peft_model, peft_config)
peft_model.print_trainable_parameters()
peft_model.to(device)

trainer = Trainer(
    model=peft_model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
train_and_pretty_print_results(trainer)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 15,360 || all params: 109,502,214 || trainable%: 0.0140


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.6859,1.625574,0.3525,0.167036,0.125397,0.088023
2,1.635,1.585426,0.407,0.212614,0.151385,0.16804
3,1.6071,1.544583,0.4395,0.227292,0.186456,0.179919
4,1.5807,1.604632,0.3855,0.248977,0.202094,0.213545
5,1.5721,1.54159,0.4595,0.282803,0.218746,0.245001
6,1.5551,1.515897,0.461,0.287623,0.216768,0.245038
7,1.5537,1.514201,0.438,0.282538,0.208013,0.23223
8,1.5334,1.514675,0.425,0.279261,0.203578,0.224996
9,1.5358,1.507338,0.433,0.282102,0.205779,0.228656
10,1.5332,1.495184,0.4515,0.289934,0.212549,0.238143


eval_loss: 1.4769
eval_Accuracy: 0.4750
eval_Recall: 0.2989
eval_Precision: 0.2231
eval_F1: 0.2508
eval_runtime: 4.5304
eval_samples_per_second: 441.4670
eval_steps_per_second: 7.0630
epoch: 10.0000
Training time: 780.66 sec
Trainable parameters: 0.02M
CUDA memory used: 1804.14MB


# 5. LoRA

In [None]:
for r in [2, 4, 8, 16, 32]:
    print("R = ", r)
    lora_config = LoraConfig(task_type=TaskType.SEQ_CLS, r=r)

    lora_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
    lora_model = get_peft_model(lora_model, lora_config)
    lora_model.print_trainable_parameters()

    lora_model.to(device)

    trainer = Trainer(
        model=lora_model,
        args=args,
        train_dataset=dataset["train"],
        eval_dataset=dataset["validation"],
        processing_class=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )
    train_and_pretty_print_results(trainer)
    print()

R =  2


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 78,342 || all params: 109,565,196 || trainable%: 0.0715


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0427,0.864485,0.68,0.466521,0.623927,0.426045
2,0.6789,0.593508,0.7985,0.684998,0.778109,0.707712
3,0.5108,0.435054,0.8605,0.795782,0.837918,0.809244
4,0.4201,0.36636,0.891,0.8428,0.863781,0.849898
5,0.361,0.311671,0.901,0.859371,0.87936,0.867804
6,0.3304,0.286943,0.913,0.869974,0.899727,0.882965
7,0.2874,0.271764,0.9125,0.876701,0.885065,0.880121
8,0.2775,0.258155,0.9165,0.895832,0.886111,0.890709
9,0.2685,0.26074,0.9185,0.888034,0.892492,0.88946
10,0.264,0.250059,0.915,0.887352,0.885295,0.886033


eval_loss: 0.2273
eval_Accuracy: 0.9180
eval_Recall: 0.8657
eval_Precision: 0.8792
eval_F1: 0.8717
eval_runtime: 3.3366
eval_samples_per_second: 599.4070
eval_steps_per_second: 9.5910
epoch: 10.0000
Training time: 595.34 sec
Trainable parameters: 0.08M
CUDA memory used: 2260.31MB

R =  4


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 152,070 || all params: 109,638,924 || trainable%: 0.1387


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0116,0.817671,0.7015,0.510448,0.560067,0.494984
2,0.631,0.533662,0.819,0.700518,0.806409,0.725835
3,0.4687,0.385553,0.8685,0.79817,0.849376,0.814545
4,0.3852,0.324968,0.892,0.839343,0.866545,0.847885
5,0.326,0.272506,0.9105,0.879915,0.884522,0.881816
6,0.3009,0.253953,0.916,0.883399,0.898054,0.8899
7,0.2705,0.239597,0.919,0.884791,0.896743,0.889786
8,0.253,0.236536,0.923,0.906596,0.893005,0.899394
9,0.2483,0.235098,0.9235,0.898434,0.897025,0.897375
10,0.2398,0.229153,0.9255,0.900072,0.900297,0.899784


eval_loss: 0.2281
eval_Accuracy: 0.9220
eval_Recall: 0.8755
eval_Precision: 0.8801
eval_F1: 0.8764
eval_runtime: 3.3300
eval_samples_per_second: 600.5920
eval_steps_per_second: 9.6090
epoch: 10.0000
Training time: 596.71 sec
Trainable parameters: 0.15M
CUDA memory used: 2254.64MB

R =  8


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 299,526 || all params: 109,786,380 || trainable%: 0.2728


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0183,0.837264,0.696,0.501918,0.59498,0.47923
2,0.6164,0.506274,0.8315,0.72628,0.819525,0.752434
3,0.4505,0.35762,0.876,0.807298,0.858432,0.824504
4,0.3646,0.310631,0.8945,0.837525,0.870705,0.848313
5,0.3085,0.260805,0.918,0.889466,0.893021,0.890998
6,0.2909,0.235847,0.9195,0.888774,0.900566,0.893979
7,0.2584,0.232797,0.922,0.885427,0.901691,0.892255
8,0.2448,0.232567,0.92,0.895609,0.890515,0.892878
9,0.2368,0.226265,0.924,0.892683,0.903225,0.897165
10,0.2337,0.221789,0.9275,0.903928,0.901996,0.902514


eval_loss: 0.2178
eval_Accuracy: 0.9215
eval_Recall: 0.8729
eval_Precision: 0.8871
eval_F1: 0.8780
eval_runtime: 3.3137
eval_samples_per_second: 603.5640
eval_steps_per_second: 9.6570
epoch: 10.0000
Training time: 599.53 sec
Trainable parameters: 0.30M
CUDA memory used: 2254.84MB

R =  16


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 594,438 || all params: 110,081,292 || trainable%: 0.5400


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,1.0056,0.801349,0.7065,0.508285,0.56778,0.491888
2,0.5963,0.480032,0.832,0.726947,0.812526,0.752939
3,0.4387,0.352922,0.8785,0.811976,0.855346,0.826306
4,0.3519,0.300128,0.9025,0.851319,0.886119,0.861156
5,0.2997,0.257592,0.913,0.881176,0.886431,0.883591
6,0.2763,0.231707,0.9225,0.883698,0.90753,0.893142
7,0.2503,0.223209,0.923,0.885602,0.902146,0.892371
8,0.2382,0.222584,0.925,0.898228,0.901006,0.899307
9,0.2284,0.214144,0.926,0.889906,0.90671,0.896719
10,0.2256,0.211572,0.927,0.895065,0.906138,0.899162


eval_loss: 0.2199
eval_Accuracy: 0.9220
eval_Recall: 0.8733
eval_Precision: 0.8934
eval_F1: 0.8793
eval_runtime: 3.3092
eval_samples_per_second: 604.3830
eval_steps_per_second: 9.6700
epoch: 10.0000
Training time: 600.56 sec
Trainable parameters: 0.59M
CUDA memory used: 2261.26MB

R =  32


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 1,184,262 || all params: 110,671,116 || trainable%: 1.0701


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,0.9928,0.792026,0.7155,0.519773,0.599064,0.509023
2,0.5893,0.479882,0.8395,0.742592,0.828113,0.768598
3,0.4343,0.349261,0.8815,0.819278,0.86133,0.833802
4,0.3499,0.302012,0.9035,0.852311,0.884547,0.863085
5,0.2975,0.255239,0.916,0.882375,0.892404,0.886842
6,0.2763,0.228059,0.9245,0.890157,0.903834,0.896372
7,0.252,0.226438,0.921,0.884114,0.899102,0.890812
8,0.236,0.218678,0.9265,0.90224,0.901802,0.901742
9,0.2234,0.214966,0.927,0.894088,0.902111,0.897574
10,0.2244,0.212875,0.9295,0.903179,0.90283,0.902605


eval_loss: 0.2169
eval_Accuracy: 0.9245
eval_Recall: 0.8780
eval_Precision: 0.8919
eval_F1: 0.8833
eval_runtime: 3.3303
eval_samples_per_second: 600.5430
eval_steps_per_second: 9.6090
epoch: 10.0000
Training time: 597.98 sec
Trainable parameters: 1.18M
CUDA memory used: 2265.72MB



In [None]:
data = [
    {
        "R": 2,
        "Trainable Params (M)": 0.08,
        "Trainable %": 0.0715,
        "Accuracy": 0.9180,
        "Recall": 0.8657,
        "Precision": 0.8792,
        "F1": 0.8717,
        "Training Time (s)": 595.34,
        "Eval Time (s)": 3.3366,
        "CUDA Memory (MB)": 2260.31,
    },
    {
        "R": 4,
        "Trainable Params (M)": 0.15,
        "Trainable %": 0.1387,
        "Accuracy": 0.9220,
        "Recall": 0.8755,
        "Precision": 0.8801,
        "F1": 0.8764,
        "Training Time (s)": 596.71,
        "Eval Time (s)": 3.3300,
        "CUDA Memory (MB)": 2254.64,
    },
    {
        "R": 8,
        "Trainable Params (M)": 0.30,
        "Trainable %": 0.2728,
        "Accuracy": 0.9215,
        "Recall": 0.8729,
        "Precision": 0.8871,
        "F1": 0.8780,
        "Training Time (s)": 599.53,
        "Eval Time (s)": 3.3137,
        "CUDA Memory (MB)": 2254.84,
    },
    {
        "R": 16,
        "Trainable Params (M)": 0.59,
        "Trainable %": 0.5400,
        "Accuracy": 0.9220,
        "Recall": 0.8733,
        "Precision": 0.8934,
        "F1": 0.8793,
        "Training Time (s)": 600.56,
        "Eval Time (s)": 3.3092,
        "CUDA Memory (MB)": 2261.26,
    },
    {
        "R": 32,
        "Trainable Params (M)": 1.18,
        "Trainable %": 1.0701,
        "Accuracy": 0.9245,
        "Recall": 0.8780,
        "Precision": 0.8919,
        "F1": 0.8833,
        "Training Time (s)": 597.98,
        "Eval Time (s)": 3.3303,
        "CUDA Memory (MB)": 2265.72,
    },
]

df_lora = pd.DataFrame(data)
df_lora = df_lora.style.highlight_max(subset=["Accuracy", "Recall", "Precision", "F1"], color="green")
df_lora

Unnamed: 0,R,Trainable Params (M),Trainable %,Accuracy,Recall,Precision,F1,Training Time (s),Eval Time (s),CUDA Memory (MB)
0,2,0.08,0.0715,0.918,0.8657,0.8792,0.8717,595.34,3.3366,2260.31
1,4,0.15,0.1387,0.922,0.8755,0.8801,0.8764,596.71,3.33,2254.64
2,8,0.3,0.2728,0.9215,0.8729,0.8871,0.878,599.53,3.3137,2254.84
3,16,0.59,0.54,0.922,0.8733,0.8934,0.8793,600.56,3.3092,2261.26
4,32,1.18,1.0701,0.9245,0.878,0.8919,0.8833,597.98,3.3303,2265.72


- качество постепенно растёт с увеличением r, но после r=16 — почти не меняется

- при r=32 достигается лучший F1 = 0.8833, но прирост незначителен по сравнению с r=16

- использование GPU памяти почти не растёт, что делает LoRA очень экономичным

- даже при r=32 обучается только 1.07% параметров модели (~1.18M из 110M)

- если важна эффективность, оптимальным можно считать r=8 или r=16

- r=2 — самый лёгкий, но теряет 1.1% F1 по сравнению с r=32

- в итоге я бы все равно выбрала как итоговый вариант r=32, так как его метрики лучше, хоть и незначительно

# 6. Итоговое сравнение

In [None]:
data = {
    "method": ["no finetuning", "finetuning", "linear probing", "prefix tuning", "prompt tuning", "LoRA_r32"],
    "accuracy": [0.5775, 0.9235, 0.4645, 0.3820, 0.4750, 0.9245],
    "recall": [0.3039, 0.8891, 0.2388, 0.1901, 0.2989, 0.8780],
    "precision": [0.2751, 0.8951, 0.1532, 0.1284, 0.2231, 0.8919],
    "f1": [0.2425, 0.8823, 0.1858, 0.1400, 0.2508, 0.8833],
    "trainable_params_M": [109.49, 109.49, 0.00, 0.37, 0.02, 1.18],
    "cuda_mem_MB": [1334.25, 1334.25, 1774.06, 2256.85, 1804.14, 2265.72],
    "training_time_sec": [None, 1127.64, 315.08, 581.81, 780.66, 597.98],
}

result_df = pd.DataFrame(data)
result_df = (
    result_df.style.format(precision=4)
    .highlight_max(subset=["accuracy", "recall", "precision", "f1"], color="green")
    .highlight_min(subset=["trainable_params_M", "cuda_mem_MB", "training_time_sec"], color="green")
)
result_df

Unnamed: 0,method,accuracy,recall,precision,f1,trainable_params_M,cuda_mem_MB,training_time_sec
0,no finetuning,0.5775,0.3039,0.2751,0.2425,109.49,1334.25,
1,finetuning,0.9235,0.8891,0.8951,0.8823,109.49,1334.25,1127.64
2,linear probing,0.4645,0.2388,0.1532,0.1858,0.0,1774.06,315.08
3,prefix tuning,0.382,0.1901,0.1284,0.14,0.37,2256.85,581.81
4,prompt tuning,0.475,0.2989,0.2231,0.2508,0.02,1804.14,780.66
5,LoRA_r32,0.9245,0.878,0.8919,0.8833,1.18,2265.72,597.98


### коротко про результаты:
- лучшие метрики показал LoRA_r32 — чуть выше по всем показателям, чем полное fine-tuning, особенно по F1 (0.8833 против 0.8823). при этом обучается он почти в 2 раза быстрее (598 сек против 1128) и требует в 100 раз меньше параметров (1.18М vs 109.49М).

- fine-tuning тоже мощный, но дорогой и тяжёлый. много параметров, долго обучается.

- prompt tuning оказался лучше, чем linear и prefix tuning, но всё равно F1 у него всего 0.25 — слабо.

- prefix tuning и linear probing — худшие по всем метрикам. особенно precision: 0.128 и 0.153 соответственно.

- no finetuning удивительно дал F1 выше, чем prefix tuning и linear probing — то есть лучше вообще ничего не трогать, чем использовать эти два. либо досконально настраивать параметры, так как я использовала одинаковые для всех.

- в целом, LoRA показала себя лучше всех — почти как фулл fine-tuning, но сильно легче в ресурсах.