# Final Submission Notebook

> Authors: Yukang Luo, Zhilin Zhang, Yumeng Qian\
NetID: yl13427, zz10068, yq2480\
Team Name: LoRA Is All You Need

In [1]:
# Imports and Setup
import os
import sys
from argparse import Namespace
import torch

import train
import utils
import config

print(f"Torch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Device name: {torch.cuda.get_device_name(0)}")
print(f"Using device: {config.DEVICE}")

Torch version: 2.5.1+cu121
CUDA available: True
CUDA version: 12.1
Device name: NVIDIA H200
Using device: cuda


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


## 1. Comparison of different adaptors

We use the following setting to compare four different adaptors, and the summary results (best models on AG News Evaluation Set) is as follows:

|    **Methods**    |    **Param Size**    |    **Target Modules**    |  **r**  |  **alpha**  |  **lr**  | **Train Epoch** | **Loss** | **Accuracy** |
|-------------------|----------------------|--------------------------|---------|-------------|----------|---------|------------------|--------------|
|       LoRA        |        888,580       |       query, value       |    8    |      16     |   2e-4   |    3    |    **0.2184**    |   **92.19**  |
|      AdaLoRA      |        925,660       |    query, value, dense   |   6->4  |      2      |   2e-4   |    3    |      0.2669      |     90.78    |
|       LoHa        |        888,580       |       query, value       |    4    |      8      |   2e-4   |    3    |      0.2536      |     91.09    |
|       LoKr        |        632,836       |     query, key, value    |    8    |      24     |   2e-4   |    3    |      0.2536      |     91.41    |

### 1.1. LoRA

In [2]:
from argparse import Namespace
import os

args_lora = Namespace(
    output_dir="results_lora_qv_r8_a16_lr2e-4",
    seed=42,
    peft_method="lora",
    target_modules=["query", "value"],
    lora_r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    learning_rate=2e-4,
    num_train_epochs=3,
    train_batch_size=128,
    eval_batch_size=128,
    optimizer="adamw_torch"
)

os.makedirs(args_lora.output_dir, exist_ok=True)

print("Parameters:")
for k, v in vars(args_lora).items():
    print(f"  {k}: {v}")

print("=== START TRAINING ===")
final_accuracy = train.main_train(args_lora)
print("=== TRAINING FINISHED ===")


Parameters:
  output_dir: results_lora_qv_r8_a16_lr2e-4
  seed: 42
  peft_method: lora
  target_modules: ['query', 'value']
  lora_r: 8
  lora_alpha: 16
  lora_dropout: 0.1
  learning_rate: 0.0002
  num_train_epochs: 3
  train_batch_size: 128
  eval_batch_size: 128
  optimizer: adamw_torch
=== START TRAINING ===
Starting training process with PEFT method: lora
Arguments:
{'output_dir': 'results_lora_qv_r8_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using device: cuda
Set seed to 42
Loading tokenizer for model: roberta-base


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Loading dataset: ag_news, split: train


README.md:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

Cleaning text data:   0%|          | 0/120000 [00:00<?, ? examples/s]

Text cleaning completed for text column.
Number of labels: 4
Label names: ['World', 'Sports', 'Business', 'Sci/Tech']


Map:   0%|          | 0/120000 [00:00<?, ? examples/s]

Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=16, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4268,0.276669,0.9
466,0.2711,0.267466,0.90625
699,0.2492,0.240532,0.910937
932,0.2493,0.231006,0.915625
1165,0.2333,0.249506,0.914062
1398,0.2256,0.231883,0.910937
1631,0.2264,0.220692,0.920312
1864,0.2237,0.22906,0.914062
2097,0.2091,0.223503,0.920312
2330,0.2136,0.218859,0.91875


Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9219

Callback: Saving final model checkpoint (end of training) to results_lora_qv_r8_a16_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_lora_qv_r8_a16_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =      0.246
  train_runtime            = 0:08:39.93
  train_samples_per_second =    688.707
  train_steps_per_second   =      5.383

Saving BEST model checkpoint identified by Trainer to results_lora_qv_r8_a16_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Starting Evaluating on 640 samples...


Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.32it/s]


Evaluation Metric (accuracy): {'accuracy': 0.921875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.921875}
***** eval_final_best metrics *****
  accuracy = 0.9219

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_lora_qv_r8_a16_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.19%
Best model checkpoint saved to: results_lora_qv_r8_a16_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_lora_qv_r8_a16_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_lora_qv_r8_a16_lr2e-4/training_curves.png

=== TRAINING FINISHED ===


### 1.2. LoHa

In [3]:
args_loha = Namespace(
    output_dir="results_loha_qv_r4_a8_lr2e-4",
    seed=42,
    peft_method="loha",
    target_modules=["query", "value"],
    lora_r=4,
    lora_alpha=8,
    rank_dropout=0.1,
    module_dropout=0.1,
    learning_rate=2e-4,
    num_train_epochs=3,
    train_batch_size=128,
    eval_batch_size=128,
    optimizer="adamw_torch"
)

os.makedirs(args_loha.output_dir, exist_ok=True)

print("Parameters:")
for k, v in vars(args_loha).items():
    print(f"  {k}: {v}")

print("=== START TRAINING ===")
final_accuracy = train.main_train(args_loha)
print("=== TRAINING FINISHED ===")


Parameters:
  output_dir: results_loha_qv_r4_a8_lr2e-4
  seed: 42
  peft_method: loha
  target_modules: ['query', 'value']
  lora_r: 4
  lora_alpha: 8
  rank_dropout: 0.1
  module_dropout: 0.1
  learning_rate: 0.0002
  num_train_epochs: 3
  train_batch_size: 128
  eval_batch_size: 128
  optimizer: adamw_torch
=== START TRAINING ===
Starting training process with PEFT method: loha
Arguments:
{'output_dir': 'results_loha_qv_r4_a8_lr2e-4', 'seed': 42, 'peft_method': 'loha', 'target_modules': ['query', 'value'], 'lora_r': 4, 'lora_alpha': 8, 'rank_dropout': 0.1, 'module_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using device: cuda
Set seed to 42
Loading tokenizer for model: roberta-base
Loading dataset: ag_news, split: train
Text cleaning completed for text column.
Number of labels: 4
Label names: ['World', 'Sports', 'Business', 'Sci/Tech']


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: loha
  Configuring LoHa with: r=4, alpha=8, rank_dropout=0.1, module_dropout=0.1
PEFT model created with LOHA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using loha...


Step,Training Loss,Validation Loss,Accuracy
233,0.7519,0.315542,0.895312
466,0.3187,0.287417,0.901563
699,0.2951,0.274312,0.904687
932,0.297,0.265927,0.904687
1165,0.286,0.270026,0.90625
1398,0.274,0.265025,0.90625
1631,0.2723,0.26596,0.903125
1864,0.2739,0.259061,0.907813
2097,0.2625,0.256121,0.909375
2330,0.266,0.256532,0.904687


Evaluation Accuracy: 0.8953
Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9031
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9109

Callback: Saving final model checkpoint (end of training) to results_loha_qv_r4_a8_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_loha_qv_r4_a8_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =     0.3196
  train_runtime            = 0:09:03.97
  train_samples_per_second =    658.268
  train_steps_per_second   =      5.145

Saving BEST model checkpoint identified by Trainer to results_loha_qv_r4_a8_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sample

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  8.77it/s]


Evaluation Metric (accuracy): {'accuracy': 0.9109375}
Final Evaluation Metrics (Best Model): {'accuracy': 0.9109375}
***** eval_final_best metrics *****
  accuracy = 0.9109

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_loha_qv_r4_a8_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 91.09%
Best model checkpoint saved to: results_loha_qv_r4_a8_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_loha_qv_r4_a8_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_loha_qv_r4_a8_lr2e-4/training_curves.png

=== TRAINING FINISHED ===


### 1.3. LoKr

In [4]:
args_lokr = Namespace(
    output_dir="results_lokr_qkv_r8_a24_lr2e-4",
    seed=42,
    peft_method="lokr",
    target_modules=["query", "key", "value"],
    lora_r=8,
    lora_alpha=24,
    rank_dropout=0.1,
    module_dropout=0.1,
    learning_rate=2e-4,
    num_train_epochs=3,
    train_batch_size=128,
    eval_batch_size=128,
    optimizer="adamw_torch"
)

os.makedirs(args_lokr.output_dir, exist_ok=True)

print("Parameters:")
for k, v in vars(args_lokr).items():
    print(f"  {k}: {v}")

print("=== START TRAINING ===")
final_accuracy = train.main_train(args_lokr)
print("=== TRAINING FINISHED ===")


Parameters:
  output_dir: results_lokr_qkv_r8_a24_lr2e-4
  seed: 42
  peft_method: lokr
  target_modules: ['query', 'key', 'value']
  lora_r: 8
  lora_alpha: 24
  rank_dropout: 0.1
  module_dropout: 0.1
  learning_rate: 0.0002
  num_train_epochs: 3
  train_batch_size: 128
  eval_batch_size: 128
  optimizer: adamw_torch
=== START TRAINING ===
Starting training process with PEFT method: lokr
Arguments:
{'output_dir': 'results_lokr_qkv_r8_a24_lr2e-4', 'seed': 42, 'peft_method': 'lokr', 'target_modules': ['query', 'key', 'value'], 'lora_r': 8, 'lora_alpha': 24, 'rank_dropout': 0.1, 'module_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using device: cuda
Set seed to 42
Loading tokenizer for model: roberta-base
Loading dataset: ag_news, split: train


Cleaning text data:   0%|          | 0/120000 [00:00<?, ? examples/s]

Text cleaning completed for text column.
Number of labels: 4
Label names: ['World', 'Sports', 'Business', 'Sci/Tech']


Map:   0%|          | 0/120000 [00:00<?, ? examples/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lokr
  Configuring LoKr with: r=8, alpha=24, rank_dropout=0.1, module_dropout=0.1
PEFT model created with LOKR config.
  Target modules: ['query', 'key', 'value']
Trainable params: 632836 || All params: 125281544 || Trainable %: 0.51

Trainable parameters (632836) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lokr...


Step,Training Loss,Validation Loss,Accuracy
233,0.6094,0.320631,0.892188
466,0.306,0.293053,0.898438
699,0.2852,0.285068,0.898438
932,0.2884,0.269037,0.901563
1165,0.2723,0.276444,0.904687
1398,0.2616,0.264061,0.907813
1631,0.2637,0.260772,0.90625
1864,0.2627,0.261415,0.90625
2097,0.2504,0.255838,0.907813
2330,0.2545,0.257528,0.910937


Evaluation Accuracy: 0.8922
Evaluation Accuracy: 0.8984
Evaluation Accuracy: 0.8984
Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109

Callback: Saving final model checkpoint (end of training) to results_lokr_qkv_r8_a24_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_lokr_qkv_r8_a24_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88394276GF
  train_loss               =     0.2971
  train_runtime            = 0:09:53.02
  train_samples_per_second =    603.815
  train_steps_per_second   =       4.72

Saving BEST model checkpoint identified by Trainer to results_lokr_qkv_r8_a24_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Starting Evaluating on 640 samples...


Evaluating: 100%|██████████| 5/5 [00:00<00:00,  8.16it/s]


Evaluation Metric (accuracy): {'accuracy': 0.9140625}
Final Evaluation Metrics (Best Model): {'accuracy': 0.9140625}
***** eval_final_best metrics *****
  accuracy = 0.9141

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_lokr_qkv_r8_a24_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 91.41%
Best model checkpoint saved to: results_lokr_qkv_r8_a24_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_lokr_qkv_r8_a24_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_lokr_qkv_r8_a24_lr2e-4/training_curves.png

=== TRAINING FINISHED ===


### 1.4. AdaLoRA

In [5]:
args_adalora = Namespace(
    output_dir="results_adalora_qvd_r4-6_a2_lr2e-4",
    seed=42,
    peft_method="adalora",
    target_modules=["query", "value", "attention.output.dense"],
    lora_r=4,
    lora_alpha=2,
    lora_dropout=0.1,
    adalora_init_r=6,
    adalora_tinit=0,
    adalora_tfinal=0,
    adalora_deltaT=1,
    adalora_beta1=0.85,
    adalora_beta2=0.85,
    learning_rate=2e-4,
    num_train_epochs=3,
    train_batch_size=128,
    eval_batch_size=128,
    optimizer="adamw_torch"
)

os.makedirs(args_adalora.output_dir, exist_ok=True)

print("Parameters:")
for k, v in vars(args_adalora).items():
    print(f"  {k}: {v}")

# launch training
print("=== START TRAINING ===")
final_accuracy = train.main_train(args_adalora)
print("=== TRAINING FINISHED ===")

Parameters:
  output_dir: results_adalora_qvd_r4-6_a2_lr2e-4
  seed: 42
  peft_method: adalora
  target_modules: ['query', 'value', 'attention.output.dense']
  lora_r: 4
  lora_alpha: 2
  lora_dropout: 0.1
  adalora_init_r: 6
  adalora_tinit: 0
  adalora_tfinal: 0
  adalora_deltaT: 1
  adalora_beta1: 0.85
  adalora_beta2: 0.85
  learning_rate: 0.0002
  num_train_epochs: 3
  train_batch_size: 128
  eval_batch_size: 128
  optimizer: adamw_torch
=== START TRAINING ===
Starting training process with PEFT method: adalora
Arguments:
{'output_dir': 'results_adalora_qvd_r4-6_a2_lr2e-4', 'seed': 42, 'peft_method': 'adalora', 'target_modules': ['query', 'value', 'attention.output.dense'], 'lora_r': 4, 'lora_alpha': 2, 'lora_dropout': 0.1, 'adalora_init_r': 6, 'adalora_tinit': 0, 'adalora_tfinal': 0, 'adalora_deltaT': 1, 'adalora_beta1': 0.85, 'adalora_beta2': 0.85, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using d

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: adalora
  Configuring AdaLoRA with: target_r=4, init_r=6, dropout=0.1, tinit=0, tfinal=0, deltaT=1, beta1=0.85, beta2=0.85
PEFT model created with ADALORA config.
  Target modules: ['query', 'value', 'attention.output.dense']
Trainable params: 925660 || All params: 125574404 || Trainable %: 0.74

Trainable parameters (925660) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using adalora...


Step,Training Loss,Validation Loss,Accuracy
233,1.1501,0.447555,0.8625
466,0.3692,0.299145,0.892188
699,0.3174,0.280307,0.901563
932,0.3173,0.27347,0.898438
1165,0.3041,0.271067,0.904687
1398,0.286,0.266874,0.907813
1631,0.287,0.266712,0.901563
1864,0.2859,0.261858,0.9
2097,0.2718,0.259384,0.9
2330,0.2761,0.260991,0.907813


Evaluation Accuracy: 0.8625
Evaluation Accuracy: 0.8922
Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.8984
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9031

Callback: Saving final model checkpoint (end of training) to results_adalora_qvd_r4-6_a2_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_adalora_qvd_r4-6_a2_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88694304GF
  train_loss               =     0.3687
  train_runtime            = 0:11:40.33
  train_samples_per_second =    511.295
  train_steps_per_second   =      3.997

Saving BEST model checkpoint identified by Trainer to results_adalora_qvd_r4-6_a2_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evalua

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  7.79it/s]


Evaluation Metric (accuracy): {'accuracy': 0.9078125}
Final Evaluation Metrics (Best Model): {'accuracy': 0.9078125}
***** eval_final_best metrics *****
  accuracy = 0.9078

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_adalora_qvd_r4-6_a2_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 90.78%
Best model checkpoint saved to: results_adalora_qvd_r4-6_a2_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_adalora_qvd_r4-6_a2_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_adalora_qvd_r4-6_a2_lr2e-4/training_curves.png

=== TRAINING FINISHED ===


## 2. Ablation Study of $r$ and $\alpha$

### 2.1. Fix $r=8$, $\alpha \in [4, 8, 16, 32]$

In [6]:
for alpha in [4, 8, 16, 32]:
    args = Namespace(
        output_dir=f"ablation_lora_qv_r8_a{alpha}_lr2e-4",
        seed=42,
        peft_method="lora",
        target_modules=["query", "value"],
        lora_r=8,
        lora_alpha=alpha,
        lora_dropout=0.1,
        learning_rate=2e-4,
        num_train_epochs=3,
        train_batch_size=128,
        eval_batch_size=128,
        optimizer="adamw_torch"
    )
    os.makedirs(args.output_dir, exist_ok=True)
    print("Parameters:", vars(args))
    print(f"=== START TRAINING r=8 alpha={alpha} ===")
    final_accuracy = train.main_train(args)
    print(f"=== TRAINING FINISHED r=8 alpha={alpha} ===")

Parameters: {'output_dir': 'ablation_lora_qv_r8_a4_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 4, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
=== START TRAINING r=8 alpha=4 ===
Starting training process with PEFT method: lora
Arguments:
{'output_dir': 'ablation_lora_qv_r8_a4_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 4, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using device: cuda
Set seed to 42
Loading tokenizer for model: roberta-base
Loading dataset: ag_news, split: train
Text cleaning completed for text column.
Number of labels: 4
Label names: ['World', 'Sports', 'Business', 'Sci/Tech']


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=4, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.492,0.289497,0.9
466,0.2822,0.274486,0.895312
699,0.2602,0.251587,0.910937
932,0.2609,0.238627,0.917188
1165,0.2477,0.252253,0.9125
1398,0.2365,0.235606,0.9125
1631,0.2397,0.230315,0.91875
1864,0.2384,0.23698,0.91875
2097,0.2261,0.231416,0.921875
2330,0.2292,0.226843,0.915625


Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.8953
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9172
Evaluation Accuracy: 0.9125
Evaluation Accuracy: 0.9125
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9172
Evaluation Accuracy: 0.9172

Callback: Saving final model checkpoint (end of training) to ablation_lora_qv_r8_a4_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to ablation_lora_qv_r8_a4_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =     0.2644
  train_runtime            = 0:08:39.78
  train_samples_per_second =    688.901
  train_steps_per_second   =      5.385

Saving BEST model checkpoint identified by Trainer to ablation_lora_qv_r8_a4_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sam

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.17it/s]


Evaluation Metric (accuracy): {'accuracy': 0.921875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.921875}
***** eval_final_best metrics *****
  accuracy = 0.9219

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to ablation_lora_qv_r8_a4_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.19%
Best model checkpoint saved to: ablation_lora_qv_r8_a4_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: ablation_lora_qv_r8_a4_lr2e-4/metrics_log.jsonl
Training curves plot saved to: ablation_lora_qv_r8_a4_lr2e-4/training_curves.png

=== TRAINING FINISHED r=8 alpha=4 ===
Parameters: {'output_dir': 'ablation_lora_qv_r8_a8_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 8, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=8, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4558,0.286039,0.896875
466,0.2757,0.270226,0.903125
699,0.2534,0.246663,0.909375
932,0.2547,0.234369,0.915625
1165,0.2401,0.248195,0.914062
1398,0.2311,0.23093,0.910937
1631,0.2331,0.224824,0.921875
1864,0.2308,0.232998,0.915625
2097,0.2175,0.227985,0.917188
2330,0.2219,0.221494,0.915625


Evaluation Accuracy: 0.8969
Evaluation Accuracy: 0.9031
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9172
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9156

Callback: Saving final model checkpoint (end of training) to ablation_lora_qv_r8_a8_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to ablation_lora_qv_r8_a8_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =     0.2547
  train_runtime            = 0:08:40.32
  train_samples_per_second =    688.186
  train_steps_per_second   =      5.379

Saving BEST model checkpoint identified by Trainer to ablation_lora_qv_r8_a8_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sam

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.29it/s]


Evaluation Metric (accuracy): {'accuracy': 0.921875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.921875}
***** eval_final_best metrics *****
  accuracy = 0.9219

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to ablation_lora_qv_r8_a8_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.19%
Best model checkpoint saved to: ablation_lora_qv_r8_a8_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: ablation_lora_qv_r8_a8_lr2e-4/metrics_log.jsonl
Training curves plot saved to: ablation_lora_qv_r8_a8_lr2e-4/training_curves.png

=== TRAINING FINISHED r=8 alpha=8 ===
Parameters: {'output_dir': 'ablation_lora_qv_r8_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=16, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4268,0.276669,0.9
466,0.2711,0.267466,0.90625
699,0.2492,0.240532,0.910937
932,0.2493,0.231006,0.915625
1165,0.2333,0.249506,0.914062
1398,0.2256,0.231883,0.910937
1631,0.2264,0.220692,0.920312
1864,0.2237,0.22906,0.914062
2097,0.2091,0.223503,0.920312
2330,0.2136,0.218859,0.91875


Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9219

Callback: Saving final model checkpoint (end of training) to ablation_lora_qv_r8_a16_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to ablation_lora_qv_r8_a16_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =      0.246
  train_runtime            = 0:08:41.19
  train_samples_per_second =    687.035
  train_steps_per_second   =       5.37

Saving BEST model checkpoint identified by Trainer to ablation_lora_qv_r8_a16_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.25it/s]


Evaluation Metric (accuracy): {'accuracy': 0.921875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.921875}
***** eval_final_best metrics *****
  accuracy = 0.9219

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to ablation_lora_qv_r8_a16_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.19%
Best model checkpoint saved to: ablation_lora_qv_r8_a16_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: ablation_lora_qv_r8_a16_lr2e-4/metrics_log.jsonl
Training curves plot saved to: ablation_lora_qv_r8_a16_lr2e-4/training_curves.png

=== TRAINING FINISHED r=8 alpha=16 ===
Parameters: {'output_dir': 'ablation_lora_qv_r8_a32_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 32, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size'

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=32, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4026,0.265914,0.901563
466,0.2668,0.26687,0.907813
699,0.2451,0.238232,0.909375
932,0.2447,0.230542,0.910937
1165,0.2257,0.247841,0.910937
1398,0.2186,0.230885,0.909375
1631,0.2187,0.218924,0.91875
1864,0.2151,0.223378,0.9125
2097,0.1978,0.217808,0.925
2330,0.2032,0.216078,0.91875


Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9078
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9125
Evaluation Accuracy: 0.9250
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9234
Evaluation Accuracy: 0.9219

Callback: Saving final model checkpoint (end of training) to ablation_lora_qv_r8_a32_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to ablation_lora_qv_r8_a32_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =     0.2366
  train_runtime            = 0:08:40.24
  train_samples_per_second =    688.297
  train_steps_per_second   =       5.38

Saving BEST model checkpoint identified by Trainer to ablation_lora_qv_r8_a32_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.33it/s]


Evaluation Metric (accuracy): {'accuracy': 0.925}
Final Evaluation Metrics (Best Model): {'accuracy': 0.925}
***** eval_final_best metrics *****
  accuracy = 0.925

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to ablation_lora_qv_r8_a32_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.50%
Best model checkpoint saved to: ablation_lora_qv_r8_a32_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: ablation_lora_qv_r8_a32_lr2e-4/metrics_log.jsonl
Training curves plot saved to: ablation_lora_qv_r8_a32_lr2e-4/training_curves.png

=== TRAINING FINISHED r=8 alpha=32 ===


### 2.2. Fix $\alpha=16$, $r \in [2, 4, 8]$

In [7]:
for r in [2, 4, 8]:
    args = Namespace(
        output_dir=f"results_lora_qv_r{r}_a16_lr2e-4",
        seed=42,
        peft_method="lora",
        target_modules=["query", "value"],
        lora_r=r,
        lora_alpha=16,
        lora_dropout=0.1,
        learning_rate=2e-4,
        num_train_epochs=3,
        train_batch_size=128,
        eval_batch_size=128,
        optimizer="adamw_torch"
    )
    os.makedirs(args.output_dir, exist_ok=True)
    print("Parameters:", vars(args))
    print(f"=== START TRAINING r={r} alpha=16 ===")
    final_accuracy = train.main_train(args)
    print(f"=== TRAINING FINISHED r={r} alpha=16 ===")

Parameters: {'output_dir': 'results_lora_qv_r2_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 2, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
=== START TRAINING r=2 alpha=16 ===
Starting training process with PEFT method: lora
Arguments:
{'output_dir': 'results_lora_qv_r2_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 2, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 'eval_batch_size': 128, 'optimizer': 'adamw_torch'}
Using device: cuda
Set seed to 42
Loading tokenizer for model: roberta-base
Loading dataset: ag_news, split: train
Text cleaning completed for text column.
Number of labels: 4
Label names: ['World', 'Sports', 'Business', 'Sci/Tech']


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=2, alpha=16, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 667396 || All params: 125316104 || Trainable %: 0.53

Trainable parameters (667396) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4263,0.277631,0.901563
466,0.2742,0.271691,0.901563
699,0.251,0.234512,0.910937
932,0.2514,0.227359,0.91875
1165,0.2359,0.245221,0.917188
1398,0.2271,0.226983,0.9125
1631,0.2275,0.216865,0.921875
1864,0.2245,0.224777,0.917188
2097,0.2103,0.222672,0.920312
2330,0.2142,0.215895,0.923438


Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9016
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9172
Evaluation Accuracy: 0.9125
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9172
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9234
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9203

Callback: Saving final model checkpoint (end of training) to results_lora_qv_r2_a16_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_lora_qv_r2_a16_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88429682GF
  train_loss               =     0.2473
  train_runtime            = 0:08:45.85
  train_samples_per_second =    680.944
  train_steps_per_second   =      5.323

Saving BEST model checkpoint identified by Trainer to results_lora_qv_r2_a16_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sam

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.13it/s]


Evaluation Metric (accuracy): {'accuracy': 0.9234375}
Final Evaluation Metrics (Best Model): {'accuracy': 0.9234375}
***** eval_final_best metrics *****
  accuracy = 0.9234

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_lora_qv_r2_a16_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.34%
Best model checkpoint saved to: results_lora_qv_r2_a16_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_lora_qv_r2_a16_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_lora_qv_r2_a16_lr2e-4/training_curves.png

=== TRAINING FINISHED r=2 alpha=16 ===
Parameters: {'output_dir': 'results_lora_qv_r4_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 4, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 1

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Creating PEFT model using method: lora
  Configuring LoRA with: r=4, alpha=16, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 741124 || All params: 125389832 || Trainable %: 0.59

Trainable parameters (741124) are within the limit of 1000000.
Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4213,0.276722,0.9
466,0.2722,0.271163,0.904687
699,0.2504,0.240185,0.90625
932,0.2508,0.232284,0.914062
1165,0.2354,0.249775,0.914062
1398,0.2265,0.232588,0.910937
1631,0.2273,0.221427,0.915625
1864,0.2247,0.229906,0.909375
2097,0.2098,0.225032,0.91875
2330,0.2156,0.219554,0.9125


Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9047
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9125
Evaluation Accuracy: 0.9094
Evaluation Accuracy: 0.9125

Callback: Saving final model checkpoint (end of training) to results_lora_qv_r4_a16_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_lora_qv_r4_a16_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88505215GF
  train_loss               =     0.2467
  train_runtime            = 0:08:46.41
  train_samples_per_second =    680.218
  train_steps_per_second   =      5.317

Saving BEST model checkpoint identified by Trainer to results_lora_qv_r4_a16_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sam

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.05it/s]


Evaluation Metric (accuracy): {'accuracy': 0.91875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.91875}
***** eval_final_best metrics *****
  accuracy = 0.9187

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_lora_qv_r4_a16_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 91.88%
Best model checkpoint saved to: results_lora_qv_r4_a16_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_lora_qv_r4_a16_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_lora_qv_r4_a16_lr2e-4/training_curves.png

=== TRAINING FINISHED r=4 alpha=16 ===
Parameters: {'output_dir': 'results_lora_qv_r8_a16_lr2e-4', 'seed': 42, 'peft_method': 'lora', 'target_modules': ['query', 'value'], 'lora_r': 8, 'lora_alpha': 16, 'lora_dropout': 0.1, 'learning_rate': 0.0002, 'num_train_epochs': 3, 'train_batch_size': 128, 

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Train dataset size: 119360
Eval dataset size: 640
Calculated total training steps: 2799
Loading base model: roberta-base for 4 labels.
Creating PEFT model using method: lora
  Configuring LoRA with: r=8, alpha=16, dropout=0.1
PEFT model created with LORA config.
  Target modules: ['query', 'value']
Trainable params: 888580 || All params: 125537288 || Trainable %: 0.71

Trainable parameters (888580) are within the limit of 1000000.


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting PEFT model training using lora...


Step,Training Loss,Validation Loss,Accuracy
233,0.4268,0.276669,0.9
466,0.2711,0.267466,0.90625
699,0.2492,0.240532,0.910937
932,0.2493,0.231006,0.915625
1165,0.2333,0.249506,0.914062
1398,0.2256,0.231883,0.910937
1631,0.2264,0.220692,0.920312
1864,0.2237,0.22906,0.914062
2097,0.2091,0.223503,0.920312
2330,0.2136,0.218859,0.91875


Evaluation Accuracy: 0.9000
Evaluation Accuracy: 0.9062
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9156
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9109
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9141
Evaluation Accuracy: 0.9203
Evaluation Accuracy: 0.9187
Evaluation Accuracy: 0.9219
Evaluation Accuracy: 0.9219

Callback: Saving final model checkpoint (end of training) to results_lora_qv_r8_a16_lr2e-4/last_checkpoint
Callback: Final model checkpoint saved successfully to results_lora_qv_r8_a16_lr2e-4/last_checkpoint
***** train metrics *****
  epoch                    =        3.0
  total_flos               = 88656280GF
  train_loss               =      0.246
  train_runtime            = 0:08:40.86
  train_samples_per_second =    687.467
  train_steps_per_second   =      5.374

Saving BEST model checkpoint identified by Trainer to results_lora_qv_r8_a16_lr2e-4/best_checkpoint

Evaluating the final best model on the evaluation set...
Starting Evaluating on 640 sam

Evaluating: 100%|██████████| 5/5 [00:00<00:00,  9.33it/s]


Evaluation Metric (accuracy): {'accuracy': 0.921875}
Final Evaluation Metrics (Best Model): {'accuracy': 0.921875}
***** eval_final_best metrics *****
  accuracy = 0.9219

Generating curves plot from callback logs...
Plotting metrics from 13 points in metrics_log.jsonl...
Dual-axis plot saved to results_lora_qv_r8_a16_lr2e-4/training_curves_dual_axis.png
Plot generation complete.

TRAINING COMPLETED
Best model validation accuracy (evaluated at end): 92.19%
Best model checkpoint saved to: results_lora_qv_r8_a16_lr2e-4/best_checkpoint
Fractional epoch metrics logged to: results_lora_qv_r8_a16_lr2e-4/metrics_log.jsonl
Training curves plot saved to: results_lora_qv_r8_a16_lr2e-4/training_curves.png

=== TRAINING FINISHED r=8 alpha=16 ===
