### LoRA
학습 시에는 일부 파라미터만 튜닝하지만, 해당 파라미터를 원래 모델에 얹는 형식이기 때문에 추론 시에는 원래 사용되는 모델의 사용량만큼의 리소스를 사용하게 됨

**`r`**  
- 원래의 모델을 작은 rank로 분해할 때 사용하는 매개변수로, 행렬에서 독립인 행 또는 열의 수를 의미  
- 커지면 학습 시간과 학습 파라미터 수가 증가, 작아지면 연산량이 줄어들지만 성능이 낮아질 수 있음  
- 하지만 r 값이 커진다고 해서 항상 성능 향상을 보장하는 것은 아님  

**`lora_alpha`**  
- rank와 함께 사용되는 변수  
- 학습한 low rank의 파라미터를 가중치 몇을 곱해서 모델에 반영할 것인지 즉, fine-tuning한 파라미터들의 영향력을 얼마나 키울 것인지  

In [1]:
from peft import LoraConfig, TaskType

peft_config = LoraConfig(task_type=TaskType.SEQ_CLS,
                         inference_mode=False,
                         r=8,
                         lora_alpha=16,
                         lora_dropout=0.1)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("klue/roberta-large", num_labels=1)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at klue/roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
from peft import get_peft_model

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 1,837,057 || all params: 338,494,466 || trainable%: 0.5427


In [4]:
from dataset import preprocess

train_data = preprocess(task="train", data_path='../../train.csv', model_name='klue/roberta-large')
valid_data = preprocess(task="train", data_path='../../dev.csv', model_name='klue/roberta-large')

tokenization: 100%|██████████| 9324/9324 [00:03<00:00, 2713.26it/s]
tokenization: 100%|██████████| 550/550 [00:00<00:00, 2634.20it/s]


In [5]:
import numpy as np
from scipy.stats import pearsonr
from transformers import TrainingArguments, Trainer

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    
    predictions = logits.squeeze()
    labels = np.array(labels)

    pearson_corr, _ = pearsonr(predictions, labels)

    return {"pearson_corr": pearson_corr}

training_arguments = TrainingArguments(
    output_dir='./peft/',
    overwrite_output_dir=True,
    num_train_epochs=10,
    learning_rate=1e-3,
    weight_decay=0.01,
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    lr_scheduler_type='cosine', # 'linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup', 'inverse_sqrt'
)

trainer = Trainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=valid_data,
    compute_metrics=compute_metrics,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [6]:
trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33moceann0315[0m ([33moceann010315[0m). Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss


In [None]:
from peft import AutoPeftModelForSequenceClassification
import os
import json
from torch.utils.data import DataLoader
import torch
from pandas import read_csv

model_path = './peft/checkpoint-10000'

with open(os.path.join(model_path, 'adapter_config.json')) as f:
        model_config = json.load(f)
    
model = AutoPeftModelForSequenceClassification.from_pretrained(model_path, num_labels=1)

test_data = preprocess(task="test", data_path='../../../data/test.csv', model_name=model_config['base_model_name_or_path'])
test_loader = DataLoader(test_data, shuffle=False)

device = 'cuda' if torch.cuda.is_available else 'cpu'
model.to(device)

outputs = []
for batch in test_loader:
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        predictions = model(**batch)
        outputs.append(predictions.logits)

outputs = list(round(float(i), 1) for i in torch.cat(outputs))

sample_csv = read_csv('../../sample_submission.csv')
sample_csv['target'] = outputs

sample_csv.to_csv(os.path.join(model_path, 'output.csv'), index=False)