#### Llama2 4bit 양자화 파인튜닝
1. ##### 4비트 양자화 QLoRA 파인튜닝 (효율성) * 파라미터를 고정 시키고 추가데이터만 튜닝

2. python module 설치
   * %pip install torch fairscale fire sentencepiece transformers protobuf accelerate peft bitsandbytes trl\
   * %pip install accelerate==0.26.1 peft==0.8.2 bitsandbytes==0.42.0 transformers==4.37.2 trl==0.7.10 \

In [None]:
%pip install torch fairscale fire sentencepiece transformers protobuf accelerate peft bitsandbytes trl

In [1]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

import json
import jsonlines

In [2]:
from datasets import Dataset
from datasets import load_from_disk

#### 학습데이터셋 불러오기

In [3]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

##### 라바 허깅페이스 모델 불러오기

In [4]:
model_id = "/data/bwllm/models/casllm-base-7b-hf"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

##### 토큰라이저 불러오기

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


##### PEFT 파라미터 설정
> Parameter-Efficient Fine-Tuning (PEFT)은 모델 파라미터의 작은 하위 집합만 업데이트

In [6]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

##### 훈련 파라메터 설정

In [None]:
output_peft_model = "/data/bwllm/models/audit_sllm7b_peft"
training_params = TrainingArguments(
    output_dir=output_peft_model,
    num_train_epochs=5,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

##### 훈련설정

In [None]:
train_dataset_folder = "/data/bwllm/dataset/alpaca_en_dataset.jsonl"
dataset = load_from_disk(train_dataset_folder)

# 데이터 확인
dataset[:5]

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

In [None]:
trainer.train()