<h1>12장 생성 모델 미세 튜닝하기</h1>
<i>생성 LLM을 미세 튜닝하기 위한 두 단계 접근 방식에 대한 탐험</i>

<a href="https://github.com/rickiepark/handson-llm"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rickiepark/handson-llm/blob/main/chapter12.ipynb)

---

이 노트북은 <[핸즈온 LLM](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961)> 책 11장의 코드를 담고 있습니다.

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [선택사항] - <img src="https://colab.google/static/images/icons/colab.png" width=100>에서 패키지 선택하기


이 노트북을 구글 코랩에서 실행한다면 다음 코드 셀을 실행하여 이 노트북에서 필요한 패키지를  설치하세요.

---

💡 **NOTE**: 이 노트북의 코드를 실행하려면 GPU를 사용하는 것이 좋습니다. 구글 코랩에서는 **런타임 > 런타임 유형 변경 > 하드웨어 가속기 > T4 GPU**를 선택하세요.

---

In [1]:
%%capture
!pip install datasets bitsandbytes trl

## 지도 학습 미세 튜닝

### 데이터 전처리

In [2]:
from transformers import AutoTokenizer
from datasets import load_dataset


# 채팅 템플릿을 사용하기 위해 토크나이저를 로드합니다.
template_tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

def format_prompt(example):
    """TinyLlama의 <|user|> 템플릿으로 프롬프트를 포맷팅합니다"""

    # 채팅 템플릿 구성
    chat = example["messages"]
    prompt = template_tokenizer.apply_chat_template(chat, tokenize=False)

    return {"text": prompt}

# 데이터를 로드하고 TinyLlama 템플릿을 적용합니다.
dataset = (
    load_dataset("HuggingFaceH4/ultrachat_200k",  split="test_sft")
      .shuffle(seed=42)
      .select(range(3_000))
)
dataset = dataset.map(format_prompt).remove_columns(['messages'])

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.90k [00:00<?, ?B/s]

(…)-00000-of-00003-a3ecf92756993583.parquet:   0%|          | 0.00/244M [00:00<?, ?B/s]

(…)-00001-of-00003-0a1804bcb6ae68c6.parquet:   0%|          | 0.00/244M [00:00<?, ?B/s]

(…)-00002-of-00003-ee46ed25cfae92c6.parquet:   0%|          | 0.00/244M [00:00<?, ?B/s]

(…)-00000-of-00001-f7dfac4afe5b93f4.parquet:   0%|          | 0.00/81.2M [00:00<?, ?B/s]

(…)-00000-of-00003-a6c9fb894be3e50b.parquet:   0%|          | 0.00/244M [00:00<?, ?B/s]

(…)-00001-of-00003-d6a0402e417f35ca.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

(…)-00002-of-00003-c0db75b92a2f48fd.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

(…)-00000-of-00001-3d4cd8309148a71f.parquet:   0%|          | 0.00/80.4M [00:00<?, ?B/s]

Generating train_sft split:   0%|          | 0/207865 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/23110 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/256032 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/28304 [00:00<?, ? examples/s]

Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

In [3]:
# 프롬프트 예시
print(dataset["text"][2576])

<|user|>
Given the text: Knock, knock. Who’s there? Hike.
Can you continue the joke based on the given text material "Knock, knock. Who’s there? Hike"?</s>
<|assistant|>
Sure! Knock, knock. Who's there? Hike. Hike who? Hike up your pants, it's cold outside!</s>
<|user|>
Can you tell me another knock-knock joke based on the same text material "Knock, knock. Who's there? Hike"?</s>
<|assistant|>
Of course! Knock, knock. Who's there? Hike. Hike who? Hike your way over here and let's go for a walk!</s>



### 모델 양자화

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"

# 4-비트 양자화 설정 - QLoRA의 Q 단계
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # 4-비트 정밀도 모델 로드
    bnb_4bit_quant_type="nf4",  # 양자화 종류
    bnb_4bit_compute_dtype="float16",  # 계산 dtype
    bnb_4bit_use_double_quant=True,  # 이중 양자화 적용
)

# 모델을 로드하고 GPU에서 훈련합니다.
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",

    # 일반적인 SFT에서는 다음을 삭제하세요.
    quantization_config=bnb_config,
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# LLaMA 토크나이저 로드
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = "<PAD>"
tokenizer.padding_side = "left"

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

### 설정

#### LoRA 설정

In [5]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

# LoRA 설정 준비
peft_config = LoraConfig(
    lora_alpha=32,  # LoRA 스케일링
    lora_dropout=0.1,  # LoRA 층의 드롭아웃
    r=64,  # 랭크
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=  # 대상 층
     ['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# 훈련을 위한 모델 준비
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

#### 훈련 설정

In [6]:
from trl import SFTConfig

output_dir = "./results"

# 훈련 매개변수
training_arguments = SFTConfig(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    num_train_epochs=1,
    logging_steps=10,
    fp16=True,
    gradient_checkpointing=True,
    dataset_text_field="text",
    max_length=512
)

### 훈련

In [7]:
from trl import SFTTrainer

# 지도 미세 튜닝 매개변수 지정
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=tokenizer,
    args=training_arguments,

    # 일반적인 SFT에서는 다음을 삭제하세요.
    peft_config=peft_config,
)

# 모델 훈련
trainer.train()

# QLoRA 가중치 저장
trainer.model.save_pretrained("TinyLlama-1.1B-qlora")

Converting train dataset to ChatML:   0%|          | 0/3000 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/3000 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/3000 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/3000 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mroadhome[0m ([33mroadhome-wagak[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
10,1.6682
20,1.4746
30,1.4489
40,1.4849
50,1.4741
60,1.3889
70,1.4928
80,1.4445
90,1.4274
100,1.4022


### 어댑터 병합

In [8]:
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "TinyLlama-1.1B-qlora",
    low_cpu_mem_usage=True,
    device_map="auto",
)

# LoRA와 베이스 모델을 병합합니다.
merged_model = model.merge_and_unload()

### 추론

In [9]:
from transformers import pipeline

# 사전에 정의된 프롬프트 템플릿을 사용합니다.
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

# 인스트럭션 튜닝된 모델을 실행합니다.
pipe = pipeline(task="text-generation", model=merged_model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

Device set to use cuda:0


<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
Large Language Models (LLMs) are a type of artificial intelligence (AI) that can generate human-like language. They are trained on large amounts of data, including text, audio, and video, and are capable of generating complex and nuanced language.

LLMs are used in a variety of applications, including natural language processing (NLP), machine translation, and chatbots. They can be used to generate text, speech, and images, and can be trained to understand different languages and dialects.

One of the most significant applications of LLMs is in the field of natural language generation (NLG). LLMs can be used to generate text in a variety of languages, including English, French, and German. They can also be used to generate speech, such as in conversational chatbots.

LLMs have the potential to revolutionize the way we communicate and interact with each other. They can help us create more engaging and personalized

## 선호도 튜닝 (PPO/DPO)

## 데이터 전처리

In [10]:
from datasets import load_dataset

def format_prompt(example):
    """TinyLlama의 <|user|> 템플릿을 사용해 프롬프트를 구성합니다"""

    # 템플릿 포맷팅
    system = "<|system|>\n" + example['system'] + "</s>\n"
    prompt = "<|user|>\n" + example['input'] + "</s>\n<|assistant|>\n"
    chosen = example['chosen'] + "</s>\n"
    rejected = example['rejected'] + "</s>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

# 데이터셋에 템플릿을 적용하고 비교적 짧은 대답을 선택합니다
dpo_dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
dpo_dataset = dpo_dataset.filter(
    lambda r:
        r["status"] != "tie" and
        r["chosen_score"] >= 8 and
        not r["in_gsm8k_train"]
)
dpo_dataset = dpo_dataset.map(format_prompt, remove_columns=dpo_dataset.column_names)
dpo_dataset

README.md:   0%|          | 0.00/10.2k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/79.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/12859 [00:00<?, ? examples/s]

Filter:   0%|          | 0/12859 [00:00<?, ? examples/s]

Map:   0%|          | 0/5922 [00:00<?, ? examples/s]

Dataset({
    features: ['chosen', 'rejected', 'prompt'],
    num_rows: 5922
})

### 모델 양자화

In [11]:
from peft import AutoPeftModelForCausalLM
from transformers import BitsAndBytesConfig, AutoTokenizer

# 4-비트 양자화 설정 - QLoRA의 Q 단계
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # 4-비트 정밀도 모델 로드
    bnb_4bit_quant_type="nf4",  # 양자화 종류
    bnb_4bit_compute_dtype="float16",  # 계산 dtype
    bnb_4bit_use_double_quant=True,  # 이중 양자화 적용
)

# LoRA와 베이스 모델을 합칩니다.
model = AutoPeftModelForCausalLM.from_pretrained(
    "TinyLlama-1.1B-qlora",
    low_cpu_mem_usage=True,
    device_map="auto",
    quantization_config=bnb_config,
)
merged_model = model.merge_and_unload()

# LLaMA 토크나이저를 로드합니다.
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = "<PAD>"
tokenizer.padding_side = "left"



### 설정

In [12]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

# LoRA 설정을 준비합니다.
peft_config = LoraConfig(
    lora_alpha=32,  # LoRA 스케일링
    lora_dropout=0.1,  # LoRA 층의 드롭아웃
    r=64,  # 랭크
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=  # 대상 층
     ['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# 훈련을 위해 모델을 준비합니다.
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

In [20]:
from trl import DPOConfig

output_dir = "./results"

# 훈련 매개변수
training_arguments = DPOConfig(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    learning_rate=1e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    logging_steps=10,
    fp16=True,
    gradient_checkpointing=True,
    warmup_ratio=0.1,
    beta=0.1,
    max_prompt_length=512,
    max_length=512
)

In [21]:
from trl import DPOTrainer

# DPOTrainer 객체를 만듭니다.
dpo_trainer = DPOTrainer(
    model,
    args=training_arguments,
    train_dataset=dpo_dataset,
    processing_class=tokenizer,
    peft_config=peft_config
)

# DPO로 모델을 미세 튜닝합니다.
dpo_trainer.train()

# 어댑터를 저장합니다.
dpo_trainer.model.save_pretrained("TinyLlama-1.1B-dpo-qlora")



Extracting prompt in train dataset:   0%|          | 0/5922 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/5922 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/5922 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
10,0.6921
20,0.6818
30,0.6375
40,0.6285
50,0.6017
60,0.6056
70,0.568
80,0.5644
90,0.5279
100,0.5482


Step,Training Loss
10,0.6921
20,0.6818
30,0.6375
40,0.6285
50,0.6017
60,0.6056
70,0.568
80,0.5644
90,0.5279
100,0.5482


In [22]:
from peft import PeftModel

# LoRA와 베이스 모델을 합칩니다.
model = AutoPeftModelForCausalLM.from_pretrained(
    "TinyLlama-1.1B-qlora",
    low_cpu_mem_usage=True,
    device_map="auto",
)
sft_model = model.merge_and_unload()

# DPO LoRA와 SFT 모델을 합칩니다.
dpo_model = PeftModel.from_pretrained(
    sft_model,
    "TinyLlama-1.1B-dpo-qlora",
    device_map="auto",
)
dpo_model = dpo_model.merge_and_unload()



In [23]:
from transformers import pipeline

# 사전에 정의된 프롬프트 템플릿을 사용합니다.
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

# 인스트럭션 튜닝된 모델을 실행합니다.
pipe = pipeline(task="text-generation", model=dpo_model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

Device set to use cuda:0


<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
Large Language Models (LLMs) are a type of artificial intelligence (AI) that can generate human-like language. They are trained on large amounts of data, including text, audio, and video, and are capable of generating complex and nuanced language.

LLMs are used in a variety of applications, including natural language processing (NLP), machine translation, and chatbots. They can be used to generate text, speech, and images, and can be trained to understand different languages and dialects.

One of the most significant applications of LLMs is in the field of natural language generation (NLG). LLMs can be used to generate text in a variety of languages, including English, French, and German. They can also be used to generate speech, such as in conversational chatbots.

LLMs have the potential to revolutionize the way we communicate and interact with each other. They can help us create more engaging and personalized