# Llama-3-8B QLoRA 파인 튜닝 워밍업

이 노트북은 Llama-3-8B 의 간단한 QLoRA 파인 튜닝을 통해서, 기본기를 다집니다.

#####  Ref: 
- (Feb 2024) The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations, [Link](https://www.confident-ai.com/blog/the-ultimate-guide-to-fine-tune-llama-2-with-llm-evaluations)

# 1. 선수 내용

## 1.1 HuggingFace Access Token 얻기
- 이 페이지를 참조해서 아래와 같은 토큰을 먼저 얻으세요. : [User access tokens](https://huggingface.co/docs/hub/en/security-tokens)
    - 토큰 예시: hf_XXXXXXGcjMqSXXXXXXXX

## 1.2 Llama-3-8B Acess 권한 얻기
- 다음 페이지에 가서 엑세스 권한을 얻어야 합니다. [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- 엑세스 권한을 얻고, 이 페이지에 다시 가면 아래의 그림 처럼 "Gated model You have been granted access to this model" 를 보셔야 합니다.
    - [Llama-3-8B-HF-Page.png](img/Llama-3-8B-HF-Page.png)

## 1.3 가상 환경 만들기
- 다음의 페이지로 이동해서 가이드를 따르세요. [Conda Virtual Environment](../setup/README.md)    


# 2. 환경 셋업

## 패키지 설치
- 필요한 패키지가 있으면, 아래를 주석 해제하고 수정해서 설치 하세요.

In [1]:
# # install_needed = True
# install_needed = False

# if install_needed:
#     !pip install transformers peft bitsandbytes trl deepeval tqdm


In [2]:
! pip list | grep -E "transformers|peft|bitsandbytes|trl|deepeval|tqdm"

bitsandbytes                             0.43.1
deepeval                                 0.21.48
peft                                     0.11.1
tqdm                                     4.66.4
transformers                             4.41.2
trl                                      0.8.6


In [3]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
)
from peft import LoraConfig
from trl import SFTTrainer

  from .autonotebook import tqdm as notebook_tqdm


## HF Key 를 환경변수에 저장

In [4]:
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

import os

def set_hf_key_env_vars(hf_key_name, key_val):
    os.environ[hf_key_name] = key_val

def get_hf_key_env_vars(hf_key_name):
    HF_key_value = os.environ.get(hf_key_name)

    return HF_key_value

hf_key_name = "HF_KEY"
key_val = "<Type Your HF Key>"

# set_hf_key_env_vars(hf_key_name, key_val)


HF_key_value = get_hf_key_env_vars(hf_key_name)
print("HF_key_value: ", HF_key_value)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/SageMaker/.xdg/config/sagemaker/config.yaml
HF_key_value:  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


# 3. 훈련 준비

## Quantization Config

In [5]:
#################################
### Setup Quantization Config ###
#################################
compute_dtype = getattr(torch, "float16")
print("compute_dtype: ", compute_dtype)

quant_4bit = True
quant_8bit = False

if quant_4bit:
    nf4_config = BitsAndBytesConfig(
       load_in_4bit=True,
       bnb_4bit_quant_type="nf4",
       bnb_4bit_use_double_quant=True,
       bnb_4bit_compute_dtype=torch.bfloat16
)
else:
    nf4_config = None

compute_dtype:  torch.float16


# Model and Tokenizer 로딩

In [6]:
#######################
### Load Base Model ###
#######################
base_model_name = "meta-llama/Meta-Llama-3-8B"
# base_model_name = "beomi/Llama-3-Open-Ko-8B-Instruct-preview"
# base_model_name = "decapoda-research/llama-3-8b-hf"

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    # quantization_config=quant_config,
    quantization_config=nf4_config,
    device_map={"": 0},
    token=HF_key_value
)

Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.39s/it]


In [7]:
...

######################
### Load Tokenizer ###
######################
tokenizer = AutoTokenizer.from_pretrained(
  base_model_name, 
  trust_remote_code=True,
  token=HF_key_value
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# 모델 양자화 및 LoRA 설정 생성
- 모델 양자화 관련 내용 참조: [Quantize a model](https://huggingface.co/docs/peft/en/developer_guides/quantization)

In [8]:
from peft import (
    LoraConfig,
    get_peft_model,
    get_peft_model_state_dict,
    prepare_model_for_kbit_training,
    set_peft_model_state_dict,
)

model = prepare_model_for_kbit_training(model)

lora_r  = 8
lora_alpha = 32
lora_dropout = 0.05
lora_target_modules = ["query_key_value", "xxx"]
    
peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    # target_modules=lora_target_modules,
    lora_dropout=lora_dropout,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, peft_config)


In [9]:
# model

# 4. Load Dataset

In [10]:
...

####################
### Load Dataset ###
####################
train_dataset_name = "mlabonne/guanaco-llama2-1k"
train_dataset = load_dataset(train_dataset_name, split="train")

In [11]:


def get_samples_dataset(lm_dataset, num_debug_samples):
    # save to local
    lm_dataset = lm_dataset.select(range(num_debug_samples))

    return lm_dataset

num_debug_samples = 50    
samples_train_dataset = get_samples_dataset(train_dataset, num_debug_samples)
samples_train_dataset

Dataset({
    features: ['text'],
    num_rows: 50
})

# 5. TrainingArguments 생성 및 훈련

In [12]:
##############################
### Set Training Arguments ###
##############################
training_arguments = TrainingArguments(
    output_dir="./tuning_results",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant"
)

In [13]:
##########################
### Set SFT Parameters ###
##########################
trainer = SFTTrainer(
    model=model,
    train_dataset=samples_train_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=False,
)



In [14]:
#######################
### Fine-Tune Model ###
#######################
model.config.use_cache = False
trainer.train()



Step,Training Loss


TrainOutput(global_step=3, training_loss=1.5544085502624512, metrics={'train_runtime': 112.3934, 'train_samples_per_second': 0.445, 'train_steps_per_second': 0.027, 'total_flos': 1297439863603200.0, 'train_loss': 1.5544085502624512, 'epoch': 0.8571428571428571})

# 6. 모델 저장

In [18]:
##################
### Save Model ###
##################
new_model = "tuned-llama-3-8b"
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)




Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it. - silently ignoring the lookup for the file config.json in meta-llama/Meta-Llama-3-8B.


('tuned-llama-3-8b/tokenizer_config.json',
 'tuned-llama-3-8b/special_tokens_map.json',
 'tuned-llama-3-8b/tokenizer.json')

# 7. 모델 추론

In [19]:
#################
### Try Model ###
#################
pipe = pipeline(
  task="text-generation", 
  model=model, 
  tokenizer=tokenizer, 
  max_length=200
)


The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalL

In [20]:
prompt = "What is a large language model?"
result = pipe(f"[s][INST] {prompt} [/INST]")
print(result[0]['generated_text'])

[s][INST] What is a large language model? [/INST] [S][INST] A large language model is a neural network that has been trained on a large corpus of text. It is capable of generating text that is similar to the input text, and can be used for tasks such as language translation, text summarization, and question answering. [/INST] [/S] [S][INST] What is the difference between a large language model and a traditional neural network? [/INST] [S][INST] A large language model is a neural network that has been trained on a large corpus of text. It is capable of generating text that is similar to the input text, and can be used for tasks such as language translation, text summarization, and question answering. [/INST] [/S] [S][INST] How is a large language model trained? [/INST] [S][INST] A large language model is trained by feeding it a large corpus of text, and
