아래의 깃허브를 참고하여 만들어졌습니다. 

* https://github.com/facebookresearch/llama
* https://github.com/facebookresearch/llama-recipes
* https://github.com/facebookresearch/llama-recipes/blob/main/quickstart.ipynb

download.sh로 다운 받은 라마 가중치를 허깅페이스(hf)로 바꿔야 로드가능 <br>
바꾸는 방법은 아래 코드 실행하기

* 가상환경에서 아래의 경로에 models란 폴더를 만들고 download.sh로 다운받은 것을 models 폴더에 넣는다. <br>
../site-packages/transformers/models/llama
* 즉, ../site-packages/transformers/models/llama를 들어가면 models/7B/"내가 다운 받은 params.json. ...pth, ...chk" 가 있고  
models와 같은 경로에 tokenizer.model이 있어야 한다.
* 다 되었으면 아래의 주석 코드를 bash 쉘로 실행한다.

In [None]:
# %%bash
# pip install transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
# TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B-chat

In [None]:
from peft import PeftModel, get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training
import transformers
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch
from transformers import TrainerCallback
from contextlib import nullcontext
import pandas as pd
from datasets import Dataset
from transformers import default_data_collator, Trainer, TrainingArguments
from datasets import Dataset
from trl import SFTTrainer

### Step 1: Load the model

Point model_id to model weight folder

In [None]:
PATH = '/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/models_hf/7B-chat'
PATH

CUDA 설치 안되어있으면 오류납니다.

In [None]:
model_id=PATH

tokenizer = LlamaTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

We can see that the base model only repeats the conversation.

### Step 4: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

PEFT : 효율적인 파라미터 튜닝, 학습 시간은 줄이고 성능은 full parameter finetuning과 비슷함.
* https://github.com/facebookresearch/llama-recipes/blob/main/docs/LLM_finetuning.md

하이퍼 파라미터들은 모두 llama_recipes의 깃허브를 참고하였습니다.

In [None]:
model.train()

def create_peft_config(model):

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)



### Step 5: Define an optional profiler

In [None]:

enable_profiler = False
output_dir = "tmp/llama-output"

config = {
    'lora_config': lora_config,
    'learning_rate': 1e-4,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 2,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)
    
    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler
            
        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

### Step 6: Fine tune the model

Here, we fine tune the model for a single epoch which takes a bit more than an hour on a A100.

csv_path 변수에 csv파일 경로를 할당한다. 이때 학습시킬 컬럼명은 text로 설정한다.

또한 질의문으로 학습시키고 싶다면 아래의 코드를 추가한 후 SFTTrainer의 인자에서 formatting_func의 주석을 해제한다.
```python
def formatting_func(example):
    text = f"[INST] <<SYS>>\n{example['question']}\n<</SYS>>\n\n{example['answer']} [/INST]"        # 시작 토큰이 <s>로 자동으로 붙기 때문에 맨 앞 <s> 토큰은 생략한다.
    return text
```

* 이때 csv 파일에서 질문 컬럼명은 "question", 답변 컬럼명은 "answer"이어야 한다.

### prompt template llama-2

- llama-2 모델에서의 프롬프트 포맷은 아래와 같습니다.
- 출처 : https://huggingface.co/blog/llama2

In [None]:
"""
prompt template llama-2


<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]
"""


"""
example

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST]

"""



"""
연속된 챗봇과 유저의 질의응답의 경우


<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]


"""

In [None]:
csv_path = '...'
df = pd.read_csv(csv_path)
df

In [None]:
train_dataset = Dataset.from_pandas(df)
train_dataset

다음의 두 가지를 바꿔가며 GPU RAM 용량을 체크해보세요. <br>
per_device_train_batch_size, max_seq_length 


In [None]:
# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,      # 덮어쓰기
    bf16=True,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,       # 10 step마다 기록
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        dataset_text_field="text",
        data_collator=default_data_collator,
        max_seq_length = 1500,          # 길어지면 GPU 소모량 상승
        callbacks=[profiler_callback] if enable_profiler else [],
        packing=True,    # chat gpt : 서로 다른 길이의 시퀀스를 효율적으로 학습하려면 packing=True로 설정하는 것이 좋습니다.
        tokenizer = tokenizer,
        # formatting_func=formatting_func       # 질의문으로 학습시에 주석 해제하기
    )
    # Start training
    trainer.train()

### Step 7:
Save model checkpoint <br>
saved only lora weight.

In [None]:
model.save_pretrained(output_dir)   # output_dir : tmp/llama-output

### Step 8:
Try the fine tuned model on the same example again to see the learning progress:

In [None]:
eval_prompt = "Question on the your csv file."

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


### Step 9:

load your saved model

For model_id, you can use the one downloaded from download.sh or model_id from the hugging face. <br>

model_id 는 download.sh 에서 다운 받은 것이나 허깅 페이스의 model_id를 사용하시면 됩니다

In [None]:
def load_peft_model(model, peft_model):
    peft_model = PeftModel.from_pretrained(model, peft_model)
    merged_model = peft_model.merge_and_unload()
    return merged_model 

In [None]:
model = LlamaForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)     # load pretrained model.
peft_model = 'tmp/llama-output/'
merged_model = load_peft_model(model, peft_model)      # merged lora weight and pretrained model.

tokenizer = LlamaTokenizer.from_pretrained(model_id)

In [None]:
eval_prompt = "Question on the your csv file."
max_new_tokens = 100            # 생성 문장 길이

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

merged_model.eval()
with torch.no_grad():
    print(tokenizer.decode(merged_model.generate(**model_input, max_new_tokens=max_new_tokens)[0], skip_special_tokens=True))
