# 필요 라이브러리 로드

#### 1단계: FT를 위한 라이브러리 로드:


`bitsandbytes`:  🤗의 CUDA custom functions: 특히나 8-bit optimizers 및 quantization functions를 둘러싸는 lightweight wrapper. QLoRA의 quantization process처리를 위해 사용.

`peft`: 🤗의 FT 라이브러리

`transformers`: 🤗의 PLMs 및 training utilites 제공 라이브러리

`accelerate`: 🤗의 multi-GPUs/TPU/fp16 관련 코드 추상화한 라이브러리.

`loralib`: LoRA의 Pytorch 구현 라이브러리.

`einops`: 텐서 연산 단순화 라이브러리

`xformers`: 구성가능한 transformer building blocks collection 라이브러리.

In [None]:
!pip install -U bitsandbytes transformers datasets accelerate loralib einops xformers
!pip install -U git+https://github.com/huggingface/peft.git

import os
import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn 
import transformers
from datasets import load_dataset


from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel, 
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    set_seed,
    pipeline,
    TrainingArguments,
)

# PLMs 로드 (🌟🌟🌟)

`AutoModelForCausalLM.from_pretrained()`함수를 이용해 🤗transformers 라이브러리에서 가져옴.

이때, 모델은 `BitsAndBytesConfig`를 이용해 4-bit로 로드되는데, 이는 모델의 사전가중치를 4bit로 양자화하고, FT중 고정하는 것을 포함하는 `Q-LoRA과정의 일부`이다!

In [None]:
model_name = "tiiuae/falcon-7b"

# Setting Global Parameters

# 'new_model' is the name that you want to give to the fine-tuned model.
new_model = "new-model-name"
username = "chan4im"
# 'hf_model_repo' is the identifier for the Hugging Face repository where you want to save the fine-tuned model.
hf_model_repo=f"{username}/"+new_model



# Load Model on GPU 
# 'device_map' is a dictionary that maps devices to model parts. In this case, it is set to {"": 0}, which means that the entire model will be loaded on GPU 0.
device_map = {"": 0}

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map=device_map,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# QLoRA를 위한 모델 준비

`prepare_model_for_kbit_training()`함수로 QLoRA에 대한 준비 진행:

이 함수는 필요한 configuration을 설정, QLoRA에 대한 모델을 초기화함.

In [None]:
model = prepare_model_for_kbit_training(model)

# LoRA Configuring

LoRA에 대한 구성은 `LoraConfig`클래스로 설정하며, 이 클래스의 매개변수는 아래와 같다:

`r`: update행렬의 rank (낮을수록 훈련가능 parameter가 줄어든다.)

`lora_alpha`: LoRA scaling인자 (LoRA층 출력에 곱해지는 값.)

`target_modules`: LoRA update행렬 적용할 모듈들. ex) attention blocks같은 모델의 특정부분에 LoRA적용가능.

`lora_dropout`: LoRA Layer의 Dropout률.

`bias`: bias parameter를 train할건지의 여부. (Can be ‘none’, ‘all’ or ‘lora_only’).

##### 모델은 LoRA Configuration 설정을 통해 `get_peft_model()`함수를 사용해 update한다!

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=['k_proj', 'q_proj', 'v_proj', 'o_proj', "gate_proj", "down_proj", "up_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

# Loading & Preparing Data

`load_dataset()`함수로 🤗 datasets라이브러리에서 data로드.

Dataset shuffled, `generate_and_tokenize_prompt()`함수에 맵핑.

`generate_and_tokenize_prompt()`함수는 각 dataset의 각 datapoint를 생성하고 tokenize함.

In [None]:
from datasets import load_dataset

def generate_prompt(data_point):
  return f"""<Human>: {data_point["Context"]}
                <AI>: {data_point["Response"]}""".strip()

def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
  return tokenized_full_prompt


dataset_name = 'Amod/mental_health_counseling_conversations'
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.shuffle().map(generate_and_tokenize_prompt)

# Training Argument Setting

training인자는 transformers라이브러리의 `TrainingArguments`클래스를 사용해 설정된다:

- `auto_find_batch_size`: True로 설정하면 트레이너가 메모리에 맞는 가장 큰 배치 크기를 자동으로 찾는다.
- `num_train_epochs`: 학습 epoch의 수입니다.
- `learning_rate`: optimizer의 학습률
- `bf16`: True로 설정하면 트레이너가 학습에 bf16(Brain Floating Point) 정밀도를 사용함. (메모리절약&훈련 속도를 크게 향상시킬 수 있음)
- `save_total_limit`: 저장할 수 있는 총 ckpt의 수입니다.
- `logging_steps`: 각 로깅 사이의 step수.
- `output_dir`: 모델 ckpt가 저장될 디렉토리.
- `save_strategy`: ckpt저장을 위한 전략. 아래 예시의 경우, 각 epoch 종료 후 저장됨.

In [None]:
training_args = transformers.TrainingArguments(
    auto_find_batch_size=True,
    num_train_epochs=4,
    learning_rate=2e-4,
    bf16=True, 
    save_total_limit=4,
    logging_steps=10,
    output_dir=OUTPUT_DIR,
    save_strategy='epoch',
)

# Training!

🤗의 transformers라이브러리의 `Trainer`클래스를 사용해 학습한다.

`모델명`, `train_dataset`, `training_args`, `언어모델링을 위한 data_collator`를 가져온다.

이후 `train()`함수로 훈련을 시작한다!

In [None]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()

In [None]:
from trl import SFTTrainer

trainer = SFTTrainer(
        model=model,
        train_dataset=dataset['train'],
        eval_dataset=dataset['valid'],
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=512,
        tokenizer=tokenizer,
        args=args,
)
trainer.train()
# save model in local
trainer.save_model()

# After Training?

학습이 완료된 이후, 

1. 로컬에 저장 or 🤗에 업로드해 🤗 PEFT와 함께 사용가능하다. 

2. 🤗 PEFT라이브러리의 `model.merge_and_unload()`함수를 사용, LoRA를 foundation-LLM과 합칠 수 있다.

### 방법 1.

In [None]:
output_dir = "path_to_save_model"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

huggingface-cli login

from huggingface_hub import notebook_login

notebook_login()

model.push_to_hub("your_model_name")
tokenizer.push_to_hub("your_model_name")

### 방법 2.

In [None]:
from huggingface_hub import notebook_login, login
from huggingface_hub import login
from dotenv import load_dotenv
import os

notebook_login()
load_dotenv()
login(token=os.getenv("HF_HUB_TOKEN"))
hf_adapter_repo=f"{username}/{model_name}"
     

# 'trainer.push_to_hub(hf_adapter_repo)' is a method that pushes the trained model adapter to the Hugging Face Model Hub. 
# 'hf_adapter_repo' is the repository name for the model adapter on the Hugging Face Model Hub.


# Save the adapter
trainer.push_to_hub(hf_adapter_repo)
     

# 'del model' and 'del trainer' are lines of code that delete the 'model' and 'trainer' objects. This frees up the memory that was used by these objects.
# 'import gc' is a line of code that imports the 'gc' module, which provides an interface to the garbage collector.
# 'gc.collect()' is a function that triggers a full garbage collection. It frees up memory by collecting all the objects that are no longer in use.
# This block of code is used to empty the VRAM (Video Random Access Memory) by deleting the 'model' and 'trainer' objects and then triggering a full garbage collection.
# Empty VRAM
del model
del trainer
import gc
gc.collect()
gc.collect()
     

# 'torch.cuda.empty_cache()' is a function from the PyTorch library that releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi.
# It's a PyTorch specific function to manage GPU memory and it doesn't affect the GPU memory usage by PyTorch tensors.
# This line of code is used to empty the cache memory that's used by PyTorch on the GPU.
torch.cuda.empty_cache() # PyTorch thing
     

# 'torch.cuda.empty_cache()' is a function from the PyTorch library that releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi.
# It's a PyTorch specific function to manage GPU memory and it doesn't affect the GPU memory usage by PyTorch tensors.
# This line of code is used to empty the cache memory that's used by PyTorch on the GPU.
torch.cuda.empty_cache() # PyTorch thing
     

# 'hf_adapter_repo' is a variable that holds the repository name for the Hugging Face model adapter.
# 'edumunozsala/phi-3-mini-QLoRA' is the repository name, where 'edumunozsala' is the username of the repository owner and 'phi-3-mini-QLoRA' is the name of the model adapter.
# 'model_name, hf_adapter_repo, compute_dtype' is a line of code that returns the values of the 'model_name', 'hf_adapter_repo', and 'compute_dtype' variables.
# This block of code is used to set the repository name for the Hugging Face model adapter and then return the values of the 'model_name', 'hf_adapter_repo', and 'compute_dtype' variables.
#hf_adapter_repo = "edumunozsala/phi-3-mini-QLoRA"



# 'peft_model_id' and 'tr_model_id' are variables that hold the identifiers for the PEFT model and the transformer model, respectively.
# 'AutoModelForCausalLM.from_pretrained(tr_model_id, trust_remote_code=True, torch_dtype=compute_dtype)' is a function that loads a pre-trained transformer model for causal language modeling. 'tr_model_id' is the identifier for the pre-trained model, 'trust_remote_code=True' allows the execution of code from the model file, and 'torch_dtype=compute_dtype' sets the data type for the PyTorch tensors.
# 'PeftModel.from_pretrained(model, peft_model_id)' is a function that loads a pre-trained PEFT model. 'model' is the transformer model and 'peft_model_id' is the identifier for the pre-trained PEFT model.
# 'model.merge_and_unload()' is a method that merges the PEFT model with the transformer model and then unloads the PEFT model.
# This block of code is used to load a pre-trained transformer model and a pre-trained PEFT model, merge the two models, and then unload the PEFT model.
peft_model_id = hf_adapter_repo
tr_model_id = model_name

model = AutoModelForCausalLM.from_pretrained(tr_model_id, trust_remote_code=True, torch_dtype=compute_dtype)
model = PeftModel.from_pretrained(model, peft_model_id)
model = model.merge_and_unload()
     

# 'tokenizer' is a variable that holds the tokenizer.

# 'AutoTokenizer.from_pretrained(peft_model_id)' is a function from the Hugging Face Transformers library that loads a pre-trained tokenizer. 'peft_model_id' is the identifier for the pre-trained tokenizer.

# This line of code is used to load a pre-trained tokenizer.
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
     

# 'hf_model_repo' is a variable that holds the repository name for the Hugging Face model.

# This line of code is used to reference the repository name for the Hugging Face model.
hf_model_repo
     

# 'merged_model_id' is a variable that holds the identifier for the merged model.

# 'hf_model_repo' is the repository name for the Hugging Face model.

# 'model.push_to_hub(merged_model_id)' is a method that pushes the merged model to the Hugging Face Model Hub. 'merged_model_id' is the identifier for the merged model.

# 'tokenizer.push_to_hub(merged_model_id)' is a method that pushes the tokenizer to the Hugging Face Model Hub. 'merged_model_id' is the identifier for the tokenizer.

# This block of code is used to save the merged model and the tokenizer to the Hugging Face Model Hub.
# SAve the model merged to the Hub
merged_model_id = hf_model_repo
model.push_to_hub(merged_model_id)
tokenizer.push_to_hub(merged_model_id)
     