Reference:     
https://www.databricks.com/blog/efficient-fine-tuning-lora-guide-llms  
https://colab.research.google.com/drive/1Vvju5kOyBsDr7RX_YAvp6ZsSOoSMjhKD?usp=sharing#scrollTo=L2Hllu-bCuN6  
https://www.youtube.com/watch?v=NRVaRXDoI3g  
https://blog.csdn.net/LF_AI/article/details/132419546  

Software environment：  
requirements.txt  
Cuda compilation tools, release 10.1, V10.1.243   
NVIDIA-SMI 545.23.08  Driver Version: 545.23.08    CUDA Version: 12.3  
Nvidia GTX 3060

# Main steps
1. Loading dataset
2. Data pre-processing
3. Creating prompt template
4. Instanciating LoraConfig object
5. Loading LoRA model and tokenizer
6. Testing before fine-tuning
7. Setting up a Trainer
8. Fine-tuning
9. Testing after fine-tuning
9. Saving fine-tuned model


In [1]:
import pandas as pd
from datasets import load_dataset
from datasets import Dataset
from trl import SFTTrainer
import torch
import mlflow
from transformers import BitsAndBytesConfig, DataCollatorForSeq2Seq
from peft import get_peft_config, PeftModel, PeftConfig, get_peft_model, LoraConfig, TaskType, PeftModelForCausalLM
from transformers import LlamaTokenizer, LlamaForCausalLM, Trainer
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, logging, set_seed

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
#Load the dataset from the HuggingFace Hub
rd_ds = load_dataset("xiyuez/red-dot-design-award-product-description")
#Convert to pandas dataframe for convenient processing
# 将数据转化为pandas的dataframe，方便处理 
rd_df = pd.DataFrame(rd_ds['train'])
rd_df.head()

Unnamed: 0,product,category,description,text
0,Biamp Rack Products,Digital Audio Processors,"“High recognition value, uniform aesthetics an...",Product Name: Biamp Rack Products;\n\nProduct ...
1,V33,Video Camera,The V33 livestreaming video camera ensures hig...,Product Name: V33;\n\nProduct Category: Video ...
2,HP LaserJet 5000-6000 and E700-E800 Series MFPs,Multi-Function Printers,The HP LaserJet 5000 to 6000 Series and E700 t...,Product Name: HP LaserJet 5000-6000 and E700-E...
3,Meaco Arete One 20L Dehumidifier,Heating and Air Conditioning Technology,The Meaco Arete One Dehumidifier is characteri...,Product Name: Meaco Arete One 20L Dehumidifier...
4,théATRE Glass Container for Loose Leaf Tea,Food Containers,The design and colouring of the théATRE Glass ...,Product Name: théATRE Glass Container for Loos...


In [3]:
#Combine the two attributes into an instruction string
# 通过pandas的操作，批量构建 instruction prompt 
rd_df['instruction'] = 'Create a detailed description for the following product: '+ rd_df['product']+', belonging to category: '+ rd_df['category']
rd_df.head()

Unnamed: 0,product,category,description,text,instruction
0,Biamp Rack Products,Digital Audio Processors,"“High recognition value, uniform aesthetics an...",Product Name: Biamp Rack Products;\n\nProduct ...,Create a detailed description for the followin...
1,V33,Video Camera,The V33 livestreaming video camera ensures hig...,Product Name: V33;\n\nProduct Category: Video ...,Create a detailed description for the followin...
2,HP LaserJet 5000-6000 and E700-E800 Series MFPs,Multi-Function Printers,The HP LaserJet 5000 to 6000 Series and E700 t...,Product Name: HP LaserJet 5000-6000 and E700-E...,Create a detailed description for the followin...
3,Meaco Arete One 20L Dehumidifier,Heating and Air Conditioning Technology,The Meaco Arete One Dehumidifier is characteri...,Product Name: Meaco Arete One 20L Dehumidifier...,Create a detailed description for the followin...
4,théATRE Glass Container for Loose Leaf Tea,Food Containers,The design and colouring of the théATRE Glass ...,Product Name: théATRE Glass Container for Loos...,Create a detailed description for the followin...


In [4]:
rd_df['instruction'][0]

'Create a detailed description for the following product: Biamp Rack Products, belonging to category: Digital Audio Processors'

In [5]:
rd_df = rd_df[['instruction', 'description']]
rd_df.head()

Unnamed: 0,instruction,description
0,Create a detailed description for the followin...,"“High recognition value, uniform aesthetics an..."
1,Create a detailed description for the followin...,The V33 livestreaming video camera ensures hig...
2,Create a detailed description for the followin...,The HP LaserJet 5000 to 6000 Series and E700 t...
3,Create a detailed description for the followin...,The Meaco Arete One Dehumidifier is characteri...
4,Create a detailed description for the followin...,The design and colouring of the théATRE Glass ...


In [6]:
#Get a 5000 sample subset for fine-tuning purposes
# 取前面5000个样本先测试整个流程
rd_df_sample = rd_df.sample(n=5000, random_state=42)
#Define template and format data into the template for supervised fine-tuning
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

{}

### Response:\n"""

In [7]:

rd_df_sample['prompt'] = rd_df_sample["instruction"].apply(lambda x: template.format(x))
# Rename columns. 重命名列
rd_df_sample.rename(columns={'description': 'response'}, inplace=True)
rd_df_sample['response'] = rd_df_sample['response'] + "\n### End"
rd_df_sample = rd_df_sample[['prompt', 'response']]
print(rd_df_sample['response'][0])
rd_df_sample.head()

“High recognition value, uniform aesthetics and practical scalability – this has been impressively achieved with the Biamp brand language,” the jury statement said. The previous design of the digital audio processors was not only costly to produce, but also incompatible with newer system architectures. With the new concept, the company is making a visual statement that allows for differences in dimension, connectivity and application. Design elements include consistent branding, a soft curve on the top and bottom edges, and two red bars on the left and right margins of the products. The two-part black front panel can be used for various products.
### End


Unnamed: 0,prompt,response
18952,Below is an instruction that describes a task....,The CG8565 is a gaming PC offering space for h...
12584,Below is an instruction that describes a task....,The iSHOXS BullBar ProX mount can be used to a...
5702,Below is an instruction that describes a task....,The S81 Pro focuses on two things: outstanding...
20503,Below is an instruction that describes a task....,The CenFlex superfinish machine is designed fo...
2480,Below is an instruction that describes a task....,The THALION S gas absorption heat pump uses na...


In [8]:
rd_df_sample['text'] = rd_df_sample["prompt"] + rd_df_sample["response"]
# Drop columns. 删除列
rd_df_sample.drop(columns=['prompt', 'response'], inplace=True)
rd_df_sample.head()

Unnamed: 0,text
18952,Below is an instruction that describes a task....
12584,Below is an instruction that describes a task....
5702,Below is an instruction that describes a task....
20503,Below is an instruction that describes a task....
2480,Below is an instruction that describes a task....


Below is an instruction that describes a task. Write a response that appropriately completes the request.

###Instruction:  
Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

###Response:  
Corelogic Smooth Mouse is a mouse that is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard time using a mouse. The mouse is designed to be used by people who have a hard

## The Turnable Knobs
Two of these hyperparameters, r and target_modules are empirically shown to affect adaptation quality significantly and will be the focus of the tests that follow. The other hyperparameters are kept constant at the values indicated above for simplicity.

Thus, it is a common practice to only target the attention blocks of the transformer. However, recent work as shown in the QLoRA paper by Dettmers et al. suggests that targeting all linear layers results in better adaptation quality. This will be explored here as well.
https://arxiv.org/abs/2305.14314 《QLoRA: Efficient Finetuning of Quantized LLMs》

In [9]:
from peft import LoraConfig

#If only targeting attention blocks of the model
# 设置需要LoRA微调的模块。可以通过打印模型查看模型具体的模块名。
target_modules = ["q_proj", "v_proj"]

#If targeting all linear layers
# target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']

lora_config = LoraConfig(
    r=16,
    target_modules = target_modules,
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    # full_determinism=False,
)


## Tuning the finetuning with LoRA

In [10]:
# Quantification config 
# 量化配置文件 
nf4_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_use_double_quant=True,
  bnb_4bit_compute_dtype=torch.bfloat16
)
# load a quantified model
# 加载量化后的模型
model_path = 'openlm-research/open_llama_3b_v2'
row_model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map='auto', quantization_config=nf4_config,
) 
# model = AutoModelForCausalLM.from_pretrained(
#     model_path, device_map='auto',
# )

# With or without LoRA parameters, the results are almost the same. Because it is not fine-tuned right now.
# This function only add LoRA plugin, instead of loading fine-tuned LoRA parameters
# 加载配置了LoRA的模型
model = get_peft_model(row_model, lora_config)
tokenizer = LlamaTokenizer.from_pretrained(model_path)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [11]:
#Pass in a prompt and infer with the model
prompt = 'Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=250
)

print(tokenizer.decode(generation_output[0]))



<s>Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: Corelogic Smooth Mouse is a mouse that is designed to be used with a computer. It is a wireless mouse that has a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 GHz wireless connection. It has a 2.4 GHz wireless connection and a 2.4 G


Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse

A: The product is a mouse that has a smooth surface. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used for computer use. It is a mouse that is used

In [12]:
base_dir = "./"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = 'adamw_hf'
learning_rate = 1e-5
max_grad_norm = 0.3
warmup_ratio = 0.03
lr_scheduler_type = "linear"

from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir=base_dir,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    num_train_epochs = 3.0,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

In [13]:
dataset = Dataset.from_pandas(rd_df_sample).train_test_split(test_size=0.05, seed=42)
trainer = SFTTrainer( # Supervised Finetuning Trainer
    # trainer = Trainer( 
    model,
    train_dataset=dataset['train'],
    eval_dataset =dataset['test'],
    dataset_text_field="text",
    max_seq_length=256,
    args=training_args,
)

# Initiate the training process
with mlflow.start_run(run_name = 'test'):
    trainer.train()

Using pad_token, but it is not set yet.
Map: 100%|██████████| 4750/4750 [00:00<00:00, 14222.46 examples/s]
Map: 100%|██████████| 250/250 [00:00<00:00, 14290.25 examples/s]
  0%|          | 0/891 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
                                                 
 33%|███▎      | 297/891 [16:09<26:25,  2.67s/it]

{'eval_loss': 2.0675911903381348, 'eval_runtime': 25.9704, 'eval_samples_per_second': 9.626, 'eval_steps_per_second': 1.232, 'epoch': 1.0}


 56%|█████▌    | 500/891 [27:02<25:12,  3.87s/it]  

{'loss': 2.2971, 'learning_rate': 4.525462962962963e-06, 'epoch': 1.68}


                                                 
 67%|██████▋   | 594/891 [32:26<13:44,  2.78s/it]

{'eval_loss': 1.949227213859558, 'eval_runtime': 26.0417, 'eval_samples_per_second': 9.6, 'eval_steps_per_second': 1.229, 'epoch': 2.0}


                                                 
100%|██████████| 891/891 [48:46<00:00,  3.28s/it]

{'eval_loss': 1.9263529777526855, 'eval_runtime': 25.9703, 'eval_samples_per_second': 9.626, 'eval_steps_per_second': 1.232, 'epoch': 3.0}
{'train_runtime': 2926.5679, 'train_samples_per_second': 4.869, 'train_steps_per_second': 0.304, 'train_loss': 2.1083144542210297, 'epoch': 3.0}





In [14]:

#Pass in a prompt and infer with the model
prompt = 'Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=250
)

print(tokenizer.decode(generation_output[0]))



<s> Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a high-end optical mouse with a 1000 DPI sensor. The mouse is equipped with a 3-button mouse wheel and a scroll wheel. The mouse is available in black and white.
</s>


In [18]:
# ValueError: Cannot merge LORA layers when the model is loaded in 8-bit mode. 
# merge_and_unload() is not supported in this version.
# merged_model = model.merge_and_unload() 
model.save_pretrained("adapter_model") # Saving adaptor (fine-tuned LoRA parameters)

## Load saved model and test it

In [2]:
## load pretrained model and LoRA adaptor
# lora_path = 'fine-tuned-llama-3B-LoRA'
lora_path = './adapter_model'
model_path = 'openlm-research/open_llama_3b_v2'

model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto',
)

model = PeftModelForCausalLM.from_pretrained(model, lora_path)
tokenizer = LlamaTokenizer.from_pretrained(model_path)

prompt = 'Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
# generation_output = peft_model.generate(input_ids=input_ids, max_new_tokens=250)
generation_output = model.generate(input_ids=input_ids.cuda(), max_new_tokens=250)
print(tokenizer.decode(generation_output[0]))

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


<s>Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a high-quality optical mouse with a smooth surface. The mouse is equipped with a 1000 dpi sensor and a 1000 Hz polling rate. The mouse is compatible with Windows 7, 8 and 10.
Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a high-quality optical mouse with a smooth surface. The mouse is equipped with a 1000 dpi sensor and a 1000 Hz polling rate. The mouse is compatible with Windows 7, 8 and 10.
Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a high-quality optical mouse with a smooth surface. The mouse is equipped with a 1000 dpi sensor and a 1000 Hz polling rate. The mouse is compatible with Windows 7, 8 and 10.
Q: Create 