## 注意事项：
1.使用Meta-Llama-3-8B-Instruct版本微调, 原始版本或中文微调版本都可以, 但最好不要用这个预训练Meta-Llama-3-8B模型来微调, 因为调用它的tokenizer.apply_chat_template返回的结果并不符合llama3对话模板, 而是'<|im_start|>user\n你是谁？<|im_end|>\n<|im_start|>assistant\n'这种模板, 你需要去重写它的apply_chat_template
2.加载模型进行微调的时候, LlamaForCausalLM和AutoModelForCausalLM都可以, 但torch_dtype需要设置为torch.bfloat16, 如果选用float16则在训练阶段你会看到loss非常大, 比前者高2个数量级, 并且会得到错误的模型, 原因暂时未知

In [1]:
from transformers import LlamaForCausalLM, AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "D:/PycharmProjects/LLM_Project/huanhuan_chat/models/LLM-Research/Meta-Llama-3-8B-Instruct"
# model_name = "../models/Meta-Llama-3-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda:0",torch_dtype=torch.bfloat16, use_cache=False)  # device_map="auto" 自动布置
# model = LlamaForCausalLM.from_pretrained(model_name, device_map="cuda:0",torch_dtype=torch.bfloat16, use_cache=False)
model.enable_input_require_grads()
# model = prepare_model_for_int8_training(model)

tokenizer = AutoTokenizer.from_pretrained(model_name)  # 这个从modelscope下载的llama3与HF不太一样
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [2]:
from peft import LoraConfig, TaskType, get_peft_model

loro_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alpha，具体作用参见 Lora 原理
    lora_dropout=0.05,  # Dropout 比例
    bias = "none"
)

model = get_peft_model(model, loro_config)
model.print_trainable_parameters()

trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605


In [3]:
def preprocess(sample):
        prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{sample['instruction']}{sample['input']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
        prompt = tokenizer.encode(prompt, add_special_tokens=False)  # optional: +tokenizer.eos_token
        output = tokenizer.encode(f"{sample['output']}<|eot_id|>", add_special_tokens=False)  # optional: +tokenizer.eos_token

        sample = {
            "input_ids": prompt + output,
            "attention_mask": [1] * (len(prompt) + len(output)),
            "labels": [-100] * len(prompt) + output,
        }
        return sample

In [4]:
import datasets

ds = datasets.load_dataset('json', data_files={'train': '../dataset/huanhuan.json'})
train_dataset = ds["train"]
# train_dataset = ds["train"].map(process_func, remove_columns=train_dataset.column_names)
train_dataset = ds["train"].map(preprocess, remove_columns=train_dataset.column_names)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/3729 [00:00<?, ? examples/s]

In [5]:
print(tokenizer.eos_token, tokenizer.eos_token_id)
print(tokenizer.encode('<|end_of_text|>', add_special_tokens=False))
print(tokenizer.encode('<|eot_id|>', add_special_tokens=False))
print(train_dataset[0])

<|end_of_text|> 128001
[128001]
[128009]
{'input_ids': [128000, 128006, 882, 128007, 271, 111319, 3922, 64022, 9554, 106241, 58850, 72368, 19000, 32018, 16325, 31867, 3922, 113723, 19361, 100389, 109, 80578, 111319, 101067, 101307, 58843, 224, 104241, 45829, 3922, 86894, 102, 113715, 111419, 41914, 50928, 89151, 89151, 103203, 9554, 8713, 128009, 128006, 78191, 128007, 271, 103001, 246, 8713, 72368, 37687, 104894, 110767, 37687, 105150, 126957, 108298, 9554, 1811, 128009], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 103001, 246, 8713, 72368, 37687, 10

In [6]:
from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq

args = TrainingArguments(
    output_dir="output/llama3",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    logging_steps=10,
    num_train_epochs=1,
    save_steps=100,
    # learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs = {"use_reentrant": False},
    report_to="none"
)

optimizer = torch.optim.AdamW(
            model.parameters(),
            lr=1e-4,
            weight_decay=0.0
        )
# scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.85)  # 每当epoch是step_size倍数时进行学习率递减
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, total_iters=100)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
    # optimizers=(optimizer, scheduler)
)

In [7]:
trainer.train()

Step,Training Loss
10,3.9494
20,3.2911
30,3.0479
40,2.9438
50,2.9229
60,2.9187
70,2.8501
80,2.9035
90,2.8964


Step,Training Loss
10,3.9494
20,3.2911
30,3.0479
40,2.9438
50,2.9229
60,2.9187
70,2.8501
80,2.9035
90,2.8964
100,2.775


Step,Training Loss
10,3.9494
20,3.2911
30,3.0479
40,2.9438
50,2.9229
60,2.9187
70,2.8501
80,2.9035
90,2.8964
100,2.775


TrainOutput(global_step=233, training_loss=2.9030128896492235, metrics={'train_runtime': 259.066, 'train_samples_per_second': 14.394, 'train_steps_per_second': 0.899, 'total_flos': 1.6972641920876544e+16, 'train_loss': 2.9030128896492235, 'epoch': 0.9989281886387996})

In [27]:
dialogs = [[
    # {"role": "system", "content": "现在你要扮演皇帝身边的女人--甄嬛"},
    {"role": "user", "content": "你叫什么？"}
]]
chats = tokenizer.apply_chat_template(dialogs, add_generation_prompt=True, tokenize=True)

print(chats)

[[128000, 128006, 882, 128007, 271, 57668, 105424, 101879, 11571, 128009, 128006, 78191, 128007, 271]]


In [29]:
model.eval()

with torch.no_grad():
    for idx, chat in enumerate(chats):
        tokens = torch.tensor(chat).long()
        tokens = tokens.unsqueeze(0)
        tokens = tokens.to("cuda:0")
        outputs = model.generate(
            input_ids=tokens,
            max_new_tokens=50,
            do_sample=True,
            # top_p=0.9,
            temperature=1,
            use_cache=True,
            top_k=50,
            # repetition_penalty=1.0,
            # length_penalty=1,
            eos_token_id=tokenizer.encode('<|eot_id|>', add_special_tokens=False)[0]
        )
        print(outputs)
        output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"Model output:\n{output_text}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


tensor([[128000, 128006,    882, 128007,    271,  57668, 105424, 101879,  11571,
         128009, 128006,  78191, 128007,    271, 113524, 100525,    122,  21043,
           7518,    226, 115003,    249,   3922,  46729, 104503,  21043,  27384,
          22649, 108553,  83747,  11239,    123,   7518,    226, 106297,  45893,
           1811, 128009]], device='cuda:0')
Model output:
user

你叫什么？assistant

臣妾是甄嬛，家父是大理寺少卿甄远道。
