## 使用微调后的 LLaMA2-7B 推理

### 一、Alpaca 格式的本质与原理
- 核心定义
Alpaca 格式是斯坦福大学在《Alpaca: A Strong, Replicable Instruction-Following Model》中提出的指令微调数据格式，其本质是结构化指令-输出对，用于训练模型理解并服从人类指令。
- 数据结构
典型的 Alpaca 格式样本包含三个核心字段：
```json
{
  "instruction": "将以下中文翻译成英文",
  "input": "今天的天气非常好",
  "output": "The weather is very nice today."
}
```
    instruction：明确的任务指令（必须存在）      
    input：任务输入内容（可能为空）      
    output：期望的输出结果（必须存在）     
- 设计原理

任务泛化：通过多样化指令让模型学习任务范式而非具体问题

零样本提示：模拟用户未提供示例时的真实使用场景

输入灵活性：input字段允许空值，支持纯指令任务（如"写一首诗"）

- 作用分析

突破传统数据集的问答局限性，覆盖复杂任务（翻译/推理/创作）

提升模型的指令泛化能力，而非记忆特定QA对

成为开源社区指令微调的事实标准格式（兼容 LLaMA、ChatGLM 等）

In [5]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

#llama-7-int4-dolly-20250805_184243
model_dir = "models/llama-7-int4-dolly-20250817_095723"
 
# 加载基础LLM模型与分词器
model = AutoPeftModelForCausalLM.from_pretrained(
    model_dir,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
) 
tokenizer = AutoTokenizer.from_pretrained(model_dir)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): Linear4bit(in_features=4096, out_feature

In [7]:
from datasets import load_dataset 
from random import randrange
 
 
# 从hub加载数据集并得到一个样本
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
sample = dataset[randrange(len(dataset))]
 
prompt = f"""### Instruction:
Use the Input below to create an instruction, which could have been used to generate the input using an LLM. 
 
### Input:
{sample['response']}
 
### Response:
"""
 
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

print(f"Prompt:\n{sample['response']}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground truth:\n{sample['instruction']}")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Prompt:
Some examples of woodwind instruments include flute, clarinet, and oboe.

Generated instruction:
What are some examples of woodwind instruments?

Ground truth:
What are some examples of a woodwind instrument?


In [8]:
prompt

'### Instruction:\nUse the Input below to create an instruction, which could have been used to generate the input using an LLM. \n\n### Input:\nSome examples of woodwind instruments include flute, clarinet, and oboe.\n\n### Response:\n'

In [9]:
prompt = f"""### Instruction:
写一首诗. 
 
### Input:

### Response:
"""
 
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

print(f"Prompt:\n{sample['response']}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground truth:\n{sample['instruction']}")

Prompt:
Some examples of woodwind instruments include flute, clarinet, and oboe.

Generated instruction:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response:

### Response
Ground truth:
What are some examples of a woodwind instrument?


In [10]:
## 用一个训练例子
prompt = f"""### Instruction:
What is salvation in christianity? 
 
### Input:

### Response:
"""
 
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

print(f"Prompt:\n{sample['response']}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground truth:\n{sample['instruction']}")

Prompt:
Some examples of woodwind instruments include flute, clarinet, and oboe.

Generated instruction:

What is salvation in Christianity?

Ground truth:
What are some examples of a woodwind instrument?
