<a href="https://colab.research.google.com/github/119020/NLP_2025_Spring_Materials/blob/main/Tutorial_5_TrainLLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 5: Train your own LLMs
### **Course Name:** CSC6052/5051/4100/DDA6307/MDS5110 Natural Language Processing




This notebook guide provides a comprehensive overview of using the `transformers` Python package to efficiently train a custom model. It covers the following techniques:

1. Load Model, Tokenizer and Template for Chat Model.
2. Process Data for Training.
2. Train Model with Qlora.
4. Evaluate Model's performance.
5. Save and Deploy Trained Model.

## Preliminary Preparation

Before proceeding with model training, ensure your environment is properly configured by following these steps:

1. Install the necessary Python packages.
2. Import the required libraries.

In [None]:
!pip install -q h5py typing-extensions wheel
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!nvidia-smi

Fri Mar 21 03:38:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   39C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                


## Load Pre-trained model and tokenizer

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "Qwen/Qwen2.5-7B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True, # Activate nested quantization for 4-bit base models (double quantization)
    bnb_4bit_quant_type="nf4", # Quantization type (fp4 or nf4), According to QLoRA paper, for training 4-bit base models (e.g. using LoRA adapters) one should use
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

tokenizer = AutoTokenizer.from_pretrained(model_id)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

## Preprocess the quantized model for training

In [None]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
from peft import LoraConfig, get_peft_model

# You can try differnt parameter-effient strategy for model trianing, for more info, please check https://github.com/huggingface/peft
config = LoraConfig(
    r=8,
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)

## Chat Template Usage

In [None]:
from jinja2 import Template
template = Template(tokenizer.chat_template)
message = "Please introduce yourself"
print(f"message:\n{message}\n")
message_send_to_model=template.render(messages=[{"role": "user", "content": message}],bos_token=tokenizer.bos_token,add_generation_prompt=True)
print(f"message_send_to_model:\n{message_send_to_model}")

message:
Please introduce yourself

message_send_to_model:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Please introduce yourself<|im_end|>
<|im_start|>assistant



In [None]:
template = Template(tokenizer.chat_template)
@torch.no_grad()
def generate(prompt):
    modelInput=template.render(messages=[{"role": "user", "content": prompt}],bos_token= tokenizer.bos_token,add_generation_prompt=True)
    print("-"*80)
    print(f"model_input_string:\n{modelInput}")
    input_ids = tokenizer.encode(modelInput, add_special_tokens=False, return_tensors='pt').to("cuda:0")
    outputs = model.generate(input_ids, do_sample=False)
    model_return_string = tokenizer.decode(*outputs, skip_special_tokens=False)
    print("-"*80)
    print(f"model_return_string:\n{model_return_string}")
    generated_ids = outputs[:, input_ids.shape[1]:]
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=False)
    return generated_text

query = "Please introduce yourself"
print("-"*80)
print(f"query:\n{query}")
response = generate(query)
print("-"*80)
print(f"response:\n{response}")

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


--------------------------------------------------------------------------------
query:
Please introduce yourself
--------------------------------------------------------------------------------
model_input_string:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Please introduce yourself<|im_end|>
<|im_start|>assistant

--------------------------------------------------------------------------------
model_return_string:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Please introduce yourself<|im_end|>
<|im_start|>assistant
Hello! I'm Qwen, an AI assistant created by Alibaba Cloud. I'm here to help
--------------------------------------------------------------------------------
response:
Hello! I'm Qwen, an AI assistant created by Alibaba Cloud. I'm here to help


## Data Preparation

Let's load a common dataset, english quotes, to fine tune our model on famous quotes.

In [None]:
from datasets import load_dataset

# data = load_dataset("Abirate/english_quotes")
dataset = load_dataset("FreedomIntelligence/Huatuo26M-Lite")
dataset = dataset['train'].map(lambda sample: {"conversations": [{"from": "human", "value": sample['question']}, {"from": "gpt", "value": sample['answer']}]}, batched=False)

README.md:   0%|          | 0.00/4.24k [00:00<?, ?B/s]

format_data.jsonl:   0%|          | 0.00/138M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/177703 [00:00<?, ? examples/s]

Map:   0%|          | 0/177703 [00:00<?, ? examples/s]

In [None]:
from torch.utils.data import random_split
train_dataset_size, val_dataset_size = 40, 8
train_dataset, val_dataset, _ = random_split(dataset, [train_dataset_size, val_dataset_size, len(dataset)-train_dataset_size-val_dataset_size])
print(train_dataset[0]['conversations'])

[{'from': 'human', 'value': '我家宝宝三个月了、身上起了很多红疙瘩、让医生看、医生说是湿疹、给开了些抹的药、但是最近又严重了、头皮上都是、然后都化脓结痂了、宝宝很难受、总是闹破、很心疼婴儿湿疹怎么办'}, {'from': 'gpt', 'value': '对于宝宝的湿疹，可以从饮食、洗澡、衣物等方面入手，减轻症状。建议母乳宝宝的母亲不要喝牛奶、不吃鸡蛋，宝宝的湿疹症状就可能会减轻。给宝宝洗澡时仅使用清水，贴身衣物要采用棉质物，要在日光下照射消毒。平时宝宝的内衣应穿松软宽大的棉织品或细软布料，不要穿化纤织物内、外衣均忌羊毛织物，以及绒线衣衫。最好穿棉花料的夹袄、棉袄、绒布衫等。'}]


### Customized Dataset
Create a specialized dataset class named "InstructionDataset" designed to handle our custom dataset.

In [None]:
import transformers
from typing import Dict, Sequence, List
from torch.utils.data import Dataset
from dataclasses import dataclass

def preprocess(
    sources,
    tokenizer: transformers.PreTrainedTokenizer,
) -> Dict:
    template = Template(tokenizer.chat_template)
    max_seq_len = tokenizer.model_max_length
    messages = []
    for i, source in enumerate(sources):
        if source[0]["from"] != "human":
            # Skip the first one if it is not from human
            source = source[1:]

        for j in range(0, len(source), 2):
            if j+1 >= len(source): continue
            q = source[j]["value"]
            a = source[j+1]["value"]
            assert q is not None and a is not None, f'q:{q} a:{a}'
            input =  template.render(messages=[{"role": "user", "content": q},{"role": "assistant", "content": a}],bos_token=tokenizer.bos_token,add_generation_prompt=False)
            input_ids = tokenizer.encode(input, add_special_tokens= False)

            query = template.render(messages=[{"role": "user", "content": q}],bos_token=tokenizer.bos_token,add_generation_prompt=True)
            query_ids = tokenizer.encode(query, add_special_tokens= False)

            labels = [-100]*len(query_ids) + input_ids[len(query_ids):]
            assert len(labels) == len(input_ids)
            if len(input_ids) == 0: continue
            messages.append({"input_ids": input_ids[-max_seq_len:], "labels": labels[-max_seq_len:]})

    input_ids = [item["input_ids"] for item in messages]
    labels = [item["labels"] for item in messages]

    max_len = max(len(x) for x in input_ids)

    max_len = min(max_len, max_seq_len)
    input_ids = [ item[:max_len] + [tokenizer.eos_token_id]*(max_len-len(item)) for item in input_ids]
    labels = [ item[:max_len] + [-100]*(max_len-len(item)) for item in labels]

    input_ids = torch.LongTensor(input_ids)
    labels = torch.LongTensor(labels)
    return {
        "input_ids": input_ids,
        "labels": labels
    }


class InstructDataset(Dataset):
    def __init__(self, data: Sequence, tokenizer: transformers.PreTrainedTokenizer) -> None:
        super().__init__()
        self.tokenizer = tokenizer
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index) -> Dict[str, torch.Tensor]:
        sources = self.data[index]
        if isinstance(index, int):
            sources = [sources]
        data_dict = preprocess([e['conversations'] for e in sources], self.tokenizer)
        if isinstance(index, int):
            data_dict = dict(input_ids=data_dict["input_ids"][0], labels=data_dict["labels"][0])
        return data_dict


@dataclass
class DataCollatorForSupervisedDataset(object):
    tokenizer: transformers.PreTrainedTokenizer
    def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
        input_ids, labels = tuple([instance[key] for instance in instances] for key in ("input_ids", "labels"))
        input_ids = torch.nn.utils.rnn.pad_sequence(
            input_ids,
            batch_first=True,
            padding_value=self.tokenizer.pad_token_id)
        labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=IGNORE_INDEX)
        return dict(
            input_ids=input_ids,
            labels=labels,
            attention_mask=input_ids.ne(self.tokenizer.pad_token_id),
        )

In [None]:
train_dataset = InstructDataset(train_dataset, tokenizer)
val_dataset = InstructDataset(val_dataset, tokenizer)
data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)

In [None]:
sample_data = train_dataset[9]
IGNORE_INDEX=-100

print("=" * 80)
print("Debuging: ")
print(f"Input_ids\n{sample_data['input_ids']}")
print(f"Label_ids\n{sample_data['labels']}")
print("-" * 80)
print(f"Input:\n{tokenizer.decode(sample_data['input_ids'])}")
print("-" * 80)
N_id = tokenizer.encode("N", add_special_tokens= False)[0]
print(f"Label:\n{tokenizer.decode([N_id if x == -100 else x for x in sample_data['labels']])}")
print("=" * 80)


Debuging: 
Input_ids
tensor([151644,   8948,    198,   2610,    525,   1207,  16948,     11,   3465,
           553,  54364,  14817,     13,   1446,    525,    264,  10950,  17847,
            13, 151645,    198, 151644,    872,    198, 104139, 101899, 100664,
         99888, 119523,  99660, 103572,  99252,   9370, 104027,  64205, 101037,
            30,  97611, 100664,  99888, 119523, 108489, 109745,  34187,   3837,
         99725,  18830, 105006, 107253, 102072,   3837, 102200,  37029, 101883,
        102870,  33108,  60686,  99471, 100359,   1773,  99601,  97611, 100664,
         99888, 119523,   9370, 111574, 106641, 104292,  99573,  34187,   3837,
        106922,  99172, 110050, 100004, 100002, 100664,  99888, 119523, 105647,
        102442,   1773, 101899, 100664,  99888, 119523, 104139, 104027,  39907,
        101037,  11319, 151645,    198, 151644,  77091,    198, 100664,  99888,
        119523, 101158, 108044, 116771,   3837, 100004, 104485,  73670,  99408,
         69905, 104

## Training

### General Training Hyperparameters

In [None]:
# Set training parameters
training_arguments = transformers.TrainingArguments(
    output_dir="./checkpoints",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    optim='paged_adamw_32bit',
    save_steps=0,
    logging_steps=1,
    learning_rate=2e-7,
    weight_decay=0.001,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="cosine",
    gradient_checkpointing=True,
    report_to="none"
)

In [None]:
model.train()
trainer = transformers.Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_arguments,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator
)
trainer.train()

  trainer = transformers.Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
1,1.9923
2,2.8363
3,2.2489
4,2.4767
5,2.278
6,2.7786
7,2.1487
8,2.9151
9,2.6364
10,3.3792


TrainOutput(global_step=10, training_loss=2.5690260529518127, metrics={'train_runtime': 121.4035, 'train_samples_per_second': 0.329, 'train_steps_per_second': 0.082, 'total_flos': 316424092680192.0, 'train_loss': 2.5690260529518127, 'epoch': 1.0})

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

model.print_trainable_parameters()

trainable params: 2,523,136 || all params: 7,618,139,648 || trainable%: 0.0331


Once the training is completed, we can evaluate our model and get its perplexity on the validation set like this:

In [None]:
import math
!pip install -q -U git+https://github.com/huggingface/accelerate.git
eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


Perplexity: 9.43


## Save Trained LoRA

In [None]:
!pwd
output_path = "ilora"
trainer.save_model(output_path)

/content


### Test the trained model

In [None]:
template = Template(tokenizer.chat_template)
@torch.no_grad()
def generate(prompt):
    modelInput = template.render(messages=[{"role": "user", "content": prompt}],bos_token= tokenizer.bos_token,add_generation_prompt=True)
    input_ids = tokenizer.encode(modelInput, add_special_tokens=False, return_tensors='pt').to("cuda:0")
    outputs = model.generate(input_ids, temperature=1.0)
    model_return_string = tokenizer.decode(*outputs, skip_special_tokens=False)
    print("-"*80)
    print(f"model_return_string:\n{model_return_string}")
    generated_ids = outputs[:, input_ids.shape[1]:]
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=False)
    return generated_text

query = "I get hit"
print(f"query:\n{query}")
response = generate(query)
print("-"*80)
print(f"response:\n{response}")

query:
I get hit
--------------------------------------------------------------------------------
model_return_string:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
I get hit<|im_end|>
<|im_start|>assistant
I'm sorry to hear that! Are you okay? If this is a physical situation, please ensure
--------------------------------------------------------------------------------
response:
I'm sorry to hear that! Are you okay? If this is a physical situation, please ensure


# Clean GPU Memory

In [None]:
# Empty VRAM
# del model
# del trainer
import gc
import torch
torch.cuda.empty_cache()
gc.collect()
gc.collect()

0

In [None]:
!nvidia-smi

Fri Mar 21 04:12:52 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   72C    P0             32W /   70W |     320MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## Load the trained model back and integrate the trained LoRA within.

In [None]:
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map={"":0})
model = PeftModel.from_pretrained(model, output_path)
model = model.merge_and_unload()
model.config.max_length = 512
model.eval()

tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, padding_side="left")
# tokenizer.pad_token = tokenizer.unk_token


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



## Answer generation

In [None]:
@torch.no_grad()
def generate(prompts):
    model_inputs = [template.render(messages=[{"role": "user", "content": prompt}], bos_token=tokenizer.bos_token, add_generation_prompt=True) for prompt in prompts]
    input_ids = tokenizer(model_inputs, add_special_tokens=False, return_tensors='pt', padding=True).to("cuda:0")

    outputs = model.generate(input_ids.input_ids, attention_mask=input_ids.attention_mask, max_new_tokens=100)

    generated_texts = []
    for i in range(len(prompts)):
        generated_ids = outputs[i, input_ids.input_ids.shape[1]:]
        generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
        generated_texts.append(generated_text)

    return generated_texts

# test
print("\n\n".join(generate(["I get hit", "Who are you?"])))


I'm sorry to hear that you're feeling hurt. Can you provide more context about what happened? Is this related to a physical injury or emotional distress? It's important to take care of yourself. If it's a physical injury, please seek medical attention if necessary. For emotional support, talking about your feelings can be very helpful.

I am Qwen, a large language model created by Alibaba Cloud. I'm here to assist with a wide variety of tasks and answer any questions you might have! How can I help you today?


## Evaluate a trained model on a given test dataset

In [None]:
!wget https://NLP-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json

--2025-03-21 04:15:24--  https://nlp-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json
Resolving nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 86227 (84K) [application/json]
Saving to: ‘1.exam.json’


2025-03-21 04:15:24 (3.60 MB/s) - ‘1.exam.json’ saved [86227/86227]



In [None]:
import json

with open('1.exam.json') as f:
  data = json.load(f)
  data = data[:20] # just for demo

print(data[0])

{'question': '27. 根据国家药品监督管理局，公安部，国家卫⽣健康委员会的有关规定，⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品，精神药品或者药品类易制毒化学品的复⽅制剂列⼊（）。', 'option': {'A': '含⿇醉药品复⽅制剂的管理', 'B': '第⼆类精神药品管理', 'C': '第⼀类精神药品管理', 'D': '医疗⽤毒性药品管理', 'E': ''}, 'analysis': '⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品、精神药品或药品类易制毒化学品的复⽅制剂列⼊第⼆类精神药品管理。', 'answer': 'B', 'question_type': '最佳选择题', 'source': '2021年执业药师职业资格考试《药事管理与法规》'}


In [None]:
your_prompt = """请回答下面的多选题，请直接正确答案选项，不要输出其他内容。
{question}
{options}"""

def get_query(da):
  da['options'] = '\n'.join([f"{k}:{v}" for k, v in da['option'].items() if v])
  return your_prompt.format_map(da)

for item in data:
  item['query'] = get_query(item)


print(data[0]['query'])

请回答下面的多选题，请直接正确答案选项，不要输出其他内容。
27. 根据国家药品监督管理局，公安部，国家卫⽣健康委员会的有关规定，⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品，精神药品或者药品类易制毒化学品的复⽅制剂列⼊（）。
A:含⿇醉药品复⽅制剂的管理
B:第⼆类精神药品管理
C:第⼀类精神药品管理
D:医疗⽤毒性药品管理


In [None]:
model_answers = generate([item['query'] for item in data])
print(f'\n{model_answers[0]}')


B


In [None]:
import re
from tqdm import tqdm

def get_ans(ans):
    match = re.findall(r'.*?([A-E]+(?:[、, ]+[A-E]+)*)', ans)
    if match:
        last_match = match[-1]
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ''

correct_num = 0
total_num = 0
for model_answer, item in tqdm(zip(model_answers, data)):
  if get_ans(model_answer) == item['answer']:
    correct_num += 1
  total_num += 1
  item['model_answer'] = model_answer

print(f"ACC: {correct_num/total_num:.2%}")

result_path = "/content/result.jso"
with open(result_path, "w", encoding="utf-8") as file:
    json.dump(data, file, ensure_ascii=False, indent=4)
    print(f"Results are save in {result_path}")

20it [00:00, 6116.37it/s]

ACC: 50.00%
Results are save in /content/result.json



