In [1]:
from transformers import AutoTokenizer, AutoModel
from datasets import load_dataset

device = "cuda"
path_model = "/home/chwenjun225/Projects/Foxer/models/DeepSeek-R1-Distill-Qwen-1.5B"
path_dataset = "/home/chwenjun225/Projects/Foxer/datasets/cot-2"

In [2]:
from unsloth import FastLanguageModel

max_seq_length = 4096 
dtype = None 
load_in_4bit = False # Training với False sau 

model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = path_model,
	max_seq_length = max_seq_length,
	dtype = dtype,
	load_in_4bit = load_in_4bit,
	# token = hf_token, 
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 02-04 15:50:07 __init__.py:183] Automatically detected platform cuda.
==((====))==  Unsloth 2025.1.8: Fast Qwen2 patching. Transformers: 4.48.2.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
/home/chwenjun225/Projects/Foxer/models/DeepSeek-R1-Distill-Qwen-1.5B does not have a padding token! Will use pad_token = <|vision_pad|>.


In [3]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert across multiple domains. Please apply a step-by-step Chain-of-Thought (CoT) approach to carefully analyze and solve the problem in a logical and accurate manner.

Based on the given question, apply the most relevant expertise and provide a structured response.

### Prompt:
{}

### Response:
<thinking>{}"""

In [4]:
prompt = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(prompt, "")], return_tensors="pt").to(device)

outputs = model.generate(
	input_ids=inputs.input_ids,
	attention_mask=inputs.attention_mask,
	max_new_tokens=1200,
	use_cache=True,
)

response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<thinking>Okay, so I'm trying to figure out what the cystometry would reveal about her residual volume and detrusor contractions after this gynecological exam. Let me start by recalling some concepts. Cystometry is a procedure where a catheter is inserted into the bladder, usually in the lower abdomen, to measure the amount of fluid trapped inside the bladder. It can be used to assess residual volume, which is the amount of fluid left in the bladder after a urination. 

The patient here is a 61-year-old woman who has a history of involuntary urine loss during activities like coughing or sneezing but no leakage at night. That suggests that she has a tendency to urinate during certain times but not at night. This could indicate a leaky bladder or some other condition that causes her to retain fluid during those activities but not at night. 

When she undergoes a gynecological exam, which I assume is an exam to check for other conditions in the pelvic area, and a Q-tip test, which is a m

In [5]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert across multiple domains. Please apply a step-by-step Chain-of-Thought (CoT) approach to carefully analyze and solve the problem in a logical and accurate manner.

Based on the given question, apply the most relevant expertise and provide a structured response.

### Prompt:
{}

### Response:
{}
"""

In [6]:
from datasets import load_dataset
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
	prompts = examples["prompt"]
	responses = examples["response"]
	texts = []
	for prompt, response in zip(prompts, responses):
		text = train_prompt_style.format(prompt, response) + EOS_TOKEN
		texts.append(text)
	return {"texts": texts}

dataset = load_dataset(path_dataset, split="train[:]", trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched=True)
dataset["texts"][0]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are an expert across multiple domains. Please apply a step-by-step Chain-of-Thought (CoT) approach to carefully analyze and solve the problem in a logical and accurate manner.\n\nBased on the given question, apply the most relevant expertise and provide a structured response.\n\n### Prompt:\nExplain the difference between truth and justification in epistemology. How can these concepts lead to different combinations of true/false beliefs and justified/unjustified beliefs? Provide examples for each possible combination.\n\n### Response:\n<thinking>\n<initial_analysis>\nTask: Explain the difference between truth and justification in epistemology and provide examples of

In [7]:
# Cách lựa chọn giá trị lora_alpha
# lora_alpha: Ảnh hưởng
# Nhỏ (~8, 16: LoRA tác động ít, fine-tune ít ảnh hưởng đến mô hình gốc.
# Trung bình (~32, 64): Ảnh hưởng vừa phải, giúp mô hình học hiệu quả mà không làm sai lệch quá nhiều.
# Lớn (>128): LoRA có ảnh hưởng mạnh, có thể gây overfitting nếu không được kiểm soát.

model = FastLanguageModel.get_peft_model(
	model,
	r=16,  
	target_modules=[
		"q_proj",
		"k_proj",
		"v_proj",
		"o_proj",
		"gate_proj",
		"up_proj",
		"down_proj",
	],
	lora_alpha=32,
	lora_dropout=0,  
	bias="none",  
	use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
	random_state=3407,
	use_rslora=False,  
	loftq_config=None,
)

Unsloth 2025.1.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [8]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
	model=model,
	tokenizer=tokenizer,
	train_dataset=dataset,
	dataset_text_field="texts",
	max_seq_length=max_seq_length,
	dataset_num_proc=2, # history: 2
	packing=False, # Can make training 5x faster for short sequences.
	args=TrainingArguments(
		per_device_train_batch_size=2,
		gradient_accumulation_steps=4,
		# Use num_train_epochs = 1, warmup_ratio for full training runs!
		num_train_epochs=5, 
		warmup_steps=5,
		max_grad_norm=0.3,
		# max_steps=60,
		learning_rate=2e-4,
		fp16=not is_bfloat16_supported(),
		bf16=is_bfloat16_supported(),
		logging_steps=10,
		optim="adamw_8bit",
		weight_decay=0.01,
		lr_scheduler_type="linear",
		seed=3407,
		output_dir="./results",
		report_to="tensorboard",
	),
)

In [9]:
# Tiếp tục train sử dụng --> trainer_stats = trainer.train("checkpoint-9500") 
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 7,143 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 4,465
 "-____-"     Number of trainable parameters = 18,464,768


Step,Training Loss
10,1.9664
20,1.3363
30,1.152
40,1.0799
50,1.046
60,1.0626
70,0.9982
80,1.0183
90,0.974
100,1.0072


In [10]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
	input_ids=inputs.input_ids,
	attention_mask=inputs.attention_mask,
	max_new_tokens=1200,
	use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<thinking> 
<initial_analysis>
Task: Analyze the potential cystometry findings based on the patient's history and diagnostic tests
Approach: Connect the patient's experience to common findings in cystomy studies
Complexity: Medium
</initial_analysis>

<conscious_thought>
I need to carefully consider the patient's history, the results of her gynecological exam and Q-tip test, and how these relate to cystometry findings.
</conscious_thought>

<step_by_step>
1. Analyze the patient's history:
   - Long history of involuntary urine loss
   - No leak at night
   - Potential for detrusor contractions

2. Consider common findings in cystometry:
   - Low residual volume
   - Existence of detrusor contractions

3. Connect to the patient's experience:
   - Long history of involuntary leaks suggests a history of detrusor contractions
   - No leak at night suggests no leak during the night, which aligns with no leak during the day

4. Evaluate the results of the gynecological exam and Q-tip test:


In [11]:
# import torch 
# Chuyển mô hình về FP32 hoàn toàn
# model = model.to(torch.float32)

In [4]:
# from peft import PeftModel
# from unsloth import FastLanguageModel

new_model_local = "1_finetuned_DeepSeek-R1-Distill-Qwen-1.5B_finetune_CoT_ReAct"

# model, tokenizer = FastLanguageModel.from_pretrained(
# 	model_name = "/home/chwenjun225/Projects/Foxer/notebooks/DeepSeek-R1-Distill-Qwen-1.5B_finetune_CoT_ReAct/1_finetuned_DeepSeek-R1-Distill-Qwen-1.5B_finetune_CoT_ReAct",
# 	max_seq_length = 4096,
# 	dtype = None,
# 	load_in_4bit = False,
# 	# token = hf_token, 
# )

# if isinstance(model, PeftModel):
# 	model = model.merge_and_unload()

model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit")

==((====))==  Unsloth 2025.1.8: Fast Qwen2 patching. Transformers: 4.48.2.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 16.83 out of 31.25 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 2581.79it/s]

Unsloth: Saving tokenizer... Done.





Done.


: 

In [13]:
!lm_eval --model hf \
--model_args pretrained="/home/chwenjun225/Projects/Foxer/notebooks/DeepSeek-R1-Distill-Qwen-1.5B_finetune_CoT_ReAct/1_finetuned_DeepSeek-R1-Distill-Qwen-1.5B_finetune_CoT_ReAct" \
--tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa \
--device cuda \
--batch_size auto 

INFO 02-04 17:00:36 __init__.py:183] Automatically detected platform cuda.
2025-02-04:17:00:36,599 INFO     [__main__.py:284] Verbosity set to INFO
2025-02-04:17:00:41,391 INFO     [__init__.py:459] The tag 'kobest' is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
2025-02-04:17:00:41,392 INFO     [__init__.py:459] The tag 'kobest' is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
2025-02-04:17:00:41,393 INFO     [__init__.py:459] The tag 'kobest' is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
2025-02-04:17:00:41,394 INFO     [__init__.py:459] The tag 'kobest' is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
2025-02-04:17:00:42,186 INFO     [__main__.py:381] Selected Tasks: ['arc_challenge', 'arc_easy', 'hellaswag', 'lambada_openai', 'openbookqa', 'piqa', 'w

: 