<a href="https://colab.research.google.com/github/BoJavs-svg/LLM_Lora_FineTunning/blob/main/nb/Qwen3_(14B)-Reasoning-Conversational.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Installation

In [1]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth
from huggingface_hub import login



### Unsloth

In [2]:
from unsloth import FastLanguageModel
from transformers import AutoTokenizer

# 1. Load the base model
base_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-14B",
    max_seq_length = 2048,
    load_in_4bit = True,
    load_in_8bit = False,
    full_finetuning = False,
)

# 2. Reapply LoRA the same way as originally done
model = FastLanguageModel.get_peft_model(
    base_model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

# 3. Load adapter weights directly
model.load_adapter("BoJavs/TrainedQwen2.5", adapter_name="default")
model.set_adapter("default")
tokenizer.add_special_tokens({
    "additional_special_tokens": ["<|im_start|>", "<|im_end|>"]
})

# Set the chat template manually (Jinja2-style template)
tokenizer.chat_template = """{% for message in messages %}
{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>\n' }}
{% endfor %}
{% if add_generation_prompt %}
<|im_start|>assistant
{% endif %}"""
# 4. Continue with training or inference
print(sum(p.requires_grad for p in model.parameters()))  # Should be > 0


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.9: Fast Qwen2 patching. Transformers: 4.52.2.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/196k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/2.12G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Unsloth 2025.5.9 patched 48 layers with 48 QKV layers, 48 O layers and 48 MLP layers.


adapter_model.safetensors:   0%|          | 0.00/551M [00:00<?, ?B/s]

672


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,           # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,  # Best to choose alpha = rank or rank*2
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


<a name="Data"></a>
### Data Prep
Qwen3 has both reasoning and a non reasoning mode. So, we should use 2 datasets:

1. We use the [Open Math Reasoning]() dataset which was used to win the [AIMO](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/leaderboard) (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and whicht got > 95% accuracy.

2. We also leverage [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we need to convert it to HuggingFace's normal multiturn format as well.

In [4]:
from datasets import load_dataset
swe_bench_lite = load_dataset('BoJavs/Clean_SweBench', split='train')

DatasetNotFoundError: Dataset 'BoJavs/TrainedQwen2.5' doesn't exist on the Hub or cannot be accessed.

Let's see the structure of both datasets:

In [None]:
swe_bench_lite

Next we take the non reasoning dataset and convert it to conversational format as well.

> Agregar bloque entrecomillado



We have to use Unsloth's `standardize_sharegpt` function to fix up the format of the dataset first.

In [None]:
def generate_conversation(batch):
    conversations = []

    for problem, patch, repo, base_commit in zip(
        batch["problem_statement"],
        batch["patch"],
        batch["repo"],
        batch["base_commit"]
    ):
        # Fetch commit content
        # try:
        #     commit_context = get_commit(repo, base_commit)
        # except Exception as e:
        #     commit_context = "Error fetching commit data."

        user_prompt = f"""\
You are an autonomous programmer, and you're working directly in the command line with a special interface.
We're currently solving the following issue within our repository. Here's the issue text:
ISSUE:
{problem}
Now, you're going to solve this issue on your own.
You need to format your output using one field; command.
Your output should always_one_ command field EXACTLY as in the following example:
<command>
ls -a
</command>
Generate a patch.
        """
        patch= f"""\
        <command>
        {patch}
        </command>
        """
        conversations.append([
            {"role": "user", "content": user_prompt},
            {"role": "assistant", "content": patch}
        ])

    return {"conversations": conversations}


In [None]:
from unsloth.chat_templates import standardize_sharegpt

# Standardize the dataset first
dataset = standardize_sharegpt(swe_bench_lite)

# Apply chat template with explicit thinking mode
swe_bench_lite_conversations = tokenizer.apply_chat_template(
    dataset.map(generate_conversation, batched = True)["conversations"],
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # Explicitly disable thinking mode
)

Let's see the first row

In [None]:
swe_bench_lite_conversations[1]


Now let's see how long both datasets are:

In [None]:
print(len(swe_bench_lite_conversations))

Finally combine both datasets:

In [None]:
import pandas as pd
data = pd.Series(swe_bench_lite_conversations)
data.name = "text"

from datasets import Dataset
dataset = Dataset.from_pandas(pd.DataFrame(data))
dataset = dataset.shuffle(seed = 3407)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [None]:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4, # Use GA to mimic batch size!
        # warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 15,
        learning_rate = 2e-4, # Reduce to 2e-5 for long training runs
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        report_to = "none", # Use this for WandB etc
    ),
)

Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [None]:
0........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [None]:
trainer_stats = trainer.train()

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub_merged(
    "BoJavs/TrainedQwen2.5",
    tokenizer,
    save_method="lora",
)
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving