**Supervised finetune Qwen2.5-0.5B Base using QLoRA**

What to do:
1. transformers – modele Hugging Face.
2. datasets – zbiór danych (np. alpaca, dolly, oasst1, itp.).
3. peft – narzędzia do efektywnego fine-tuningu (QLoRA).
4. bitsandbytes – do niskobitowego trenowania (8-bit, 4-bit).
5. accelerate – automatyczne przyspieszanie na GPU.

In [2]:
!pip install transformers peft bitsandbytes accelerate datasets



**Model and tokenizer**

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen1.5-0.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    device_map="auto"
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


**Configuration QLoRA**
What we do:
1. r=8 – rozmiar macierzy LoRA.
2. target_modules=["q_proj", "v_proj"] – modyfikujemy tylko wybrane części sieci (optymalne).
3. lora_dropout=0.05 – regularizacja.
4. get_peft_model – zawija model w wersję z LoRA.


In [6]:
from peft import get_peft_model, LoraConfig, TaskType

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()




trainable params: 786,432 || all params: 464,774,144 || trainable%: 0.1692


**Preper dataset (np. Alpaca-style)**

In [5]:
!pip install --upgrade datasets



In [7]:
from datasets import load_dataset
import os

# Ustaw lokalny katalog cache
os.environ["HF_DATASETS_CACHE"] = "./hf_cache"

# Wczytaj dataset
dataset = load_dataset("tatsu-lab/alpaca")

def format_alpaca(example):
    return {
        "text": f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}"
    }

dataset = dataset.map(format_alpaca)


**Data tokenization**

In [8]:
def tokenize_function(example):
    result = tokenizer(
        example["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )
    result["labels"] = result["input_ids"].copy()
    return result

tokenized_dataset = dataset["train"].map(tokenize_function, batched=True)


**Trening model**

In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    output_dir="./qwen-lora",
    save_total_limit=1,
    save_strategy="epoch",
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer
)

trainer.train()


  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


**Generating responses after fine-tuning**

In [4]:
model.eval()
prompt = "### Instruction:\nTranslate to Polish:\n\n### Input:\nWhere is the nearest train station?\n\n### Response:\n"

In [10]:
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


### Instruction:
Translate to Polish:

### Input:
Where is the nearest train station?

### Response:
Ponieważ w pobliżu miastek znajduje się tylko jedna kolej, która jest w pobliżu miasta.


**Summary - what we did:**
1. Loaded the Qwen2.5-0.5B model in 4-bit version.
2. Configured QLoRA - sparingly tuning only selected layers.
3. He used the Alpaca dataset with questions and answers.
4. Performed fine-tuning through several epochs.
5. Generated a response to a new instruction.