# Hands-on Modul 2.3: Fine-Tuning Llama-3 Super Cepat dengan Unsloth

Di modul ini, kita belajar bahwa *Full Fine-Tuning* itu mahal.
Hari ini, kita akan membuktikan bahwa kita bisa melatih model kelas industri (**Llama-3 8B**) menggunakan **GPU Gratis (Tesla T4)** di Google Colab.

Rahasianya? Kombinasi **QLoRA (4-bit)** dan library **Unsloth** (Optimasi Kernel).

In [1]:
# 1. Instalasi Unsloth (Khusus Colab)
# Unsloth mempercepat training 2x-5x dan hemat memori 60%
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# 2. Install dependency standar
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-7tmykepa/unsloth_7b78348ef6564bcebd86342674bf2bdd
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-7tmykepa/unsloth_7b78348ef6564bcebd86342674bf2bdd
  Resolved https://github.com/unslothai/unsloth.git to commit ded942c765abf22fc8e3b0a67f847b83a6b2ad53
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.11.4 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2025.11.4-py3-none-any.whl.metadata (32 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.gi

In [2]:
import torch
from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None # Auto detection (Float16 untuk T4)
load_in_4bit = True # Aktifkan QLoRA (Teori 2.3.2)

print("Memuat Model Llama-3 8B (4-bit)...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", # Model pre-quantized
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
print("Model Berhasil Dimuat!")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Memuat Model Llama-3 8B (4-bit)...
==((====))==  Unsloth 2025.11.3: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Model Berhasil Dimuat!


In [3]:
# Kita tambahkan adapter LoRA (Teori 2.3.2 - Update Decomposition)
# Hanya melatih < 1% parameter!
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Rank
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0, # 0 lebih cepat
    bias = "none",
    use_gradient_checkpointing = "unsloth", # Hemat VRAM ekstrem
    random_state = 3407,
)

# Cek berapa % parameter yang dilatih
model.print_trainable_parameters()

Unsloth 2025.11.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


trainable params: 41,943,040 || all params: 8,072,204,288 || trainable%: 0.5196


In [4]:
from datasets import load_dataset

# Kita pakai dataset Alpaca (Instruksi umum)
dataset = load_dataset("yahma/alpaca-cleaned", split = "train[:500]") # 500 sampel aja biar cepet

# Format Prompt (sesuai Teori 2.3.5 - Chat Template)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Isi template
        text = alpaca_prompt.format(instruction, input, output) + tokenizer.eos_token
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True,)
print(f"Contoh data pertama:\n{dataset[0]['text']}")

README.md: 0.00B [00:00, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Contoh data pertama:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Input:


### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of s

In [5]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # Training super singkat (demo)
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit", # Optimizer hemat memori
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

print("Mulai Training...")
trainer_stats = trainer.train()

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/500 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.


Mulai Training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)
  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33magusm1299[0m ([33magusm1299-aaa[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, openai] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.5921
2,1.8582
3,2.3166
4,1.9341
5,1.7047
6,1.4015
7,1.1951
8,1.0152
9,0.9986
10,0.8231


In [6]:
# Tes Inference
print("\n--- Tes Model Setelah Fine-Tuning ---")

instruction = "Jelaskan secara singkat apa itu Artificial Intelligence."
input_text = "" # Kosong

# Format prompt seperti saat training
prompt_test = alpaca_prompt.format(instruction, input_text, "")

inputs = tokenizer([prompt_test], return_tensors = "pt").to("cuda")

# Generate
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
response = tokenizer.batch_decode(outputs)

print(response[0])


--- Tes Model Setelah Fine-Tuning ---
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Jelaskan secara singkat apa itu Artificial Intelligence.

### Input:


### Response:
Artificial Intelligence (AI) adalah istilah yang digunakan untuk menggambarkan teknologi yang dibuat untuk melakukan tugas-tugas yang biasanya dilakukan oleh manusia, seperti menyelesaikan masalah, mengolah data, atau membuat keputusan. AI menggunakan teknologi seperti pengolahan bahasa algoritma, pemrograman mesin, dan pemodelan statistik untuk memecahkan masalah dan membuat keputusan yang cerdas.<|end_of_text|>


### Kesimpulan
Jika berhasil, Anda baru saja melatih **Llama-3** di lingkungan gratis!
Perhatikan bahwa model sekarang mengikuti format `### Response:` dengan baik.
Anda bisa menyimpan model ini (hanya adapternya) yang ukurannya sangat kecil (<100MB) dibandingkan model aslinya (16GB).