## Instalasi Library
menginstall Unsloth dan library pendukung

In [None]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()

# Install Unsloth (Versi khusus Colab agar cepat)
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# Install library pendukung
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

# Login ke Hugging Face (Diperlukan nanti untuk push model)
# Siapkan Token HF Anda (Mode WRITE) dari https://huggingface.co/settings/tokens
from huggingface_hub import login
login()

## Load Model Llama-3
menggunakan Llama-3 8B versi 4-bit

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 # Mendukung konteks panjang (cocok untuk pasal hukum)
dtype = None # Auto detection
load_in_4bit = True # Wajib True untuk Colab T4 (15GB VRAM)

print("Sedang mendownload model Llama-3...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
print("Model berhasil dimuat!")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
Sedang mendownload model Llama-3...
==((====))==  Unsloth 2026.1.3: Fast Llama patching. Transformers: 4.57.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

Model berhasil dimuat!


## Menyiapkan Dataset & LoRA Adapter
memanggil file json yang sudah dibuat

In [None]:
from datasets import load_dataset

# 1. LOAD DATASET YANG SUDAH KITA BUAT
# Pastikan nama filenya sama dengan output langkah sebelumnya
dataset = load_dataset("json", data_files="dataset_hukum_siap_training.json", split="train")

# 2. FORMAT DATASET
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Token penutup kalimat

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Format sesuai template Alpaca
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True)

# 3. KONFIGURASI LoRA (Agar hemat memori)
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Rank
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

print(f"Siap melatih dengan {len(dataset)} data hukum.")

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/114988 [00:00<?, ? examples/s]

Unsloth 2026.1.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Siap melatih dengan 114988 data hukum.


## Eksekusi Training (Fine-tuning)
butuh 15-30 menit

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2, # Batch kecil agar tidak OOM
        gradient_accumulation_steps = 4, # Akumulasi gradien (Total batch = 8)
        warmup_steps = 5,
        max_steps = 60, # UNTUK TEST CEPAT, ganti jadi 60 langkah saja.
        # Jika ingin training FULL semua data, HAPUS baris 'max_steps' di atas
        # dan ganti dengan: num_train_epochs = 1,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

# MULAI TRAINING
print("Mulai Training...")
trainer_stats = trainer.train()
print("Training Selesai!")

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/114988 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.


Mulai Training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 114,988 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:

 3


wandb: You chose "Don't visualize my results"


wandb: Detected [huggingface_hub.inference, openai] in use.
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/


Step,Training Loss
1,2.3477
2,2.2434
3,2.2239
4,2.0754
5,1.776
6,1.7779
7,1.3492
8,1.1953
9,1.0164
10,0.9071




0,1
train/epoch,‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/grad_norm,‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñÖ‚ñÉ‚ñÑ‚ñÜ‚ñÑ‚ñÖ‚ñÖ‚ñÜ‚ñà‚ñá‚ñÉ‚ñÇ‚ñÉ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÉ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÅ‚ñÇ
train/learning_rate,‚ñÅ‚ñÇ‚ñÑ‚ñÖ‚ñá‚ñà‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ
train/loss,‚ñà‚ñà‚ñá‚ñÜ‚ñÜ‚ñÑ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÇ‚ñÅ‚ñÅ‚ñÅ

0,1
total_flos,3255899855781888.0
train/epoch,0.00417
train/global_step,60.0
train/grad_norm,0.52806
train/learning_rate,0.0
train/loss,0.3833
train_loss,0.74728
train_runtime,316.0313
train_samples_per_second,1.519
train_steps_per_second,0.19


Training Selesai!


In [None]:
# --- SIMPAN MODEL AGAR TIDAK HILANG ---
new_model_name = "llama3-hukum-indo-finetuned"

model.save_pretrained(new_model_name)
tokenizer.save_pretrained(new_model_name)

print(f"‚úÖ Model berhasil disimpan di folder '{new_model_name}'")

‚úÖ Model berhasil disimpan di folder 'llama3-hukum-indo-finetuned'


## Evaluasi (Coba Tanya Modelnya)
tes prompting

In [None]:
# Aktifkan mode inferensi (supaya cepat)
FastLanguageModel.for_inference(model)

# Coba tanya pertanyaan hukum (Gunakan pertanyaan yang mirip dengan data training tapi beda dikit)
pertanyaan = "Apa hukuman bagi pelaku pencurian menurut KUHP?"

inputs = tokenizer(
[
    alpaca_prompt.format(
        pertanyaan, # Instruction
        "", # Input
        "", # Output (Biarkan kosong)
    )
], return_tensors = "pt").to("cuda")

# Generate jawaban
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
jawaban = tokenizer.batch_decode(outputs)

print("\n--- JAWABAN MODEL ---")
print(jawaban[0].split("### Response:")[-1].strip().replace(EOS_TOKEN, ""))


--- JAWABAN MODEL ---
Berdasarkan KUHP, pelaku pencurian diancam dengan hukuman penjara selama 4-10 tahun dan/atau dengan denda.


In [None]:
# Masukkan token WRITE baru Anda di sini
token_baru = "YOURHUGGINGFACETOKEN"

# Definisikan repo_name di sini (mengambil dari cell selanjutnya)
repo_name = "dikcej/llama3-hukum-indo-forrag-v1"

model.push_to_hub_merged(
    repo_name,
    tokenizer,
    save_method = "merged_16bit",
    token = token_baru, # Gunakan variabel token baru
)

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]



Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...-forrag-v1/tokenizer.json: 100%|##########| 17.2MB / 17.2MB            



Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]



Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  25%|‚ñà‚ñà‚ñå       | 1/4 [01:42<05:08, 102.99s/it]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 2/4 [05:23<05:44, 172.05s/it]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 3/4 [08:43<03:05, 185.00s/it]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [09:28<00:00, 142.14s/it]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit:   0%|          | 0/4 [00:00<?, ?it/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...0001-of-00004.safetensors:   1%|          | 40.4MB / 4.98GB            

Unsloth: Merging weights into 16bit:  25%|‚ñà‚ñà‚ñå       | 1/4 [04:31<13:33, 271.26s/it]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...0002-of-00004.safetensors:   1%|          | 25.1MB / 5.00GB            

Unsloth: Merging weights into 16bit:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 2/4 [08:30<08:24, 252.33s/it]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...0003-of-00004.safetensors:   1%|          | 25.2MB / 4.92GB            

Unsloth: Merging weights into 16bit:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 3/4 [12:23<04:03, 243.55s/it]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...0004-of-00004.safetensors:   2%|2         | 25.2MB / 1.17GB            

Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [13:41<00:00, 205.47s/it]


Unsloth: Merge process complete. Saved to `/content/dikcej/llama3-hukum-indo-forrag-v1`


## Untuk konversi GGUF

In [None]:
# 1. Pastikan library terinstall (biasanya sudah ada di unsloth)
# Jika error, uncomment baris bawah ini:
# !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

from unsloth import FastLanguageModel

# 2. Login Hugging Face (PENTING: Gunakan Token WRITE)
from huggingface_hub import login
# Ganti dengan token WRITE baru Anda
login("YOURHUGGINGFACETOKEN")

# 3. Setting Nama Repo (Samakan dengan yang tadi)
repo_name = "dikcej/llama3-hukum-indo-forrag-v1"

# 4. PROSES KONVERSI & UPLOAD GGUF
# Ini akan membuat file berekstensi .gguf di repo Anda
print("üöÄ Sedang mengkonversi ke GGUF (Format Ringan)...")
print("‚è≥ Proses ini memakan waktu sekitar 5-10 menit...")

# Dapatkan token dari variable token_baru yang sudah didefinisikan
token_baru = "YOURHUGGINGFACETOKEN" # Mengambil dari variable yang sudah ada di kernel

model.push_to_hub_gguf(
    repo_name,
    tokenizer,
    quantization_method = "q4_k_m", # Metode kompresi standar (Balance terbaik)
    token = token_baru, # Gunakan token_baru secara eksplisit
)

print(f"‚úÖ SUKSES! File GGUF sudah terupload ke: https://huggingface.co/{repo_name}")

üöÄ Sedang mengkonversi ke GGUF (Format Ringan)...
‚è≥ Proses ini memakan waktu sekitar 5-10 menit...
Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  25%|‚ñà‚ñà‚ñå       | 1/4 [02:56<08:48, 176.31s/it]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 2/4 [06:20<06:25, 192.99s/it]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 3/4 [09:31<03:11, 191.91s/it]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [10:00<00:00, 150.02s/it]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [06:51<00:00, 102.95s/it]


Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_ma8akqcg`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF f16 might take 3 minutes.
\        /    [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['llama-3-8b-instruct.F16.gguf']
Unsloth: [2] Converting GGUF f16 into q4_k_m. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['llama-3-8b-instruct.Q4_K_M.gguf']
Unsloth: example usage for text only LLMs: llama-cli --model llama-3-8b-instr

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...3-8b-instruct.Q4_K_M.gguf:   0%|          | 16.8MB / 4.92GB            



Uploading config.json...
Uploading Ollama Modelfile...
Unsloth: Successfully uploaded GGUF to https://huggingface.co/dikcej/llama3-hukum-indo-forrag-v1
Unsloth: Cleaning up temporary files...
‚úÖ SUKSES! File GGUF sudah terupload ke: https://huggingface.co/dikcej/llama3-hukum-indo-forrag-v1
