# Korean Legal Data QLoRA Fine-tuning (Kanana Nano 2.1B)

This notebook fine-tunes the Kanana Nano 2.1B Instruct model on Korean legal data using QLoRA method.
It is structured to execute the entire flow from data loading to training, saving, and simple testing.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# ============================================================
# 0. Install Dependencies (Colab default Python3 environment)
# ============================================================

!pip -q install transformers datasets peft trl bitsandbytes accelerate


[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m518.9/518.9 kB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# ============================================================
# Korean Legal Data Fine-tuning with QLoRA
# Base Model: Kanana Nano 2.1B Instruct (Kakao)
# ============================================================

"""
This script uses QLoRA on Google Colab T4 GPU to
fine-tune the Kanana Nano 2.1B model on Korean legal data.

Model: kakaocorp/kanana-nano-2.1b-instruct
Dataset: Legal terminology + case law + custom data
"""

import os
import json
import torch
from pathlib import Path
from datetime import datetime

from datasets import Dataset, load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer


In [4]:
DRIVE_BASE_DIR = "/content/drive/MyDrive/fine_tuned_models"
Path(DRIVE_BASE_DIR).mkdir(parents=True, exist_ok=True)
print(f"Drive save base path: {DRIVE_BASE_DIR}")
print(f"Exists: {Path(DRIVE_BASE_DIR).exists()}")

Drive save base path: /content/drive/MyDrive/fine_tuned_models
Exists: True


In [5]:
# ============================================================
# 0-1. GPU Check
# ============================================================

!nvidia-smi


Thu Jan 15 20:25:56 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:00:05.0 Off |                    0 |
| N/A   31C    P0             54W /  400W |       5MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [6]:
# ============================================================
# 1. Configuration
# ============================================================

# Model Configuration
MODEL_NAME = "kakaocorp/kanana-nano-2.1b-instruct"

# Data path
# DATA_PATH is no longer needed (loaded directly from Hugging Face)

# Output directory (Colab Google Drive path)
OUTPUT_DIR = f"{DRIVE_BASE_DIR}/kanana-legal-finetuned"
LOGS_DIR = "./logs"

# ============================================================
# 2. QLoRA Configuration
# ============================================================

# 4-bit quantization Configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# LoRA Configuration
lora_config = LoraConfig(
    r=16,                        # LoRA rank
    lora_alpha=32,               # LoRA alpha
    target_modules=[             # Kanana model's attention layers
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# ============================================================
# 3. Training Hyperparameters
# ============================================================

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    logging_dir=LOGS_DIR,

    # Training parameters
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,

    # Optimizer
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.01,
    optim="paged_adamw_8bit",

    # Precision
    fp16=False,
    bf16=True,

    # Logging and saving
    logging_steps=10,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=3,

    # Evaluation
    eval_strategy="steps",
    eval_steps=100,

    # Other
    max_grad_norm=0.3,
    group_by_length=True,
    report_to="none",  # W&B disabled
    run_name=f"kanana-legal-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
)

MAX_SEQ_LENGTH = 2048

print("‚úì Configuration ÏôÑÎ£å")
print(f"  Model: {MODEL_NAME}")
print(f"  LoRA rank: {lora_config.r}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")


‚úì Configuration ÏôÑÎ£å
  Model: kakaocorp/kanana-nano-2.1b-instruct
  LoRA rank: 16
  Batch size: 4
  Effective batch size: 16


In [7]:
# ============================================================
# 4. Data Loading
# ============================================================

print("\nData Loading Ï§ë...")
print("Downloading dataset from Hugging Face...")

# Load dataset from Hugging Face (train split)
full_data = load_dataset(
    "flyingcarycoder/korean-legal-terminology",
    split="train",
)

print(f"Total samples: {len(full_data):,}")

# Train/Validation split (95% train, 5% validation)
split_dataset = full_data.train_test_split(test_size=0.05, seed=42)
train_dataset = split_dataset['train']
val_dataset = split_dataset['test']

print(f"Train: {len(train_dataset):,} samples")
print(f"Validation: {len(val_dataset):,} samples")



Data Loading Ï§ë...
Downloading dataset from Hugging Face...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

training_data.json:   0%|          | 0.00/66.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/17484 [00:00<?, ? examples/s]

Total samples: 17,484
Train: 16,609 samples
Validation: 875 samples


In [8]:
# ============================================================
# 5. Model and Tokenizer Loading
# ============================================================

print("\nLoading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True
)

# Padding Configuration
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"‚úì Tokenizer loaded successfully")
print(f"  Vocab size: {len(tokenizer)}")

print("\nLoading model (4-bit quantization)...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Prepare for K-bit training
model = prepare_model_for_kbit_training(model)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Check trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in model.parameters())

print(f"\n‚úì Model loaded successfully")
print(f"  Total parameters: {all_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")
print(f"  Trainable ratio: {100 * trainable_params / all_params:.2f}%")

model.print_trainable_parameters()



Loading tokenizer...


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

‚úì Tokenizer loaded successfully
  Vocab size: 128256

Loading model (4-bit quantization)...


config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.17G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/126 [00:00<?, ?B/s]


‚úì Model loaded successfully
  Total parameters: 1,181,468,416
  Trainable parameters: 23,003,136
  Trainable ratio: 1.95%
trainable params: 23,003,136 || all params: 2,109,982,464 || trainable%: 1.0902


In [9]:
# ============================================================
# 6. Prompt Format
# ============================================================

def format_prompt(sample):
    """Convert input data to training prompt format"""
    return f"""### Question:
{sample['input']}

### Answer:
{sample['output']}"""


In [10]:
# ============================================================
# 7. Trainer Configuration Î∞è ÌïôÏäµ
# ============================================================

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    peft_config=lora_config,
    formatting_func=format_prompt,
    args=training_args
)

print("\n" + "="*60)
print("Fine-tuning ÏãúÏûë!")
print("="*60)

trainer.train()

print("\n" + "="*60)
print("‚úÖ Fine-tuning ÏôÑÎ£å!")
print("="*60)



Applying formatting function to train dataset:   0%|          | 0/16609 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/16609 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/16609 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/16609 [00:00<?, ? examples/s]

Applying formatting function to eval dataset:   0%|          | 0/875 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/875 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/875 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/875 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 128009, 'pad_token_id': 128001}.



Fine-tuning ÏãúÏûë!


  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
100,1.3474,1.348448,1.27847,963863.0,0.67008
200,1.3109,1.282038,1.214781,1925369.0,0.683275
300,1.2826,1.244442,1.246762,2874206.0,0.688552
400,1.2932,1.211675,1.186833,3799669.0,0.692966
500,1.2889,1.202,1.189427,4732602.0,0.695191
600,1.2863,1.198944,1.224393,5695277.0,0.696299
700,1.2429,1.180754,1.214276,6629521.0,0.699385
800,1.2347,1.167861,1.171825,7585008.0,0.701624
900,1.2499,1.157248,1.168674,8552583.0,0.703447
1000,1.2534,1.146654,1.145106,9502313.0,0.704891


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)



‚úÖ Fine-tuning ÏôÑÎ£å!


In [11]:
# ============================================================
# 8. Save Model
# ============================================================

output_dir = f"{OUTPUT_DIR}/final"
trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"\n‚úì Save Model ÏôÑÎ£å: {output_dir}")
print("Ï†ÄÏû• ÌååÏùº Î™©Î°ù:")
for saved_path in sorted(Path(output_dir).glob("**/*")):
    if saved_path.is_file():
        print(f"- {saved_path}")



‚úì Save Model ÏôÑÎ£å: /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final
Ï†ÄÏû• ÌååÏùº Î™©Î°ù:
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/README.md
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/adapter_config.json
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/adapter_model.safetensors
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/chat_template.jinja
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/special_tokens_map.json
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/tokenizer.json
- /content/drive/MyDrive/fine_tuned_models/kanana-legal-finetuned/final/tokenizer_config.json


In [12]:
# ============================================================
# 9. Test
# ============================================================

print("\n" + "="*60)
print("Model Test")
print("="*60)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)

test_questions = [
    "Í≥ÑÏïΩ Ìï¥Ï†úÏôÄ Í≥ÑÏïΩ Ìï¥ÏßÄÏùò Ï∞®Ïù¥Î•º ÏÑ§Î™ÖÌï¥Ï£ºÏÑ∏Ïöî.",
    "ÎØºÎ≤ïÏÉÅ ÏÜåÎ©∏ÏãúÌö®Îäî Î¨¥ÏóáÏù∏Í∞ÄÏöî?",
    "ÌòïÎ≤ïÏÉÅ Ï†ïÎãπÎ∞©ÏúÑÏùò ÏöîÍ±¥ÏùÄ Î¨¥ÏóáÏù∏Í∞ÄÏöî?",
]

for i, question in enumerate(test_questions, 1):
    prompt = f"""### Question:
{question}

### Answer:
"""

    print(f"\n[Test {i}]")
    print(f"Question: {question}")
    print("\nAnswer:")

    result = pipe(prompt)[0]['generated_text']
    answer = result.split("### Answer:")[-1].strip()
    print(answer)
    print("\n" + "-"*60)

print("\nüéâ Î™®Îì† ÏûëÏóÖ ÏôÑÎ£å!")


Device set to use cuda:0
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Caching is incompatible with gradient checkpointing in LlamaDecoderLayer. Setting `past_key_values=None`.



Model Test

[Test 1]
Question: Í≥ÑÏïΩ Ìï¥Ï†úÏôÄ Í≥ÑÏïΩ Ìï¥ÏßÄÏùò Ï∞®Ïù¥Î•º ÏÑ§Î™ÖÌï¥Ï£ºÏÑ∏Ïöî.

Answer:


  return fn(*args, **kwargs)


Í≥Ñ#####################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################