# üá™üáπ Afaan Oromoo Passport Chatbot - Fine-tuning

This notebook fine-tunes a small language model to answer questions about Ethiopian passport services in Afaan Oromoo.

**Dataset:** 392 Q&A pairs about passport services

**Model:** Qwen2.5-3B-Instruct with QLoRA

---

## ‚öôÔ∏è Setup Instructions

1. Go to **Runtime ‚Üí Change runtime type**
2. Select **T4 GPU** (free tier) or **A100 GPU** (Colab Pro)
3. Run all cells in order

## 1Ô∏è‚É£ Check GPU Availability

In [1]:
import torch

print("=" * 50)
print("GPU Check")
print("=" * 50)

if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå No GPU detected! Go to Runtime ‚Üí Change runtime type ‚Üí T4 GPU")
    raise RuntimeError("GPU not available. Please enable GPU runtime.")

GPU Check
‚úÖ GPU Available: Tesla T4
   VRAM: 15.6 GB


In [2]:
!pip uninstall -y trl transformers peft accelerate bitsandbytes
!pip install --no-cache-dir torch transformers datasets accelerate peft bitsandbytes trl

Found existing installation: trl 0.28.0
Uninstalling trl-0.28.0:
  Successfully uninstalled trl-0.28.0
Found existing installation: transformers 5.2.0
Uninstalling transformers-5.2.0:
  Successfully uninstalled transformers-5.2.0
Found existing installation: peft 0.18.1
Uninstalling peft-0.18.1:
  Successfully uninstalled peft-0.18.1
Found existing installation: accelerate 1.12.0
Uninstalling accelerate-1.12.0:
  Successfully uninstalled accelerate-1.12.0
Found existing installation: bitsandbytes 0.49.2
Uninstalling bitsandbytes-0.49.2:
  Successfully uninstalled bitsandbytes-0.49.2
Collecting transformers
  Downloading transformers-5.2.0-py3-none-any.whl.metadata (32 kB)
Collecting accelerate
  Downloading accelerate-1.12.0-py3-none-any.whl.metadata (19 kB)
Collecting peft
  Downloading peft-0.18.1-py3-none-any.whl.metadata (14 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting trl
  Downloading trl-0.28.0-py3-n

## 2Ô∏è‚É£ Install Dependencies

In [3]:
%%capture
# Install required packages (this takes ~2-3 minutes)
!pip install -q torch
#transformers datasets accelerate peft bitsandbytes trl

In [4]:
import inspect
from trl import SFTTrainer
print(inspect.signature(SFTTrainer.__init__))

(self, model: 'str | PreTrainedModel | PeftModel', args: trl.trainer.sft_config.SFTConfig | transformers.training_args.TrainingArguments | None = None, data_collator: collections.abc.Callable[[list[typing.Any]], dict[str, typing.Any]] | None = None, train_dataset: datasets.arrow_dataset.Dataset | datasets.iterable_dataset.IterableDataset | None = None, eval_dataset: datasets.arrow_dataset.Dataset | datasets.iterable_dataset.IterableDataset | dict[str, datasets.arrow_dataset.Dataset | datasets.iterable_dataset.IterableDataset] | None = None, processing_class: transformers.tokenization_utils_base.PreTrainedTokenizerBase | transformers.processing_utils.ProcessorMixin | None = None, compute_loss_func: collections.abc.Callable | None = None, compute_metrics: collections.abc.Callable[[transformers.trainer_utils.EvalPrediction], dict] | None = None, callbacks: list[transformers.trainer_callback.TrainerCallback] | None = None, optimizers: tuple[torch.optim.optimizer.Optimizer | None, torch.opt

In [5]:
# Verify installations
import transformers, peft, trl, bitsandbytes
print(f"‚úÖ transformers: {transformers.__version__}")
print(f"‚úÖ peft: {peft.__version__}")
print(f"‚úÖ trl: {trl.__version__}")

‚úÖ transformers: 5.2.0
‚úÖ peft: 0.18.1
‚úÖ trl: 0.28.0


## 3Ô∏è‚É£ Upload Your Dataset

Run the cell below and upload your `afaan_oromoo_jsonl_20260225_bfc421.txt` file.

In [6]:
from google.colab import files
import os

# Create data directory
os.makedirs('/content/data', exist_ok=True)

print("üì§ Please upload your dataset file (afaan_oromoo_jsonl_20260225_bfc421.txt)")
print("   Click 'Choose Files' below...\n")

uploaded = files.upload()

for filename in uploaded.keys():
    print(f"\n‚úÖ Uploaded: {filename} ({len(uploaded[filename])} bytes)")
    # Move to data directory
    os.rename(filename, f'/content/data/{filename}')

üì§ Please upload your dataset file (afaan_oromoo_jsonl_20260225_bfc421.txt)
   Click 'Choose Files' below...



Saving afaan_oromoo_jsonl_20260225_bfc421.txt to afaan_oromoo_jsonl_20260225_bfc421.txt

‚úÖ Uploaded: afaan_oromoo_jsonl_20260225_bfc421.txt (77372 bytes)


## 4Ô∏è‚É£ Prepare Dataset

In [7]:
import json
import random
from pathlib import Path

# System prompt in Afaan Oromoo
SYSTEM_PROMPT = """Ati gargaarsa tajaajila paaspoortii Itoophiyaa. Gaaffiiwwan waa'ee paaspoortii, viizaa, fi imala biyya alaa kanneen Afaan Oromootin gaafataman deebii kenna. Odeeffannoo sirrii, gargaarsa faallaa, fi akkataa jaalalaa qaba. Namoota paaspoortii barbaadan gargaaruuf qophiidha."""

def load_data(filepath):
    """Load JSONL data."""
    data = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if line:
                try:
                    data.append(json.loads(line))
                except:
                    pass
    return data

def format_sample(sample):
    """Format sample for training."""
    return {
        "text": f"""### System:
{SYSTEM_PROMPT}

### Instruction:
{sample['instruction']}

### Response:
{sample['output']}"""
    }

# Load and prepare data
print("üìÇ Loading dataset...")
raw_data = load_data('/content/data/afaan_oromoo_jsonl_20260225_bfc421.txt')
print(f"   Loaded {len(raw_data)} samples")

# Shuffle and split
random.seed(42)
random.shuffle(raw_data)
split_idx = int(len(raw_data) * 0.9)
train_data = [format_sample(s) for s in raw_data[:split_idx]]
val_data = [format_sample(s) for s in raw_data[split_idx:]]

# Save formatted data
Path('/content/data/train.jsonl').write_text('\n'.join(json.dumps(d, ensure_ascii=False) for d in train_data))
Path('/content/data/val.jsonl').write_text('\n'.join(json.dumps(d, ensure_ascii=False) for d in val_data))

print(f"\n‚úÖ Dataset prepared:")
print(f"   Training: {len(train_data)} samples")
print(f"   Validation: {len(val_data)} samples")

# Show sample
print(f"\nüìù Sample training data:")
print("-" * 40)
print(train_data[0]['text'][:400] + "...")

üìÇ Loading dataset...
   Loaded 392 samples

‚úÖ Dataset prepared:
   Training: 352 samples
   Validation: 40 samples

üìù Sample training data:
----------------------------------------
### System:
Ati gargaarsa tajaajila paaspoortii Itoophiyaa. Gaaffiiwwan waa'ee paaspoortii, viizaa, fi imala biyya alaa kanneen Afaan Oromootin gaafataman deebii kenna. Odeeffannoo sirrii, gargaarsa faallaa, fi akkataa jaalalaa qaba. Namoota paaspoortii barbaadan gargaaruuf qophiidha.

### Instruction:
Paaspoortii koo yeroo imala keessa jiru, ani imala koo dhaabuu hin danda'u? Maal godhu?

### Res...


## 5Ô∏è‚É£ Configure Training

In [8]:
# @title Training Configuration { display-mode: "form" }

# Model selection
MODEL_NAME = "Qwen/Qwen2.5-3B-Instruct"  # @param ["Qwen/Qwen2.5-3B-Instruct", "Qwen/Qwen2.5-1.5B-Instruct", "meta-llama/Llama-3.2-3B-Instruct", "google/gemma-2-2b-it"]

# Training parameters
NUM_EPOCHS = 3  # @param {type:"slider", min:1, max:10, step:1}
BATCH_SIZE = 4  # @param {type:"slider", min:1, max:8, step:1}
LEARNING_RATE = 2e-4  # @param {type:"number"}
MAX_SEQ_LENGTH = 512  # @param {type:"slider", min:256, max:1024, step:128}

# LoRA parameters
LORA_R = 16  # @param {type:"slider", min:4, max:64, step:4}
LORA_ALPHA = 32  # @param {type:"slider", min:8, max:128, step:8}

print("=" * 50)
print("Training Configuration")
print("=" * 50)
print(f"üì¶ Model: {MODEL_NAME}")
print(f"‚öôÔ∏è  Epochs: {NUM_EPOCHS}")
print(f"üìä Batch Size: {BATCH_SIZE}")
print(f"üìà Learning Rate: {LEARNING_RATE}")
print(f"üìè Max Sequence Length: {MAX_SEQ_LENGTH}")
print(f"üîß LoRA Rank: {LORA_R}, Alpha: {LORA_ALPHA}")

Training Configuration
üì¶ Model: Qwen/Qwen2.5-3B-Instruct
‚öôÔ∏è  Epochs: 3
üìä Batch Size: 4
üìà Learning Rate: 0.0002
üìè Max Sequence Length: 512
üîß LoRA Rank: 16, Alpha: 32


## 6Ô∏è‚É£ Load Model & Tokenizer

In [9]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType

print("üì¶ Loading model with 4-bit quantization...")

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Prepare for training
model = prepare_model_for_kbit_training(model)

# LoRA config
lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(model, lora_config)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Print trainable parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"\n‚úÖ Model loaded!")
print(f"   Trainable: {trainable:,} / {total:,} ({100*trainable/total:.2f}%)")

üì¶ Loading model with 4-bit quantization...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading weights:   0%|          | 0/434 [00:00<?, ?it/s]


‚úÖ Model loaded!
   Trainable: 29,933,568 / 1,728,606,208 (1.73%)


## 7Ô∏è‚É£ Start Training

In [18]:
# Debug + auto-fix training block for the bfloat16 "_amp_foreach_non_finite_check_and_unscale_cuda" error
import os
import torch
import traceback
from transformers import TrainingArguments
from trl import SFTTrainer

print("="*60)
print("DEBUG: torch / cuda / env info")
print("="*60)
print("torch:", torch.__version__)
print("torch.cuda.is_available():", torch.cuda.is_available())
if torch.cuda.is_available():
    try:
        print("CUDA version:", torch.version.cuda)
        print("Device name:", torch.cuda.get_device_name(0))
        props = torch.cuda.get_device_properties(0)
        print(f"Total memory (GB): {props.total_memory / 1e9:.2f}")
    except Exception as e:
        print("Could not query CUDA device properties:", e)

# Check environment variables that may enable bf16
bf16_env_vars = {k: v for k, v in os.environ.items() if "BF16" in k.upper() or "BFLOAT" in k.upper() or "TORCH_BF16" in k.upper()}
print("\nPotential BF16-related env vars (will unset them for safety):")
for k, v in bf16_env_vars.items():
    print(f"  {k}={v}")

# Unset common bf16 env vars defensively
for k in list(os.environ.keys()):
    if "BF16" in k.upper() or "BFLOAT" in k.upper() or "TORCH_BF16" in k.upper() or "ACCELERATE_BF16" in k.upper():
        os.environ.pop(k, None)

print("\nUnset BF16 related env vars. Continuing...\n")

# Summarize parameter dtypes
dtype_counts = {}
bfloat16_params = []
for n, p in model.named_parameters():
    d = p.dtype
    dtype_counts[d] = dtype_counts.get(d, 0) + 1
    if d == torch.bfloat16:
        bfloat16_params.append((n, p))

print("Parameter dtype counts:", dtype_counts)
print(f"Found {len(bfloat16_params)} parameters with dtype bfloat16.")

# Try to convert bfloat16 params to float16 (safe attempt)
converted = 0
failed_conversions = []
if bfloat16_params:
    print("\nAttempting to convert bfloat16 parameters -> float16 (in-place) ...")
    for n, p in bfloat16_params:
        try:
            # do this in a try/except because quantized / 4-bit params may not support .to()
            with torch.no_grad():
                p.data = p.data.to(torch.float16)
            converted += 1
        except Exception as e:
            failed_conversions.append((n, repr(e)))
    print(f"Converted {converted}/{len(bfloat16_params)} bfloat16 params to float16.")
    if failed_conversions:
        print("Failed conversions (name, error):")
        for n, err in failed_conversions[:5]:
            print(" ", n, err)
else:
    print("No bfloat16 parameters detected; good.")

# As an extra safety, set default dtype to float32 (avoid accidental bfloat promotions)
torch.set_default_dtype(torch.float32)

# Force environment flags that disable bf16 usage
os.environ.pop("TORCH_BF16", None)
os.environ.pop("ACCELERATE_BF16", None)
os.environ.pop("XLA_USE_BF16", None)

print("\nBuilding TrainingArguments with bf16=False and fp16=True (if CUDA available)...")
fp16_flag = torch.cuda.is_available()
training_args_kwargs = dict(
    output_dir="/content/results",
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=2,
    warmup_steps=10,
    logging_steps=5,
    eval_steps=20,
    save_strategy="epoch",
    learning_rate=LEARNING_RATE,
    fp16=fp16_flag,
    bf16=False,   # explicit
    push_to_hub=False,
    report_to="none",
    save_total_limit=3,
)

# compatibility for eval_strategy vs evaluation_strategy
try:
    training_args = TrainingArguments(**{**training_args_kwargs, "eval_strategy": "steps"})
except TypeError:
    training_args = TrainingArguments(**{**training_args_kwargs, "evaluation_strategy": "steps"})

# formatting func you used
def formatting_func(example: dict) -> str:
    return example["text"]

trainer_kwargs = dict(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    processing_class=tokenizer,
    formatting_func=formatting_func,
)

print("\nTrainer kwargs prepared. Attempting to instantiate SFTTrainer...")
trainer = SFTTrainer(**trainer_kwargs)
print("SFTTrainer instantiated successfully.")

# Attempt training; if the same NotImplementedError occurs, retry once with fp16=False
try:
    print("\nStarting trainer.train() (attempt 1) ...")
    trainer.train()
    print("\nTraining completed (attempt 1).")
except Exception as e:
    tb = traceback.format_exc()
    print("\nTrainer.train() raised an exception:\n", tb)
    # Detect the specific amp foreach NotImplementedError pattern:
    if isinstance(e, NotImplementedError) or "amp_foreach_non_finite_check_and_unscale" in tb or "BFloat16" in tb or "bfloat16" in tb:
        print("\nDetected NotImplementedError related to bfloat16 AMP unscale. Retrying with mixed precision disabled (fp16=False)...")
        # rebuild args with fp16 disabled
        training_args_kwargs["fp16"] = False
        training_args_kwargs["bf16"] = False
        try:
            try:
                new_training_args = TrainingArguments(**{**training_args_kwargs, "eval_strategy": "steps"})
            except TypeError:
                new_training_args = TrainingArguments(**{**training_args_kwargs, "evaluation_strategy": "steps"})
            trainer = SFTTrainer(
                model=model,
                args=new_training_args,
                train_dataset=train_dataset,
                eval_dataset=val_dataset,
                processing_class=tokenizer,
                formatting_func=formatting_func,
            )
            print("Rebuilt SFTTrainer with fp16=False. Starting trainer.train() (attempt 2)...")
            trainer.train()
            print("\nTraining completed (attempt 2, fp16 disabled).")
        except Exception as e2:
            print("\nRetry also failed. Exception:\n", traceback.format_exc())
            print("\nAt this point options are:\n"
                  "  - restart the Colab runtime and run from the top (clears accel / env state),\n"
                  "  - upgrade/downgrade PyTorch to a compatible cuda build, or\n"
                  "  - try training with a different GPU (A100) if available.\n")
    else:
        print("\nTrainer failed with an error not recognized as the bf16 AMP unscale issue. Full traceback above.")

DEBUG: torch / cuda / env info
torch: 2.10.0+cu128
torch.cuda.is_available(): True
CUDA version: 12.8
Device name: Tesla T4
Total memory (GB): 15.64

Potential BF16-related env vars (will unset them for safety):

Unset BF16 related env vars. Continuing...

Parameter dtype counts: {torch.float32: 182, torch.uint8: 252, torch.bfloat16: 504}
Found 504 parameters with dtype bfloat16.

Attempting to convert bfloat16 parameters -> float16 (in-place) ...
Converted 504/504 bfloat16 params to float16.

Building TrainingArguments with bf16=False and fp16=True (if CUDA available)...

Trainer kwargs prepared. Attempting to instantiate SFTTrainer...


Applying formatting function to train dataset:   0%|          | 0/352 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/352 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/352 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/352 [00:00<?, ? examples/s]

Applying formatting function to eval dataset:   0%|          | 0/40 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/40 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/40 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/40 [00:00<?, ? examples/s]

SFTTrainer instantiated successfully.

Starting trainer.train() (attempt 1) ...


  return fn(*args, **kwargs)



Trainer.train() raised an exception:
 Traceback (most recent call last):
  File "/tmp/ipython-input-1740439160.py", line 122, in <cell line: 0>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1412, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1775, in _inner_training_loop
    _grad_norm = self.accelerator.clip_grad_norm_(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/accelerator.py", line 3008, in clip_grad_norm_
    self.unscale_gradients()
  File "/usr/local/lib/python3.12/dist-packages/accelerate/accelerator.py", line 2946, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/grad_scaler.py", line 343, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
                            

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
20,1.267862,1.095082
40,0.666058,0.653177
60,0.532921,0.546581
80,0.360809,0.50496
100,0.326267,0.486491
120,0.41312,0.468881


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)



Training completed (attempt 2, fp16 disabled).


## 8Ô∏è‚É£ Save Model

In [19]:
# Save the fine-tuned model
OUTPUT_DIR = "/content/oromo_chatbot_final"

print(f"üíæ Saving model to {OUTPUT_DIR}...")

trainer.model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

# Save config
config = {
    "model_name": MODEL_NAME,
    "epochs": NUM_EPOCHS,
    "batch_size": BATCH_SIZE,
    "learning_rate": LEARNING_RATE,
    "lora_r": LORA_R,
    "lora_alpha": LORA_ALPHA,
    "system_prompt": SYSTEM_PROMPT,
    "language": "Afaan Oromoo",
    "domain": "Ethiopian Passport Services"
}

with open(f"{OUTPUT_DIR}/training_config.json", "w") as f:
    json.dump(config, f, indent=2)

print("‚úÖ Model saved successfully!")

üíæ Saving model to /content/oromo_chatbot_final...
‚úÖ Model saved successfully!


## 9Ô∏è‚É£ Test the Model

In [22]:
def generate_response(instruction, max_new_tokens=250, temperature=0.4, do_sample=True):
    """
    Generate a response from the fine-tuned model.
    - max_new_tokens: increased for longer answers
    - temperature: lower (0.4) for focused diversity
    - do_sample: True to enable sampling
    - repetition_penalty: 1.1 to gently discourage repeats
    """
    prompt = f"""### System:
{SYSTEM_PROMPT}

### Instruction:
{instruction}

### Response:
"""

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=do_sample,
            temperature=temperature if do_sample else None,
            top_p=0.9 if do_sample else None,
            top_k=50 if do_sample else None,
            repetition_penalty=1.1,          # gentle penalty
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
        )

    full_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

    # Extract only the response part
    if "### Response:" in full_output:
        response = full_output.split("### Response:")[-1].strip()
    else:
        response = full_output.strip()

    # Remove the model-specific end token if present
    response = response.replace("<|im_end|>", "").strip()
    return response

In [23]:
test_questions = [
    "Paaspoortii maali?",
    "Paaspoortii argachuuf maal barbaachisa?",
    "Finfinne keessatti ofiisii paaspoortii eessa jira?",
    "Baasii paaspoortii meeqa?",
    "Paaspoortii daa'imaaf akkamitti argadha?",
]

for q in test_questions:
    print(f"‚ùì {q}")
    print(f"üí¨ {generate_response(q)}\n")

‚ùì Paaspoortii maali?
üí¨ Eeyyeen, ofiisii paaspoortiitti dhaqxee ragaa guutuu (ragaa 10-20) fiduun baay'ee paasportii maali.

‚ùì Paaspoortii argachuuf maal barbaachisa?
üí¨ Eeyyeen, ofiisii paaspoortiitti dhaqxee ragaa maaluun bilbiluun (birth certificate) fiduu qabda.

‚ùì Finfinne keessatti ofiisii paaspoortii eessa jira?
üí¨ Eeyyeen, finfinne keessatti ofiisii paaspoortii irratti eessa jiru. Ofiisii paaspoortii waraqaa ragaa guutuu (ragaa 5) hin dandeessuun.

‚ùì Baasii paaspoortii meeqa?
üí¨ Eeyyeen, baasii paasportii kee yeroo hin jiru, ofiisii paaspoortiitti dhaqxee ragaa lubbuun bilbiluun bilbilla.

‚ùì Paaspoortii daa'imaaf akkamitti argadha?
üí¨ Eeyyeen, daa'imi ykn haaraa (12-35) ofiisii paaspoortiitti dhaqxee isaan siif kennu.



In [33]:
# List of instructions to test
test_instructions = [
    "Paaspoortii koo argadhe, garuu irraa fuulli koo hin mul'anne. Maal godhu?",
    "Paaspoortii koo yeroo imala keessa jiru, ani imala koo dhaabuu hin danda'u? Maal godhu?"
]

# Run inference
for inst in test_instructions:
    print("‚ùì", inst)
    response = generate_response(inst, max_new_tokens=200, temperature=0.4, do_sample=True)
    print("üí¨", response)
    print("-" * 50)

‚ùì Paaspoortii koo argadhe, garuu irraa fuulli koo hin mul'anne. Maal godhu?
üí¨ Eeyyeen, ofiisii paaspoortiitti dhaqxee guuti kee irratti jijjiirsiisi.
--------------------------------------------------
‚ùì Paaspoortii koo yeroo imala keessa jiru, ani imala koo dhaabuu hin danda'u? Maal godhu?
üí¨ Imbassii dhaqxee gargaarsa gaafadhu. Isaan si gargaaru.
--------------------------------------------------


## üîü Interactive Chat

In [41]:
# @title Chat with the Bot { display-mode: "form" }
your_question = "Paaspoortii koo yeroo imala keessa jiru, ani imala koo deebi'uu hin danda'u? Maal godhu?"  # @param {type:"string"}

if your_question:
    print(f"\nüë§ You: {your_question}")
    response = generate_response(your_question, max_new_tokens=200)
    print(f"\nü§ñ Bot: {response}")


üë§ You: Paaspoortii koo yeroo imala keessa jiru, ani imala koo deebi'uu hin danda'u? Maal godhu?

ü§ñ Bot: Imbassii dhaqxee gargaarsa gaafadhu. Isaan si gargaaru.


## üì• Download Model

In [None]:
import shutil
from google.colab import files

# Create a zip file
print("üì¶ Creating zip file...")
shutil.make_archive('/content/oromo_chatbot_model', 'zip', OUTPUT_DIR)

# Download
print("üì• Downloading model...")
files.download('/content/oromo_chatbot_model.zip')

print("\n‚úÖ Model downloaded! You can use this for deployment.")

## üìã (Optional) Save to Google Drive

In [None]:
# @title Save to Google Drive { display-mode: "form" }
save_to_drive = False  # @param {type:"boolean"}

if save_to_drive:
    from google.colab import drive

    print("üìÇ Mounting Google Drive...")
    drive.mount('/content/drive')

    # Create directory
    save_path = '/content/drive/MyDrive/oromo_chatbot_model'
    os.makedirs(save_path, exist_ok=True)

    # Copy model
    shutil.copytree(OUTPUT_DIR, save_path, dirs_exist_ok=True)

    print(f"‚úÖ Model saved to Google Drive: {save_path}")

---

## üéâ Training Complete!

Your Afaan Oromoo passport chatbot model is now fine-tuned and saved.

### Next Steps:
1. **Test more questions** using the interactive chat cell
2. **Download the model** for local deployment
3. **Save to Google Drive** for persistent storage

### Deployment Options:
- **Ollama**: Convert to GGUF format for local use
- **vLLM**: High-performance inference server
- **Hugging Face**: Upload to the model hub
- **FastAPI**: Create a REST API endpoint