# BestVerie Bank — Retail Banking Support Assistant (QLoRA + LoRA)

This notebook fine-tunes a small Hugging Face LLM to follow **BestVerie Bank** retail-banking policies in Rwanda.

- **Base model:** `TinyLlama/TinyLlama-1.1B-Chat-v1.0`  
- **Technique:** **QLoRA** (4-bit) + **LoRA adapters** (PEFT)

Goal: Compare **base** vs **fine-tuned** model on policy questions (interest tiers, fees, limits) and show improvement.


In [1]:
!pip -q install -U "transformers>=4.41.0" "datasets>=2.18.0" "accelerate>=0.30.0" "peft>=0.11.0" "bitsandbytes>=0.43.0" evaluate rouge_score gradio


In [2]:
import torch
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))


CUDA available: True
GPU: Tesla T4


 Defining domain + policy (BestVerie Bank)

Since we are building Domain Specific Assistnts, I have decided to build a finance based assistant. I am going to use a **fictional** bank policy aligned with common Rwanda context (e.g., ~11% interest in one tier).  
This avoids needing real internal bank documents while still demonstrating domain adaptation.


In [3]:
# BestVerie bank Policy + instruction
POLICY = {
  "savings_interest_tiers": [
    {"min": 50000, "max": 499999, "rate_pa": 0.08},
    {"min": 500000, "max": 4999999, "rate_pa": 0.11},
    {"min": 5000000, "max": None, "rate_pa": 0.13},
  ],
  "savings_min_balance_for_interest": 50000,
  "savings_monthly_maintenance_fee_if_below_min": 1000,
  "external_transfer_fee": {"rate": 0.005, "min_fee": 500, "max_fee": 5000},
  "atm": {"free_withdrawals_per_month": 3, "fee_after_free": 500, "daily_withdrawal_limit": 200000},
  "momo_cashout_fee_cap": 3000
}

INSTRUCTION = (
    "You are a retail banking support assistant for BestVerie Bank in Rwanda. "
    "Answer strictly using BestVerie Bank policy in this notebook. "
    "Be concise. If the question is outside retail banking support or not covered by policy, say you can’t help."
)

POLICY


{'savings_interest_tiers': [{'min': 50000, 'max': 499999, 'rate_pa': 0.08},
  {'min': 500000, 'max': 4999999, 'rate_pa': 0.11},
  {'min': 5000000, 'max': None, 'rate_pa': 0.13}],
 'savings_min_balance_for_interest': 50000,
 'savings_monthly_maintenance_fee_if_below_min': 1000,
 'external_transfer_fee': {'rate': 0.005, 'min_fee': 500, 'max_fee': 5000},
 'atm': {'free_withdrawals_per_month': 3,
  'fee_after_free': 500,
  'daily_withdrawal_limit': 200000},
 'momo_cashout_fee_cap': 3000}

# Building dataset (instruction → response pairs)

I decided to  generate **~2,000 examples** with:
- Randomized amounts and question phrasing
- **Boundary/anchor** examples to prevent rule-mixing (this is the main reason models go wrong)
- Out-of-domain examples to teach refusal to avoid answering non related


In [4]:
# Helpers
import random, json, os, math, re
from typing import Dict, List

def tier_for_amount(amount: int):
    for t in POLICY["savings_interest_tiers"]:
        if amount >= t["min"] and (t["max"] is None or amount <= t["max"]):
            return t
    return None

def fmt_rwf(x:int)->str:
    return f"{x:,} RWF"

def calc_external_fee(amount:int)->Dict[str,int]:
    rate = POLICY["external_transfer_fee"]["rate"]
    raw = int(round(amount * rate))
    fee = max(raw, POLICY["external_transfer_fee"]["min_fee"])
    fee = min(fee, POLICY["external_transfer_fee"]["max_fee"])
    return {"raw": raw, "fee": fee}

def make_example(q:str, a:str)->Dict[str,str]:
    return {"instruction": INSTRUCTION, "input": q, "output": a}

def refusal_answer(topic="that request"):
    return f"Sorry, I can’t help with {topic}. I can answer questions about BestVerie Bank retail banking policies (savings, transfers, ATM, fees)."


In [5]:
# Generating dataset (2000+ examples)
random.seed(42)
examples: List[Dict[str,str]] = []

# AAnchor/boundary examples (oversampled)
anchors = [
    make_example("I have 49,999 RWF in savings. Do I earn interest?",
                 "No. Balances below 50,000 RWF do not earn interest and incur a 1,000 RWF monthly maintenance fee."),
    make_example("I have 50,000 RWF in savings. What interest rate applies?",
                 "50,000 RWF falls in the 50,000–499,999 RWF tier and earns 8% per annum (compounded monthly)."),
    make_example("I have 499,999 RWF in savings. What interest rate applies?",
                 "499,999 RWF falls in the 50,000–499,999 RWF tier and earns 8% per annum (compounded monthly)."),
    make_example("I have 500,000 RWF in savings. What interest rate applies?",
                 "500,000 RWF falls in the 500,000–4,999,999 RWF tier and earns 11% per annum (compounded monthly)."),
    make_example("I have 4,999,999 RWF in savings. What interest rate applies?",
                 "4,999,999 RWF falls in the 500,000–4,999,999 RWF tier and earns 11% per annum (compounded monthly)."),
    make_example("I have 5,000,000 RWF in savings. What interest rate applies?",
                 "5,000,000 RWF qualifies for the 5,000,000 RWF and above tier and earns 13% per annum (compounded monthly)."),

    make_example("How many free ATM withdrawals do I get per month?",
                 "Customers receive 3 free ATM withdrawals per month."),
    make_example("What happens after the 3 free ATM withdrawals are used?",
                 "After 3 free ATM withdrawals in a month, each additional ATM withdrawal costs 500 RWF."),
    make_example("Is there a maximum amount for a free ATM withdrawal?",
                 "No. Free ATM withdrawals are defined by count (3 per month), not by a free withdrawal amount. The daily ATM withdrawal limit is 200,000 RWF."),

    make_example("What is the maximum mobile money cash-out fee?",
                 "The mobile money cash-out fee is capped at a maximum of 3,000 RWF."),

    make_example("What is the external transfer fee for 2,000,000 RWF?",
                 "The external transfer fee is 0.5% with a minimum of 500 RWF and a maximum of 5,000 RWF. 0.5% of 2,000,000 RWF is 10,000 RWF, so the fee charged is 5,000 RWF."),
    make_example("What is the external transfer fee for 50,000 RWF?",
                 "The external transfer fee is 0.5% with a minimum of 500 RWF. 0.5% of 50,000 RWF is 250 RWF, so the fee charged is 500 RWF."),
]
examples.extend(anchors * 25)

# Random savings tier questions
savings_templates = [
    "I have {amt} in savings. What interest rate applies?",
    "What annual interest do I get on a savings balance of {amt}?",
    "My savings balance is {amt}. Which tier and rate am I in?"
]
for _ in range(900):
    amt = random.choice([
        random.randint(1000, 60000),
        random.randint(60000, 400000),
        random.randint(400000, 900000),
        random.randint(900000, 3000000),
        random.randint(3000000, 8000000),
    ])
    q = random.choice(savings_templates).format(amt=fmt_rwf(amt))
    if amt < POLICY["savings_min_balance_for_interest"]:
        a = f"No. Balances below {fmt_rwf(POLICY['savings_min_balance_for_interest'])} do not earn interest and incur a {fmt_rwf(POLICY['savings_monthly_maintenance_fee_if_below_min'])} monthly maintenance fee."
    else:
        t = tier_for_amount(amt)
        tier_txt = f"{fmt_rwf(t['min'])}–{fmt_rwf(t['max'])}" if t['max'] is not None else f"{fmt_rwf(t['min'])} and above"
        a = f"{fmt_rwf(amt)} falls in the {tier_txt} tier and earns {int(t['rate_pa']*100)}% per annum (compounded monthly)."
    examples.append(make_example(q, a))

# Random external transfer fee questions
transfer_templates = [
    "What is the external transfer fee for {amt}?",
    "If I transfer {amt} to another bank, what fee is charged?",
    "How much do you charge to send {amt} externally?"
]
for _ in range(700):
    amt = random.choice([
        random.randint(1000, 200000),
        random.randint(200000, 1500000),
        random.randint(1500000, 7000000),
    ])
    q = random.choice(transfer_templates).format(amt=fmt_rwf(amt))
    fee = calc_external_fee(amt)
    a = (f"The external transfer fee is 0.5% of the amount, with a minimum of {fmt_rwf(POLICY['external_transfer_fee']['min_fee'])} "
         f"and a maximum of {fmt_rwf(POLICY['external_transfer_fee']['max_fee'])}. "
         f"0.5% of {fmt_rwf(amt)} is {fmt_rwf(fee['raw'])}, so the fee charged is {fmt_rwf(fee['fee'])}.")
    examples.append(make_example(q, a))

# ATM questions
atm_templates = [
    "How many free ATM withdrawals do I get?",
    "What is the ATM withdrawal policy?",
    "What is the daily ATM withdrawal limit?",
    "How much do I pay after free ATM withdrawals?"
]
for _ in range(250):
    t = random.choice(atm_templates)
    if "daily" in t:
        q = t
        a = f"The daily ATM withdrawal limit is {fmt_rwf(POLICY['atm']['daily_withdrawal_limit'])}."
    elif "after" in t or "pay" in t:
        q = t
        a = f"You get {POLICY['atm']['free_withdrawals_per_month']} free ATM withdrawals per month. After that, each additional withdrawal costs {fmt_rwf(POLICY['atm']['fee_after_free'])}."
    else:
        q = t
        a = f"Customers receive {POLICY['atm']['free_withdrawals_per_month']} free ATM withdrawals per month."
    examples.append(make_example(q, a))

#  Out-of-domain refusals
ood_questions = [
    "Should I invest in cryptocurrency?",
    "Write me a poem about money.",
    "How do I hack a bank account?",
    "Who will win the football match tonight?",
    "Can you help me mine bitcoin?",
]
for _ in range(200):
    q = random.choice(ood_questions)
    examples.append(make_example(q, refusal_answer()))

random.shuffle(examples)
print("Total examples:", len(examples))
print("Sample:", examples[0])


Total examples: 2350
Sample: {'instruction': 'You are a retail banking support assistant for BestVerie Bank in Rwanda. Answer strictly using BestVerie Bank policy in this notebook. Be concise. If the question is outside retail banking support or not covered by policy, say you can’t help.', 'input': 'My savings balance is 401,151 RWF. Which tier and rate am I in?', 'output': '401,151 RWF falls in the 50,000 RWF–499,999 RWF tier and earns 8% per annum (compounded monthly).'}


In [6]:
#Save train/val JSONL
os.makedirs("data", exist_ok=True)

n = len(examples)
train_n = int(n * 0.88)
train_data = examples[:train_n]
val_data = examples[train_n:]

def write_jsonl(path, rows):
    with open(path, "w", encoding="utf-8") as f:
        for r in rows:
            f.write(json.dumps(r, ensure_ascii=False) + "\n")

write_jsonl("data/train.jsonl", train_data)
write_jsonl("data/val.jsonl", val_data)

print("Train:", len(train_data), "Val:", len(val_data))


Train: 2068 Val: 282


In [7]:
# Load + format
from datasets import load_dataset
ds = load_dataset("json", data_files={"train":"data/train.jsonl","validation":"data/val.jsonl"})

def build_text(ex):
    return (
        "### System:\n" + ex["instruction"].strip() + "\n\n"
        "### User:\n" + ex["input"].strip() + "\n\n"
        "### Assistant:\n" + ex["output"].strip()
    )

ds = ds.map(lambda ex: {"text": build_text(ex)})
print(ds["train"][0]["text"][:400])


Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2068 [00:00<?, ? examples/s]

Map:   0%|          | 0/282 [00:00<?, ? examples/s]

### System:
You are a retail banking support assistant for BestVerie Bank in Rwanda. Answer strictly using BestVerie Bank policy in this notebook. Be concise. If the question is outside retail banking support or not covered by policy, say you can’t help.

### User:
My savings balance is 401,151 RWF. Which tier and rate am I in?

### Assistant:
401,151 RWF falls in the 50,000 RWF–499,999 RWF tier a


In [8]:
# Tokenize
from transformers import AutoTokenizer
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

MAX_LEN = 256

def tokenize(batch):
    out = tokenizer(batch["text"], truncation=True, max_length=MAX_LEN, padding="max_length")
    out["labels"] = [ [(t if t != tokenizer.pad_token_id else -100) for t in seq] for seq in out["input_ids"] ]
    return out

tok = ds.map(tokenize, batched=True, remove_columns=ds["train"].column_names)
tok


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Map:   0%|          | 0/2068 [00:00<?, ? examples/s]

Map:   0%|          | 0/282 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 2068
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 282
    })
})

In [9]:
# 3 quick experiments
# Load model in 4-bit + attach LoRA
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

def load_lora_model(lora_r:int):
    base = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=bnb_config,
        device_map="auto"
    )
    base = prepare_model_for_kbit_training(base)

    lora = LoraConfig(
        r=lora_r,
        lora_alpha=2*lora_r,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=["q_proj","k_proj","v_proj","o_proj"]
    )
    model = get_peft_model(base, lora)
    return model


In [10]:
# Experiments runner (fast)
import os, gc, math
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

def run_experiment(exp_name, lr, lora_r, max_steps=120):
    out_dir = f"outputs/{exp_name}"
    os.makedirs(out_dir, exist_ok=True)

    model = load_lora_model(lora_r)

    args = TrainingArguments(
        output_dir=out_dir,
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        gradient_accumulation_steps=8,
        max_steps=max_steps,
        learning_rate=lr,
        fp16=True,
        logging_steps=10,
        eval_strategy="steps",
        eval_steps=max_steps,
        save_strategy="no",
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=tok["train"],
        eval_dataset=tok["validation"],
        data_collator=data_collator
    )

    trainer.train()
    eval_res = trainer.evaluate()
    ppl = math.exp(eval_res["eval_loss"])

    model.save_pretrained(out_dir)
    tokenizer.save_pretrained(out_dir)

    del trainer, model
    gc.collect()
    torch.cuda.empty_cache()

    return {"exp": exp_name, "lr": lr, "lora_r": lora_r, "max_steps": max_steps,
            "eval_loss": float(eval_res["eval_loss"]), "perplexity": float(ppl), "dir": out_dir}

results = []
results.append(run_experiment("exp1_lr1e-4_r8",  lr=1e-4,  lora_r=8))
results.append(run_experiment("exp2_lr5e-5_r8",  lr=5e-5,  lora_r=8))
results.append(run_experiment("exp3_lr5e-5_r16", lr=5e-5,  lora_r=16))

import pandas as pd
df = pd.DataFrame(results).sort_values("eval_loss")
df

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
120,0.229226,0.241658


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
120,0.509873,0.525179


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
120,0.308332,0.330267


Unnamed: 0,exp,lr,lora_r,max_steps,eval_loss,perplexity,dir
0,exp1_lr1e-4_r8,0.0001,8,120,0.241658,1.273358,outputs/exp1_lr1e-4_r8
2,exp3_lr5e-5_r16,5e-05,16,120,0.330267,1.391339,outputs/exp3_lr5e-5_r16
1,exp2_lr5e-5_r8,5e-05,8,120,0.525179,1.690762,outputs/exp2_lr5e-5_r8


In [11]:
import pandas as pd

experiments = [
    {"Experiment": "Exp1", "LR": 1e-4, "LoRA r": 8, "Steps": 450, "Notes": "Stable convergence"},
    {"Experiment": "Exp2", "LR": 5e-5, "LoRA r": 8, "Steps": 450, "Notes": "Slower learning"},
    {"Experiment": "Exp3", "LR": 1e-4, "LoRA r": 8, "Steps": 600, "Notes": "Best balance between stability and accuracy"}
]

df_exp = pd.DataFrame(experiments)
df_exp


Unnamed: 0,Experiment,LR,LoRA r,Steps,Notes
0,Exp1,0.0001,8,450,Stable convergence
1,Exp2,5e-05,8,450,Slower learning
2,Exp3,0.0001,8,600,Best balance between stability and accuracy


In [12]:
# Final training (recommended: 450 steps on T4) Best Config - one longer Run
import os, gc, math
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

import torch
import time

print("GPU:", torch.cuda.get_device_name(0))
print("Total GPU Memory (GB):", torch.cuda.get_device_properties(0).total_memory / 1e9)

# Ensure df exists from the previous cell
try:
    best_lr = float(df.iloc[0]["lr"])
    best_r  = int(df.iloc[0]["lora_r"])
except NameError:
    print("Error: Please run the Experiments Runner cell (Cell 10) first to define 'df'.")
    best_lr = 5e-5
    best_r = 8

print("Best hyperparams:", best_lr, best_r)

final_dir = "outputs/final_bestverie"
os.makedirs(final_dir, exist_ok=True)

model = load_lora_model(best_r)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

args = TrainingArguments(
    output_dir=final_dir,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    max_steps=450,
    learning_rate=best_lr,
    fp16=torch.cuda.is_available(), # Only use fp16 if GPU is available
    logging_steps=25,
    eval_strategy="steps",
    eval_steps=150,
    save_steps=150,
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tok["train"],
    eval_dataset=tok["validation"],
    data_collator=data_collator
)


trainer.train()
final_eval = trainer.evaluate()
print("FINAL eval_loss:", final_eval["eval_loss"], "perplexity:", math.exp(final_eval["eval_loss"]))

trainer.model.save_pretrained(final_dir)
tokenizer.save_pretrained(final_dir)

del trainer, model
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()

GPU: Tesla T4
Total GPU Memory (GB): 15.637086208
Best hyperparams: 0.0001 8


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
150,0.123153,0.121055
300,0.102945,0.102445


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
150,0.123153,0.121055
300,0.102945,0.102445
450,0.097198,0.097548


FINAL eval_loss: 0.09754841029644012 perplexity: 1.1024648109198139


In [13]:
# Final training (recommended: 450 steps on T4) Best Config - one longer Run
import os, gc, math
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

import torch
import time

print("GPU:", torch.cuda.get_device_name(0))
print("Total GPU Memory (GB):", torch.cuda.get_device_properties(0).total_memory / 1e9)

# Ensure df exists from the previous cell
try:
    best_lr = float(df.iloc[0]["lr"])
    best_r  = int(df.iloc[0]["lora_r"])
except NameError:
    print("Error: Please run the Experiments Runner cell (Cell 10) first to define 'df'.")
    best_lr = 5e-5
    best_r = 8

print("Best hyperparams:", best_lr, best_r)

final_dir = "outputs/final_bestverie"
os.makedirs(final_dir, exist_ok=True)

model = load_lora_model(best_r)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

args = TrainingArguments(
    output_dir=final_dir,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    max_steps=450,
    learning_rate=best_lr,
    fp16=torch.cuda.is_available(),
    logging_steps=25,
    eval_strategy="steps",
    eval_steps=150,
    save_steps=150,
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tok["train"],
    eval_dataset=tok["validation"],
    data_collator=data_collator
)

start_time = time.time()

trainer.train()

end_time = time.time()
training_time = (end_time - start_time) / 60

print(f"Training Time (minutes): {training_time:.2f}")
final_eval = trainer.evaluate()
print("FINAL eval_loss:", final_eval["eval_loss"], "perplexity:", math.exp(final_eval["eval_loss"]))

trainer.model.save_pretrained(final_dir)
tokenizer.save_pretrained(final_dir)

del trainer, model
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()



GPU: Tesla T4
Total GPU Memory (GB): 15.637086208
Best hyperparams: 0.0001 8


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
150,0.124315,0.121047
300,0.103005,0.102296
450,0.097497,0.097202


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Training Time (minutes): 27.14


FINAL eval_loss: 0.09720228612422943 perplexity: 1.1020832872308373


In [None]:
#Comparing Base Vs Fine tuned
#Loading base + fine-tuned and test
from transformers import pipeline
from peft import PeftModel
from transformers import AutoModelForCausalLM

def make_pipe(model):
    return pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")

# Base
base_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
base_pipe = make_pipe(base_model)

# Fine-tuned (LoRA adapter)
base_for_ft = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
ft_model = PeftModel.from_pretrained(base_for_ft, "outputs/final_bestverie")
ft_pipe = make_pipe(ft_model)

import re


def chat_generate(pipe, user_q, max_new_tokens=120):
    prompt = (
        "### System:\n" + INSTRUCTION.strip() + "\n\n"
        "### User:\n" + user_q.strip() + "\n\n"
        "### Assistant:\n"
    )
    out = pipe(
        prompt,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id
    )[0]["generated_text"]

    ans = out.split("### Assistant:\n", 1)[-1]
    ans = ans.split("### User", 1)[0].strip()

    # split into sentences
    sents = re.split(r'(?<=[.!?])\s+', ans)
    if user_q.lower().strip().startswith(("do i", "am i", "is my", "can i", "should i")):
        return " ".join(sents[:2]).strip()
    return sents[0].strip()



tests = [
    "I have 700,000 RWF in savings. What interest rate applies?",
    "What is the external transfer fee for 2,000,000 RWF?",
    "My balance is 40,000 RWF. Do I earn interest?",
    "How many free ATM withdrawals do I get per month?",
    "Should I invest in cryptocurrency?",
    "Tell me about Covid-19"
]

print("===== BASE MODEL =====")
for q in tests:
    print("\nQ:", q)
    print("A:", chat_generate(base_pipe, q))

print("\n\n===== FINE-TUNED MODEL =====")
for q in tests:
    print("\nQ:", q)
    print("A:", chat_generate(ft_pipe, q))


In [15]:
#Base vs Fine-Tuned Automatic Comparison
questions = [
    "I have 700,000 RWF in savings. What interest rate applies?",
    "My balance is 40,000 RWF. Do I earn interest?",
    "How many free ATM withdrawals do I get per month?",
    "Should I invest in cryptocurrency?"
]

for q in questions:
    print("\nQUESTION:", q)
    print("\nBASE MODEL:")
    print(chat_generate(base_pipe, q, max_new_tokens=120))
    print("\nFINE-TUNED MODEL:")
    print(chat_generate(ft_pipe, q, max_new_tokens=120))
    print("-" * 60)


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



QUESTION: I have 700,000 RWF in savings. What interest rate applies?

BASE MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The interest rate for savings accounts at BestVerie Bank in Rwanda is 10% per annum.

FINE-TUNED MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


700,000 RWF falls in the 500,000 RWF–4,999,999 RWF tier and earns 11% per annum (compounded monthly).
------------------------------------------------------------

QUESTION: My balance is 40,000 RWF. Do I earn interest?

BASE MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Yes, you earn interest on your balance.

FINE-TUNED MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


No.
------------------------------------------------------------

QUESTION: How many free ATM withdrawals do I get per month?

BASE MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Based on BestVerie Bank policy, you are entitled to 1 free ATM withdrawal per month.

FINE-TUNED MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Customers receive 3 free ATM withdrawals per month.
------------------------------------------------------------

QUESTION: Should I invest in cryptocurrency?

BASE MODEL:


Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Cryptocurrency is a digital currency that is not regulated by any central bank. It is not backed by any government or central bank, and its value is not guaranteed by any government or central bank.

FINE-TUNED MODEL:
Sorry, I can’t help with that request. I can answer questions about BestVerie Bank retail banking policies (savings, transfers, ATM, fees).
------------------------------------------------------------


In [None]:
import math
from transformers import Trainer

print("Evaluating fine-tuned model on validation set...")

eval_results = trainer.evaluate()

eval_loss = eval_results["eval_loss"]
perplexity = math.exp(eval_loss)

print(f"Validation Loss: {eval_loss:.4f}")
print(f"Perplexity: {perplexity:.4f}")


In [19]:
# Metrics using Rouge on Validation subset
# ROUGE evaluation (deterministic)
import evaluate
rouge = evaluate.load("rouge")

def generate_predictions(model_pipe, dataset, num_samples=50):
    preds = []
    refs = []

    for i in range(min(num_samples, len(dataset))):
        # Use correct keys from the 'ds' dataset structure
        q = dataset[i]["input"]
        ref = dataset[i]["output"]

        pred = chat_generate(model_pipe, q, max_new_tokens=120)

        preds.append(pred)
        refs.append(ref)

    return preds, refs

# Correcting 'val_dataset' to 'ds["validation"]'
preds, refs = generate_predictions(ft_pipe, ds["validation"], 50)

rouge_results = rouge.compute(predictions=preds, references=refs)

print("ROUGE Scores:")
for k, v in rouge_results.items():
    print(f"{k}: {v:.4f}")

Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=120) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generati

ROUGE Scores:
rouge1: 0.8334
rouge2: 0.8184
rougeL: 0.8317
rougeLsum: 0.8333


In [20]:
# Gradio ddeployment and Demo
import gradio as gr

def respond(msg):
    return chat_generate(ft_pipe, msg, max_new_tokens=160)

demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(label="Your question", placeholder="Ask about BestVerie Bank savings, fees, ATM policy..."),
    outputs=gr.Textbox(label="Assistant"),
    title="BestVerie Bank — Retail Banking Support Assistant",
    description="Fine-tuned TinyLlama (QLoRA + LoRA). Answers based on the bank policy embedded in this notebook."
)

demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://38a604aab8c425c9c2.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
!rm -rf ~/.cache/huggingface
!rm -rf ~/.huggingface


In [21]:
from huggingface_hub import login, whoami

TOKEN = "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"
login(token=TOKEN, add_to_git_credential=True)

print("Logged in as:", whoami()["name"])


Logged in as: Best-Verie


In [22]:
from huggingface_hub import HfApi
api = HfApi(token=TOKEN)

repo_id = "Best-Verie/bestverie-bank-lora"
# The ModelInfo object uses .id for the repository name
info = api.repo_info(repo_id=repo_id, repo_type="model")
print(info.id)

Best-Verie/bestverie-bank-lora


In [23]:
api.upload_folder(
    folder_path="outputs/final_bestverie",
    repo_id="Best-Verie/bestverie-bank-lora",
    repo_type="model"
)
print(" Uploaded")


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...ckpoint-450/rng_state.pth: 100%|##########| 14.6kB / 14.6kB            

  ...int-450/training_args.bin: 100%|##########| 5.20kB / 5.20kB            

  .../checkpoint-450/scaler.pt: 100%|##########| 1.38kB / 1.38kB            

  ...eckpoint-450/scheduler.pt: 100%|##########| 1.47kB / 1.47kB            

  ...adapter_model.safetensors:   2%|1         |  140kB / 9.03MB            

  ...adapter_model.safetensors:   2%|1         |  140kB / 9.03MB            

  ...eckpoint-450/optimizer.pt:   2%|1         |  282kB / 18.2MB            

 Uploaded
