# **⚠️ Run Notebook on Colab Using Colab's Free T4-GPU ⚠️**

## **🔧 1. Install Required Libraries**

In [1]:
!pip install -q bitsandbytes accelerate transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m52.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## **🔐 2. Import Libraries and Login to Hugging Face**

In [24]:
import os
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
from huggingface_hub import login
from tabulate import tabulate

login()  # Paste your Hugging Face token when prompted

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## **⚙️ 3. Configure 4-bit Quantization**

In [5]:
# Use BitsAndBytes for quantized loading to save memory
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

## **🧠 4. Load Tokenizer & Fine-Tuned LLMs (4-bit)**

In [7]:
# Global cache to store models and related components
model_cache = {}

In [8]:
def load_model(model_type="llama"):
    if model_type in model_cache:
        return model_cache[model_type]

    if model_type == "llama":
        model_id = "Muhammad-Umer-Khan/Llama-3-8B-FAQs-Finetuned"
        prompt_wrapper = lambda q: f"<|begin_of_text|><|user|>\n{q}\n<|assistant|>"
        extract_response = lambda out, prompt: out[out.index("<|assistant|>") + len("<|assistant|>\n"):].strip()
    elif model_type == "mistral":
        model_id = "Muhammad-Umer-Khan/Mistral-7b-v03-FAQs-Finetuned"
        prompt_wrapper = lambda q: f"<s>[INST] {q} [/INST]"
        extract_response = lambda out, prompt: out.replace(prompt, '').strip()
    else:
        raise ValueError("Invalid model_type. Choose either 'llama' or 'mistral'.")

    # Load tokenizer and model once
    tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True
    )

    # Create inference pipeline
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_length=200, device_map="auto")

    # Cache all components
    model_cache[model_type] = (pipe, prompt_wrapper, extract_response)
    return model_cache[model_type]

## **🧪 5. Create Inference Pipeline**

In [9]:
def get_chatbot_response(user_input, model_type="llama"):
    try:
        pipe, wrap_prompt, extract_response = load_model(model_type)
        prompt = wrap_prompt(user_input)
        result = pipe(prompt)[0]['generated_text']
        return extract_response(result, prompt)
    except Exception as e:
        return f"Error: {str(e)}"

## **📥 6. Test the Chatbot**

In [10]:
query = "How can I reset my password?"

In [11]:
print("🔷 Llama 3 Response:")
print(get_chatbot_response(query, model_type="llama"))

🔷 Llama 3 Response:


tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/325 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/389 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/840 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

Loading adapter weights from Muhammad-Umer-Khan/Llama-3-8B-FAQs-Finetuned led to missing keys in the model: model.layers.0.self_attn.q_proj.lora_A.default.weight, model.layers.0.self_attn.q_proj.lora_B.default.weight, model.layers.0.self_attn.v_proj.lora_A.default.weight, model.layers.0.self_attn.v_proj.lora_B.default.weight, model.layers.1.self_attn.q_proj.lora_A.default.weight, model.layers.1.self_attn.q_proj.lora_B.default.weight, model.layers.1.self_attn.v_proj.lora_A.default.weight, model.layers.1.self_attn.v_proj.lora_B.default.weight, model.layers.2.self_attn.q_proj.lora_A.default.weight, model.layers.2.self_attn.q_proj.lora_B.default.weight, model.layers.2.self_attn.v_proj.lora_A.default.weight, model.layers.2.self_attn.v_proj.lora_B.default.weight, model.layers.3.self_attn.q_proj.lora_A.default.weight, model.layers.3.self_attn.q_proj.lora_B.default.weight, model.layers.3.self_attn.v_proj.lora_A.default.weight, model.layers.3.self_attn.v_proj.lora_B.default.weight, model.layers

Hi there! I'm happy to help you reset your password. To do so, I'll need some more information from you. Can you please provide me with the following details:

1. Your email address or username associated with your account.
2. A valid recovery email address or phone number (if you have set one up).
3. A few security questions to verify your identity (if you have set them up).

Once I have this information, I can guide you through the password reset process.


In [12]:
print("\n🔶 Mistral 7B Response:")
print(get_chatbot_response(query, model_type="mistral"))


🔶 Mistral 7B Response:


tokenizer_config.json:   0%|          | 0.00/137k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.67M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/3.96k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

Loading adapter weights from Muhammad-Umer-Khan/Mistral-7b-v03-FAQs-Finetuned led to missing keys in the model: model.layers.0.self_attn.q_proj.lora_A.default.weight, model.layers.0.self_attn.q_proj.lora_B.default.weight, model.layers.0.self_attn.v_proj.lora_A.default.weight, model.layers.0.self_attn.v_proj.lora_B.default.weight, model.layers.1.self_attn.q_proj.lora_A.default.weight, model.layers.1.self_attn.q_proj.lora_B.default.weight, model.layers.1.self_attn.v_proj.lora_A.default.weight, model.layers.1.self_attn.v_proj.lora_B.default.weight, model.layers.2.self_attn.q_proj.lora_A.default.weight, model.layers.2.self_attn.q_proj.lora_B.default.weight, model.layers.2.self_attn.v_proj.lora_A.default.weight, model.layers.2.self_attn.v_proj.lora_B.default.weight, model.layers.3.self_attn.q_proj.lora_A.default.weight, model.layers.3.self_attn.q_proj.lora_B.default.weight, model.layers.3.self_attn.v_proj.lora_A.default.weight, model.layers.3.self_attn.v_proj.lora_B.default.weight, model.la

To reset your password, you usually need to follow these steps:

1. Go to the login page of the service you're using (e.g., email, social media, etc.).

2. Click on the "Forgot password" or similar link.

3. You will be asked to enter the email address or username associated with your account.

4. A password reset link will be sent to that email address.

5. Click on the link in the email and follow the instructions to create a new password.

6. Once you've created a new password, log in with your email and the new password.

If you're still having trouble, check your spam folder in case the password reset email ended up there, or contact the service's support for further assistance.


## **⚖️ Let's do some comparasion**

#### **⌛ Load FAQs Dataset**

In [27]:
dataset = pd.read_csv("hf://datasets/Muhammad-Umer-Khan/FAQ_Dataset/BankFAQs.csv")

#### **🆚 Comparator Function**

In [28]:
def compare_llm_responses_from_dataset(index, model_type1="llama", model_type2="mistral"):
    # Extract the question and actual answer from the dataset
    question = dataset.loc[index, "Question"]
    actual_answer = dataset.loc[index, "Answer"]

    # Get responses from both LLMs
    llama_response = get_chatbot_response(question, model_type=model_type1)
    mistral_response = get_chatbot_response(question, model_type=model_type2)

    # Format the results as a comparison table
    comparison_table = [
        ["📌 Question", question],
        ["✅ Actual Answer", actual_answer],
        [f"🦙 {model_type1.capitalize()} Response", llama_response],
        [f"🧠 {model_type2.capitalize()} Response", mistral_response],
    ]

    print(tabulate(comparison_table, headers=["Type", "Content"], tablefmt="fancy_grid"))

### **Question ❓:**
- How do I apply for a Loan Against Property (LAP).
### **Actual Answer 💡:**
- You can apply for a loan in the following ways: Fill in the online application form and our representative will get in touch with you Call one of our PhoneBanking numbers provided on the website Visit your nearest branch Our existing liability customers may also get in touch with their Relationship Managers/ Personal Bankers to know more and apply for LAP.

In [33]:
compare_llm_responses_from_dataset(index=263)

╒═════════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Type                │ Content                                                                                                                                                                                                                                                                                                                                                               │
╞═════════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════

### **Question ❓:**
- What is Amortization.
### **Actual Answer 💡:**
- Amortization is paying off debts in regular instalments over a period of time.

In [35]:
compare_llm_responses_from_dataset(index=276)

╒═════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Type                │ Content                                                                                                                                                                                                                                                                                                                  │
╞═════════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════

### **Question ❓:**
- Do I need to provide collateral to avail of an education loan (Education Loans For Indian Education)
### **Actual Answer 💡:**
- For loans up to Rs. 4 lakh - No collateral or Third Party Guarantee is required For loans from Rs. 4 lakh to Rs. 7.5 lakh – No collateral required, however Third Party Guarantee is required For loans above Rs. 7.5 lakh – collateral is required. You can choose from any of the following acceptable collaterals: Residential Property HDFC Bank Fixed Deposit LIC/NSC/KVP View more

In [37]:
compare_llm_responses_from_dataset(index=182)

╒═════════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Type                │ Content                                                                                                                                                                                                                                                                                                                                                                                                                                                                        │
╞═════════════════════

Based on your **fine-tuned LLaMA 3 and Mistral 7B outputs**, training logs, and response comparisons — here's a detailed, professional recommendation:

---

### ✅ **Overall Recommendation: Which Model Performed Better?**

**🏆 Winner: Mistral 7B** — for FAQ-style banking chatbot applications.

---

### 🔍 **Summary of Evaluation**

| Factor                       | Verdict                                                                                                                                                   |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Training Loss**            | **LLaMA had lower final training loss (\~1.49)** vs Mistral (\~1.78), indicating slightly better convergence.                                             |
| **Response Quality**         | **Mistral’s responses were more concise, focused, and directly aligned** with the actual FAQ answers. LLaMA often added extra (and sometimes vague) info. |
| **Factual Accuracy**         | **Mistral stuck closer to ground truth**, avoiding hallucinations. LLaMA included broad financial explanations, not always needed.                        |
| **Answer Structure**         | Mistral gave clean step-wise or bullet answers. LLaMA responses were long-winded, occasionally rambling.                                                  |
| **General Language Fluency** | **LLaMA is slightly more fluent**, friendly, and human-like. Great for assistant-style or general-purpose bots.                                           |
| **Domain-Specific Use Case** | Mistral clearly performed better in **narrow, factual domains like banking FAQs**.                                                                        |
| **Colab Compatibility**      | Both models trained fine on Colab, but **Mistral was more memory-efficient** and inference was faster.                                                    |

---

### 💡 **Final Use-Case Based Verdict**

| Use Case                           | Best Model     | Why                                                                                           |
| ---------------------------------- | -------------- | --------------------------------------------------------------------------------------------- |
| **Banking FAQ / Customer Support** | ✅ Mistral 7B   | Aligned better with structured answers, factual accuracy, and response conciseness.           |
| **General Chat / Help Assistant**  | ✅ LLaMA 3 (8B) | More verbose, polite, and expressive – good for open-ended queries and humanlike interaction. |

---

### 📌 What You Should Do

* ✅ **Use Mistral 7B** as your primary model for SupportGenie (FAQ Chatbot).
* ✅ **Keep LLaMA 3 as a fallback** for general or unexpected queries to add flexibility.
* 🧠 You can even **hybrid both**: Use Mistral for strict FAQ match, and route to LLaMA when response confidence is low or FAQ not found.