# Medical Chatbot Inference & Interface (using checkpoint-2000)

This notebook demonstrates how to load your current best checkpoint and interact with your domain-specific chatbot. When you finish retraining, you can simply update the checkpoint path to use your best model.

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "Featherless-Chat-Models/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="cpu"
)

print("4-bit model downloaded.")


Loading weights: 100%|██████████| 291/291 [05:05<00:00,  1.05s/it, Materializing param=model.norm.weight]                              


4-bit model downloaded.


NameError: name 'trainer' is not defined

In [7]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import os
import torch
from proccess import formatted_ds

# Use the correct local checkpoint directory for tokenizer and adapter
ckpt_dir = os.path.abspath("./checkpoint-2000")

# Load tokenizer from local checkpoint directory
tokenizer = AutoTokenizer.from_pretrained(
    ckpt_dir,
    local_files_only=True
)

# Load base model (quantized, force CPU to avoid OOM)
bnb_config = BitsAndBytesConfig(load_in_4bit=True, llm_int8_enable_fp32_cpu_offload=True)
base_model = AutoModelForCausalLM.from_pretrained(
    "Featherless-Chat-Models/Mistral-7B-Instruct-v0.2",
    device_map="cpu",  # Force CPU to avoid GPU OOM
    quantization_config=bnb_config,
    local_files_only=True
)

# Load LoRA adapter from local checkpoint directory
model = PeftModel.from_pretrained(
    base_model,
    ckpt_dir,
    local_files_only=True
)

# # Move model to GPU if available
# device = "cuda" if torch.cuda.is_available() else "cpu"
# model = model.to(device)

OSError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/umugabekazi/Desktop/health-care_chatbot/model/checkpoint-2000'. Use `repo_type` argument if needed.

## Try the chatbot: Single prompt example

In [None]:
# Set device for inference
device = "cuda" if torch.cuda.is_available() else "cpu"

prompt = "Instruction: What are the symptoms of diabetes? Question:  Response:"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}  # Move all tensors to the correct device
model = model.to(device)  # Ensure model is on the same device

outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Instruction: What are the symptoms of diabetes? Question:  Response:  Diabetes is a chronic condition that affects the way your body processes blood sugar. Symptoms of diabetes can include frequent urination, excessive thirst, blurred vision, and increased hunger. These symptoms occur when the body is unable to produce or use insulin effectively, leading to high blood sugar levels. If left untreated, diabetes can cause a range of complications, including nerve damage, kidney damage, and cardiovascular disease. It is important to seek medical attention if you experience any of these symptoms, as early diagnosis and treatment can help prevent or manage the condition.


## Interactive Chatbot Interface (Gradio)
You can use this cell to launch a simple web interface for your chatbot. When you retrain and have a better checkpoint, just update `ckpt_dir` above.

In [None]:
import gradio as gr

def chat(query):
    # Wrap user input in the expected prompt format
    prompt = f"Instruction: {query} Response:"
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(**inputs, max_new_tokens=128)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Remove repeated question if present
    lines = response.split('\n')
    if lines[0].strip().lower().startswith(query.strip().lower()):
        response = '\n'.join(lines[1:]).strip()
    return response

gr.Interface(
    fn=chat,
    inputs=gr.Textbox(lines=2, label="Your medical question"),
    outputs=gr.Textbox(label="Chatbot Response"),
    title="Medical Chatbot",
    description="Ask a medical question and get a response from your fine-tuned model (checkpoint-2000)."
).launch()

* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.




Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


---
**When you finish retraining:**
- Change `ckpt_dir` to your best checkpoint directory (e.g., `./checkpoint-8000` or `./best_model`)
- Re-run the notebook to use the improved model in the interface.