# Simple Chatbot with Prompt Engineering & Streamlit Deployment
This notebook shows how to build a chatbot using a pretrained model, customize it with prompt engineering, add basic prompt tuning, and deploy it with Streamlit in Colab.

[**https://9b1afddba529.ngrok-free.app/**](https://9b1afddba529.ngrok-free.app/)





## Set Up Kaggle API Credentials

In [None]:
from google.colab import files
files.upload()  # upload kaggle.json

In [15]:
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

## Download and Extract Medical Q&A Dataset

In [16]:
## dataset
!kaggle datasets download -d thedevastator/comprehensive-medical-q-a-dataset

Dataset URL: https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
License(s): CC0-1.0
Downloading comprehensive-medical-q-a-dataset.zip to /content
  0% 0.00/4.89M [00:00<?, ?B/s]
100% 4.89M/4.89M [00:00<00:00, 765MB/s]


In [17]:
!unzip comprehensive-medical-q-a-dataset.zip -d ./medical_dataset

Archive:  comprehensive-medical-q-a-dataset.zip
  inflating: ./medical_dataset/train.csv  


## Load and Inspect the Dataset

In [18]:
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/medical_dataset/train.csv')
print(f"Dataset shape: {df.shape}")
print(df.head())

# Check basic info about the dataset
print("\nDataset info:")
print(df.info())
print("\nMissing values:")
print(df.isnull().sum())

Dataset shape: (16407, 3)
             qtype                                           Question  \
0   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   
1         symptoms  What are the symptoms of Lymphocytic Choriomen...   
2   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   
3  exams and tests  How to diagnose Lymphocytic Choriomeningitis (...   
4        treatment  What are the treatments for Lymphocytic Chorio...   

                                              Answer  
0  LCMV infections can occur after exposure to fr...  
1  LCMV is most commonly recognized as causing ne...  
2  Individuals of all ages who come into contact ...  
3  During the first phase of the disease, the mos...  
4  Aseptic meningitis, encephalitis, or meningoen...  

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16407 entries, 0 to 16406
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   q

In [19]:
## Take Samples From Data
df = df[['Question', 'Answer']]
df = df[:500]

## Preview Sample Questions and Answers

In [20]:
print("\nSample questions and answers:")
for i in range(5):
    print(f"Q: {df['Question'].iloc[i]}")
    print(f"A: {df['Answer'].iloc[i]}")
    print()


Sample questions and answers:
Q: Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?
A: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.

Q: What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?
A: LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial 

## Define System Prompt for Medical Assistant

In [21]:
system_prompt = """
You are a knowledgeable and careful medical assistant.
Provide clear, accurate, and general medical information.
Do not give personal medical advice.
Explain medical terms in simple language for patients.
When appropriate, use bullet points or structured formatting for long answers.
Always include a caution that this is general information, not a substitute for professional care.
"""

In [22]:
df['text'] = df.apply(lambda row: f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>{row['Question']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{row['Answer']}<|eot_id|>""", axis=1)

In [23]:
df

Unnamed: 0,Question,Answer,text
0,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...,<|begin_of_text|><|start_header_id|>system<|en...
1,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...,<|begin_of_text|><|start_header_id|>system<|en...
2,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...,<|begin_of_text|><|start_header_id|>system<|en...
3,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos...",<|begin_of_text|><|start_header_id|>system<|en...
4,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen...",<|begin_of_text|><|start_header_id|>system<|en...
...,...,...,...
495,What are the treatments for Absence of the Sep...,Absence of the SP alone is not a disorder but ...,<|begin_of_text|><|start_header_id|>system<|en...
496,What is the outlook for Absence of the Septum ...,When the absence of the septum pellucidum is p...,<|begin_of_text|><|start_header_id|>system<|en...
497,what research (or clinical trials) is being do...,The mission of the National Institute of Neuro...,<|begin_of_text|><|start_header_id|>system<|en...
498,What is (are) Peripheral Neuropathy ?,Peripheral neuropathy describes damage to the ...,<|begin_of_text|><|start_header_id|>system<|en...


In [24]:
df['text'].iloc[0]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a knowledgeable and careful medical assistant.\nProvide clear, accurate, and general medical information.\nDo not give personal medical advice.\nExplain medical terms in simple language for patients.\nWhen appropriate, use bullet points or structured formatting for long answers.\nAlways include a caution that this is general information, not a substitute for professional care.\n<|eot_id|><|start_header_id|>user<|end_header_id|>Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?<|eot_id|><|start_header_id|>assistant<|end_header_id|>LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmi

## Prepare Training and Evaluation Datasets

In [25]:
df=df.drop(columns=['Question','Answer'])

In [26]:
eval_df = df.sample(frac=0.1, random_state=42)

In [27]:
remaining_df = df.drop(eval_df.index)

In [28]:
from datasets import Dataset
train_dataset = Dataset.from_pandas(remaining_df)

In [29]:
eval_dataset=Dataset.from_pandas(eval_df)

In [30]:
train_dataset

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 450
})

In [31]:
eval_dataset

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 50
})

## Install Required Libraries

In [1]:
!pip install bitsandbytes transformers streamlit pyngrok peft  trl

Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting streamlit
  Downloading streamlit-1.49.1-py3-none-any.whl.metadata (9.5 kB)
Collecting pyngrok
  Downloading pyngrok-7.3.0-py3-none-any.whl.metadata (8.1 kB)
Collecting trl
  Downloading trl-0.22.2-py3-none-any.whl.metadata (11 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading streamlit-1.49.1-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m66.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyngrok-7.3.0-py3-none-any.whl (25 kB)
Downloading trl-0.22.2-py3-none-any.whl (544 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Load Pretrained Model
I will use `BioMistral/BioMistral-7B` for simplicity.


In [32]:
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [None]:
# Instruction-tuned model
# model_name = "unsloth/llama-3.2-3b-instruct-bnb-4bit"
# model_name = "moonshotai/Kimi-K2-Instruct-0905"
# model_name = "BioMistral/BioMistral-7B"

In [None]:
# from huggingface_hub import login
# from google.colab import userdata

# login(token="<add_key_here>")

## Create and Load Model with 4-Bit Quantization

In [33]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "BioMistral/BioMistral-7B"

def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16,
    )

    # Load model
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True
    )

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer

# Initialize
model, tokenizer = create_model_and_tokenizer()
model.config.use_cache = False

pytorch_model.bin:   0%|          | 0.00/14.5G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

## Inspect Model Architecture

In [34]:
for name, module in model.named_modules():
    print(name, module.__class__.__name__)

 MistralForCausalLM
model MistralModel
model.embed_tokens Embedding
model.layers ModuleList
model.layers.0 MistralDecoderLayer
model.layers.0.self_attn MistralAttention
model.layers.0.self_attn.q_proj Linear4bit
model.layers.0.self_attn.k_proj Linear4bit
model.layers.0.self_attn.v_proj Linear4bit
model.layers.0.self_attn.o_proj Linear4bit
model.layers.0.mlp MistralMLP
model.layers.0.mlp.gate_proj Linear4bit
model.layers.0.mlp.up_proj Linear4bit
model.layers.0.mlp.down_proj Linear4bit
model.layers.0.mlp.act_fn SiLU
model.layers.0.input_layernorm MistralRMSNorm
model.layers.0.post_attention_layernorm MistralRMSNorm
model.layers.1 MistralDecoderLayer
model.layers.1.self_attn MistralAttention
model.layers.1.self_attn.q_proj Linear4bit
model.layers.1.self_attn.k_proj Linear4bit
model.layers.1.self_attn.v_proj Linear4bit
model.layers.1.self_attn.o_proj Linear4bit
model.layers.1.mlp MistralMLP
model.layers.1.mlp.gate_proj Linear4bit
model.layers.1.mlp.up_proj Linear4bit
model.layers.1.mlp.dow

## Configure LoRA for Parameter-Efficient Fine-Tuning

In [35]:
lora_r = 8
lora_alpha = 32
lora_dropout = 0.1

#["q_proj", "k_proj", "v_proj", "o_proj"]

peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

## Set Training Arguments & Initialize SFT Trainer

In [36]:
training_arguments = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=1e-4,
    num_train_epochs=1,
    fp16=True,
    gradient_checkpointing=True,
    logging_steps=1,
    save_strategy="epoch"
)


trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    args=training_arguments,
)


# To clear out cache for unsuccessful run
torch.cuda.empty_cache()

Adding EOS to train dataset:   0%|          | 0/450 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/450 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/450 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

## Start Fine-Tuning the Model

In [37]:
train_result = trainer.train()

  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mahmdeltoky4[0m ([33mahmdeltoky4-faculty-of-engineering-tanta-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,1.8839
2,1.8054
3,1.7375
4,1.4891
5,1.3122
6,1.3174
7,1.1975
8,0.9427
9,1.0308
10,0.9951


In [40]:
trainer.model.save_pretrained("./results")
tokenizer.save_pretrained("./results")

('./results/tokenizer_config.json',
 './results/special_tokens_map.json',
 './results/chat_template.jinja',
 './results/tokenizer.model',
 './results/added_tokens.json',
 './results/tokenizer.json')

## Prompt Engineering Example
We customize the chatbot to act as a **knowledgeable medical assistant** by prepending instructions and few-shot examples to every user query.


## Define Few-Shot Prompting for Medical Chatbot

In [49]:
def generate_medical_fewshot(user_input):
    # Few-shot examples tailored for medical advice
    examples = (
        "Task: Provide accurate medical information.\n"
        "User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?\n"
        "Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents. Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, the mouth, or via bites. Person-to-person transmission is rare.\n"
        "User: What are the symptoms of LCM?\n"
        "Bot: LCMV infections may be asymptomatic or cause mild febrile illness. Some patients develop neurological symptoms such as meningitis, encephalitis, or meningoencephalitis. Pregnant women may pass the infection to the fetus, potentially causing birth defects.\n"
        "User: Who is at risk for LCM?\n"
        "Bot: Individuals of all ages exposed to urine, feces, saliva, or blood of wild mice, or pet rodents from contaminated colonies, are at risk. Laboratory workers handling infected animals are also at risk.\n"
        "User: How to diagnose LCM?\n"
        "Bot: Laboratory diagnosis is made by detecting IgM/IgG antibodies in CSF and serum, PCR testing, or virus isolation in the CSF during acute infection.\n"
        "User: What are the treatments for LCM?\n"
        "Bot: Treatment is supportive. Severe cases may require hospitalization. Corticosteroids may be considered, but there is no established antiviral treatment for humans.\n"
    )



    prompt = examples + f"User: {user_input}\nBot:"

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_length=512,
        do_sample=True,
        top_k=50,
        num_return_sequences=1,
        repetition_penalty=1.2
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("Bot:")[-1].strip()

## Test the Medical Chatbot in Notebook
Here we test the few-shot medical chatbot directly in the notebook using a sample query.

For a full interactive experience, use the Streamlit app launched above to chat with the model in real time.


In [42]:
## testing
print(generate_medical_fewshot("What are the treatments for Lymphocytic Choriomeningitis (LCM) ?"))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Task: Provide accurate medical information.
User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?
Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents. Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, the mouth, or via bites. Person-to-person transmission is rare.
User: What are the symptoms of LCM?
Bot: LCMV infections may be asymptomatic or cause mild febrile illness. Some patients develop neurological symptoms such as meningitis, encephalitis, or meningoencephalitis. Pregnant women may pass the infection to the fetus, potentially causing birth defects.
User: Who is at risk for LCM?
Bot: Individuals of all ages exposed to urine, feces, saliva, or blood of wild mice, or pet rodents from contaminated colonies, are at risk. Laboratory workers handling infected animals are also at risk.
User: How to diagnose LCM?
Bot: Laboratory diagnosis is made by 

## Deploy with Streamlit in Colab
We will create a `chatbot_app.py` file and run it with Streamlit + ngrok.

In [5]:
# import zipfile
# import os

# # Path to your zip file
# zip_path = "/content/drive/MyDrive/ahmed_eltokhy_session_9_med_chatboy/results-20250909T131111Z-1-001.zip"

# # Destination folder where you want to extract
# extract_dir = "/content/"

# # Make sure the destination folder exists
# os.makedirs(extract_dir, exist_ok=True)

# # Unzip the file
# with zipfile.ZipFile(zip_path, 'r') as zip_ref:
#     zip_ref.extractall(extract_dir)

# print(f"Unzipped files to {extract_dir}")


Unzipped files to /content/


In [50]:
%%writefile chatbot_app.py

import streamlit as st
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import PeftModel

model_name = "BioMistral/BioMistral-7B"

# ---- Load model and tokenizer ----
@st.cache_resource(show_spinner=True)
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer

model, tokenizer = create_model_and_tokenizer()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Load LoRA adapters
fine_tuned_model = PeftModel.from_pretrained(model, "./results")
fine_tuned_model.eval()

# ---- Chatbot function ----
def generate_medical_fewshot(user_input):
    examples = (
        "Task: Provide accurate medical information.\n"
        "User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?\n"
        "Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.\n"
        "User: What are the symptoms of LCM?\n"
        "Bot: LCMV infections may be asymptomatic or cause mild febrile illness.\n"
    )
    prompt = examples + f"User: {user_input}\nBot:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        top_k=50,
        num_return_sequences=1,
        repetition_penalty=1.2,
        eos_token_id=tokenizer.eos_token_id
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("Bot:")[-1].strip()

# ---- Streamlit UI ----
st.title("💊 Medical Chatbot")
st.markdown("Ask your medical questions below. This chatbot provides general medical information only.")

# Store only last response
if "last_response" not in st.session_state:
    st.session_state.last_response = ""

user_input = st.text_input("Your Question:")

if st.button("Send") and user_input:
    st.session_state.last_response = generate_medical_fewshot(user_input)

# ---- Display last bot response only ----
if st.session_state.last_response:
    st.markdown(f"**Bot:** {st.session_state.last_response}")

Overwriting chatbot_app.py


## Launch Medical Chatbot via Streamlit + Ngrok
This cell configures Ngrok with your auth token and runs the Streamlit app in Colab, exposing a public URL for accessing the medical chatbot.


In [None]:
from pyngrok import ngrok, conf

# Replace with your token
# NGROK_AUTH_TOKEN = "<add_key_here>"

!ngrok config add-authtoken $NGROK_AUTH_TOKEN


Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [52]:
from pyngrok import ngrok
!streamlit run chatbot_app.py &>/dev/null &
url = ngrok.connect(8501)
print('Chatbot running at:', url)


Chatbot running at: NgrokTunnel: "https://9b1afddba529.ngrok-free.app" -> "http://localhost:8501"
