# Simple Chatbot with Prompt Engineering & Streamlit Deployment
This notebook shows how to build a chatbot using a pretrained model, customize it with prompt engineering, add basic prompt tuning, and deploy it with Streamlit in Colab.

[**https://289061def254.ngrok-free.app/**](https://289061def254.ngrok-free.app/)

## Set Up Kaggle API Credentials

In [2]:
from google.colab import files
files.upload()  # upload kaggle.json

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"ahmdeltoky","key":"9f073b262d5f33650bb856dfceea183d"}'}

In [3]:
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

## Download and Extract Medical Q&A Dataset

In [4]:
## dataset
!kaggle datasets download -d thedevastator/comprehensive-medical-q-a-dataset

Dataset URL: https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
License(s): CC0-1.0
Downloading comprehensive-medical-q-a-dataset.zip to /content
  0% 0.00/4.89M [00:00<?, ?B/s]
100% 4.89M/4.89M [00:00<00:00, 599MB/s]


In [5]:
!unzip comprehensive-medical-q-a-dataset.zip -d ./medical_dataset

Archive:  comprehensive-medical-q-a-dataset.zip
  inflating: ./medical_dataset/train.csv  


## Load and Inspect the Dataset

In [46]:
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/medical_dataset/train.csv')
print(f"Dataset shape: {df.shape}")
print(df.head())

# Check basic info about the dataset
print("\nDataset info:")
print(df.info())
print("\nMissing values:")
print(df.isnull().sum())

Dataset shape: (16407, 3)
             qtype                                           Question  \
0   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   
1         symptoms  What are the symptoms of Lymphocytic Choriomen...   
2   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   
3  exams and tests  How to diagnose Lymphocytic Choriomeningitis (...   
4        treatment  What are the treatments for Lymphocytic Chorio...   

                                              Answer  
0  LCMV infections can occur after exposure to fr...  
1  LCMV is most commonly recognized as causing ne...  
2  Individuals of all ages who come into contact ...  
3  During the first phase of the disease, the mos...  
4  Aseptic meningitis, encephalitis, or meningoen...  

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16407 entries, 0 to 16406
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   q

In [47]:
## Take Samples From Data
df = df[['Question', 'Answer']]
df = df[:1250]

## Preview Sample Questions and Answers

In [48]:
print("\nSample questions and answers:")
for i in range(5):
    print(f"Q: {df['Question'].iloc[i]}")
    print(f"A: {df['Answer'].iloc[i]}")
    print()


Sample questions and answers:
Q: Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?
A: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.

Q: What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?
A: LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial 

## Define System Prompt for Medical Assistant

In [49]:
system_prompt = """
You are a knowledgeable and careful medical assistant.
Provide clear, accurate, and general medical information.
Do not give personal medical advice.
Explain medical terms in simple language for patients.
When appropriate, use bullet points or structured formatting for long answers.
Always include a caution that this is general information, not a substitute for professional care.
"""

In [50]:
df['text'] = df.apply(lambda row: f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>{row['Question']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{row['Answer']}<|eot_id|>""", axis=1)

In [51]:
df

Unnamed: 0,Question,Answer,text
0,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...,<|begin_of_text|><|start_header_id|>system<|en...
1,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...,<|begin_of_text|><|start_header_id|>system<|en...
2,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...,<|begin_of_text|><|start_header_id|>system<|en...
3,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos...",<|begin_of_text|><|start_header_id|>system<|en...
4,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen...",<|begin_of_text|><|start_header_id|>system<|en...
...,...,...,...
1245,what research (or clinical trials) is being do...,The NINDS conducts and supports research on TS...,<|begin_of_text|><|start_header_id|>system<|en...
1246,What is (are) Neurosarcoidosis ?,Neurosarcoidosis is a manifestation of sarcoid...,<|begin_of_text|><|start_header_id|>system<|en...
1247,What are the treatments for Neurosarcoidosis ?,There is no agreed upon standard of treatment ...,<|begin_of_text|><|start_header_id|>system<|en...
1248,What is the outlook for Neurosarcoidosis ?,The prognosis for patients with neurosarcoidos...,<|begin_of_text|><|start_header_id|>system<|en...


In [52]:
df['text'].iloc[0]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a knowledgeable and careful medical assistant.\nProvide clear, accurate, and general medical information.\nDo not give personal medical advice.\nExplain medical terms in simple language for patients.\nWhen appropriate, use bullet points or structured formatting for long answers.\nAlways include a caution that this is general information, not a substitute for professional care.\n<|eot_id|><|start_header_id|>user<|end_header_id|>Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?<|eot_id|><|start_header_id|>assistant<|end_header_id|>LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmi

## Prepare Training and Evaluation Datasets

In [53]:
df=df.drop(columns=['Question','Answer'])

In [54]:
eval_df = df.sample(frac=0.1, random_state=42)

In [55]:
remaining_df = df.drop(eval_df.index)

In [56]:
from datasets import Dataset
train_dataset = Dataset.from_pandas(remaining_df)

In [57]:
eval_dataset=Dataset.from_pandas(eval_df)

In [58]:
train_dataset

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 1125
})

In [59]:
eval_dataset

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 125
})

## Install Required Libraries

In [20]:
!pip install bitsandbytes transformers streamlit pyngrok peft  trl

Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting streamlit
  Downloading streamlit-1.49.1-py3-none-any.whl.metadata (9.5 kB)
Collecting pyngrok
  Downloading pyngrok-7.3.0-py3-none-any.whl.metadata (8.1 kB)
Collecting trl
  Downloading trl-0.22.2-py3-none-any.whl.metadata (11 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading streamlit-1.49.1-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyngrok-7.3.0-py3-none-any.whl (25 kB)
Downloading trl-0.22.2-py3-none-any.whl (544 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Load Pretrained Model
I will use `unsloth/llama-3.2-3b-instruct-bnb-4bit` for simplicity.


In [21]:
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [22]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig

# Instruction-tuned model
model_name = "unsloth/llama-3.2-3b-instruct-bnb-4bit"
# model_name = "moonshotai/Kimi-K2-Instruct-0905"

In [None]:
# from huggingface_hub import login
# from google.colab import userdata
# login(token=userdata.get('hugging_face_key'))

## Create and Load Model with 4-Bit Quantization

In [23]:
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        use_safetensors=True,
        quantization_config=bnb_config,
        trust_remote_code=True,
        device_map="auto",
    )

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer

model, tokenizer = create_model_and_tokenizer()
model.config.use_cache = False

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]



model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

## Inspect Model Architecture

In [25]:
for name, module in model.named_modules():
    print(name, module.__class__.__name__)

 LlamaForCausalLM
model LlamaModel
model.embed_tokens Embedding
model.layers ModuleList
model.layers.0 LlamaDecoderLayer
model.layers.0.self_attn LlamaAttention
model.layers.0.self_attn.q_proj Linear4bit
model.layers.0.self_attn.k_proj Linear4bit
model.layers.0.self_attn.v_proj Linear4bit
model.layers.0.self_attn.o_proj Linear4bit
model.layers.0.mlp LlamaMLP
model.layers.0.mlp.gate_proj Linear4bit
model.layers.0.mlp.up_proj Linear4bit
model.layers.0.mlp.down_proj Linear4bit
model.layers.0.mlp.act_fn SiLU
model.layers.0.input_layernorm LlamaRMSNorm
model.layers.0.post_attention_layernorm LlamaRMSNorm
model.layers.1 LlamaDecoderLayer
model.layers.1.self_attn LlamaAttention
model.layers.1.self_attn.q_proj Linear4bit
model.layers.1.self_attn.k_proj Linear4bit
model.layers.1.self_attn.v_proj Linear4bit
model.layers.1.self_attn.o_proj Linear4bit
model.layers.1.mlp LlamaMLP
model.layers.1.mlp.gate_proj Linear4bit
model.layers.1.mlp.up_proj Linear4bit
model.layers.1.mlp.down_proj Linear4bit
mo

## Configure LoRA for Parameter-Efficient Fine-Tuning

In [60]:
lora_r = 16
lora_alpha = 32
lora_dropout = 0.1

#["q_proj", "k_proj", "v_proj", "o_proj"]

peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=["q_proj", "k_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

## Set Training Arguments & Initialize SFT Trainer

In [61]:
training_arguments = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=1e-4,
    num_train_epochs=1,
    fp16=True,
    gradient_checkpointing=True,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    args=training_arguments,
)


# To clear out cache for unsuccessful run
torch.cuda.empty_cache()



Adding EOS to train dataset:   0%|          | 0/1125 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/1125 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1125 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/125 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/125 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/125 [00:00<?, ? examples/s]

## Start Fine-Tuning the Model

In [62]:
train_result = trainer.train()

Step,Training Loss


## Prompt Engineering Example
We customize the chatbot to act as a **knowledgeable medical assistant** by prepending instructions and few-shot examples to every user query.


## Define Few-Shot Prompting for Medical Chatbot

In [63]:
def generate_medical_fewshot(user_input):
    # Few-shot examples tailored for medical advice
    examples = (
        "Task: Provide accurate medical information.\n"
        "User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?\n"
        "Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents. Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, the mouth, or via bites. Person-to-person transmission is rare.\n"
        "User: What are the symptoms of LCM?\n"
        "Bot: LCMV infections may be asymptomatic or cause mild febrile illness. Some patients develop neurological symptoms such as meningitis, encephalitis, or meningoencephalitis. Pregnant women may pass the infection to the fetus, potentially causing birth defects.\n"
        "User: Who is at risk for LCM?\n"
        "Bot: Individuals of all ages exposed to urine, feces, saliva, or blood of wild mice, or pet rodents from contaminated colonies, are at risk. Laboratory workers handling infected animals are also at risk.\n"
        "User: How to diagnose LCM?\n"
        "Bot: Laboratory diagnosis is made by detecting IgM/IgG antibodies in CSF and serum, PCR testing, or virus isolation in the CSF during acute infection.\n"
        "User: What are the treatments for LCM?\n"
        "Bot: Treatment is supportive. Severe cases may require hospitalization. Corticosteroids may be considered, but there is no established antiviral treatment for humans.\n"
    )



    prompt = examples + f"User: {user_input}\nBot:"

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_length=512,
        do_sample=True,
        top_k=50,
        num_return_sequences=1,
        repetition_penalty=1.2
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.strip()

## Test the Medical Chatbot in Notebook
Here we test the few-shot medical chatbot directly in the notebook using a sample query.

For a full interactive experience, use the Streamlit app launched above to chat with the model in real time.


In [75]:
## testing
print(generate_medical_fewshot("What are the treatments for Lymphocytic Choriomeningitis (LCM) ?"))



Task: Provide accurate medical information.
User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?
Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents. Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, the mouth, or via bites. Person-to-person transmission is rare.
User: What are the symptoms of LCM?
Bot: LCMV infections may be asymptomatic or cause mild febrile illness. Some patients develop neurological symptoms such as meningitis, encephalitis, or meningoencephalitis. Pregnant women may pass the infection to the fetus, potentially causing birth defects.
User: Who is at risk for LCM?
Bot: Individuals of all ages exposed to urine, feces, saliva, or blood of wild mice, or pet rodents from contaminated colonies, are at risk. Laboratory workers handling infected animals are also at risk.
User: How to diagnose LCM?
Bot: Laboratory diagnosis is made by 

## Using Another Model

In [74]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# from transformers import BitsAndBytesConfig
# import torch

# model_name_2 = "BioMistral/BioMistral-7B"

# bnb_config = BitsAndBytesConfig(
#     load_in_8bit=True,
#     bnb_8bit_use_double_quant=True,
#     bnb_8bit_quant_type="nf4",
#     bnb_8bit_compute_dtype=torch.float16,
# )

# model_1 = AutoModelForCausalLM.from_pretrained(
#     model_name_2,
#     quantization_config=bnb_config,
#     trust_remote_code=True,
#     device_map="auto",
#     use_safetensors=True,
# )

# tokenizer = AutoTokenizer.from_pretrained(model_name_2)
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = "right"

In [None]:
# from peft import get_peft_model, LoraConfig
# from peft import TaskType

# lora_config = LoraConfig(
#     r=16,
#     lora_alpha=32,
#     target_modules=["q_proj", "v_proj"],
#     lora_dropout=0.1,
#     bias="none",
#     task_type=TaskType.CAUSAL_LM,
# )

# model_1 = get_peft_model(model_1, lora_config)


In [None]:
# from transformers import Trainer, TrainingArguments

# training_args = TrainingArguments(
#     output_dir="./output",
#     evaluation_strategy="epoch",
#     learning_rate=2e-5,
#     per_device_train_batch_size=1,
#     gradient_accumulation_steps=16,
#     num_train_epochs=3,
#     logging_dir="./logs",
#     logging_steps=10,
#     save_strategy="epoch",
#     save_total_limit=3,
# )


In [None]:
# from transformers import DataCollatorForSeq2Seq

# data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

# trainer = Trainer(
#     model=model,
#     args=training_args,
#     train_dataset=dataset["train"],
#     eval_dataset=dataset["validation"],
#     tokenizer=tokenizer,
#     data_collator=data_collator,
# )

# trainer.train()


In [None]:
# trainer.save_model("./fine_tuned_biomistral_7b")

## Deploy with Streamlit in Colab
We will create a `chatbot_app.py` file and run it with Streamlit + ngrok.

In [65]:
%%writefile chatbot_app.py
import streamlit as st
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load instruction-tuned model
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Few-shot chatbot function
def generate_medical_fewshot(user_input):
    # Few-shot examples tailored for medical advice
    examples = (
        "Task: Provide accurate medical information.\n"
        "User: Who is at risk for Lymphocytic Choriomeningitis (LCM)?\n"
        "Bot: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents. Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, the mouth, or via bites. Person-to-person transmission is rare.\n"
        "User: What are the symptoms of LCM?\n"
        "Bot: LCMV infections may be asymptomatic or cause mild febrile illness. Some patients develop neurological symptoms such as meningitis, encephalitis, or meningoencephalitis. Pregnant women may pass the infection to the fetus, potentially causing birth defects.\n"
        "User: Who is at risk for LCM?\n"
        "Bot: Individuals of all ages exposed to urine, feces, saliva, or blood of wild mice, or pet rodents from contaminated colonies, are at risk. Laboratory workers handling infected animals are also at risk.\n"
        "User: How to diagnose LCM?\n"
        "Bot: Laboratory diagnosis is made by detecting IgM/IgG antibodies in CSF and serum, PCR testing, or virus isolation in the CSF during acute infection.\n"
        "User: What are the treatments for LCM?\n"
        "Bot: Treatment is supportive. Severe cases may require hospitalization. Corticosteroids may be considered, but there is no established antiviral treatment for humans.\n"
    )


    prompt = examples + f"User: {user_input}\nBot:"

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_length=256,
        do_sample=True,
        top_k=50,
        num_return_sequences=1,
        repetition_penalty=1.2
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.strip()

# Streamlit UI
st.title("💬 Few-Shot Learning Chatbot")
st.write("This chatbot explains concepts in simple words using few-shot learning.")

user_input = st.text_input("You:")
if st.button("Send") and user_input:
    st.write("Bot:", generate_medical_fewshot(user_input))


Writing chatbot_app.py


## Launch Medical Chatbot via Streamlit + Ngrok
This cell configures Ngrok with your auth token and runs the Streamlit app in Colab, exposing a public URL for accessing the medical chatbot.


In [70]:
from pyngrok import ngrok, conf

# Replace with your token
NGROK_AUTH_TOKEN = "32QBaztwKNy7T1Cmve0LtyJH9Y4_4yGaUF6i6MRY1ZNo4FFXL"

!ngrok config add-authtoken $NGROK_AUTH_TOKEN

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [71]:
from pyngrok import ngrok
!streamlit run chatbot_app.py &>/dev/null &
url = ngrok.connect(8501)
print('Chatbot running at:', url)

Chatbot running at: NgrokTunnel: "https://289061def254.ngrok-free.app" -> "http://localhost:8501"


In [None]:
!pip install -U bitsandbytes