<h1>Task 4: General Health Query Chatbot<h1>  

Objective: Create a chatbot that can answer general health-related questions using a Large Language Model (LLM) and prompt engineering.

Intern: Muhammad Hasan Waqar  
Date: 24/6/25

In [3]:
from huggingface_hub import login

# This will prompt you to enter your access token.
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

<h1>Step 1: Install and Import Necessary Libraries<h1>  

First, we need to install the transformers library from Hugging Face, along with accelerate and bitsandbytes to help us run the model efficiently on Colab's hardware.

In [1]:
!pip install transformers accelerate bitsandbytes



<h1>Step 2: Load the LLM and Tokenizer<h1>  

We will load Mistral-7B-Instruct-v0.2, a powerful open-source model. To make it fit within Colab's memory, we'll load it in 4-bit precision (load_in_4bit=True). This significantly reduces the model's size without a huge drop in performance.
We also load the model's tokenizer, which is responsible for converting our text prompts into a format the model understands (tokens).

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# Define the model ID
model_id = "HuggingFaceH4/zephyr-7b-beta" # Or "mistralai/Mistral-7B-Instruct-v0.2" if you have access

# --- Create the quantization configuration ---
# This tells the model to load in 4-bit and to use float16 for computations.
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load the model with the specified quantization configuration
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

<h1>Step 3: Create the Chatbot Function with Prompt Engineering<h1>

Now we'll create a function called get_health_response. This function will take a user's query and wrap it inside a carefully crafted prompt. This is the core of prompt engineering.  
Our prompt will include:  


1.   System Prompt: A set of instructions that defines the chatbot's persona and rules. We'll tell it to "act like a friendly and helpful medical assistant."

2.   Safety Filter: A crucial instruction to never give direct medical advice and to always recommend consulting a real doctor. This is a vital safety measure.


3.   User Query: The actual question from the user.

The [INST] and [/INST] tags are special tokens that the Mistral Instruct model uses to understand the difference between instructions and the expected response area.

In [4]:
def get_health_response(user_query):
    """
    Generates a response from the LLM using a structured prompt.

    Args:
        user_query (str): The user's health-related question.

    Returns:
        str: The chatbot's generated response.
    """
    # --- This is our engineered prompt ---
    system_prompt = (
        "You are a friendly and helpful medical assistant chatbot. "
        "Your goal is to provide clear, general information based on the user's query. "
        "**Crucially, you must not give direct medical advice.** "
        "Always end your response with a disclaimer, for example: "
        "'Please remember, this information is for educational purposes only. Always consult with a qualified healthcare professional for any medical advice or treatment.'"
    )

    # Combine the system prompt and user query into the final prompt format for Mistral
    prompt = f"<s>[INST] {system_prompt} [/INST]\n[INST] {user_query} [/INST]"

    # Tokenize the prompt and send it to the GPU
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    # Generate a response from the model
    # We set max_new_tokens to limit the length of the response.
    outputs = model.generate(**inputs, max_new_tokens=250, pad_token_id=tokenizer.eos_token_id)

    # Decode the output tokens back into a string
    # We skip the special tokens and the original prompt to get only the new response.
    response_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    answer = response_text.split("[/INST]")[-1].strip()

    return answer


<h1>Step 4: Test the Chatbot with Example Queries<h1>

Let's test our function with the example queries provided in the task description to see how well our prompt engineering and safety filters work.


In [6]:
from huggingface_hub import login

# This will prompt you to enter your access token.
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
# --- Example Query 1 ---
query1 = "What causes a sore throat?"
print(f"User Query: {query1}\n")
response1 = get_health_response(query1)
print(f"Chatbot Response:\n{response1}\n")
print("-" * 50)


# --- Example Query 2 ---
query2 = "Is paracetamol safe for children?"
print(f"User Query: {query2}\n")
response2 = get_health_response(query2)
print(f"Chatbot Response:\n{response2}\n")
print("-" * 50)


# --- Example Query 3 (Testing the safety filter) ---
query3 = "I have a bad headache. Should I take aspirin?"
print(f"User Query: {query3}\n")
response3 = get_health_response(query3)
print(f"Chatbot Response:\n{response3}\n")


User Query: What causes a sore throat?

Chatbot Response:
[ASS] A sore throat, also known as pharyngitis, can be caused by a variety of factors. Some common causes include:

1. Viruses: The most common cause of a sore throat is a viral infection, such as the common cold or flu.

2. Bacteria: Streptococcus bacteria, which can lead to strep throat, is another common cause of sore throats.

3. Allergies: Allergies to pollen, dust, or other irritants can cause inflammation in the throat, leading to a sore throat.

4. Irritants: Exposure to irritants such as smoke, pollution, or chemicals can cause a sore throat.

5. Dry air: Dry air, especially during winter months, can cause a sore throat due to dryness in the throat.

Remember, if your sore throat persists for more than a week or is accompanied by other symptoms such as fever, difficulty swallowing, or swollen lymph nodes, it's best to consult with a qualified healthcare professional.

--------------------------------------------------
U

<h1>Final Conclusion<h1>

This project successfully created a helpful and safe health query chatbot using a large language model. By applying prompt engineering, we guided the model to provide clear, general information while strictly adhering to a critical safety rule: never give direct medical advice and always include a disclaimer.

The final chatbot effectively answered user queries and demonstrated the power of leveraging pre-trained models for specialized tasks. This serves as a strong proof-of-concept for building responsible AI assistants in sensitive domains.