<a href="https://colab.research.google.com/github/Zeeshan506/developerhub-task-4-general-health-query-chatbot/blob/main/Task_4_General_Health_Query_Chatbot_%7C_DeveloperHub_Internship.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  **Task 4: General Health Query Chatbot**
---

In this notebook, we'll build a General Health Query Chatbot using Mistral-7B-Instruct, a powerful open-source large language model. Our goal is to allow users to ask common health-related questions and get informative, conversational responses ‚Äî all directly in the notebook through an easy-to-use interface.

---

To achieve this, we will:

      Load the Mistral-7B-Instruct model via Hugging Face Transformers.

      Use Gradio to build an interactive chatbot UI.

      Implement a function (ask_bot) to generate model responses to user questions.

      Run inference efficiently using GPU acceleration in Colab (with torch_dtype=torch.float16 and device_map="auto").

      By the end of this notebook, you'll have a fully functional chatbot interface capable of answering general health queries in real time.

## Installing Huggingface Hub

In [1]:
!pip install transformers accelerate huggingface_hub

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

## Logging in the hugging face hub

In [10]:
from huggingface_hub import login
from google.colab import userdata

login(userdata.get('HF_TOKEN'))

## fetch and istall the model

       In colab, make you are running, T4 GPU as runtime gpu

In [32]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16
)


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]



In [34]:
!pip install gradio --quiet # installing gradio, lets us add ui to notebooks and even gives a shareable link

In [35]:
import gradio as gr

## Defining the Pipeline
        we created a function ask_bot which uses the model fecthed to send a question to
        the model and get the answers. the process will be fast if you are running a gpu
        yourself, on colab T4 it would take like 2-3 mins to get a response from the model

In [36]:
def ask_bot(prompt, max_new_tokens=256):
    instruction_prompt = f"[INST] You are a helpful AI health assistant. Answer clearly and kindly.\n\n{prompt} [/INST]"

    inputs = tokenizer(instruction_prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )

    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)


    response = decoded.replace(instruction_prompt, "").strip()
    return response


with gr.Blocks() as demo:
    gr.Markdown("## ü§ñ General Health Assistant (Mistral 7B-Instruct)")
    with gr.Row():
        query = gr.Textbox(label="Ask your health-related question", placeholder="e.g. What are the symptoms of diabetes?")
    with gr.Row():
        response = gr.Textbox(label="Bot's Response")
    with gr.Row():
        submit = gr.Button("Submit")

    submit.click(fn=ask_bot, inputs=query, outputs=response)


demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://8b19865e60ea0e477c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


