# Medical Information Assistant (Colab Edition)

This notebook acts as a standalone, secure medical information assistant.

**Features:**
- Runs entirely on Google Colab (Free Tier T4 GPU supported)
- Uses an open-source model (Llama-3-8B-Instruct or similar 4-bit quantized)
- Strictly follows safety guidelines: No diagnosis, No prescriptions.
- No external API keys required.

**Instructions:**
1.  Connect to a GPU runtime (Runtime > Change runtime type > T4 GPU).
2.  Run all cells in order.

## 1. Install Dependencies
We install `unsloth` for faster 4-bit inference and other necessary libraries.

In [None]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

## 2. Load the Model
We use `unsloth/llama-3-8b-Instruct-bnb-4bit`. This model is highly optimized, small enough for Colab's T4 GPU, and excellent at following instructions.

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

## 3. Configure the Assistant
Here we define the system prompt that ensures the model behaves safely and professionally.

In [None]:
# Define the System Prompt strictly following the user's constraints
medical_system_prompt = """
You are a medical information assistant. Your purpose is to provide general, educational health information.

STRICT RULES:
- Do NOT diagnose diseases.
- Do NOT prescribe medicines or treatments.
- Do NOT recommend specific drugs (no antibiotics, steroids, or prescriptions).
- Do NOT ask follow-up questions.
- Do NOT role-play as a doctor.
- Do NOT include legal disclaimers.

Answer style:
- Use simple, clear language.
- Respond in 3â€“5 short bullet points.
- Focus on general care, lifestyle habits, hygiene, rest, hydration, and symptom monitoring.
- If medical certainty is not possible, begin with: "General health advice includes:"

Safety behavior:
- If input is vague, politely request a clearer question.
- If question exceeds general education, provide high-level guidance only.
- Never hallucinate medical facts.
"""

def generate_response(user_input):
    messages = [
        {"role": "system", "content": medical_system_prompt},
        {"role": "user", "content": user_input},
    ]
    
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize = True,
        add_generation_prompt = True, # Must add for generation
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(inputs, max_new_tokens = 512, use_cache = True)
    response = tokenizer.batch_decode(outputs)
    
    # Parse the response to get only the assistant's part
    # Llama-3 format usually puts answer after header. We'll do a simple split if needed,
    return tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0]

## 4. Run Medical Assistant
Run the cell below to start chatting.

In [None]:
print("medi-bot: Hello! I am your AI medical assistant. I can help with general health questions.")
print("          (Type 'exit' to stop)")
print("-" * 60)

while True:
    try:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit"]:
            print("medi-bot: Stay healthy! Goodbye.")
            break
            
        if not user_input.strip():
            continue

        print("medi-bot:", end=" ")
        response = generate_response(user_input)
        print(response)
        print("-" * 60)
    except KeyboardInterrupt:
        print("\nExiting...")
        break
