<a href="https://colab.research.google.com/github/GVarshitha2110/Cosmetic/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install opencv-python
!pip install numpy
!pip install flask
!pip install tensorflow
!pip install keras




In [2]:
# ===========================================================
# RUN ADVANCED MEDICAL SYMPTOM CHECKER
# ===========================================================

import os
os.environ["PJRT_DEVICE"] = "CPU"
os.environ["COLAB_TPU_ADDR"] = ""

# Import libraries
import time
import pandas as pd
import gradio as gr
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    AutoModelForSeq2SeqLM,
    pipeline,
)

# -----------------------
# Create small dataset
# -----------------------
data = {
    "text": [
        "fever cough tiredness",
        "headache dizziness nausea",
        "chest pain shortness of breath",
        "skin rash itching",
        "joint pain swelling",
        "abdominal pain vomiting",
        "sore throat fever",
        "runny nose sneezing",
        "back pain stiffness",
        "eye redness irritation",
    ],
    "labels": [
        "Viral Infection",
        "Migraine",
        "Heart Problem",
        "Skin Allergy",
        "Arthritis",
        "Food Poisoning",
        "Throat Infection",
        "Common Cold",
        "Back Strain",
        "Eye Allergy",
    ]
}
df = pd.DataFrame(data)
df.to_csv("medical_data.csv", index=False)
dataset = load_dataset("csv", data_files="medical_data.csv")

# Label encoding
unique_labels = list(dict.fromkeys(df["labels"]))
label_to_id = {label: i for i, label in enumerate(unique_labels)}
id_to_label = {i: label for label, i in label_to_id.items()}
dataset = dataset.map(lambda x: {"labels": label_to_id[x["labels"]]})

# -----------------------
# Tokenize
# -----------------------
tokenizer_default = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize(batch):
    return tokenizer_default(batch["text"], truncation=True, padding="max_length")
dataset_tokenized = dataset.map(tokenize, batched=True)
dataset_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
train_data = dataset_tokenized["train"]

device = "cuda" if torch.cuda.is_available() else "cpu"

# -----------------------
# Build & train lightweight model
# -----------------------
def build_and_train(model_name="distilbert-base-uncased", epochs=2):
    global model, tokenizer, classifier_pipeline, current_model_name
    current_model_name = model_name
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=len(unique_labels)
    ).to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=3e-5)
    model.train()
    for epoch in range(epochs):
        total_loss = 0.0
        for i in range(0, len(train_data), 2):
            batch = train_data[i:i+2]
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)

            optimizer.zero_grad()
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch+1}/{epochs} — loss: {total_loss:.4f}")

    classifier_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer,
                                   device=0 if torch.cuda.is_available() else -1)
    print("Training complete and classifier pipeline ready.")

build_and_train()

# -----------------------
# Gradio interface
# -----------------------
def ui_predict(symptoms):
    if not symptoms or symptoms.strip() == "":
        return "Please enter your symptoms."
    out = classifier_pipeline(symptoms)[0]
    label_token = out["label"]
    idx = int(label_token.split("_")[-1])
    disease = id_to_label[idx]
    confidence = round(out["score"]*100,2)
    return f"Predicted Disease: {disease}\nConfidence: {confidence}%"

with gr.Blocks() as demo:
    gr.Markdown("## Advanced Medical Symptom Checker")
    inp = gr.Textbox(lines=3, placeholder="Enter symptoms like 'fever, cough'")
    outp = gr.Textbox()
    btn = gr.Button("Analyze")
    btn.click(fn=ui_predict, inputs=inp, outputs=outp)

demo.launch()


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/2 — loss: 11.7702


Device set to use cpu


Epoch 2/2 — loss: 11.3243
Training complete and classifier pipeline ready.
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://230ceb22d09e42df63.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [3]:
import pandas as pd

# Load existing dataset
df = pd.read_csv("medical_data.csv")

# New examples to add
new_data = {
    "text": [
        "high fever headache body pain",
        "stomach pain diarrhea dehydration",
        "red itchy eyes watery",
        "persistent cough sore throat",
        "severe back pain stiffness",
        "nausea vomiting headache",
        "joint pain swelling redness",
        "chest pain shortness of breath",
        "runny nose sneezing mild cough",
        "skin rash itching redness"
    ],
    "labels": [
        "Viral Infection",
        "Food Poisoning",
        "Eye Allergy",
        "Throat Infection",
        "Back Strain",
        "Migraine",
        "Arthritis",
        "Heart Problem",
        "Common Cold",
        "Skin Allergy"
    ]
}

# Convert to DataFrame
new_df = pd.DataFrame(new_data)

# Append to existing dataset
df = pd.concat([df, new_df], ignore_index=True)

# Save back to CSV
df.to_csv("medical_data.csv", index=False)

# Display updated dataset
df


Unnamed: 0,text,labels
0,fever cough tiredness,Viral Infection
1,headache dizziness nausea,Migraine
2,chest pain shortness of breath,Heart Problem
3,skin rash itching,Skin Allergy
4,joint pain swelling,Arthritis
5,abdominal pain vomiting,Food Poisoning
6,sore throat fever,Throat Infection
7,runny nose sneezing,Common Cold
8,back pain stiffness,Back Strain
9,eye redness irritation,Eye Allergy


In [4]:
# Re-import dataset and label mappings
from datasets import load_dataset
dataset = load_dataset("csv", data_files="medical_data.csv")

unique_labels = list(dict.fromkeys(df["labels"]))
label_to_id = {label: i for i, label in enumerate(unique_labels)}
id_to_label = {i: label for label, i in label_to_id.items()}

dataset = dataset.map(lambda x: {"labels": label_to_id[x["labels"]]})

tokenizer_default = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize(batch):
    return tokenizer_default(batch["text"], truncation=True, padding="max_length")
dataset_tokenized = dataset.map(tokenize, batched=True)
dataset_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
train_data = dataset_tokenized["train"]

# Train model again
model, tokenizer, classifier_pipeline = build_and_train("distilbert-base-uncased", epochs=3)


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/3 — loss: 23.0928
Epoch 2/3 — loss: 22.9579


Device set to use cpu


Epoch 3/3 — loss: 22.3098
Training complete and classifier pipeline ready.


TypeError: cannot unpack non-iterable NoneType object

In [5]:
# ===========================================================
# ADVANCED MEDICAL SYMPTOM CHECKER — CLEAN VERSION FOR COLAB
# ===========================================================

import os
import time
import pandas as pd
import torch
from datasets import load_dataset
import gradio as gr
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    AutoModelForSeq2SeqLM,
    pipeline,
)

# -----------------------
# Device
# -----------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device set to use:", device)

# -----------------------
# Create/Update dataset
# -----------------------
# Original dataset
data = {
    "text": [
        "fever cough tiredness",
        "headache dizziness nausea",
        "chest pain shortness of breath",
        "skin rash itching",
        "joint pain swelling",
        "abdominal pain vomiting",
        "sore throat fever",
        "runny nose sneezing",
        "back pain stiffness",
        "eye redness irritation"
    ],
    "labels": [
        "Viral Infection",
        "Migraine",
        "Heart Problem",
        "Skin Allergy",
        "Arthritis",
        "Food Poisoning",
        "Throat Infection",
        "Common Cold",
        "Back Strain",
        "Eye Allergy"
    ]
}

# Load existing dataset if exists, else create new
try:
    df = pd.read_csv("medical_data.csv")
    print("Loaded existing dataset.")
except:
    df = pd.DataFrame(data)
    df.to_csv("medical_data.csv", index=False)
    print("Created new dataset.")

# Add extra examples
extra_data = {
    "text": [
        "high fever headache body pain",
        "stomach pain diarrhea dehydration",
        "red itchy eyes watery",
        "persistent cough sore throat",
        "severe back pain stiffness",
        "nausea vomiting headache",
        "joint pain swelling redness",
        "chest pain shortness of breath",
        "runny nose sneezing mild cough",
        "skin rash itching redness"
    ],
    "labels": [
        "Viral Infection",
        "Food Poisoning",
        "Eye Allergy",
        "Throat Infection",
        "Back Strain",
        "Migraine",
        "Arthritis",
        "Heart Problem",
        "Common Cold",
        "Skin Allergy"
    ]
}

df_extra = pd.DataFrame(extra_data)
df = pd.concat([df, df_extra], ignore_index=True)
df.to_csv("medical_data.csv", index=False)
print("Dataset updated. Total examples:", len(df))

# -----------------------
# Label encoding
# -----------------------
unique_labels = list(dict.fromkeys(df["labels"]))
label_to_id = {label: i for i, label in enumerate(unique_labels)}
id_to_label = {i: label for label, i in label_to_id.items()}

# -----------------------
# Load dataset for training
# -----------------------
dataset = load_dataset("csv", data_files="medical_data.csv")
dataset = dataset.map(lambda x: {"labels": label_to_id[x["labels"]]})

# -----------------------
# Tokenizer
# -----------------------
tokenizer_default = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize(batch):
    return tokenizer_default(batch["text"], truncation=True, padding="max_length")
dataset_tokenized = dataset.map(tokenize, batched=True)
dataset_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
train_data = dataset_tokenized["train"]

# -----------------------
# Base templates
# -----------------------
base_info = {
    "Viral Infection": {"desc_short": "Viral infection causes fever, cough, body pain.", "suggest_short": "Rest, fluids, paracetamol", "severity": 40, "specialist":"General Physician", "recovery_days":"3–7 days", "timing":"Paracetamol 500mg every 4–6 hours."},
    "Migraine": {"desc_short":"Severe headache with nausea.", "suggest_short":"Ibuprofen/Paracetamol, rest", "severity":50, "specialist":"Neurologist", "recovery_days":"hours to 2 days", "timing":"Ibuprofen 200–400mg every 6–8 hours."},
    "Heart Problem": {"desc_short":"Chest pain and breathlessness may indicate a cardiac issue.", "suggest_short":"Seek immediate medical attention", "severity":95, "specialist":"Cardiologist", "recovery_days":"Variable; emergency", "timing":"Follow hospital prescription"},
    "Skin Allergy": {"desc_short":"Itching, redness or rash.", "suggest_short":"Antihistamines, calamine", "severity":20, "specialist":"Dermatologist", "recovery_days":"1–7 days", "timing":"Cetirizine 10mg once daily"},
    "Arthritis": {"desc_short":"Joint pain, stiffness and swelling.", "suggest_short":"Warm compress, physiotherapy", "severity":60, "specialist":"Orthopedician", "recovery_days":"days to weeks", "timing":"Diclofenac gel 2–3 times daily"},
    "Food Poisoning": {"desc_short":"Vomiting and diarrhea after bad food.", "suggest_short":"ORS, rest", "severity":70, "specialist":"Gastroenterologist", "recovery_days":"1–5 days", "timing":"ORS frequently"},
    "Throat Infection": {"desc_short":"Sore throat with pain.", "suggest_short":"Salt-water gargles, paracetamol", "severity":30, "specialist":"ENT", "recovery_days":"3–7 days", "timing":"Paracetamol 500mg every 4–6 hours"},
    "Common Cold": {"desc_short":"Sneezing, runny nose, mild cough.", "suggest_short":"Steam, fluids, rest", "severity":15, "specialist":"General Physician", "recovery_days":"3–10 days", "timing":"Cetirizine once daily"},
    "Back Strain": {"desc_short":"Pain from muscle strain or posture.", "suggest_short":"Heat pack, gentle stretching", "severity":35, "specialist":"Physiotherapist", "recovery_days":"days to weeks", "timing":"Topical analgesic as needed"},
    "Eye Allergy": {"desc_short":"Red, itchy, watery eyes.", "suggest_short":"Cold compress, lubricating drops", "severity":25, "specialist":"Ophthalmologist", "recovery_days":"1–7 days", "timing":"Lubricating drops as needed"}
}

# -----------------------
# Build & train model safely
# -----------------------
def build_and_train(model_name="distilbert-base-uncased", epochs=3):
    global model, tokenizer, classifier_pipeline, current_model_name
    current_model_name = model_name
    try:
        print(f"Loading tokenizer & model: {model_name} ...")
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSequenceClassification.from_pretrained(
            model_name, num_labels=len(unique_labels)
        ).to(device)
        print("Model loaded. Starting training...")
    except Exception as e:
        print(f"Could not load {model_name}: {e}")
        tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
        model = AutoModelForSequenceClassification.from_pretrained(
            "distilbert-base-uncased", num_labels=len(unique_labels)
        ).to(device)
        print("Fallback model loaded.")

    optimizer = torch.optim.Adam(model.parameters(), lr=3e-5)
    model.train()
    for epoch in range(epochs):
        total_loss = 0.0
        for i in range(0, len(train_data), 2):
            batch = train_data[i:i+2]
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)
            optimizer.zero_grad()
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch+1}/{epochs} — loss: {total_loss:.4f}")

    classifier_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer,
                                   device=0 if device=="cuda" else -1)
    print("Training complete.")
    return model, tokenizer, classifier_pipeline

# Train model
model, tokenizer, classifier_pipeline = build_and_train(epochs=3)

# -----------------------
# Gradio UI
# -----------------------
query_history = []

def expand_long_text(short_desc, short_suggest, disease_name):
    long = (
        f"{short_desc} Steps: \n"
        f"1) Symptom relief: {short_suggest}\n"
        f"2) Home care: rest, hydrate.\n"
        f"3) Medicine timing: {base_info[disease_name]['timing']}\n"
        f"4) Recovery: {base_info[disease_name]['recovery_days']}\n"
        f"5) Doctor visit: if symptoms worsen, seek medical attention."
    )
    return long

def ui_predict(symptoms):
    if not symptoms.strip():
        return "Enter your symptoms.", None
    out = classifier_pipeline(symptoms)[0]
    label_token = out["label"]
    try: idx = int(label_token.split("_")[-1])
    except: idx = label_to_id.get(label_token,0)
    disease = id_to_label[idx]
    confidence = round(out["score"]*100,2)
    long_desc = expand_long_text(base_info[disease]["desc_short"], base_info[disease]["suggest_short"], disease)
    query_history.insert(0, {"symptoms": symptoms, "disease": disease, "confidence": confidence})
    if len(query_history)>10: query_history.pop()
    md = f"### Predicted: {disease}\nConfidence: {confidence}%\n\n{long_desc}"
    return md

with gr.Blocks() as demo:
    gr.Markdown("## Advanced Medical Symptom Checker")
    inp = gr.Textbox(lines=2, placeholder="Enter symptoms like 'fever, cough'", label="Symptoms")
    out_md = gr.Markdown()
    run_btn = gr.Button("Analyze")
    run_btn.click(fn=ui_predict, inputs=inp, outputs=out_md)

demo.launch()


Device set to use: cpu
Loaded existing dataset.
Dataset updated. Total examples: 30


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/30 [00:00<?, ? examples/s]

Map:   0%|          | 0/30 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading tokenizer & model: distilbert-base-uncased ...
Model loaded. Starting training...
Epoch 1/3 — loss: 34.4260
Epoch 2/3 — loss: 30.6433


Device set to use cpu


Epoch 3/3 — loss: 25.0145
Training complete.
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://808cc149c0c4e4e56c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [6]:
# ===========================================================
# 🩺 ADVANCED MEDICAL SYMPTOM CHECKER — FULL
# Features:
# - 20+ diseases
# - Hugging Face DistilBERT classifier
# - Long/guided descriptions
# - Suggested specialist, severity, recovery time, medicine timing
# - Downloadable TXT report
# - Query history (last 10)
# ===========================================================

import os, time
import pandas as pd
import gradio as gr
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline, AutoModelForSeq2SeqLM

# -----------------------
# Set device
# -----------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"✅ Device set to use: {device}")

# -----------------------
# 1️⃣ Dataset
# -----------------------
data = {
    "text": [
        "fever cough tiredness",
        "headache dizziness nausea",
        "chest pain shortness of breath",
        "skin rash itching",
        "joint pain swelling",
        "abdominal pain vomiting",
        "sore throat fever",
        "runny nose sneezing",
        "back pain stiffness",
        "eye redness irritation",
        # New diseases
        "high fever body ache chills",
        "persistent cough night sweats weight loss",
        "vomiting diarrhea dehydration",
        "nose bleed headache fatigue",
        "chest tightness wheezing cough",
        "frequent urination thirst fatigue",
        "severe abdominal cramps diarrhea",
        "stiff neck headache sensitivity to light",
        "yellow eyes fatigue nausea",
        "hair loss brittle nails fatigue"
    ],
    "labels": [
        "Viral Infection",
        "Migraine",
        "Heart Problem",
        "Skin Allergy",
        "Arthritis",
        "Food Poisoning",
        "Throat Infection",
        "Common Cold",
        "Back Strain",
        "Eye Allergy",
        # New diseases
        "Dengue",
        "Tuberculosis",
        "Food Poisoning",
        "Blood Disorder",
        "Asthma",
        "Diabetes",
        "Food Poisoning",
        "Meningitis",
        "Hepatitis",
        "Thyroid Disorder"
    ]
}
df = pd.DataFrame(data)
df.to_csv("medical_data.csv", index=False)
dataset = load_dataset("csv", data_files="medical_data.csv")["train"]
print(f"✅ Loaded dataset. Total examples: {len(dataset)}")

# Label encoding
unique_labels = list(dict.fromkeys(df["labels"]))
label_to_id = {label: i for i, label in enumerate(unique_labels)}
id_to_label = {i: label for label, i in label_to_id.items()}

dataset = dataset.map(lambda x: {"labels": label_to_id[x["labels"]]})
dataset.set_format("torch", columns=["text","labels"])

# -----------------------
# 2️⃣ Base info / templates
# -----------------------
base_info = {
    "Viral Infection": {
        "desc_short": "Fever, cough, body pain and fatigue.",
        "suggest_short": "Paracetamol, rest, fluids.",
        "severity": 40,
        "specialist": "General Physician",
        "recovery_days": "3–7 days",
        "timing": "Paracetamol 500mg every 4–6 hours as needed (max 3g/day)."
    },
    "Migraine": {
        "desc_short": "Severe headache often with nausea, light/sound sensitivity.",
        "suggest_short": "Ibuprofen/Paracetamol, dark quiet room, hydration.",
        "severity": 50,
        "specialist": "Neurologist",
        "recovery_days": "hours to 2 days",
        "timing": "Ibuprofen 200–400mg every 6–8 hours with food."
    },
    "Heart Problem": {
        "desc_short": "Chest pain and breathlessness may indicate a cardiac emergency.",
        "suggest_short": "Seek immediate medical attention; aspirin if not allergic.",
        "severity": 95,
        "specialist": "Cardiologist",
        "recovery_days": "Emergency evaluation required",
        "timing": "Follow hospital prescription."
    },
    "Skin Allergy": {
        "desc_short": "Itching, redness or rash due to allergens.",
        "suggest_short": "Antihistamines, calamine/aloe vera.",
        "severity": 20,
        "specialist": "Dermatologist",
        "recovery_days": "1–7 days",
        "timing": "Cetirizine 10mg once daily."
    },
    "Arthritis": {
        "desc_short": "Joint pain, stiffness and swelling.",
        "suggest_short": "Warm compress, topical gel, physiotherapy.",
        "severity": 60,
        "specialist": "Orthopedician / Rheumatologist",
        "recovery_days": "days-weeks",
        "timing": "Diclofenac gel 2–3 times daily; oral NSAIDs as prescribed."
    },
    "Food Poisoning": {
        "desc_short": "Vomiting and diarrhea after contaminated food.",
        "suggest_short": "ORS, rest, antiemetic if severe.",
        "severity": 70,
        "specialist": "General Physician / Gastroenterologist",
        "recovery_days": "1–5 days",
        "timing": "ORS frequently; antiemetics as prescribed."
    },
    "Throat Infection": {
        "desc_short": "Sore throat with pain on swallowing.",
        "suggest_short": "Salt-water gargles, warm fluids, paracetamol.",
        "severity": 30,
        "specialist": "ENT / General Physician",
        "recovery_days": "3–7 days",
        "timing": "Paracetamol 500mg every 4–6 hours as needed."
    },
    "Common Cold": {
        "desc_short": "Sneezing, runny nose and mild cough.",
        "suggest_short": "Steam, fluids, vitamin C, rest.",
        "severity": 15,
        "specialist": "General Physician",
        "recovery_days": "3–10 days",
        "timing": "Cetirizine once daily."
    },
    "Back Strain": {
        "desc_short": "Pain from muscle strain or poor posture.",
        "suggest_short": "Heat pack, NSAID gel, gentle stretching.",
        "severity": 35,
        "specialist": "Orthopedician / Physiotherapist",
        "recovery_days": "days to weeks",
        "timing": "Topical analgesic as needed; oral NSAIDs if prescribed."
    },
    "Eye Allergy": {
        "desc_short": "Red, itchy, watery eyes from allergens.",
        "suggest_short": "Cold compress, lubricating drops, antihistamines.",
        "severity": 25,
        "specialist": "Ophthalmologist",
        "recovery_days": "1–7 days",
        "timing": "Lubricating drops as needed; oral antihistamine once daily."
    },
    "Dengue": {
        "desc_short": "High fever, severe body pain, chills and fatigue.",
        "suggest_short": "Paracetamol, hydration, rest, avoid NSAIDs.",
        "severity": 70,
        "specialist": "General Physician / Infectious Disease Specialist",
        "recovery_days": "7–14 days",
        "timing": "Paracetamol 500mg every 6 hours as needed, max 3g/day."
    },
    "Tuberculosis": {
        "desc_short": "Persistent cough, night sweats, weight loss.",
        "suggest_short": "Medical evaluation; follow prescribed antibiotics.",
        "severity": 90,
        "specialist": "Pulmonologist",
        "recovery_days": "Several months",
        "timing": "As prescribed by physician."
    },
    "Blood Disorder": {
        "desc_short": "Frequent nose bleeds, fatigue, headache.",
        "suggest_short": "Consult hematologist; avoid injury, follow blood tests.",
        "severity": 80,
        "specialist": "Hematologist",
        "recovery_days": "Varies with condition",
        "timing": "As prescribed by doctor."
    },
    "Asthma": {
        "desc_short": "Chest tightness, wheezing, persistent cough.",
        "suggest_short": "Use inhalers as prescribed, avoid triggers.",
        "severity": 60,
        "specialist": "Pulmonologist",
        "recovery_days": "Chronic management",
        "timing": "Inhalers as prescribed."
    },
    "Diabetes": {
        "desc_short": "Frequent urination, thirst, fatigue.",
        "suggest_short": "Monitor blood sugar, diet control, medication.",
        "severity": 70,
        "specialist": "Endocrinologist",
        "recovery_days": "Chronic",
        "timing": "As prescribed by doctor."
    },
    "Meningitis": {
        "desc_short": "Stiff neck, headache, sensitivity to light.",
        "suggest_short": "Seek emergency medical attention immediately.",
        "severity": 95,
        "specialist": "Neurologist / Infectious Disease Specialist",
        "recovery_days": "Depends on treatment",
        "timing": "Hospital management required."
    },
    "Hepatitis": {
        "desc_short": "Yellow eyes, fatigue, nausea.",
        "suggest_short": "Medical evaluation, avoid alcohol, maintain hydration.",
        "severity": 80,
        "specialist": "Gastroenterologist / Hepatologist",
        "recovery_days": "Weeks to months",
        "timing": "Follow doctor prescribed meds."
    },
    "Thyroid Disorder": {
        "desc_short": "Hair loss, brittle nails, fatigue.",
        "suggest_short": "Check TSH, follow endocrinologist advice.",
        "severity": 50,
        "specialist": "Endocrinologist",
        "recovery_days": "Chronic; controlled with meds",
        "timing": "As prescribed by doctor."
    }
}

# -----------------------
# 3️⃣ Tokenizer
# -----------------------
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length")
dataset_tokenized = dataset.map(tokenize, batched=True)
dataset_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
train_data = dataset_tokenized

# -----------------------
# 4️⃣ Build & train model
# -----------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
def build_and_train(model_name="distilbert-base-uncased", epochs=3):
    global model, tokenizer, classifier_pipeline, current_model_name
    current_model_name = model_name
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=len(unique_labels)
    ).to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=3e-5)
    model.train()

    for epoch in range(epochs):
        total_loss = 0.0
        for i in range(0, len(train_data), 2):
            batch = train_data[i:i+2]
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)

            optimizer.zero_grad()
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"✅ Epoch {epoch+1}/{epochs} — loss: {total_loss:.4f}")

    classifier_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0 if device=="cuda" else -1)
    return model, tokenizer, classifier_pipeline

model, tokenizer, classifier_pipeline = build_and_train("distilbert-base-uncased", epochs=2)

# -----------------------
# 5️⃣ Gradio UI
# -----------------------
query_history = []

def expand_long_text(short_desc, short_suggest, disease_name):
    long = (
        f"{short_desc} ⚕️ Practical advice: \n"
        f"1️⃣ Symptom relief: {short_suggest} Use OTC medicines responsibly.\n"
        f"2️⃣ Home remedies: Hydrate, rest, and apply local measures.\n"
        f"3️⃣ Timing: Follow medicine timing guidance.\n"
        f"4️⃣ Recovery: typical recovery is {base_info[disease_name]['recovery_days']}.\n"
        f"5️⃣ Doctor visit: If symptoms worsen, high fever, breathing difficulty, seek medical attention.\n"
        "💡 Always disclose allergies and current medicines to your clinician."
    )
    return long

def ui_predict(symptoms):
    if not symptoms.strip():
        return "❌ Please enter your symptoms.", None, None

    out = classifier_pipeline(symptoms)[0]
    try:
        idx = int(out["label"].split("_")[-1])
    except:
        idx = label_to_id.get(out["label"], 0)
    disease = id_to_label[idx]
    confidence = round(out["score"]*100,2)
    base = base_info[disease]
    long_desc = expand_long_text(base["desc_short"], base["suggest_short"], disease)

    # Severity
    severity_score = min(100,int(base["severity"]*(confidence/100.0)+0.5))

    result_md = f"""
# 🩺 Predicted: *{disease}*
*Confidence:* {confidence}%
*Severity (est.):* {severity_score}/100
*Specialist:* {base['specialist']}
*Recovery:* {base['recovery_days']}

## 📘 Advice & Details
{long_desc}

## 💊 Medication / Timing
{base['timing']}
    """
    return result_md

with gr.Blocks(title="Advanced Medical Symptom Checker") as demo:
    gr.Markdown("## 🏥 Enter your symptoms below:")
    inp = gr.Textbox(lines=3, placeholder="e.g., fever, cough, sore throat")
    run_btn = gr.Button("Analyze")
    output_md = gr.Markdown()

    run_btn.click(fn=ui_predict, inputs=inp, outputs=output_md)

demo.launch()


✅ Device set to use: cpu


Generating train split: 0 examples [00:00, ? examples/s]

✅ Loaded dataset. Total examples: 20


Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Epoch 1/2 — loss: 28.9531


Device set to use cpu


✅ Epoch 2/2 — loss: 28.4532
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://548230c1a63b192cdb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


