Disclaimer: This project is for educational and academic purposes only. It is NOT a medical diagnosis tool and must never be used as a substitute for professional medical advice.
MediCheck is an AI-powered Medical Symptom Checker that uses a Bayesian Network to predict the most likely disease(s) based on user-entered symptoms. The system employs a Naive Bayes graphical model trained on the Kaggle Disease Prediction dataset (10,000+ rows, 41 diseases, 132 symptoms) and performs inference via Variable Elimination.
┌────────────────┐ POST /predict ┌────────────────┐ pgmpy VE ┌──────────────┐
│ Streamlit │ ──────────────────────▶│ FastAPI │ ───────────────▶│ Bayesian │
│ Frontend │◀────── JSON ──────────│ Gateway │◀───────────────│ Network │
└────────────────┘ └────────────────┘ └──────────────┘
| Layer | Technology |
|---|---|
| Frontend | Streamlit 1.32+ |
| API | FastAPI 0.110+ / Uvicorn |
| ML Model | pgmpy (Bayesian Network) |
| Data | pandas, numpy |
| Viz | Plotly, Altair |
| Testing | pytest |
medicheck/
├── data/
│ ├── download_kaggle.py # Dataset downloader & augmenter
│ ├── disease_symptom.csv # Training data (10k+ rows)
│ ├── disease_info.json # Disease descriptions & actions (41 diseases)
│ └── kaggle_raw/ # Raw Kaggle CSV files
├── model/
│ ├── train.py # BN training pipeline
│ ├── inference.py # Variable Elimination inference
│ └── bayesian_model.pkl # Serialised model (generated)
├── api/
│ ├── main.py # FastAPI app & endpoints
│ ├── schemas.py # Pydantic models
│ └── utils.py # Helper utilities
├── frontend/
│ ├── app.py # Streamlit entry point
│ └── pages/
│ ├── symptom_input.py # Symptom selection UI (132 symptoms)
│ └── results.py # Results visualisation
├── tests/
│ ├── test_inference.py # Model & inference tests
│ └── test_api.py # API endpoint tests
├── requirements.txt
└── README.md
cd medicheck
pip install -r requirements.txtpython -m data.download_kaggleThis will:
- Download the Kaggle Disease Prediction dataset
- Augment to 10,000+ rows using noise injection
- Save
data/disease_symptom.csv
python -m model.trainThis will:
- Load the dataset (10,000+ records, 41 diseases, 132 symptoms)
- Train the Bayesian Network with BDeu priors
- Save
model/bayesian_model.pkl
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000API docs available at: http://localhost:8000/docs
streamlit run frontend/app.pyOpen http://localhost:8501 in your browser.
pytest tests/ -v| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check & model status |
| GET | /symptoms |
List all recognised symptoms |
| POST | /predict |
Predict diseases from symptom list |
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"symptoms": ["itching", "skin_rash", "high_fever"]}'- Structure: Naive Bayes Bayesian Network (Disease -> Symptom_1, Disease -> Symptom_2, ...)
- Estimation: Bayesian Estimation with BDeu prior (equivalent sample size = 5)
- Inference: Variable Elimination for exact posterior computation
- Dataset: Kaggle Disease Prediction (10,000+ rows, augmented from ~5,000 original)
- Diseases: 41 conditions across respiratory, dermatological, hepatic, metabolic, and infectious categories
- Symptoms: 132 clinical features
cp .env.example .env
# Edit .env with your values (optional – defaults work out of the box)Note: The
.envfile is git-ignored. Never commit real credentials.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is intended for academic use. No medical claims are made.
