This repository contains a machine learning project that trains and serves an XGBoost classifier capable of predicting whether a breast tumor is malignant or benign. It includes training workflows, a FastAPI inference service, and full Docker deployment support.
Early breast cancer detection greatly improves patient outcomes, but diagnosing malignancy from diagnostic imaging features can be slow and subjective.
This project provides a machine-learning–based solution that:
- Accepts numerical diagnostic features from breast tissue measurements
- Predicts malignancy (1) or benign (0)
- Outputs a probability score
- Provides a human-readable explanation
- Can be deployed locally or via Docker
Midterm_Project/
│
├── train.py # Trains model and saves artifacts
├── script.py # FastAPI inference service
├── Dockerfile # Docker container configuration
├── dv.bin # Trained DictVectorizer
├── xgb_model.bin # Trained XGBoost model
├── notebook.ipynb # Experiments and model development
└── README.md # Documentation
train.py performs:
- Dataset loading
- Feature engineering using
DictVectorizer - Binary label encoding (M=1, B=0)
- XGBoost model training
- Saving model/artifacts (
dv.bin,xgb_model.bin)
Run training locally:
python train.pyscript.py:
- Loads
dv.binandxgb_model.bin - Provides:
GET /→ health checkPOST /predict→ prediction endpoint
Example response includes:
{
"probability": 0.9973,
"prediction": 1,
"message": "Prediction suggests a malignant tumor (positive)."
}Run locally:
uvicorn script:app --host 0.0.0.0 --port 8000 --reloadVisit:
- Home: http://127.0.0.1:8000/
- API Docs (Swagger UI): http://127.0.0.1:8000/docs
{
"radius_mean": 17.99,
"texture_mean": 10.38,
"perimeter_mean": 122.8,
"area_mean": 1001.0,
"smoothness_mean": 0.1184,
"compactness_mean": 0.2776,
"concavity_mean": 0.3001,
"concave_points_mean": 0.1471,
"symmetry_mean": 0.2419,
"fractal_dimension_mean": 0.07871,
"radius_se": 1.095,
"texture_se": 0.9053,
"perimeter_se": 8.589,
"area_se": 153.4,
"smoothness_se": 0.006399,
"compactness_se": 0.04904,
"concavity_se": 0.05373,
"concave_points_se": 0.01587,
"symmetry_se": 0.03003,
"fractal_dimension_se": 0.006193,
"radius_worst": 25.38,
"texture_worst": 17.33,
"perimeter_worst": 184.6,
"area_worst": 2019.0,
"smoothness_worst": 0.1622,
"compactness_worst": 0.6656,
"concavity_worst": 0.7119,
"concave_points_worst": 0.2654,
"symmetry_worst": 0.4601,
"fractal_dimension_worst": 0.1189
}FROM python:3.13.5-slim-bookworm
WORKDIR /app
RUN apt-get update && apt-get install -y \
build-essential \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
COPY script.py .
COPY dv.bin .
COPY xgb_model.bin .
RUN pip install --no-cache-dir fastapi uvicorn xgboost scikit-learn pydantic
EXPOSE 8000
# Command to run the FastAPI app with Uvicorn
CMD ["uvicorn", "script:app", "--host", "0.0.0.0", "--port", "8000"]
docker build -t midterm_project .docker run -p 8000:8000 midterm_projectAccess the running service:
Stop the current container:
CTRL + C
Or:
docker stop container_id
Stop all running containers:
docker kill $(docker ps -q)Shut down Docker's backend completely:
wsl --shutdown| Issue | Fix |
|---|---|
| Docker can't connect | Start Docker Desktop |
| Port already used | docker run -p 9000:8000 midterm_project |
| Model files missing | Ensure dv.bin + xgb_model.bin are in build context |
| Sklearn version mismatch | Retrain and resave model with matching sklearn version |
- Frontend UI for users to upload tumor metrics
- Deployment to cloud (AWS ECS, Azure Apps, Google Cloud Run)
- Logging and analytics of predictions
- Authentication and security layer
- Batch prediction endpoint