Skip to content

MarshallW11/Midterm_Project

Repository files navigation

Breast Cancer XGBoost Classification API

This repository contains a machine learning project that trains and serves an XGBoost classifier capable of predicting whether a breast tumor is malignant or benign. It includes training workflows, a FastAPI inference service, and full Docker deployment support.


📌 Problem Description

Early breast cancer detection greatly improves patient outcomes, but diagnosing malignancy from diagnostic imaging features can be slow and subjective.
This project provides a machine-learning–based solution that:

  • Accepts numerical diagnostic features from breast tissue measurements
  • Predicts malignancy (1) or benign (0)
  • Outputs a probability score
  • Provides a human-readable explanation
  • Can be deployed locally or via Docker

📂 Project Structure

Midterm_Project/
│
├── train.py              # Trains model and saves artifacts
├── script.py             # FastAPI inference service
├── Dockerfile            # Docker container configuration
├── dv.bin                # Trained DictVectorizer
├── xgb_model.bin         # Trained XGBoost model
├── notebook.ipynb        # Experiments and model development
└── README.md             # Documentation

🚀 1. Model Training

train.py performs:

  • Dataset loading
  • Feature engineering using DictVectorizer
  • Binary label encoding (M=1, B=0)
  • XGBoost model training
  • Saving model/artifacts (dv.bin, xgb_model.bin)

Run training locally:

python train.py

⚡ 2. FastAPI Inference Service

script.py:

  • Loads dv.bin and xgb_model.bin
  • Provides:
    • GET / → health check
    • POST /predict → prediction endpoint

Example response includes:

{
  "probability": 0.9973,
  "prediction": 1,
  "message": "Prediction suggests a malignant tumor (positive)."
}

Run locally:

uvicorn script:app --host 0.0.0.0 --port 8000 --reload

Visit:


🧪 3. Example JSON Input (All 30 Features)

{
  "radius_mean": 17.99,
  "texture_mean": 10.38,
  "perimeter_mean": 122.8,
  "area_mean": 1001.0,
  "smoothness_mean": 0.1184,
  "compactness_mean": 0.2776,
  "concavity_mean": 0.3001,
  "concave_points_mean": 0.1471,
  "symmetry_mean": 0.2419,
  "fractal_dimension_mean": 0.07871,
  "radius_se": 1.095,
  "texture_se": 0.9053,
  "perimeter_se": 8.589,
  "area_se": 153.4,
  "smoothness_se": 0.006399,
  "compactness_se": 0.04904,
  "concavity_se": 0.05373,
  "concave_points_se": 0.01587,
  "symmetry_se": 0.03003,
  "fractal_dimension_se": 0.006193,
  "radius_worst": 25.38,
  "texture_worst": 17.33,
  "perimeter_worst": 184.6,
  "area_worst": 2019.0,
  "smoothness_worst": 0.1622,
  "compactness_worst": 0.6656,
  "concavity_worst": 0.7119,
  "concave_points_worst": 0.2654,
  "symmetry_worst": 0.4601,
  "fractal_dimension_worst": 0.1189
}

🐳 4. Docker Deployment

Dockerfile

FROM python:3.13.5-slim-bookworm

WORKDIR /app

RUN apt-get update && apt-get install -y \
    build-essential \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*


COPY script.py .
COPY dv.bin .
COPY xgb_model.bin .


RUN pip install --no-cache-dir fastapi uvicorn xgboost scikit-learn pydantic


EXPOSE 8000

# Command to run the FastAPI app with Uvicorn
CMD ["uvicorn", "script:app", "--host", "0.0.0.0", "--port", "8000"]

Build the Docker Image

docker build -t midterm_project .

Run the Container

docker run -p 8000:8000 midterm_project

Access the running service:


🛑 Stopping Docker Quickly

Stop the current container:

CTRL + C

Or:

docker stop container_id

Stop all running containers:

docker kill $(docker ps -q)

Shut down Docker's backend completely:

wsl --shutdown

🧩 Troubleshooting

Issue Fix
Docker can't connect Start Docker Desktop
Port already used docker run -p 9000:8000 midterm_project
Model files missing Ensure dv.bin + xgb_model.bin are in build context
Sklearn version mismatch Retrain and resave model with matching sklearn version

🌟 Possible Extensions

  • Frontend UI for users to upload tumor metrics
  • Deployment to cloud (AWS ECS, Azure Apps, Google Cloud Run)
  • Logging and analytics of predictions
  • Authentication and security layer
  • Batch prediction endpoint

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages