ml-api — Production-Ready ML Model Deployment

A production-ready FastAPI service for deploying machine learning models. This project demonstrates best practices for ML model serving with structured logging, batch inference, Prometheus metrics, rate limiting, and comprehensive testing.

Tech Stack: FastAPI • scikit-learn • Docker • Prometheus • Pydantic v2

Features

✨ Single & Batch Predictions — Serve one prediction or hundreds at once
📊 Prometheus Metrics — Monitor API usage and performance
🔐 Rate Limiting — Protect your API (10 req/min per endpoint)
📝 Structured Logging — JSON logs with request ID tracking for debugging
✅ Comprehensive Tests — 20 tests covering all endpoints and edge cases
🚀 CI/CD Pipeline — Automated testing and Docker image publishing
⚡ Load Testing — Locust scaffold for performance validation

Quick Start

Prerequisites

WSL 2 with Python 3.11+ (or any Linux environment)
Docker Desktop
Git

Get Running in 5 Minutes

# Clone and enter the project
git clone https://github.com/LedioZefi/ml-api.git
cd ml-api

# Install dependencies
make dev-install

# Run tests (should see 20/20 passing)
make test

# Build Docker image
make build

# Start the API
make run

# In another terminal, make a prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"sepal_length":6.1,"sepal_width":2.8,"petal_length":4.7,"petal_width":1.2}'

# Check metrics
curl http://localhost:8000/metrics

# Stop the container
make stop

API Endpoints

GET /health

Quick health check. Tells you if the model is loaded and ready.

curl http://localhost:8000/health

POST /predict

Make a single prediction. Send iris measurements, get back the predicted class and confidence.

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"sepal_length":6.1,"sepal_width":2.8,"petal_length":4.7,"petal_width":1.2}'

Response:

{
  "predicted_class": "versicolor",
  "class_index": 1,
  "confidence": 0.92,
  "probabilities": {"setosa": 0.01, "versicolor": 0.92, "virginica": 0.07}
}

POST /predict-batch

Make multiple predictions in one request. Great for batch processing.

curl -X POST http://localhost:8000/predict-batch \
  -H "Content-Type: application/json" \
  -d '{"items": [
    {"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2},
    {"sepal_length":7.0,"sepal_width":3.2,"petal_length":4.7,"petal_width":1.4}
  ]}'

Response:

{
  "items": [
    {"predicted_class": "setosa", "class_index": 0, "confidence": 0.98, ...},
    {"predicted_class": "versicolor", "class_index": 1, "confidence": 0.85, ...}
  ],
  "count": 2
}

GET /metrics

Prometheus metrics for monitoring. Scrape this endpoint with Prometheus.

curl http://localhost:8000/metrics

Project Structure

ml-api/
├── .github/workflows/ci.yml    # Automated testing & Docker publishing
├── app/
│   ├── main.py                 # FastAPI app (logging, metrics, rate limiting)
│   ├── logging_config.py       # JSON logging & request ID tracking
│   ├── requirements.txt         # Dependencies
│   ├── Dockerfile              # Multi-stage build with model training
│   ├── model/                  # Trained model (auto-generated)
│   └── schemas/predict_schema.py # Pydantic v2 validation
├── tests/
│   ├── test_health.py          # Health endpoint tests
│   ├── test_predict.py         # Prediction endpoint tests (20 total)
│   └── conftest.py             # Pytest fixtures
├── load_test/
│   ├── locustfile.py           # Load testing scenarios
│   └── README.md               # Load testing guide
├── Makefile                    # Quick commands
├── pyproject.toml              # Project config & ruff linting
├── predict_demo.py             # Demo script
└── README.md                   # This file

Development

Makefile Commands

make help          # Show all commands
make install       # Install dependencies
make dev-install   # Install + dev tools (pytest, ruff)
make build         # Build Docker image
make run           # Run container on port 8000
make stop          # Stop and remove container
make lint          # Run ruff linter
make fmt           # Format code with ruff
make test          # Run pytest tests
make clean         # Remove cache files

Running Tests Locally

source .venv/bin/activate
pytest tests/ -v

Code Quality

ruff check app/ tests/ predict_demo.py
ruff format app/ tests/ predict_demo.py

Docker

Build Image

docker build -t ml-api:latest ./app -f app/Dockerfile

Run Container

docker run -d -p 8000:8000 --name ml-api ml-api:latest

View Logs

docker logs -f ml-api

Deployment

GitHub Container Registry (GHCR)

Tag a release and push to automatically build and publish:

git tag v1.0.0
git push origin v1.0.0

GitHub Actions will automatically:

Run tests
Build Docker image
Push to ghcr.io/LedioZefi/ml-api:v1.0.0 and latest

Pull the image:

docker pull ghcr.io/LedioZefi/ml-api:v1.0.0

Load Testing

Use Locust to simulate realistic load:

pip install locust
locust -f load_test/locustfile.py --host=http://localhost:8000
# Open http://localhost:8089 in your browser

Rate Limiting

Limit: 10 requests per minute per client IP
Applies to: /predict and /predict-batch
Exceeding: Returns 429 Too Many Requests

Monitoring

Request ID Tracking

All requests get a unique ID for tracing:

curl -X POST http://localhost:8000/predict \
  -H "X-Request-ID: my-custom-id" \
  -H "Content-Type: application/json" \
  -d '{"sepal_length":6.1,"sepal_width":2.8,"petal_length":4.7,"petal_width":1.2}'

Response includes the ID:

X-Request-ID: my-custom-id

Structured Logging

Logs are JSON-formatted for easy parsing:

{"timestamp": "2024-01-15 10:30:45,123", "level": "INFO", "logger": "app.request", "message": "POST /predict 200"}

Contributing

Found a bug or have an idea? Feel free to open an issue or submit a PR!

License

MIT License — see LICENSE file for details

Questions?

Check out the PRODUCTION_FEATURES_SUMMARY.md for detailed implementation notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ml-api — Production-Ready ML Model Deployment

Features

Quick Start

Prerequisites

Get Running in 5 Minutes

API Endpoints

GET /health

POST /predict

POST /predict-batch

GET /metrics

Project Structure

Development

Makefile Commands

Running Tests Locally

Code Quality

Docker

Build Image

Run Container

View Logs

Deployment

GitHub Container Registry (GHCR)

Load Testing

Rate Limiting

Monitoring

Request ID Tracking

Structured Logging

Contributing

License

Questions?

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
app		app
load_test		load_test
tests		tests
.gitignore		.gitignore
Makefile		Makefile
QUICK_START.md		QUICK_START.md
README.md		README.md
predict_demo.py		predict_demo.py
pyproject.toml		pyproject.toml

LedioZefi/ml-api

Folders and files

Latest commit

History

Repository files navigation

ml-api — Production-Ready ML Model Deployment

Features

Quick Start

Prerequisites

Get Running in 5 Minutes

API Endpoints

GET /health

POST /predict

POST /predict-batch

GET /metrics

Project Structure

Development

Makefile Commands

Running Tests Locally

Code Quality

Docker

Build Image

Run Container

View Logs

Deployment

GitHub Container Registry (GHCR)

Load Testing

Rate Limiting

Monitoring

Request ID Tracking

Structured Logging

Contributing

License

Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages