This module demonstrates how to build production-ready API servers for AI models using FastAPI.
While Modules 1-2 focused on consuming APIs, Module 3 teaches you to produce APIs - serving your own AI models through professional-grade endpoints with:
- FastAPI framework for high-performance async APIs
- Authentication & Authorization using Bearer tokens
- Rate limiting to prevent abuse
- Database persistence for user management and analytics
- API versioning for backward compatibility
- Async model loading for efficient resource usage
Client (client.py) → API Server (server.py) → AI Model (ResNet-18)
↓
SQLite Database
(users, requests)
Installation with uv (recommended):
uv syncInstallation with pip:
pip install fastapi uvicorn sqlalchemy pillow torch transformers python-dotenvThe SQLite database (ai_api.db) is created automatically on first run. For your convenience, I included the database file with one user entry (with API key your-secret-api-key).
If the database is actually newly created, you can manually add a test user:
sqlite3 ai_api.db
INSERT INTO users (api_key) VALUES ('your-secret-api-key');Create a .env file for the client:
API_KEY=your-secret-api-keypython server.pyAll endpoints are versioned under /v1:
Check model status without authentication.
curl http://localhost:8000/v1/model/infoClassify an image (requires authentication).
curl -X POST http://localhost:8000/v1/classify \
-H "Authorization: Bearer your-secret-api-key" \
-H "Content-Type: application/json" \
-d '{"image": "base64_encoded_image_here"}'Get usage statistics (requires authentication).
curl http://localhost:8000/v1/usage \
-H "Authorization: Bearer your-secret-api-key"The included client.py is a modified version of Module 1's image_analyzer.py, adapted to work with our server:
python client.py meal.png- Dependency Injection:
Depends()for auth and database sessions - Pydantic Models: Automatic request/response validation
- Async/Await: Non-blocking I/O for better performance
- Lifespan Events: Startup/shutdown resource management
- Authentication: Bearer tokens (same pattern as OpenAI/Anthropic)
- Rate Limiting: Prevent abuse with per-minute request limits
- Error Handling: Proper HTTP status codes and error messages
- Usage Tracking: Database logging for analytics and billing
- Versioning:
/v1prefix allows future updates without breaking clients - RESTful Routes: Logical endpoint naming and HTTP methods
- Response Models: Consistent, documented response structures
- Cause: Server not running
- Solution: Start server with
python server.py
- Cause: Invalid or missing API key
- Solution: Check API key in database and
.envfile
- Cause: Too many requests (>5 per minute)
- Solution: Wait 60 seconds or increase limit in
check_rate_limit()
- Cause: First-time model download from Hugging Face
- Solution: Model is cached after first download
