Skip to content

KG87/ml-api-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ML API - AI-Powered Text Generation API

A production-ready FastAPI application that provides AI text generation with response caching and comprehensive monitoring.

Features

  • 🤖 AI Text Generation - Uses Google's Flan-T5-Small model
  • Response Caching - 1000x faster for repeat queries
  • 📊 Metrics Tracking - Real-time performance monitoring
  • 🔍 Comprehensive Logging - Detailed request/response tracking
  • Input Validation - Prevents malformed requests
  • 💚 Health Checks - Production monitoring ready

Tech Stack

  • FastAPI - Modern Python web framework
  • Transformers (Hugging Face) - AI model integration
  • Flan-T5-Small - 80M parameter text generation model
  • Python 3.12

Quick Start

Prerequisites

  • Python 3.8+
  • 8GB+ RAM (for model)

Installation

  1. Clone the repository:
git clone <your-repo-url>
cd ml-api-project
  1. Create virtual environment:
python3 -m venv venv
source venv/bin/activate  # On Mac/Linux
  1. Install dependencies:
pip install fastapi uvicorn transformers torch
  1. Run the API:
uvicorn main:app --reload
  1. Open your browser:
http://127.0.0.1:8000/docs

API Endpoints

GET /

Root endpoint - confirms API is running

GET /health

Health check endpoint for monitoring

GET /metrics

Returns API usage statistics:

  • Total requests
  • Cache hit rate
  • Average inference time
  • And more...

POST /generate

Main text generation endpoint

Request Body:

{
  "prompt": "What is machine learning?",
  "max_length": 100
}

Response:

{
  "prompt": "What is machine learning?",
  "response": "Machine learning is a method of data analysis...",
  "model": "flan-t5-small",
  "inference_time_seconds": 5.59,
  "cached": false
}

Performance

  • First request: ~5-10 seconds (runs AI model)
  • Cached requests: ~0.01 seconds (instant)
  • Cache hit rate: Typically 60%+ in production

Example Usage

Using curl:

curl -X POST "http://127.0.0.1:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is Python?", "max_length": 100}'

Using Python:

import requests

response = requests.post(
    "http://127.0.0.1:8000/generate",
    json={"prompt": "What is Python?", "max_length": 100}
)

print(response.json())

Logging

The API uses comprehensive logging with emoji markers:

  • 🔵 New request received
  • 📝 Prompt details
  • 💾 Cache hit/miss
  • 🤖 AI model running
  • ✅ Response generated
  • ⚠️ Warnings/errors

Future Improvements

  • Deploy to AWS ECS/Lambda
  • Add Redis for persistent caching
  • Implement rate limiting
  • Add authentication
  • Support for multiple models
  • Streaming responses
  • OpenAPI/Swagger customization

Project Structure

ml-api-project/
├── main.py              # Main application code
├── README.md            # This file
├── requirements.txt     # Python dependencies
└── venv/               # Virtual environment

Author

Built as part of an AI engineering portfolio project.

License

MIT License

About

Production-ready ML API with FastAPI, Flan-T5, response caching, and real-time metrics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages