# 🚀 Week 11-12 · Notebook 02 · API Development with FastAPI

This notebook covers the creation of a production-grade REST API for our **Manufacturing Copilot** using FastAPI. This API will serve as the main entrypoint for all agent interactions, providing a stable, fast, and reliable service for factory floor applications.


## 🎯 Learning Objectives

- **Build a Robust FastAPI Application:** Structure a multi-endpoint FastAPI service that exposes the functionality of our RAG, Vision, and Reporting agents.
- **Implement Data Validation with Pydantic:** Define strict request and response models (schemas) to ensure data integrity and provide clear API contracts.
- **Add Production-Grade Middleware:** Implement custom middleware for logging, timing, and adding observability hooks (like trace IDs) to every request.
- **Prepare for Containerization:** Organize the application code into a logical structure that is easy to package into a Docker container.


## 🧩 Scenario: Exposing the Copilot as a Service

A technician on the factory floor uses a tablet application to interact with the Manufacturing Copilot. When a machine malfunctions, they take a picture of the faulty part and ask a question. The tablet app sends this data to our FastAPI backend. The backend must:

1.  Route the image to the **Vision Agent** for defect detection.
2.  Route the question to the **RAG Agent** for troubleshooting steps from SOPs.
3.  Combine the results and generate a structured report.
4.  Return the complete response to the tablet app—all within a strict latency budget (e.g., <500ms).

The service must be secure, requiring authentication, and highly observable for IT and operations teams.


## 🧱 API Service Blueprint

Our FastAPI application will act as the central gateway, orchestrating the calls to our various agents.

**Workflow:**
`Tablet App -> FastAPI Gateway -> [Vision Agent | RAG Agent | Report Agent] -> Formatted Response`

| Component         | Responsibility                                      | Key Implementation Details                               |
| ----------------- | --------------------------------------------------- | -------------------------------------------------------- |
| **FastAPI App**   | Main application object, middleware registration.   | `FastAPI()`, `@app.middleware("http")`                   |
| **Pydantic Models** | Define data shapes for requests and responses.    | `BaseModel`, `Field` for validation rules.               |
| **API Endpoints** | Handle incoming HTTP requests (`/diagnose`, etc.).  | `@app.post`, `async def`, dependency injection.          |
| **Agent Logic**   | Placeholder functions for agent interactions.       | `run_inference()` simulates calling the agent graph.     |
| **Security**      | Authenticate requests.                              | `Header` dependency to check for an `Authorization` token. |


### Project Structure for a Production API

For a real-world application, you wouldn't put all your code in a single file. A well-organized project is easier to maintain, test, and scale. We will simulate this structure by writing our code into separate files from this notebook.

Our target structure will be:
```
/
|-- app/
|   |-- __init__.py
|   |-- main.py             # FastAPI app object, endpoints, middleware
|   |-- models.py           # Pydantic request/response schemas
|   |-- security.py         # Authentication and security dependencies
|   |-- agents.py           # Placeholder logic for our ML agents
|   `-- config.py           # Application configuration
|
|-- tests/
|   |-- __init__.py
|   `-- test_main.py        # API contract tests
|
`-- requirements.txt        # Project dependencies
```

Let's create these files step-by-step.

### 1. Create the Application Files

We will use the `%%writefile` magic command to write the content of the following cells into their respective Python files. This keeps our notebook clean and allows us to build the application step-by-step.

#### `app/config.py`
This file uses Pydantic's `BaseSettings` to manage configuration. It allows loading settings from environment variables or a `.env` file, which is great for production.

In [None]:
%%writefile app/config.py

import os
from pydantic import BaseSettings

class Settings(BaseSettings):
    """Application configuration settings."""
    APP_TITLE: str = "Manufacturing Copilot API"
    APP_VERSION: str = "1.0.0"
    LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
    
    # In a real app, this would be a more robust secret management strategy
    VALID_AUTH_TOKEN_PREFIX: str = "Bearer technician-"

    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

settings = Settings()

#### `app/models.py`
This file defines the data contracts for our API using Pydantic. This ensures that all incoming requests and outgoing responses conform to a strict, validated schema. It's one of FastAPI's best features.

In [None]:
%%writefile app/models.py

from typing import List, Optional
from uuid import UUID
from pydantic import BaseModel, Field

class DiagnosisRequest(BaseModel):
    """Request model for the main diagnosis endpoint."""
    plant_id: str = Field(
        ...,
        regex=r"^[A-Z]{3,4}-\w{2,3}$",
        description="Unique plant identifier, e.g., 'PUNE-IN' or 'MEX-GTO'.",
        example="PUNE-IN",
    )
    equipment_id: str = Field(
        ..., min_length=4, description="Tag or ID of the equipment.", example="CNC-A-102"
    )
    problem_description: str = Field(
        ..., description="Technician's description of the issue."
    )
    image_id: Optional[str] = Field(
        None, description="ID of the uploaded image for visual inspection."
    )

class DiagnosisResponse(BaseModel):
    """Response model containing the combined output from all agents."""
    request_id: UUID
    vision_analysis: dict
    rag_guidance: dict
    generated_report: str
    confidence_score: float = Field(
        ..., ge=0.0, le=1.0, description="Overall confidence in the recommendation."
    )
    safety_disclaimer: str = "Always follow standard safety procedures and consult a supervisor if unsure."

class HealthStatus(BaseModel):
    """Response model for the health check endpoint."""
    status: str = "ok"

#### `app/security.py`
This file contains our authentication logic. Using FastAPI's dependency injection system (`Depends`), we can protect endpoints by simply adding `Depends(authorize_request)` to the function signature.

In [None]:
%%writefile app/security.py

from fastapi import Header, HTTPException
from .config import settings

async def authorize_request(x_auth_token: str = Header(..., alias="X-Auth-Token")):
    """
    Dependency to check for a valid authentication token in the request header.
    
    Raises:
        HTTPException: 401 if the token is missing or invalid.
    
    Returns:
        str: The user ID extracted from the token.
    """
    if not x_auth_token.startswith(settings.VALID_AUTH_TOKEN_PREFIX):
        raise HTTPException(
            status_code=401, 
            detail="Invalid or missing authentication token."
        )
    # Extract user ID from "Bearer technician-<user_id>"
    user_id = x_auth_token.removeprefix(settings.VALID_AUTH_TOKEN_PREFIX)
    if not user_id:
        raise HTTPException(status_code=401, detail="User ID missing in token.")
    return user_id

#### `app/agents.py`
This file isolates the core machine learning logic. For now, it contains placeholder functions that simulate the behavior of our RAG and Vision agents. In the final capstone, this is where we would import and invoke our LangGraph-based agent orchestrator.

In [None]:
%%writefile app/agents.py

from uuid import uuid4
import asyncio
from .models import DiagnosisRequest, DiagnosisResponse

async def run_copilot_inference(payload: DiagnosisRequest) -> DiagnosisResponse:
    """
    This function simulates the full agentic workflow, including I/O-bound operations.
    In the real capstone, this will invoke the LangGraph orchestrator.
    """
    # Simulate network latency for calling different microservices or models
    await asyncio.sleep(0.1) # Simulate call to Vision Agent
    
    # 1. Vision Agent (Simulated)
    vision_output = {
        "defects_found": ["micro-fracture", "surface-discoloration"],
        "confidence": 0.85,
    }

    await asyncio.sleep(0.15) # Simulate call to RAG Agent (retrieval + generation)

    # 2. RAG Agent (Simulated)
    rag_output = {
        "recommended_steps": [
            f"1. For equipment {payload.equipment_id}, inspect the primary coolant line for leaks.",
            "2. Verify torque settings on mounting bolts (Ref: SOP-123, Sec 4.2).",
            "3. Escalate to Level-2 maintenance if vibration exceeds 5mm/s.",
        ],
        "cited_documents": ["SOP-123", "MAINT-GUIDE-V2"],
    }

    # 3. Report Generation (Simulated)
    report = f"Incident Report for {payload.equipment_id} at {payload.plant_id}: Visual inspection found {', '.join(vision_output['defects_found'])}. Recommended action: Follow RAG guidance."

    return DiagnosisResponse(
        request_id=uuid4(),
        vision_analysis=vision_output,
        rag_guidance=rag_output,
        generated_report=report,
        confidence_score=0.91,
    )

#### `app/main.py`
This is the heart of our application. It ties everything together: it creates the `FastAPI` app instance, registers the middleware, and defines the API endpoints that use our models, security dependencies, and agent logic.

In [None]:
%%writefile app/main.py

import time
import logging
from uuid import uuid4

from fastapi import FastAPI, Request, Depends
from .config import settings
from .models import DiagnosisRequest, DiagnosisResponse, HealthStatus
from .security import authorize_request
from .agents import run_copilot_inference

# --- Application Setup ---
app = FastAPI(
    title=settings.APP_TITLE,
    version=settings.APP_VERSION,
    description="API for interacting with the Manufacturing Copilot agents.",
)

# --- Logging Configuration ---
# In a real app, you might use a more advanced logging setup (e.g., Loguru)
# and configure it to output structured JSON logs.
logging.basicConfig(
    level=settings.LOG_LEVEL.upper(),
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger("manufacturing_copilot_api")


# --- Middleware for Observability ---
@app.middleware("http")
async def add_observability_headers(request: Request, call_next):
    """
    Middleware to add custom headers for timing and tracing.
    This is crucial for monitoring and debugging in a production environment.
    """
    trace_id = str(uuid4())
    # Make trace_id accessible in endpoint logging
    request.state.trace_id = trace_id
    
    start_time = time.perf_counter()
    
    # Process the request
    response = await call_next(request)
    
    # Calculate duration
    duration_ms = (time.perf_counter() - start_time) * 1000
    
    # Add custom headers to the response
    response.headers["X-Request-Trace-ID"] = trace_id
    response.headers["X-Response-Time-ms"] = f"{duration_ms:.2f}"
    
    logger.info(
        f"Request {request.method} {request.url.path} - Status {response.status_code} - Completed in {duration_ms:.2f}ms",
        extra={"trace_id": trace_id, "duration_ms": duration_ms, "path": request.url.path}
    )
    
    # Warn if latency exceeds our SLA (Service Level Agreement)
    if duration_ms > 500:
        logger.warning(
            f"High latency detected: {duration_ms:.2f}ms on path {request.url.path}",
            extra={"trace_id": trace_id}
        )
        
    return response


# --- API Endpoints ---

@app.get("/health", response_model=HealthStatus, tags=["Monitoring"])
async def health_check():
    """
    Health check endpoint to verify that the service is running.
    This is used by load balancers and monitoring systems to ensure the API is alive.
    """
    return HealthStatus(status="ok")


@app.post("/v1/diagnose", response_model=DiagnosisResponse, tags=["Copilot"])
async def diagnose_problem(
    request: Request,
    payload: DiagnosisRequest,
    user_id: str = Depends(authorize_request)
):
    """
    Main endpoint to diagnose a manufacturing problem.
    
    This endpoint orchestrates the Vision and RAG agents to provide a comprehensive
    analysis and recommendation. It requires a valid technician auth token.
    """
    trace_id = request.state.trace_id
    logger.info(
        f"Received diagnosis request from user '{user_id}' for plant '{payload.plant_id}'.",
        extra={"trace_id": trace_id}
    )
    
    # In the real project, this is where you'd call your LangGraph `app.ainvoke()`
    response = await run_copilot_inference(payload)
    
    return response

### 2. Running the API Locally

Now that we have created the application files, we can run the API server using `uvicorn`. Uvicorn is a lightning-fast ASGI server, built on uvloop and httptools.

You can run this command in your VS Code terminal.

```bash
# Install dependencies first
pip install "fastapi[all]" "uvicorn[standard]" "python-dotenv"

# Run the server
# --reload will automatically restart the server when you change code
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

After running the command, you can access the interactive API documentation (provided by Swagger UI) at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs). This UI allows you to explore and test your API endpoints directly from the browser.

### 3. API Contract Testing

Before deploying, we must verify that our API behaves as expected. We use `TestClient` to send requests to our app and assert that the responses are correct. This is a form of **contract testing** – ensuring the API adheres to its defined schema and behavior.

First, let's create the test file.

#### `tests/test_main.py`

In [None]:
%%writefile tests/test_main.py

from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_health_check():
    """Tests the /health endpoint."""
    response = client.get("/health")
    assert response.status_code == 200
    assert response.json() == {"status": "ok"}

def test_diagnose_endpoint_success():
    """Tests a successful call to the /v1/diagnose endpoint."""
    sample_payload = {
        "plant_id": "PUNE-IN",
        "equipment_id": "CNC-B-205",
        "problem_description": "The machine is making a loud grinding noise and producing excess vibration.",
    }
    
    headers = {"X-Auth-Token": "Bearer technician-abhay"}

    response = client.post("/v1/diagnose", json=sample_payload, headers=headers)

    # 1. Check for success status code
    assert response.status_code == 200

    # 2. Validate response headers
    assert "X-Request-Trace-ID" in response.headers
    assert "X-Response-Time-ms" in response.headers

    # 3. Validate response body structure
    data = response.json()
    assert "request_id" in data
    assert "vision_analysis" in data
    assert "rag_guidance" in data
    assert 0.0 <= data["confidence_score"] <= 1.0
    assert data["rag_guidance"]["recommended_steps"][0].startswith("1. For equipment CNC-B-205")

def test_diagnose_endpoint_auth_error_invalid_token():
    """Tests that the endpoint correctly rejects a request with a bad token."""
    sample_payload = {
        "plant_id": "PUNE-IN",
        "equipment_id": "CNC-B-205",
        "problem_description": "The machine is making a loud grinding noise.",
    }
    
    headers = {"X-Auth-Token": "invalid-token"}
    response = client.post("/v1/diagnose", json=sample_payload, headers=headers)
    assert response.status_code == 401
    assert response.json()["detail"] == "Invalid or missing authentication token."

def test_diagnose_endpoint_auth_error_no_token():
    """Tests that the endpoint correctly rejects a request with a missing token."""
    sample_payload = {
        "plant_id": "PUNE-IN",
        "equipment_id": "CNC-B-205",
        "problem_description": "The machine is making a loud grinding noise.",
    }
    
    response = client.post("/v1/diagnose", json=sample_payload)
    # FastAPI's TestClient will show a 422 error for a missing header that is required
    assert response.status_code == 422 
    assert "field required" in response.text
    assert "X-Auth-Token" in response.text


def test_diagnose_endpoint_validation_error():
    """Tests that Pydantic validation catches bad input."""
    # Invalid plant_id format
    invalid_payload = {
        "plant_id": "pune", # Does not match regex
        "equipment_id": "CNC-B-205",
        "problem_description": "The machine is making a loud grinding noise.",
    }
    
    headers = {"X-Auth-Token": "Bearer technician-test"}
    
    response = client.post("/v1/diagnose", json=invalid_payload, headers=headers)
    
    # FastAPI returns a 422 Unprocessable Entity for validation errors
    assert response.status_code == 422
    assert "plant_id" in response.text
    assert "string does not match regex" in response.text

Now, we can run our tests using `pytest`.

```bash
# Install pytest
pip install pytest

# Run the tests
pytest
```

Executing these tests validates our API's core functionality, error handling, and data schemas, giving us confidence to proceed toward deployment.

## 🧾 Operational Readiness Checklist

Before this API can go into production, the following items must be addressed. This is a crucial step in the MLOps lifecycle.

-   [ ] **Dependency Management:** All Python dependencies are pinned in a `requirements.txt` file for reproducible builds.
-   [ ] **Configuration Management:** All hardcoded values (like model names or thresholds) are moved to a configuration file or environment variables.
-   [ ] **Security:** HTTPS is enabled at the load balancer level (e.g., using GCP Cloud Run's managed certificates).
-   [ ] **IAM & Permissions:** The service runs under a dedicated service account with the minimum required permissions (e.g., read-only access to model buckets).
-   [ ] **Logging & Auditing:** Structured logs (JSON format) are enabled and configured to be stored for a required retention period (e.g., 7 years for compliance).
-   [ ] **Secrets Management:** All sensitive values (API keys, database URLs) are loaded from a secure secret manager (like GCP Secret Manager) and not stored in code.


## 🧪 Lab Assignment: Enhance the API

1.  **Add a Health Check Endpoint:**
    -   Create a new endpoint at `/health` that returns a simple JSON response like `{"status": "ok"}` with a 200 status code. This is essential for production monitoring systems to know if your service is alive.

2.  **Implement Streaming Response:**
    -   The RAG agent might take a few seconds to generate a detailed response. To improve user experience, modify the `/v1/diagnose` endpoint to stream the response.
    -   Use `fastapi.responses.StreamingResponse` and a generator function (`async def`) to `yield` parts of the response as they become available (e.g., yield the vision analysis first, then stream the RAG steps one by one).

3.  **Perform Load Testing:**
    -   Install a load testing tool like `locust` (`pip install locust`).
    -   Write a simple `locustfile.py` to send requests to your `/v1/diagnose` endpoint.
    -   Run a test with 50 concurrent users and analyze the results. Does the response time stay within your 500ms SLA?

4.  **Draft an API SLA Document:**
    -   Create a markdown file (`API_SLA.md`) that defines the service level agreement for your API. Include:
        -   **Uptime Commitment:** e.g., 99.9%
        -   **Latency Commitment:** e.g., P95 response time < 500ms
        -   **Support:** How to report issues.
        -   **Exclusions:** What conditions are not covered (e.g., downstream system failures).


## ✅ Checklist for this Notebook

- [X] FastAPI application structured with Pydantic models for clear data contracts.
- [X] Authentication dependency implemented to secure the endpoint.
- [X] Observability middleware added to provide timing and tracing for every request.
- [X] Unit and integration tests written using `TestClient` to ensure correctness and validate error handling.
- [ ] **TODO:** Complete the Lab Assignment to add a health check, implement streaming, and perform load testing.


## 📚 References and Further Reading

-   [FastAPI Official Documentation](https://fastapi.tiangolo.com/) - The best place to learn everything about FastAPI.
-   [Pydantic Documentation](https://pydantic-docs.helpmanual.io/) - Essential for understanding data validation and schema definition.
-   [Test-Driven Development with FastAPI and Docker](https://testdriven.io/blog/fastapi-docker-tdd/) - A great tutorial on building and testing FastAPI apps.
-   [Locust Documentation](https://docs.locust.io/en/stable/) - For learning how to load test your new API.
