![](https://europe-west1-atp-views-tracker.cloudfunctions.net/working-analytics?notebook=tutorials--fastapi-agent--fastapi-agent-tutorial)

# Serving an Agent with FastAPI

## What is FastAPI and Why Use It for Agents?

FastAPI is a modern, high-performance web framework for building APIs with Python. Released in 2018, it has quickly gained popularity due to its combination of speed, ease of use, and developer-friendly features.

At its core, FastAPI is designed to create REST APIs that can serve requests efficiently while providing robust validation and documentation. For AI agent deployment, FastAPI offers several critical advantages:

- **Asynchronous Support**: AI agents often need to handle concurrent requests efficiently. FastAPI's native async/await support enables handling thousands of simultaneous connections, perfect for serving multiple agent requests in parallel without blocking.

- **Streaming Responses**: Agents frequently generate content incrementally (token by token). FastAPI's streaming response capabilities allow for real-time transmission of agent outputs as they're generated, creating a more responsive user experience.

- **Type Validation**: When working with agents, ensuring proper input formats is crucial. FastAPI uses Pydantic for automatic request validation, catching malformed inputs before they reach your agent and providing clear error messages.

- **Performance**: Built on Starlette and Uvicorn, FastAPI offers near-native performance. For compute-intensive agent applications, this means your infrastructure handles API overhead efficiently, allowing more resources for the actual agent processing.

- **Automatic Documentation**: When exposing an agent API to multiple users or teams, documentation becomes essential. FastAPI automatically generates interactive API documentation via Swagger UI and ReDoc, making it easy for others to understand and use your agent.

- **Schema Enforcement**: Pydantic models ensure that both requests to your agent and responses from it conform to predefined schemas, making agent behavior more predictable and easier to integrate with other systems.

In this tutorial, we'll build a complete API that serves an AI agent with both synchronous and streaming endpoints, demonstrating how FastAPI's features address the specific challenges of deploying agents in production.

## Prerequisites

Before we begin, let's install the necessary packages:

In [None]:
!pip install fastapi uvicorn pydantic

If you plan to use the streaming functionality, also install:

In [None]:
!pip install sse-starlette

## Agent Quick Recap

Let's start by defining a simple agent that we'll expose via our API. This could be any agent implementation, but for this tutorial, we'll create a basic example that simulates an AI agent responding to user queries:

In [3]:
class SimpleAgent:
    def __init__(self, name="FastAPI Agent"):
        self.name = name
    
    def generate_response(self, query):
        """Generate a synchronous response to a user query"""
        return f"Agent {self.name} received: '{query}'\nResponse: This is a simulated agent response."
    
    async def generate_response_stream(self, query):
        """Generate a streaming response to a user query"""
        import asyncio
        
        prefix = f"Agent {self.name} thinking about: '{query}'\n"
        response = "This is a simulated agent response that streams token by token."
        
        # Yield the prefix as a single chunk
        yield prefix
        
        # Stream the response token by token with small delays
        for token in response.split():
            await asyncio.sleep(0.1)  # Simulate thinking time
            yield token + " "

# Test our agent directly
agent = SimpleAgent()
test_query = "Hello, what can you do?"
print(agent.generate_response(test_query))

Agent FastAPI Agent received: 'Hello, what can you do?'
Response: This is a simulated agent response.


This simple agent can generate both synchronous responses and streaming responses. In practice, you might replace this with a more sophisticated agent like a fine-tuned LLM, an RAG system, or any other AI agent.

## Minimal FastAPI App

Now, let's create a minimal FastAPI application with a health check endpoint:

In [4]:
from fastapi import FastAPI

# Initialize FastAPI app
app = FastAPI(
    title="Agent API",
    description="A simple API that serves an AI agent",
    version="0.1.0"
)

# Create an instance of our agent
agent = SimpleAgent()

# Health check endpoint
@app.get("/health")
def health_check():
    """Check if the API is running"""
    return {"status": "ok", "message": "API is operational"}

This creates a basic FastAPI application with metadata and a health check endpoint. The health check is a simple way to verify that your API is running correctly.

## POST /agent - Synchronous Endpoint

Now, let's create a synchronous endpoint for our agent. We'll use Pydantic models to define the request and response structures:

In [None]:
from pydantic import BaseModel
from typing import Optional

# Define request and response models
class QueryRequest(BaseModel):
    query: str
    context: Optional[str] = None
    
    class Config:
        schema_extra = {
            "example": {
                "query": "What is FastAPI?",
                "context": "I'm a beginner programmer."
            }
        }

class QueryResponse(BaseModel):
    response: str
    
    class Config:
        schema_extra = {
            "example": {
                "response": "FastAPI is a modern, high-performance web framework for building APIs with Python."
            }
        }

# Create a synchronous endpoint for the agent
@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest):
    """Get a response from the agent"""
    response = agent.generate_response(request.query)
    return QueryResponse(response=response)

This endpoint accepts POST requests with a JSON body containing a "query" field and an optional "context" field. It returns a JSON response with the agent's answer.

## POST /agent/stream - Token Streaming

For many AI applications, token streaming provides a better user experience. Let's implement a streaming endpoint:

In [11]:
from fastapi.responses import StreamingResponse
import json

@app.post("/agent/stream")
async def stream_agent(request: QueryRequest):
    """Stream a response from the agent token by token"""
    
    async def event_generator():
        async for token in agent.generate_response_stream(request.query):
            # Format as a JSON object
            data = json.dumps({"token": token})
            yield f"data: {data}\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

This endpoint streams the agent's response token by token using Server-Sent Events (SSE). The client can process these tokens incrementally as they arrive, enabling a more interactive experience.

For a more sophisticated implementation, you might want to use the `sse-starlette` package:

In [12]:
from sse_starlette.sse import EventSourceResponse

@app.post("/agent/stream-sse")
async def stream_agent_sse(request: QueryRequest):
    """Stream a response using SSE with the sse-starlette package"""
    
    async def event_generator():
        async for token in agent.generate_response_stream(request.query):
            yield {"data": json.dumps({"token": token})}
    
    return EventSourceResponse(event_generator())

This provides a more robust implementation of Server-Sent Events.

## Creating the Full Application

Now, let's put everything together into a complete FastAPI application. Create a file named `fastapi_agent.py` in your `scripts` directory:

In [8]:
from fastapi import FastAPI, Depends, HTTPException, Header
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import json
import os
import asyncio

# Define our simple agent class
class SimpleAgent:
    def __init__(self, name="FastAPI Agent"):
        self.name = name
    
    def generate_response(self, query):
        """Generate a synchronous response to a user query"""
        return f"Agent {self.name} received: '{query}'\nResponse: This is a simulated agent response."
    
    async def generate_response_stream(self, query):
        """Generate a streaming response to a user query"""
        prefix = f"Agent {self.name} thinking about: '{query}'\n"
        response = "This is a simulated agent response that streams token by token."
        
        # Yield the prefix as a single chunk
        yield prefix
        
        # Stream the response token by token with small delays
        for token in response.split():
            await asyncio.sleep(0.1)  # Simulate thinking time
            yield token + " "

# Define request and response models
class QueryRequest(BaseModel):
    query: str
    context: Optional[str] = None
    
    class Config:
        schema_extra = {
            "example": {
                "query": "What is FastAPI?",
                "context": "I'm a beginner programmer."
            }
        }

class QueryResponse(BaseModel):
    response: str
    
    class Config:
        schema_extra = {
            "example": {
                "response": "FastAPI is a modern, high-performance web framework for building APIs with Python."
            }
        }

# Initialize FastAPI app
app = FastAPI(
    title="Agent API",
    description="A simple API that serves an AI agent",
    version="0.1.0"
)

# Create an instance of our agent
agent = SimpleAgent()

# Health check endpoint
@app.get("/health")
def health_check():
    """Check if the API is running"""
    return {"status": "ok", "message": "API is operational"}

# Create a synchronous endpoint for the agent
@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest):
    """Get a response from the agent"""
    response = agent.generate_response(request.query)
    return QueryResponse(response=response)

# Create a streaming endpoint for the agent
@app.post("/agent/stream")
async def stream_agent(request: QueryRequest):
    """Stream a response from the agent token by token"""
    
    async def event_generator():
        async for token in agent.generate_response_stream(request.query):
            # Format as a JSON object
            data = json.dumps({"token": token})
            yield f"data: {data}\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

## Running the Server

Now that we have our FastAPI application, let's run it with uvicorn:

In [13]:
# This command should be run in the terminal, not in a notebook cell
!cd tutorials/fastapi-agent && uvicorn fastapi_agent:app --reload

The system cannot find the path specified.


The `--reload` flag enables hot reloading, which automatically restarts the server when you make changes to the code. This is helpful during development.

Once running, you can access:
- API documentation at http://localhost:8000/docs
- Alternative documentation at http://localhost:8000/redoc
- Health check endpoint at http://localhost:8000/health

## Simple Client Test

Let's test our API with a simple Python client:

In [14]:
import requests
import json

# Test the synchronous endpoint
response = requests.post(
    "http://localhost:8000/agent", 
    json={"query": "What is FastAPI?"}
)
print("Synchronous Response:")
print(response.json())
print("\n" + "-" * 40 + "\n")

# Test the streaming endpoint
response = requests.post(
    "http://localhost:8000/agent/stream",
    json={"query": "Tell me about streaming"},
    stream=True
)

print("Streaming Response:")
for line in response.iter_lines():
    if line:
        # Parse the SSE format
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = json.loads(line[6:])
            print(data["token"], end="")

Synchronous Response:
{'response': "Agent FastAPI Agent received: 'What is FastAPI?'\nResponse: This is a simulated agent response."}

----------------------------------------

Streaming Response:
Agent FastAPI Agent thinking about: 'Tell me about streaming'
This is a simulated agent response that streams token by token. 

This script tests both the synchronous and streaming endpoints of our API.

## Adding Basic Auth Key (Optional)

For production use, you might want to add simple API key authentication. Let's extend our FastAPI application to check for an API key:

In [15]:
from fastapi import Depends, HTTPException, Header

# Function to validate API key
async def verify_api_key(x_api_key: str = Header(None)):
    """Verify the API key provided in the X-API-Key header"""
    # Get the API key from environment variable
    api_key = os.environ.get("API_KEY")
    
    # If no API key is set in the environment, skip validation
    if not api_key:
        return True
    
    # If API key is set but not provided in the request, return 401
    if not x_api_key:
        raise HTTPException(status_code=401, detail="API Key is missing")
    
    # If API key doesn't match, return 403
    if x_api_key != api_key:
        raise HTTPException(status_code=403, detail="Invalid API Key")
    
    return True

# Update endpoints to require API key
@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest, authorized: bool = Depends(verify_api_key)):
    """Get a response from the agent"""
    response = agent.generate_response(request.query)
    return QueryResponse(response=response)

@app.post("/agent/stream")
async def stream_agent(request: QueryRequest, authorized: bool = Depends(verify_api_key)):
    """Stream a response from the agent token by token"""
    # Same implementation as before

With this update, if you set the `API_KEY` environment variable, the API will require a matching key in the `X-API-Key` header for all requests.

## Unit Tests

Let's create simple unit tests for our FastAPI application using pytest and the FastAPI test client:

In [None]:
from fastapi.testclient import TestClient
from scripts.fastapi_agent import app

client = TestClient(app)

def test_health_check():
    """Test the health check endpoint"""
    response = client.get("/health")
    assert response.status_code == 200
    assert response.json()["status"] == "ok"

def test_agent_endpoint():
    """Test the synchronous agent endpoint"""
    response = client.post(
        "/agent",
        json={"query": "Test query"}
    )
    assert response.status_code == 200
    assert "response" in response.json()
    assert "Agent" in response.json()["response"]

def test_stream_endpoint():
    """Test the streaming agent endpoint"""
    with client.stream("POST", "/agent/stream", json={"query": "Test query"}) as response:
        assert response.status_code == 200
        assert response.headers["content-type"] == "text/event-stream"
        # Check that we receive at least some content
        content = response.iter_content().read()
        assert len(content) > 0

Save these tests in a file named `test_fastapi_agent.py` in your tests directory and run them with pytest:

In [1]:
# This command should be run in the terminal, not in a notebook cell
!pytest -xvs tests/test_fastapi_agent.py

platform win32 -- Python 3.9.13, pytest-7.1.2, pluggy-1.0.0 -- C:\Users\N7\Anaconda3\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\N7\PycharmProjects\agents-towards-production-private\tutorials\fastapi-agent
plugins: anyio-3.5.0, langsmith-0.3.15
[1mcollecting ... [0mcollected 0 items / 1 error

[31m[1m________________ ERROR collecting tests/test_fastapi_agent.py _________________[0m
[31mImportError while importing test module 'c:\Users\N7\PycharmProjects\agents-towards-production-private\tutorials\fastapi-agent\tests\test_fastapi_agent.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
[1m[31mC:\Users\N7\Anaconda3\lib\importlib\__init__.py[0m:127: in import_module
    [94mreturn[39;49;00m _bootstrap._gcd_import(name[level:], package, level)[90m[39;49;00m
[1m[31mtests\test_fastapi_agent.py[0m:4: in <module>
    [94mfrom[39;49;00m [04m[96mfastapi[39;49;00m[04m[96m.[39;49;00m[04m[96mtestclient[39;49;00m [94mimport[39;49

## Next Steps

Now that you have a basic FastAPI agent service running, here are some ideas for next steps:

- **Add more advanced agents**: Replace the simple agent with your production-ready agent
- **Implement authentication and rate limiting**: Add more sophisticated authentication and rate limiting for production use
- **Add middleware for logging and monitoring**: Implement middleware for request logging and performance monitoring
- **Set up deployment**: Deploy your FastAPI application to a production environment using Docker, Kubernetes, or a cloud service
- **Implement async database connections**: Add database integrations for storing conversation history or other data
- **Add background tasks**: Use FastAPI's background tasks for long-running operations

## Conclusion

In this tutorial, we've built a FastAPI application that serves a simple AI agent with both synchronous and streaming endpoints. We've covered the basics of setting up FastAPI, defining Pydantic models for request/response validation, implementing both synchronous and streaming endpoints, and adding simple authentication.

FastAPI's combination of performance, automatic documentation, and developer-friendly features makes it an excellent choice for serving AI agents in production. By following the patterns in this tutorial, you can create robust, production-ready APIs for your own AI agents.