# Deploy the LangChain Agent

For this, you’ll deploy your chatbot as a FastAPI endpoint and create a Streamlit UI to interact with the endpoint. This notebook is not really meant to be executed we will only show the code here

## Serve the Agent With FastAPI
FastAPI is a modern, high-performance web framework for building APIs with Python based on standard type hints. It comes with a lot of great features including development speed, runtime speed, and great community support, making it a great choice for serving your chatbot agent.

### Pydantic models
You’ll serve your agent through a POST request, so the first step is to define what data you expect to get in the request body and what data the request returns. FastAPI does this with Pydantic __models__:
- API receive a query (str) as input
- API returns
    - the input query (str)
    - the answer (str)
    - the intermediate steps (a list as we saw in previous lesson)

In [None]:
from pydantic import BaseModel

class HospitalQueryInput(BaseModel):
    text: str

class HospitalQueryOutput(BaseModel):
    input: str
    output: str
    intermediate_steps: list[str]


### async retry decorator
One great feature of FastAPI is its asynchronous serving capabilities. Because your agent calls OpenAI models hosted on an external server, there will always be latency while your agent waits for a response. This is a perfect opportunity for you to use asynchronous programming.

Instead of waiting for OpenAI to respond to each of your agent’s requests, you can have your agent make multiple requests in a row and store the responses as they’re received. This will save you a lot of time if you have multiple queries you need your agent to respond to.

As discussed previously, there can sometimes be intermittent connection issues with Neo4j that are usually resolved by establishing a new connection. Because of this, you’ll want to implement retry logic that works for asynchronous functions. This decorator will retry any function it decorates three times with a delay of 1 second (unless specified otherwise)

In [None]:
import asyncio

## @async_retry
def async_retry(max_retries: int=3, delay: int=1):
    def decorator(func):
        async def wrapper(*args, **kwargs):
            for attempt in range(1, max_retries + 1):
                try:
                    result = await func(*args, **kwargs)
                    return result
                except Exception as e:
                    print(f"Attempt {attempt} failed: {str(e)}")
                    await asyncio.sleep(delay)

            raise ValueError(f"Failed after {max_retries} attempts")

        return wrapper

    return decorator

### FastAPI Application and view Setup

Makes no sense for a single endpoint + health with no auth to separate this into app.py, crud.py and views.py and use routers as is usually the case in a fastapi api. 

In this code we prepare the FastAPI webservice, 

The code imports several key components:

- FastAPI: The web framework for creating the API endpoints
- hospital_rag_agent_executor: The core agent that handles hospital-related queries using RAG
- Data Models: HospitalQueryInput and HospitalQueryOutput for request/response validation
- Retry Utility: the async_retry decorator for handling intermittent failures we just explained

In [None]:
from fastapi import FastAPI
from agents.hospital_rag_agent import hospital_rag_agent_executor
from models.hospital_rag_query import HospitalQueryInput, HospitalQueryOutput
from utils.async_utils import async_retry

# app.py
app = FastAPI(
    title="Hospital Chatbot",
    description="Endpoints for a hospital system graph RAG chatbot",
)

# crud.py
@async_retry(max_retries=10, delay=1)
async def invoke_agent_with_retry(query: str):
    """Retry the agent if a tool fails to run.

    This can help when there are intermittent connection issues
    to external APIs.
    """
    return await hospital_rag_agent_executor.ainvoke({"input": query})

# views.py
@app.get("/")
async def get_status():
    return {"status": "running"}

@app.post("/hospital-rag-agent")
async def query_hospital_agent(query: HospitalQueryInput) -> HospitalQueryOutput:
    query_response = await invoke_agent_with_retry(query.text)
    query_response["intermediate_steps"] = [
        str(s) for s in query_response["intermediate_steps"]
    ]

    return query_response

## Testing FASTAPI

### with curl
We need to launch the app from the folder its stored -> ./source_code_step_5/chatbot_api/src with ``` uvicorn main:app --host 0.0.0.0 --port 8000 --env-file .env```

We need to pass the .env file so it loads the variables. Then we can test it with curls

In [None]:
> source_code_step_5/chatbot_api/src
> curl http://localhost:8000/docs#/
> curl -X 'POST' \
  'http://localhost:8000/hospital-rag-agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "text": "how are you"
}'

### debuging in vscode

In vscode we create a FASTAPI launcher for the main.py on /source_code_step_5/chatbot_api/src

Once launched you can test the swagger in http://localhost:8000/docs#/

In [None]:
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: FastAPI on src",
            "type": "debugpy",
            "request": "launch",
            "module": "uvicorn",
            "args": [
                "main:app",
                "--reload",
                "--host", "0.0.0.0",
                "--port", "8000"
            ],
            "cwd": "${workspaceFolder}/source_code_step_5/chatbot_api/src",
            "env": {
                "PYTHONPATH": "${workspaceFolder}/source_code_step_5/chatbot_api/src"
            },
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}

## docker deployment
### entrypoint for chatbot
You’ll serve this API with Docker and you’ll want to define the following entrypoint file to run inside the container. The command 

```shell
uvicorn main:app --host 0.0.0.0 --port 8000 
```
runs the FastAPI application at port 8000 on your machine.

In [None]:
#!/bin/bash

# Run any setup steps or pre-processing tasks here
echo "Starting hospital RAG FastAPI service..."

# Start the main application
uvicorn main:app --host 0.0.0.0 --port 8000

### dockerfile for the chatbot
The Dockerfile must specify what docker image will be run 
- wich base is used for the container, in our case we will use python:3.11-slim distribution
- then copy the contents from chatbot_api/src/ into the /app directory within the container
- install the dependencies from pyproject.toml
- and run entrypoint.sh.

In [None]:
# chatbot_api/Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY ./src/ /app

COPY ./pyproject.toml /code/pyproject.toml
RUN pip install /code/.

EXPOSE 8000
CMD ["sh", "entrypoint.sh"]

### Entrypoint for the ETL Operations
We simply run the python script

In [None]:
#!/bin/bash

# Run any setup steps or pre-processing tasks here
echo "Running ETL to move hospital data from csvs to Neo4j..."

# Run the ETL script
python hospital_bulk_csv_write.py

### dockerfile for the ETL operations
This will run the Data pipeline service for Extract, Transform, Load operations
- Will base over python:3.11-slim
- then copy the contents from hospital_neo4j_ETL/src/ into the /app directory within the container
- install the dependencies from pyproject.toml
- and run entrypoint.sh.

In [None]:
FROM python:3.11-slim

WORKDIR /app

COPY ./src/ /app

COPY ./pyproject.toml /code/pyproject.toml
RUN pip install /code/.

CMD ["sh", "entrypoint.sh"]

### docker-compose

In Docker Compose, services are the individual application components or *containers* that make up your multi-container application. Each service represents a separate containerized process that performs a specific function. Much like its counterparts in kubernetes, pods/containers. 

In this case we create the two that we have defined for the ETL operations and the API and establish precedence. We also pass the .env file to each one. 
- The first one hospital_neo4j_etl: does the loading operations in neo4j we saw in lesson 3
- The second one that will not begin till the first run out is the FASTAPI that will be publishing the port 8000 as port 8000

As we specify a build clause, this will prompt docker compose to create the two services/containers by using the dockerfile in each context

In [None]:

services:
  hospital_neo4j_etl:
    build:
      context: ./hospital_neo4j_etl
    env_file:
      - .env

  chatbot_api:
    build:
      context: ./chatbot_api
    env_file:
      - .env
    depends_on:
      - hospital_neo4j_etl
    ports:
      - "8000:8000"

To run the API, along with the ETL you build earlier, open a terminal on the fifth lesson folder /source_code_step_5 and run:
```bash
$  docker-compose up --build
```