# NOTE

This notebook, and all others involving Docker, cannot be run on Colab, and should be run either on your local machine or on your Vertex AI Workbench instance.

### API Model Integration

In this notebook we will take a look at integrating the RL models we have trained previously into a real-world application using FastAPI. FastAPI is a great wat to deploy and utilize it in a production environment. The setyp involves creating an API that receives a question and context as input and returns the predicted answer. We will create a Docker container for the FastAPI app

In [1]:
!pip install torchrl==0.7.0 gymnasium==0.29 tqdm matplotlib av tensordict==0.7.2 uvicorn fastapi

Collecting torchrl==0.7.0
  Downloading torchrl-0.7.0-cp311-cp311-manylinux1_x86_64.whl.metadata (39 kB)
Collecting gymnasium==0.29
  Downloading gymnasium-0.29.0-py3-none-any.whl.metadata (10 kB)
Collecting av
  Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)
Collecting tensordict==0.7.2
  Downloading tensordict-0.7.2-cp311-cp311-manylinux1_x86_64.whl.metadata (9.1 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting fastapi
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting starlette<0.47.0,>=0.40.0 (from fastapi)
  Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.6.0->torchrl==0.7.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.6.0->torchrl==0.7.0)
  Downloading nvidia_cuda_runti

### Imports

In [None]:
from torchrl.envs import (
    Compose, DoubleToFloat,
    StepCounter,
    TransformedEnv, set_exploration_type,
)
from torchrl.modules import ProbabilisticActor, ValueOperator
from torchrl.objectives import ClipPPOLoss
from torchrl.objectives.value import GAE
from torch.distributions import Categorical
from tensordict.nn import TensorDictModule, TensorDictSequential
from torch import nn
from torchrl.envs import GymWrapper
import gymnasium as gym
import torch
base_env = gym.make("MountainCar-v0", render_mode="rgb_array")
env = GymWrapper(
    gym.make("MountainCar-v0", render_mode="rgb_array"), categorical_action_encoding=  True, device = "cpu"
)

env = TransformedEnv(env, Compose(
    DoubleToFloat(),
    StepCounter(),
))

print(env.action_spec)


### Sample Policy

In [None]:
num_cells = 64

# Simple Actor-Critic Setup

# You can skip these if you want, these are the underlying neural networks.
# Since we are using a Discrete policy, we need to use a Softmax to transform the outputs into action probabilities.
actor_net = nn.Sequential(
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(3),
    nn.Softmax(dim = -1)
)


value_net = nn.Sequential(
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(num_cells),
    nn.Tanh(),
    nn.LazyLinear(1),
)


# Actor Module
policy_module = ProbabilisticActor(
    module = TensorDictModule(
        actor_net, in_keys=["observation"], out_keys=["logits"]
    ),
    spec=env.action_spec,
    in_keys=["logits"],
    distribution_class=Categorical,
    return_log_prob=True,
    # we'll need the log-prob for the numerator of the importance weights
)

# Critic Module
value_module = ValueOperator(
    module=value_net,
    in_keys=["observation"],
)


### Saving Policy Module

In [None]:
# Saving models in Pytorch
policy_path = "policy_module.pth"
value_path = "value_module.pth"


torch.save(policy_module, policy_path)
torch.save(value_module, value_path)


### Integrating the Saved Model into FastAPI

Now that your RL models are saved, you can load them from the saved directory in your FASTAPI application. This will allow your API to use the finetuned model to answer questions. The below example code is stored in `app.py`

In [None]:
from fastapi import FastAPI, HTTPException, Request
import torch
import os
import gymnasium as gym
from torchrl.envs import GymWrapper, TransformedEnv, Compose, set_exploration_type, DoubleToFloat, StepCounter
import numpy as np
import uvicorn
app = FastAPI()

model_directory = 'src/models'
policy_name = 'policy_module.pth' #Speciffy your model filename here
# Full path to model file
model_path = os.path.join(model_directory, policy_name)

# Load the policy module
policy_module = torch.load(model_path, weights_only=False)



#To wrap whatever transforms and to filter only the action as output
# We highly recommend you to torch.export your model, but you can explore
# other alternatives.

base_env = gym.make("MountainCar-v0", render_mode="rgb_array")
env = GymWrapper(
    gym.make("MountainCar-v0", render_mode="rgb_array"), categorical_action_encoding=  True, device = "cpu"
)

env = TransformedEnv(env, Compose(
    DoubleToFloat(),
    StepCounter(),
))

fake_td = env.base_env.fake_tensordict()
obs = fake_td['observation']

#warmup policy module
policy_module(obs)

with set_exploration_type("DETERMINISTIC"):
    exported = torch.export.export(
    policy_module.select_out_keys("action"),
    args=(),
    kwargs={'observation':obs},
    strict = False
  )

#### End of exporting

@app.get("/health")
def health():
  return {"message": "health ok"}

@app.post("/rl")
async def rl(request: Request):
  """
  Feed observation into RL model
  Returns action taken given current observation (int)
  """

  #get observation, feed into model
  input_json = await request.json()

  predictions = []

  for instance in input_json["instances"]:
    output =  exported.module()(observation=torch.tensor(instance))
    print(output)
    predictions.append({"action": output.detach().numpy().tolist()})
  return {"predictions": predictions}


if __name__ == "__main__":
  uvicorn.run(app, host="0.0.0.0", port=8000)


### Create a Dockerfile

Create a Dockerfile in the same directory as your FastAPI app (app.py). This file will define the Docker image that includes your app and all its dependencies

```
# using base python image bc RL agents don't often use much gpu
FROM --platform=linux/amd64 python:3.11-slim

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE 1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1

# pip gives a warning if you install packages as root
# set this flag to just ignore the warning
ENV PIP_ROOT_USER_ACTION=ignore

RUN pip install -U pip
WORKDIR /workspace

# install other requirements
COPY docker-requirements.txt .
RUN pip install -r docker-requirements.txt

# copy the rest of the files into the container
COPY src .

# start model service
CMD uvicorn api_service:app --port 5000 --host 0.0.0.0
```

### Build the Docker Image

From your project directory (where your `Dockerfile` and `app.py`are located), run the following command to build the Docker image

```sudo docker build -t rl_app:1.0.0 .```

### Run the Docker Container

```sudo docker run -p 8000:8000 rl_app:1.0.0```

Docker runs the container and map port 8000 of the container to port 8000 on your host, allowing us to access the FastAPI application using the browswer, requests library or Postman.

### Testing `app` using `requests`
This will be a separate file called `test.py`

In [None]:
import requests
import gymnasium as gym
from torchrl.envs import GymWrapper, TransformedEnv, Compose, set_exploration_type, DoubleToFloat, StepCounter
import numpy as np

base_env = gym.make("MountainCar-v0", render_mode="rgb_array")
env = GymWrapper(
    gym.make("MountainCar-v0", render_mode="rgb_array"), categorical_action_encoding=  True, device = "cpu"
)
# The endpoint URL
url = 'http://localhost:8000/rl'

# Example question and context
data = {
    "instances":[
        env.observation_space.sample().tolist() for i in range(3) #Samples 3 observations to send to the app
    ]
}

print(data)

# Sending a POST request
response = requests.post(url, json=data)

# Print the response from the server
print("Status Code:", response.status_code)
print("Response:", response.json())


{'instances': [[0.20252582430839539, 0.03165983408689499], [0.5146100521087646, -0.013421635143458843], [0.005742982961237431, -0.05793124437332153]]}


ConnectionError: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79dc61908cd0>: Failed to establish a new connection: [Errno 111] Connection refused'))

This result shows that the model is sucessfully able to respond and give actions based on the instances. While this notebook provides a basic foundation for setting up a RL pipeline via FastAPI, there are many optimizations that can be had, such as using ONNX or another optimised framework instead for inference, or even a more detailed inference pipeline in your FastAPI server.