# API Model Integration

In this notebook we will take a look at integrating the fine-tuned VLM model for object detection into a real-world application using FastAPI. The setup involves creating an API that receives an image and caption as input and returns the predicted bounding box. As before, we will create a Docker image for the FastAPI app to deploy it.

### Saving the Model and Processor
After training the model in 6.3.2, you can save it along with its image processor to a directory. This is commonly done using the `save_pretrained()` method provided by the Hugging Face Transformers library.

In [None]:
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection

# replace this with your actual model training setup
checkpoint = "google/owlvit-base-patch32"
model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
processor = AutoProcessor.from_pretrained(checkpoint)

model_path = "vlm_model.pth"
# After training:
model.save_pretrained(model_path)
processor.save_pretrained(model_path)

### Integrating the Saved Model into FastAPI
Now that your model and processor are saved, you can load them from the saved directory in your FastAPI application. This will allow your API to use the fine-tuned model to run object detection. The below example code is stored in `app.py` in the `vlm_app` folder

```python
import base64
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
import io
from PIL import Image
import torch
import os

app = FastAPI()

# Fetch the model directory from the environment variable
model_directory = os.getenv("MODEL_PATH", "/usr/src/app/models")
model_filename = "vlm_model.pth"  # Specify your model filename here

# Full path to the model file
model_path = os.path.join(model_directory, model_filename)

# Load the model and tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForZeroShotObjectDetection.from_pretrained(
    model_path, device_map=device
)
processor = AutoProcessor.from_pretrained(model_path, device_map=device)


class VLMInput(BaseModel):
    image: str
    caption: str


@app.post("/predict")
async def predict(data: VLMInput):
    image_bytes = base64.b64decode(data.image)
    im = Image.open(io.BytesIO(image_bytes))

    # text prompts
    inputs = processor(text=[data.caption], images=im, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        target_sizes = torch.tensor([im.size[::-1]])
        results = processor.post_process_object_detection(
            outputs, threshold=0.1, target_sizes=target_sizes
        )[0]

    bbox = results["boxes"].tolist()
    return bbox


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)


### Create a Dockerfile
Create a `Dockerfile` in the same directory as your FastAPI app (`app.py`). This file will define the Docker image that includes your app and all its dependencies.

```docker
FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/pytorch-gpu.2-2.py310

# Set the working directory in the container
WORKDIR /usr/src/app

COPY . /usr/src/app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 8000 available to the world outside this container
EXPOSE 8000

# Define environment variable
ENV MODEL_PATH=/usr/src/app/models

# Run app.py when the container launches
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]


### Create a Requirements File
Create a `requirements.txt` file that lists the packages that your app depends on. Make sure to include fastapi, uvicorn, torch, transformers, and any other required libraries. Torch isn't included in this `requirements.txt` because it's included in the starting Docker image (i.e. the image indicated in the first `FROM` line in the `Dockerfile`).

```txt
fastapi
uvicorn[standard]
pydantic
transformers==4.37.0
accelerate
```


### Build the Docker Image
From your project directory (where your `Dockerfile` and `app.py` are located), run the following command to build the Docker image
```bash
docker build -t vlm_app .
```

### Run the Docker Container
```bash
docker run -p 8000:8000 vlm_app
```

Docker runs the container and map port 8000 of the container to port 8000 on your host, allowing us to access the FastAPI application using the browser, `requests` library or Postman.

### Testing `vlm_app` using `requests`

In [4]:
import requests
from base64 import b64encode

# The endpoint URL
url = 'http://localhost:8000/predict'

# base64 encode image so it can be passed in json
with open("../assets/dog1.jpg", "rb") as f:
    image = b64encode(f.read()).decode("utf-8")

# Example question and context
data = {
    "image": image,
    "caption": "dog",
}

# Sending a POST request
response = requests.post(url, json=data)

# Print the response from the server
print("Status Code:", response.status_code)
print("Response:", response.json())


Status Code: 200
Response: [[32.91341018676758, 2088.8251953125, 3719.854736328125, 5634.173828125]]


This result shows that the model is successfully able to respond to the request. 