# Deploying pretrained VLM to FastAPI

### Integrating the pretrained VLM Model (Owl-Vit) into FastAPI

To aid in deployment, we would define the following functions and class

**Functions**
- loading_image: Downloads an image from a URL and converts it to RGB format.
- detect_objects: Uses the Hugging Face Transformers library to perform zero-shot object detection on an image.
- parsing_results: Filters object detection results based on a confidence threshold and extracts bounding box coordinates.

**FastAPI Application**
- test is a simple GET endpoint that returns a greeting.
- predict is a POST endpoint that takes a VLMInput object as input, performs object detection, and returns the bounding box coordinates of the detected objects.

**VLMInput (predict POST payload)**

VLMInput is a Pydantic model that defines the input data structure for the predict endpoint. It has three attributes:
- url: The URL of the input image.
- labels: The list of object labels to detect.
- threshold: The confidence threshold for object detection (default: 0.5).

```python
import base64
from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
import io
from PIL import Image
import torch
import os
from transformers import pipeline
import urllib.request 

app = FastAPI()

def loading_image(url):
    urllib.request.urlretrieve(
        url,
        "tmpt.png") 
    img = Image.open("tmpt.png").convert("RGB")
    return img

def detect_objects(img, labels):
    checkpoint = "google/owlv2-base-patch16-ensemble"
    detector = pipeline(model=checkpoint, task="zero-shot-object-detection")
    predictions = detector(
        img,
        candidate_labels=labels.split(","),
    )
    print(labels.split(","))
    return predictions

def parsing_results(predictions, label, threshold):
    predict_dict = {}
    for prediction in predictions:
        if prediction["score"]>threshold:
            label = prediction["label"]
            predict_dict[label] = [(prediction["box"]['xmin'], prediction["box"]['xmax'], prediction["box"]['ymin'], prediction["box"]['ymax'])]
    return predict_dict

@app.get("/{item_id}")
def test():
    return {"Hello": f"World_{item_id}"}

class VLMInput(BaseModel):
    url: str
    labels: str
    threshold: float = 0.01

@app.post("/predict")
async def predict(data: VLMInput):
    img = loading_image(data.url)
    
    # detect object bounding boxes
    predictions = detect_objects(img, data.labels)

    # get images of objects
    predict_dict = parsing_results(predictions, data.labels, data.threshold)

    # return bounding box of best match
    return predict_dict


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)
```

### Create a Dockerfile
Create a `Dockerfile` in the same directory as your FastAPI app (`app.py`). This file will define the Docker image that includes your app and all its dependencies.

```docker
FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/pytorch-gpu.2-2.py310

# Set the working directory in the container
WORKDIR /usr/src/app

COPY . /usr/src/app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 8000 available to the world outside this container
EXPOSE 8000

# Define environment variable
ENV MODEL_PATH=/usr/src/app/models

# Run app.py when the container launches
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Create a Requirements File
Create a `requirements.txt` file that lists the packages that your app depends on. Make sure to include fastapi, uvicorn, torch, transformers, and any other required libraries. Torch isn't included in this `requirements.txt` because it's included in the starting Docker image (i.e. the image indicated in the first `FROM` line in the `Dockerfile`).

```txt
fastapi
uvicorn[standard]
pydantic
timm
transformers==4.37.0
accelerate
```


### Build the Docker Image
From your project directory (where your `Dockerfile` and `app.py` are located), run the following command to build the Docker image
```bash
docker build -t vlm_app .
```

### Run the Docker Container
```bash
docker run -p 8000:8000 vlm_app

#--gpus all
```

Docker runs the container and map port 8000 of the container to port 8000 on your host, allowing us to access the FastAPI application using the browser, `requests` library or Postman. We also give the container access to all the GPUs on our system such that it can run the models on GPU using CUDA, rather than on the CPU.

### Testing `vlm_app` using `requests`

In [31]:
import requests
from base64 import b64encode

# The endpoint URL
url = 'http://localhost:8000/predict'

# Example url and context
data = {
    "url": "https://th.bing.com/th/id/OIP.WhJW62tRiVMktCDMwRb52gHaJQ?rs=1&pid=ImgDetMain",
    "labels": "helmet, nasa badge",
}

# Sending a POST request
response = requests.post(url, json=data)

# Print the response from the server
print("Status Code:", response.status_code)
print("Response:", response.json())


Status Code: 200
Response: {'helmet': [[57, 176, 336, 493]], ' nasa badge': [[158, 250, 123, 168]]}


This result shows that the model is successfully able to respond to the request. 

# Exercise (20 mins)

**1. Refining Object Detection Results: Applying Confidence Thresholds and Bounding Box Centers**

In object detection tasks, it's essential to refine the results to ensure accuracy and relevance. 

Let's explore how to modify the request to return only confident predictions and adjust the functions value to provide the center of the object's bounding box.