# Machine Learning Deployment using FastAPI

In the previous section, w've learned how to quickly deploy machine learning models with an interactive interface using Gradio. It's fast and easy to build and deploy machine learning prototype for other poeple to try. But most of the time in production, we need different kind of user interface that has more flexibility and functionality or maybe we want to integrate the model to our existin interface. In this situation, we may have to decouple our machine learning model with the interface itself. We can focus more on the deploying machine learning models and find a way to let other services communicate to our model.

A web framework is a software package or library that provides a structured and standardized way to build web applications. It typically offers a collection of tools, components, and abstractions that simplify common web development tasks, such as handling HTTP requests, managing routing, interacting with databases, and generating HTML or other types of responses.

![web-framework-scheme](https://ubiops.com/wp-content/uploads/2023/04/Basic-elements-of-a-data-science-web-API.png)

How do to services communicate with each other? They're using an Application Programming Interface. Application Programming Interfaces (or APIs for short) have been around for a long time and basically provide a standardized way of communication between two software applications that are not necessarily of the same type.

Applied to our context of data science, an API allows for the communication between a web page or app and your AI application. The API opens up certain user-defined URL endpoints, which can be used to send or receive requests with data. These endpoints are not dependent on the application: if you update your algorithm, the interface will stay the same. This minimizes the work required to update the running application.

You can read more details about how API works between two applications from [here](https://hygraph.com/blog/how-do-apis-work) 

## FastAPI Web Framework

FastAPI is a modern, high-performance web framework for building APIs (Application Programming Interfaces) with Python. It is **designed to be fast, easy to use, and highly efficient**, making it a popular choice for developing web applications and microservices.

One of the main features of FastAPI is its ability to **generate highly efficient code** by leveraging the type annotations introduced in Python 3.6 and above. By using type hints, **FastAPI can automatically validate data, handle request and response serialization**, and generate interactive API documentation.

While FastAPI is a popular choice for deploying machine learning models, it's important to note that **there are several other frameworks and tools available** for deploying ML models, and the choice ultimately **depends on your specific requirements and preferences**. However, FastAPI does offer several advantages that make it a recommended option in many cases. Here are some reasons why professionals often suggest using FastAPI for deploying machine learning models:

- **Performance**: FastAPI is known for its high performance and low latency. It leverages asynchronous programming techniques and supports asynchronous libraries like TensorFlow and PyTorch, making it suitable for handling computationally intensive ML tasks efficiently. This can be crucial, especially when dealing with real-time or high-throughput applications.

 - **Integration with ML libraries**: FastAPI integrates seamlessly with popular Python ML libraries and frameworks such as TensorFlow, PyTorch, scikit-learn, and more. This makes it easy to incorporate your trained ML models into the API and leverage the rich functionality provided by these libraries.

- **Type annotations and validation**: FastAPI utilizes Python's type annotations to automatically serialize and validate request and response data. This ensures that the input data provided to your ML models is in the expected format, reducing the chances of errors and improving the overall reliability of your API.

- **Interactive documentation**: FastAPI automatically generates detailed API documentation, including information about endpoints, input/output data types, and request/response examples. This documentation is interactive and helps developers and users understand and interact with your ML API more effectively.

- **Scalability and concurrency**: FastAPI is designed to handle high levels of concurrency and can efficiently serve multiple API requests simultaneously. This is beneficial when deploying ML models that require handling multiple requests concurrently, especially in scenarios with high traffic or real-time predictions.

- **Deployment options**: FastAPI can be easily deployed on various platforms, including cloud services and containerization technologies like Docker. It provides flexibility in choosing the deployment option that best suits your needs, whether it's deploying on-premises or in the cloud.


### Sending Data over API

<img src="https://uploads.sitepoint.com/wp-content/uploads/2022/08/1661749125REST-API-Request.png" width=600>

When sending data over HTTP, there are several ways to include the data in the request:

- **Query Parameters**: Data can be sent as part of the URL query parameters. This method is commonly used in GET requests, where the data is appended to the URL in a key-value format. Query parameters are limited in size and may not be suitable for sending large amounts of data.

- **Request Body**: Data can be sent in the body of the HTTP request. This is typically done using methods like POST, PUT, and PATCH. The request body can contain various data formats, such as JSON, XML, or form data. This method allows for sending larger amounts of data and is more flexible in terms of the data structure.

- **Headers**: Data can also be included in the request headers. Headers are used to provide additional information about the request, and certain headers can be used to send data, such as authentication tokens or metadata. However, headers have limitations on the amount of data they can carry, and they may not be suitable for sending large payloads.

When choosing how to send data over HTTP, there are tradeoffs to consider:

- **Data Size and Performance**: Sending data in query parameters may be suitable for small amounts of data but can become impractical for larger payloads. Sending large amounts of data in the request body allows for more flexibility, but it may impact performance due to increased payload size and longer transfer times.

- **Security**: Depending on the nature of the data, its sensitivity, and the communication requirements, different methods may have varying security implications. For example, sending sensitive data as part of the URL query parameters may expose it in server logs or browser history, while sending it in the request body allows for better confidentiality.

- **Encoding and Parsing**: Different data formats require proper encoding and parsing on both the client and server sides. Ensure compatibility between the client and server in terms of data format and handling to avoid issues with data integrity or interpretation.

- **Server Constraints and API Design**: Consider the capabilities and constraints of the server and any limitations imposed by the API design. For example, some servers may have restrictions on the size of the request body or the maximum length of query parameters.

I'm not going to explain everything in here, but you can read the details from [here](https://www.sitepoint.com/rest-api/)

## Before Running

- Because we can't run FastAPI inside a Notebook, we have to use linux terminal to run the python script.
- Instead of using `torch` and `torchvision`, I'll introduce you with `onnxruntime` to execute onnx model and `albumentations` to process the augmentation pipeline
- We'll use the Notebook to explain the process and send a request to our app

In [6]:
# Install requirements
! pip -q install onnxruntime albumentations scipy

You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.[0m


## Launching Simple App

Similar to the previous section, we're going to deploy a simple image classfier. Before we go building a more complex system, we're going to start by creating an endpoint with these speifications:
- expects an image encoded as a base64 string in the request body
- return a dictionary of scores as JSON response

### Dummy App Code Sample

```
from fastapi import FastAPI
from pydantic import BaseModel
import base64

# create an fastapi instance
app = FastAPI()

# create input data model/definition
# useful to specify and validate incoming input
class ImageInput(BaseModel):
    image_base64: str

@app.post("/classify")
def classify_image(image: ImageInput):
    # Decode the base64 image string
    image_data = base64.b64decode(image.image_base64)

    # Process the image (dummy code)
    # Replace this with your actual machine learning model prediction code
    # Here, we assume the image is classified as 80% cat and 20% dog
    classification_results = {"cat": 0.8, "dog": 0.2}

    return classification_results
```

- **Importing Dependencies**: The necessary dependencies are imported, including `FastAPI` for creating the API, `BaseModel` from `pydantic` for defining the input data model, and `base64` for decoding the base64 image string.

- **Creating the FastAPI Application**: An instance of `FastAPI` is created, named app, which represents the web application.

- **Defining the Input Data Model**: The ImageInput class is defined as a `BaseModel` subclass. It contains a single attribute image_base64 of type str. This class is used to specify the structure and validation rules for the input data expected by the `classify_image` endpoint.

- **Defining the Classification Endpoint**: The `@app.post("/classify")` decorator is used to define a POST endpoint at the `/classify` URL path. When a POST request is made to this endpoint, the classify_image function is invoked.

- **Processing the Image**: Within the `classify_image` function, the base64-encoded image string is decoded using `base64.b64decode`. This converts the string back into its binary image data representation.

- **Machine Learning Model (Dummy Code)**: Next, the code includes a placeholder for the machine learning model prediction code. In the provided dummy code, a dictionary named `classification_results` is created, assuming the image is classified as 80% cat and 20% dog. This section should be replaced with the actual machine learning model prediction code.

- **Returning Classification Results**: The `classification_results` dictionary is returned as the response from the API endpoint. This data will be serialized into JSON format automatically by `FastAPI`.

### Execute app
Open a new terminal session in the notebook and run the following command to start the server:
```
uvicorn dummy_app:app
```

After running the command, don't close the terminal session and return back to this notebook.

### Test Endpoint

Since the `app` is running we can try to communicate with it using `POST` method. This script reads an image file, encodes it into base64, creates a payload with the base64-encoded image, sends a POST request to the `/classify` endpoint, and prints the classification results if the response is successful (status code 200) or prints the error message if an error occurs.

In [1]:
import requests
import base64

# Encode the image file to base64
def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
    return encoded_string

# Image file path
image_path = "samples/cannoli.jpg"

# Encode the image file to base64
image_base64 = encode_image_to_base64(image_path)

# API endpoint URL
url = "http://localhost:8000/classify"  # Update with the correct URL

# Payload data
payload = {"image_base64": image_base64}

# Send POST request
response = requests.post(url, json=payload)

# Check response status code
if response.status_code == 200:
    # Successful response
    classification_results = response.json()
    print("Classification Results: ", classification_results)
else:
    # Error occurred
    print("Error:", response.text)


Classification Results:  {'cat': 0.8, 'dog': 0.2}


## Launching Basic App

Now that we're able to communicate with our application, let's integrate our machine learning model into the application and build a new `basic_app.py` In this step, we're want to remove `torch` dependency by using `ONNX Runtime` to process the model and use `albumentations` for performing augmentation instead of `torchvision`. This could be beneficial if we want to reduce the application size by using `numpy` library that is a lot smaller than `torch` library size.

### Convert ONNX Model

We need to convert the pretrained torch model into onnx model format. This is important to reduce the latency by optimizing the model.

In [2]:
import os

ROOT_DIR = os.path.dirname(os.path.abspath(''))

import sys
sys.path.append(ROOT_DIR)

In [3]:
import os
import torch
from src.models import ResNet18, BasicBlock

PRETRAINED_MODEL = os.path.join(ROOT_DIR, 'pretrained/simple-lightning-epoch100/resnet18_epoch99.ckpt')

model = ResNet18(3, 10)

checkpoint = torch.load(PRETRAINED_MODEL)

# The state dict will contains net.layer_name
# Our model doesn't contains `net.` so we have to rename it
state_dict = checkpoint['state_dict']
for key in list(state_dict.keys()):
    if 'net.' in key:
        state_dict[key.replace('net.', '')] = state_dict[key]
        del state_dict[key]

model.load_state_dict(state_dict)
model.eval()

input = torch.rand(1, 3, 384, 384)
_ = model(input)

# Export the model
torch.onnx.export(model,                     # model being run
                  input,                     # model input (or a tuple for multiple inputs)
                  "food101_resenet18.onnx",  # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                 )

verbose: False, log level: Level.ERROR



### Basic App Code Sample

Here is the sample code to our basic application. It is similar from the previous dummy application, but we use a model processing pipeline to produce real result instead of a dummy process and dummy result.

```
import base64

import albumentations as A
import cv2
import numpy as np
import onnxruntime as ort
from fastapi import FastAPI
from pydantic import BaseModel
from scipy.special import softmax

# Dataset Metadata
RGB_MEAN = [0.51442681, 0.43435301, 0.33421855]
RGB_STD = [0.24099932, 0.246478, 0.23652802]

# Transformation pipeline using Albumentations
transformation_pipeline = A.Compose([
    A.CenterCrop(width=384, height=384),
    A.Normalize(mean=RGB_MEAN, std=RGB_STD)
])

# Load the ONNX model to onnxruntime
onnx_model_path = 'food101_resenet18.onnx'
model = ort.InferenceSession(onnx_model_path)  # Update with the correct model path

# Get model input/output names
input_name = model.get_inputs()[0].name
output_name = model.get_outputs()[0].name

class_names = ['apple_pie', 'bibimbap', 'cannoli', 'edamame', 'falafel', 'french_toast', 'ice_cream', 'ramen', 'sushi', 'tiramisu']
class_names.sort()

app = FastAPI()

class ImageInput(BaseModel):
    image_base64: str

def preprocess_image(image: np.ndarray):
    """Preprocess the input image.

    Note that the input image is in RGB mode.

    Parameters
    ----------
    image: np.ndarray
        Input image from callback.
    """

    image = transformation_pipeline(image=image)['image']
    image = np.transpose(image, (2, 1, 0))
    image = np.expand_dims(image, axis=0)

    return image

@app.post("/classify")
def classify_image(image_input: ImageInput):
    # Decode the base64 image string
    image_data =  np.fromstring(base64.b64decode(image_input.image_base64), np.uint8)
    image = cv2.imdecode(image_data, cv2.IMREAD_COLOR)

    # If input not valid, return dummy data or raise error
    if image is None:
        return {"cat": 0.8, "dog": 0.2}

    # Preprocess image
    processed_image = preprocess_image(image)

    # Run inference using the ONNX model
    prediction = model.run([output_name], {input_name: processed_image})[0] # takes the first output

    # Postprocess result
    prediction = softmax(prediction, axis=1)[0] # Apply softmax to normalize the output
    labeled_result = {name:score for name, score in zip(class_names, prediction.tolist())}

    return labeled_result

```

- **Importing Dependencies**: The necessary dependencies are imported, including `base64` for base64 decoding, `albumentations` for image transformations, `cv2` for image decoding, `numpy` for array operations, `onnxruntime` for running the ONNX model, and `FastAPI` for creating the API.

- **Dataset Metadata and Transformation Pipeline**: The code defines RGB mean and standard deviation values for dataset normalization. It also creates a transformation pipeline using `Albumentations`, which performs a center crop and normalizes the image using the defined mean and standard deviation.

- **Loading the ONNX Model**: The ONNX model is loaded using ONNX Runtime (`ort.InferenceSession`). The path to the model file is provided as a parameter.

- **Preprocessing Function**: The `preprocess_image` function takes an input image and applies the defined transformation pipeline. The image is transposed, expanded, and returned as a preprocessed image.

- **/classify Endpoint**: The `/classify endpoint` is defined using the `@app.post decorator`. When a POST request is made to this endpoint, the `classify_image` function is invoked.

- **Image Decoding and Preprocessing**: The `classify_image` function decodes the base64 image string using `base64.b64decode` and `cv2.imdecode`. It checks if the image is valid and if not, returns dummy data. If the image is valid, it preprocesses the image using the preprocess_image function.

- **Running Inference**: The preprocessed image is passed to the ONNX model using `model.run`. The output predictions are retrieved and stored in the prediction variable.

- **Postprocessing**: The prediction scores are postprocessed by applying the softmax function to normalize the output probabilities. The class names and corresponding scores are combined into a dictionary, labeled_result, where the class name is the key and the score is the value.

- **Returning the Result**: The labeled_result dictionary is returned as the response from the `/classify endpoint`.



### Execute app
Close the previous terminal seesion by pressing `CTRL+C` or close the terminal window. Atter that, run the following command to start the server:

```
uvicorn basic_app:app
```

After running the command, don't close the terminal session and return back to this notebook.

### Test Endpoint

Since the `app` is running we can try to communicate with it using `POST` method. This script reads an image file, encodes it into base64, creates a payload with the base64-encoded image, sends a POST request to the `/classify` endpoint, and prints the classification results if the response is successful (status code 200) or prints the error message if an error occurs.

In [4]:
# Image file path
image_path = "samples/cannoli.jpg"

# Encode the image file to base64
image_base64 = encode_image_to_base64(image_path)

# API endpoint URL
url = "http://localhost:8000/classify"  # Update with the correct URL

# Payload data
payload = {"image_base64": image_base64}

# Send POST request
response = requests.post(url, json=payload)

# Check response status code
if response.status_code == 200:
    # Successful response
    classification_results = response.json()
    print("Classification Results: ", classification_results)
else:
    # Error occurred
    print("Error:", response.text)

Classification Results:  {'apple_pie': 0.007718801964074373, 'bibimbap': 2.0401820677307114e-07, 'cannoli': 0.9777591228485107, 'edamame': 1.0776185263239313e-05, 'falafel': 0.0001419971522409469, 'french_toast': 0.014265178702771664, 'ice_cream': 0.00010071938595501706, 'ramen': 1.887471157147047e-08, 'sushi': 2.448658960929606e-06, 'tiramisu': 6.506219847324246e-07}


This example should be enough to give you on how to deploy the application using a web framework. Once the application is ready, you can deploy the application on a cloud or vm according to your needs. The best way to deploy your model is to use a Docker, you can read more about it [here](https://docker-curriculum.com/).