# Using ONNX Runtime with Konduit Serving to serve a PyTorch model

The Open Neural Network Exchange (ONNX) format is supported by a number of deep learning frameworks, including PyTorch, CNTK and MXNet. 

This notebook provides an example of serving a model built in PyTorch with [ONNX Runtime](https://github.com/microsoft/onnxruntime), a cross-platform, high performance scoring engine for machine learning models.

In [1]:
import os 
from urllib.request import urlretrieve 
import sys 
import numpy as np 
import time 
from PIL import Image 

import onnx
from onnx import optimizer

from konduit import PythonConfig, ServingConfig, InferenceConfiguration, PythonStep
from konduit.server import Server
from konduit.client import Client 
from konduit.utils import default_python_path

ModuleNotFoundError: No module named 'onnx'

## Download file 

For the purposes of this example, we use ONNX model files from [Ultra-Light-Fast-Generic-Face-Detector-1MB](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB) by Linzaer, a lightweight facedetection model designed for edge computing devices. 

In [None]:
dl_path = os.path.abspath("../data/facedetector/facedetector.onnx")
DOWNLOAD_URL = "https://raw.githubusercontent.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB/master/models/onnx/version-RFB-320.onnx"
if not os.path.isfile(dl_path):
    urlretrieve(DOWNLOAD_URL, filename=dl_path)

The following content is based on the PyTorch tutorial [Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html), with modifications.

We start by loading the model and running `onnx.checker.check_model` to check whether the model has a valid schema. 

In [None]:
# Load the ONNX model
model = onnx.load(dl_path)
# model is a onnx.ModelProto object 

onnx.checker.check_model(model)

## Optimize 

When loading some models, ONNX may return warnings that the model can be further optimized by removing some unused nodes. 

Use ONNX's optimizer to optimize your ONNX file. The code below is adapted from this [GitHub comment](https://github.com/microsoft/onnxruntime/issues/1899#issuecomment-534806537). 

Note that the API for optimizing models in ONNX Runtime is experimental, and [may change](https://github.com/onnx/onnx/blob/c08a7b76cf7c1555ae37186f12be4d62b2c39b3b/onnx/optimizer/optimize.h#L1-L2). 

In [None]:
onnx_model = onnx.load(dl_path)
passes = ["extract_constant_to_initializer", "eliminate_unused_initializer"]
optimized_model = optimizer.optimize(onnx_model, passes)
onnx.save(optimized_model, dl_path)

## Python script with PyTorch and ONNX Runtime 

Now that we have an optimized ONNX file, we can serve our model. 

The following code:
- transforms a [PIL](https://python-pillow.org/) image into a 240 x 320 image, 
- casts it into a PyTorch Tensor, 
- adds an extra dimension with [`unsqueeze`](https://pytorch.org/docs/stable/torch.html#torch.unsqueeze), 
- casts the Tensor into a NumPy array, then 
- returns the model's output with ONNX Runtime. 

In [None]:
python_code = """

from PIL import Image 
import torchvision.transforms as transforms
import onnxruntime
import os 

dl_path = os.path.abspath("../data/facedetector/facedetector.onnx")

image = Image.fromarray(image.astype('uint8')[0], 'RGB')
resize = transforms.Resize([240, 320])
img_y = resize(image)
to_tensor = transforms.ToTensor()
img_y = to_tensor(img_y)
img_y.unsqueeze_(0)

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

ort_session = onnxruntime.InferenceSession(dl_path)
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(img_y)}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs[0]

"""

## Configuring the server

### Defining a `PythonConfig` 
- Here we use the `python_code` argument instead of `python_code_path`, since the code is defined as a string. 
- Define the inputs and outputs as dictionaries, where the keys represent objects in the server's Python environment, and the values represent data types (Python data structures), defined as strings. See https://serving.oss.konduit.ai/python for supported data types. 

In [None]:
work_dir = os.path.abspath('.')

python_config = PythonConfig(
    python_code=python_code,
    python_inputs={"image": "NDARRAY"}, 
    python_outputs={"img_out_y": "NDARRAY"}, 
    python_path=default_python_path(work_dir)
)

### Define a pipeline step with the `PythonStep` class. 

In the `.step()` method, define a name for this step (`input1`) and the respective configuration (`python_config`). 

In [None]:
onnx_step = (PythonStep()
             .step(input_name="input1", 
                   python_config=python_config))

### Define the server configuration using the Server class. 

In [None]:
port = np.random.randint(1000, 65535)

server = Server(
    steps=onnx_step, 
    serving_config=ServingConfig(http_port=port)
)

## Serving the model 

Load a sample image using PIL/Pillow, start the server, and send the image to the server for prediction using the `predict()` method of the `Client` class. 

In [None]:
server.start()

im = Image.open("../data/facedetector/1.jpg")
im = np.array(im).astype("int")

### Configure the Client
Since the image is passed to the Server as a NumPy array, specify the input and output data format as `NUMPY`. 

In [None]:
client = Client(port=port)

output = client.predict(
    {"input1": im}
)
print(output)

Finally, we stop the server:

In [None]:
server.stop()