<img src="images/onnx.png" width=700 align='left'>

# ONNX (Open Neural Network eXchange)

Originally created by Facebook and Microsoft as an industry collaboration for import/export of neural networks
* ONNX has grown to include support for "traditional" ML models
* interop with many software libraries
* has both software (CPU, optional GPU accelerated) and hardware (Intel, Qualcomm, etc.) runtimes.

https://onnx.ai/

* DAG-based model
* Built-in operators, data types
* Extensible -- e.g., ONNX-ML
* Goal is to allow tools to share a single model format

*Of the "standard/open" formats, ONNX clearly has the most momentum in the past year or two.*

## Viewing a Model

ONNX models are not directly (as raw data) human-readable, but, as they represent a graph, can easily be converted into textual or graphical representations.

Here is a snippet of the [SqueezeNet](https://arxiv.org/abs/1602.07360) image-recognition model, as rendered in the ONNX visualization tutorial at https://github.com/onnx/tutorials/blob/master/tutorials/VisualizingAModel.md. 

> The ONNX codebase comes with the visualization converter used in this example -- it's a simple script currently located at https://github.com/onnx/onnx/blob/master/onnx/tools/net_drawer.py

<img src='images/squeezenet.png'>

### Let's Build a Model and Convert it to ONNX

In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression

data = pd.read_csv('data/diamonds.csv')
X = data.carat
y = data.price
model = LinearRegression().fit(X.values.reshape(-1,1), y)

ONNX can be generated from many modeling tools. A partial list is here: https://github.com/onnx/tutorials#converting-to-onnx-format

Microsoft has contributed a lot of resources toward open-source ONNX capabilities, including, in early 2019, support for Apache Spark ML Pipelines: https://github.com/onnx/onnxmltools/blob/master/onnxmltools/convert/sparkml/README.md

__Convert to ONNX__

Note that we can print a string representation of the converted graph.

In [None]:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([1, 1]))]
onx = convert_sklearn(model, initial_types=initial_type)
print(onx)

Let's save it as a file:

In [None]:
with open("diamonds.onnx", "wb") as f:
    f.write(onx.SerializeToString())

The file itself is binary:

In [None]:
! head diamonds.onnx

## How Do We Consume ONNX and Make Predictions?

One of the things that makes ONNX a compelling solution in 2019 is the wide industry support not just for model creation, but also for performant model inference.

Here is a partial list of tools that consume ONNX: https://github.com/onnx/tutorials#scoring-onnx-models

Of particular interest for productionizing models are
* Apple CoreML
* Microsoft `onnxruntime` and `onnxruntime-gpu` for CPU & GPU-accelerated inference
* TensorRT for NVIDIA GPU
* Conversion for Qualcomm Snapdragon hardware: https://developer.qualcomm.com/docs/snpe/model_conv_onnx.html

Today, we'll look at "regular" server-based inference with a sample REST server, using `onnxruntime`

#### We'll start by loading the `onnxruntime` library, and seeing how we make predictions

In [None]:
import onnxruntime as rt

sess = rt.InferenceSession("diamonds.onnx")

print("In", [(i.name, i.type, i.shape) for i in sess.get_inputs()])
  
print("Out", [(i.name, i.type, i.shape) for i in sess.get_outputs()])

We've skipped some metadata annotation in the model creation for this quick example -- that's why our input field name is "float_input" and the output is called "variable"

In [None]:
import numpy as np

sample_to_score = np.array([[1.0]], dtype=np.float32)

In [None]:
output = sess.run(['variable'], {'float_input': sample_to_score})

output

In [None]:
output[0][0][0]

#### At this point, we can build our service...

Now we are free to choose our service infrastructure of choice.

Moreover, we can containerize that service, so we get back all of the benefits of Docker, Kubernetes, etc.

But this time, we have a minimal serving infrastructure that knows only about the model itself, and loads models in a single, open, industry-standard format.

#### Pros and Cons: ONNX
* Most major deep learning tools have ONNX support
* MIT license makes it both OSS and business friendly
* Seems to achieve its first-order goal of allowing tools interop for neural nets
* As of 2019, is the closest thing we have to an open, versatile, next-gen format *with wide support*
* Protobuf format is compact and typesafe
* Biggest weakness was "classical" ML and feature engineering support -- this has now been fixed
* Microsoft open-sourced (Dec 2018) a high-perf runtime (GPU, CPU, language bindings, etc.) https://azure.microsoft.com/en-us/blog/onnx-runtime-is-now-open-source/
  * Being used as part of Windows ML / Azure ML
  * https://github.com/Microsoft/onnxruntime
* In Q1-Q2 of 2019, Microsoft added a Spark ML Pipeline exporter to the `onnxmltools` project
  * https://github.com/onnx/onnxmltools

Cons:
* Focused on DAG-friendly neural networks (i.e., not intended to be fully general)
* Wasn't originally intended as a deployment format *per se*
  * Doesn't have a standard or reference runtime
  * Doesn't provide certification or standard around correctness
  * No opinion on security, etc.
* Protobuf format is not human readable or manageable via text-oriented tooling
  * Though the graph itself can be (e.g., PyTorch export output)