**Install ONNX and ONNX Simplifier**

In [None]:
!sudo apt-get install protobuf-compiler libprotoc-dev
!pip install onnx
!pip install onnx-simplifier
!pip install torch-summary

**Load And Convert The Model to ONNX Format**

Copy the model checkpoint weights(**model_296.pth**) and model class file(**SINet.py**) into the current directory, and finally export the pytorch model to onnx format.

In [None]:
import torch
from SINet import *

config = [[[3, 1], [5, 1]], [[3, 1], [3, 1]],
              [[3, 1], [5, 1]], [[3, 1], [3, 1]], [[5, 1], [3, 2]], [[5, 2], [3, 4]],
              [[3, 1], [3, 1]], [[5, 1], [5, 1]], [[3, 2], [3, 4]], [[3, 1], [5, 2]]]
model = SINet(classes=2, p=2, q=8, config=config,
                  chnn=1)

model.load_state_dict(torch.load('/content/model_296.pth'))
model.eval()

dummy_input = torch.randn(1, 3, 320, 320)
input_names = [ "data" ]
output_names = [ "classifier/1" ]

torch.onnx.export(model, dummy_input, "SINet_320.onnx", verbose=True, input_names=input_names, output_names=output_names, opset_version=11, export_params=True, do_constant_folding=True)

**Note:** The original  SINet class file was modified at line 116, i.e **x.size()** to **[int(s) for s in x.size()]** . This makes the size value static and prevents error during onnx conversion. Also, some layers like **ReduceMax** may not be fully supported in **opencv onnx** runtime, so replace this operation with appropriate reshape and **maxpool** layers.

**Add Softmax Layer And Save Model**

Save another version of the model with **softmax output**.

In [None]:
from torchsummary import summary

soft_model = nn.Sequential(
    model,
    nn.Softmax(1)
)

# Export softmax model
dummy_input = torch.randn(1, 3, 320, 320)

soft_model.eval()
input_names = [ "data" ]
output_names = [ "Softmax/1" ]
torch.onnx.export(soft_model, dummy_input, "SINet_320_Softmax.onnx", verbose=True, input_names=input_names, output_names=output_names, opset_version=11, export_params=True, do_constant_folding=True)

**Optimize The Models With ONNX Simplifier**

In [None]:
!python3 -m onnxsim SINet_320_Softmax.onnx SINet_320_optim_Softmax.onnx
!python3 -m onnxsim SINet_320.onnx SINet_320_optim.onnx

**Run Inference Using ONNX-Runtime**

In [5]:
import numpy as np
import cv2
import onnxruntime as rt

img = cv2.imread('obama.jpg')
img = cv2.resize(img, (320,320))
img = img.astype(np.float32)

# Preprocess images based on the original training/inference code
mean = [102.890434, 111.25247,  126.91212 ]
std = [62.93292,  62.82138,  66.355705]

img=(img-mean)/std

img /= 255
img = img.transpose((2, 0, 1))
img = img[np.newaxis,...]

# Perform inference using the ONNX runtime
sess = rt.InferenceSession("/content/SINet_320_optim.onnx")
input_name = sess.get_inputs()[0].name
pred_onx = sess.run(None, {input_name: img.astype(np.float32)})[0]
res=np.argmax(pred_onx[0], axis=0)[...,np.newaxis]

Perform pytorch model inference on **GPU** and **compare** the results with **ONNX** model output.

In [None]:
# Enable gpu mode, if cuda available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load the model and inputs into GPU
model.to(device)
inputs=torch.from_numpy(img).float().to(device)

# Perform prediction and plot results
with torch.no_grad():    
         torch_res = model(inputs)
         _, mask = torch.max(torch_res, 1)
      
torch_res = torch_res.cpu().numpy()

# Compare the outputs of onnx and pytorch models
np.allclose(pred_onx,torch_res, rtol=1e-03, atol=1e-05)

**Note:-** 
* On a dual core 2.2GHz CPU, the inference time of ONNX model is about **0.064 seconds** (i.e **15 fps**) wihtout any additional optimizations. 

* On **Tesla T4 GPU**, the avg. execution time of the **pytorch model** was around 0.010s(**100 fps**), whereas on CPU it was around 0.144s.

**Plot The Results Using Matplotlib**

In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
import numpy as np

# Read input and background images
image = cv2.imread('obama.jpg')
image = cv2.resize(image, (320,320))
background =  cv2.imread('whitehouse.jpeg')
background = cv2.resize(background, (320,320))

# Crop image using mask and blend with background 
output = res*image + (1-res)*background
output = cv2.cvtColor(output.astype(np.uint8), cv2.COLOR_BGR2RGB)

# Plot the results using matplotlib
im1 = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
im2 = cv2.cvtColor(background.astype(np.uint8), cv2.COLOR_BGR2RGB)

im3 = res.squeeze()*255
im4 = output

fig = plt.figure(figsize=(10., 10.))
grid = ImageGrid(fig, 111,  # similar to subplot(111)
                 nrows_ncols=(2, 2),  # creates 2x2 grid of axes
                 axes_pad=0.2,  # pad between axes in inch.
                 )

for ax, im in zip(grid, [im1, im2, im3, im4]):
    # Iterating over the grid returns the Axes.
    ax.imshow(im)

plt.show()

**Pytorch To CoreML (Experimental)**

Install latest coremltools

In [None]:
!pip install --upgrade coremltools

Load the pytorch saved model and directly convert in to **CoreML** format

In [None]:
from SINet import *
import coremltools as ct
import torch
import torchvision

# Load pytorch model
config = [[[3, 1], [5, 1]], [[3, 1], [3, 1]],
              [[3, 1], [5, 1]], [[3, 1], [3, 1]], [[5, 1], [3, 2]], [[5, 2], [3, 4]],
              [[3, 1], [3, 1]], [[5, 1], [5, 1]], [[3, 2], [3, 4]], [[3, 1], [5, 2]]]
model = SINet(classes=2, p=2, q=8, config=config,
                  chnn=1)

model.load_state_dict(torch.load('/content/model_296.pth'))
model.eval()

# Get a pytorch model and save it as a *.pt file
pytorch_model = model
pytorch_model.eval()
example_input = torch.rand(1, 3, 320, 320)
traced_model = torch.jit.trace(pytorch_model, example_input)
traced_model.save("sinet.pt")

# Convert the saved PyTorch model to Core ML
mlmodel = ct.convert("sinet.pt",
                    inputs=[ct.TensorType(shape=(1, 3, 320, 320))])

# Save the coreml model
mlmodel.save("SINet.mlmodel")

**Note:** The converter seems to throw an error - '**raise ValueError("x should be at least rank 3")** ' due to prelu activation dimension mismatch. However if we commnet out the lines 270-278( function: type_inference(self)) in the file: /usr/local/lib/python3.6/dist-packages/coremltools/converters/mil/mil/ops/defs/activation.py., the conversion was successful and the **mlmodel** was saved. Unfortunately we cannot test the model(prediction) on a linux system; it requires macOS.

**References:-**

1.   https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md
2.   https://github.com/daquexian/onnx-simplifier
3. https://github.com/microsoft/onnxruntime/blob/master/docs/python/tutorial.rst#step-3-load-and-run-the-model-using-onnx-runtime
4. https://github.com/clovaai/ext_portrait_segmentation
5. https://coremltools.readme.io/docs/introductory-quickstart