Skip to content

haritsahm/cpp-ml-server

Repository files navigation

CPP Machine Learning Web Server

This is an example to deploy image classification model using libasyik, a web services that runs on C++ and Boost libraries. Usually, Flask or Ray are used to deploy machine learning model as a web service endpoint. This repository show how to use deploy machine learning model using C++ and expose the endpoint for use.

This example also shows how to integrate Triton Inference Server using its C++ API Client. We can coupled the web services with onnxruntime to process the incoming data inside the proces. But I want to try the decoupled method to split the model inference process and web services that accept incoming data using REST API.

Build Docker

// Build from Dockerfile
docker build -t cpp-ml-server:1.2.0-tris . --build-arg ENGINE_TYPE=triton
docker build -t cpp-ml-server:1.2.0-ort . --build-arg ENGINE_TYPE=onnxrt
docker build -t cpp-ml-server:1.2.0-all . --build-arg ENGINE_TYPE=all

// Pull from Docker Registry
docker pull haritsahm/cpp-ml-server:1.2.0-all
docker pull haritsahm/cpp-ml-server:1.2.0-tris
docker pull haritsahm/cpp-ml-server:1.2.0-ort

Run application

  1. Sync submodules that have the model configurations
git submodule update --init --recursive
  1. Git LFS on submodule
// Just in case the model isn't downloadade
cd triton-ml-server && git lfs pull && cd ..
  1. Docker compose up
# Choose either to run the onnxrt engine or triton engine from the override commands
docker-compose up -d

Examples

import base64
import json

import cv2
import numpy as np
import requests
from PIL import Image

# Tree frog
url = "https://github.com/EliSchwartz/imagenet-sample-images/blob/master/n01644373_tree_frog.JPEG?raw=true"

image = np.array(Image.open(requests.get(url, stream=True).raw))
image_string = base64.b64encode(cv2.imencode('.png', image)[1]).decode('utf-8')

response = requests.post("http://127.0.0.1:8080/classification/image", headers={"Content-Type":"application/json"}, data=json.dumps({"image":image_string}))
print(json.loads(response.text))

TODO

  • Add detailed data validation steps
  • Optimize variables and parameters using pointers
  • Support batched inputs
  • Support coupled inference process using onnxruntime