CPP Machine Learning Web Server

This is an example to deploy image classification model using libasyik, a web services that runs on C++ and Boost libraries. Usually, Flask or Ray are used to deploy machine learning model as a web service endpoint. This repository show how to use deploy machine learning model using C++ and expose the endpoint for use.

This example also shows how to integrate Triton Inference Server using its C++ API Client. We can coupled the web services with onnxruntime to process the incoming data inside the proces. But I want to try the decoupled method to split the model inference process and web services that accept incoming data using REST API.

Build Docker

// Build from Dockerfile
docker build -t cpp-ml-server:1.2.0-tris . --build-arg ENGINE_TYPE=triton
docker build -t cpp-ml-server:1.2.0-ort . --build-arg ENGINE_TYPE=onnxrt
docker build -t cpp-ml-server:1.2.0-all . --build-arg ENGINE_TYPE=all

// Pull from Docker Registry
docker pull haritsahm/cpp-ml-server:1.2.0-all
docker pull haritsahm/cpp-ml-server:1.2.0-tris
docker pull haritsahm/cpp-ml-server:1.2.0-ort

Run application

Sync submodules that have the model configurations

git submodule update --init --recursive

Git LFS on submodule

// Just in case the model isn't downloadade
cd triton-ml-server && git lfs pull && cd ..

Docker compose up

# Choose either to run the onnxrt engine or triton engine from the override commands
docker-compose up -d

Examples

import base64
import json

import cv2
import numpy as np
import requests
from PIL import Image

# Tree frog
url = "https://github.com/EliSchwartz/imagenet-sample-images/blob/master/n01644373_tree_frog.JPEG?raw=true"

image = np.array(Image.open(requests.get(url, stream=True).raw))
image_string = base64.b64encode(cv2.imencode('.png', image)[1]).decode('utf-8')

response = requests.post("http://127.0.0.1:8080/classification/image", headers={"Content-Type":"application/json"}, data=json.dumps({"image":image_string}))
print(json.loads(response.text))

TODO

Add detailed data validation steps
Optimize variables and parameters using pointers
Support batched inputs
Support coupled inference process using onnxruntime

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github/workflows		.github/workflows
examples		examples
include/cpp_server		include/cpp_server
scripts		scripts
src		src
tests		tests
triton-ml-server @ b222fb7		triton-ml-server @ b222fb7
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
codecov.yaml		codecov.yaml
docker-compose.override-onnxrt.yaml.example		docker-compose.override-onnxrt.yaml.example
docker-compose.override-triton.yaml.example		docker-compose.override-triton.yaml.example
docker-compose.yaml		docker-compose.yaml

License

haritsahm/cpp-ml-server

Folders and files

Latest commit

History

Repository files navigation

CPP Machine Learning Web Server

Build Docker

Run application

Examples

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Languages