Skip to content
Enterprise runtime for machine learning models
Java Python Jupyter Notebook PureBasic Shell Dockerfile Smarty
Branch: master
Clone or download
agibsonccc Old code cleanup (#73)
* Update for json python tests

* Get rid of UploadHandler interface

Unused abstraction.

* Get rid of other old code

* More clean up

Adds sme extra javadoc to different pipeline runners and removes a few more un used classes.

* Javadoc pass

* Python execution cleanup

Cleans up python execution and javadoc a bit

* Fixes #22

* Get rid of old array transforms (in favor of pipeline step composition created later)

* Bit of python refactoring and updates for arrow
Latest commit 5a78fc2 Nov 17, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.mvn/wrapper update Oct 16, 2019
docker Fixing broken links (#32) Nov 1, 2019
helm-charts fixed KonduitServerMain to KonduitServingMain Oct 29, 2019
konduit-serving-api Old code cleanup (#73) Nov 17, 2019
konduit-serving-codegen Remove pubsub configuration Nov 12, 2019
konduit-serving-converter Reflection API changes to tests. Also, fixing deprecated APIs Nov 14, 2019
konduit-serving-core Old code cleanup (#73) Nov 17, 2019
konduit-serving-deb update Oct 16, 2019
konduit-serving-distro-bom update Oct 16, 2019
konduit-serving-docker update Oct 16, 2019
konduit-serving-examples Added examples for showing new API changes. Nov 14, 2019
konduit-serving-exe fixed KonduitServerMain to KonduitServingMain Oct 29, 2019
konduit-serving-gpu update Oct 16, 2019
konduit-serving-native openblas deps Nov 12, 2019
konduit-serving-orchestration WIP: Multi node clustering support for Konduit (#9) Oct 29, 2019
konduit-serving-pmml openblas deps Nov 12, 2019
konduit-serving-python Old code cleanup (#73) Nov 17, 2019
konduit-serving-rpm update Oct 16, 2019
konduit-serving-tar update Oct 16, 2019
konduit-serving-test Old code cleanup (#73) Nov 17, 2019
konduit-serving-uberjar fixed KonduitServerMain to KonduitServingMain Oct 29, 2019
konduit-serving-war update Oct 16, 2019
python feedback Nov 14, 2019
.gitignore Removed unnecessary target files (#64) Nov 16, 2019
Jenkinsfile Add CI for konduit-serving project (#63) Nov 15, 2019
LICENSE update Oct 16, 2019
README.md Update SonarCloud project ID Nov 15, 2019
build_client.sh update Oct 16, 2019
build_jar.py make build jar tool work in more general settings Nov 6, 2019
konduit-logo.jpg update Oct 16, 2019
konduit.png update Oct 16, 2019
mvnw disentangle serving config (#14) Oct 22, 2019
mvnw.cmd update Oct 16, 2019
pom.xml Old code cleanup (#73) Nov 17, 2019
startserver.sh update Oct 16, 2019

README.md

Konduit: Enterprise Runtime for Machine Learning Models


Overview

Konduit is a serving system and framework focused on deploying machine learning pipelines to production. The core abstraction is an idea called a "pipeline step". An individual step is meant to perform a task as part of using a machine learning model in a deployment. These steps generally include:

  1. Pre- or post-processing steps
  2. One or more machine learning models
  3. Transforming the output in a way that can be understood by humans, such as labels in a classification example.

For instance, if you want to run arbitrary Python code for pre-processing purposes, you can use aPythonPipelineStep. To perform inference on a (mix of) TensorFlow, Keras, DL4J or PMML models, use ModelPipelineStep. Konduit Serving also contains functionality for other preprocessing tasks, such as DataVec transform processes, or image transforms.

Why Konduit

Konduit was built with the goal of providing proper low level interop with native math libraries such as TensorFlow and our very own DL4J's core math library libnd4j.

At the core of pipelines are the JavaCPP Presets, vertx and Deeplearning4j for running Keras models in Java.

Konduit Serving (like some of the other libraries of a similar concept such as Seldon or MLflow) provides building blocks for developers to write their own production ML pipelines from pre-processing to model serving, exposable as a simple REST API.

Combining JavaCPP's low-level access to C-like apis from Java, with Java's robust server side application development (vertx on top of netty) allows for better access to faster math code in production while minimizing the surface area where native code = more security flaws (mainly in server side networked applications)

This allows us to do things like in zero-copy memory access of NumPy arrays or Arrow records for consumption straight from the server without copy or serialization overhead.

When dealing with deep learning, we can handle proper inference on the GPU (batching large workloads)

Extending that to Python SDK, we know when to return a raw Arrow record and return it as a pandas DataFrame!

We also strive to provide a Python-first SDK that makes it easy to integrate pipelines into a Python-first workflow.

Optionally, for the Java community, a vertx based model server and pipeline development framework allow a thin abstraction that is embeddable in a Java microservice.

Finally, we want to expose modern standards for monitoring everything from your GPU to your inference time.

Visualization can happen with applications such as Grafana or anything that integrates with the Prometheus standard for visualizing data.

Finally, we aim to provide integrations with more enterprise platforms typically seen outside the big data space.

Usage

Python SDK

See the python subdirectory for our Python SDK.

Configuring server on startup

Upon startup, the server loads a config.json or config.yaml file specified by the user. If the user specifies a YAML file, it is converted to a config.json that is then loaded by vertx.

This gets loaded in to an InferenceConfiguration which just contains a list of pipeline steps. Configuring the steps is relative to the implementation.

A small list (but not all!) of possible implementations can be found here.

An individual agent is a Java process that gets managed by a KonduitServingMain.

Outside of the pipeline components themselves, the main configuration is a ServingConfig which contains information such as the expected port to start the server on, and the host to listen on (default localhost).

If you want your model server to listen on the public internet, please use 0.0.0.0 instead.

Port configuration varies relative to your type of packaging (for example, in Docker, it may not matter) because the port is already mapped by Docker instead.

From there, your pipeline may run in to issues such as memory, or warm up issues.

When dealing with either, there are generally a few considerations:

  1. Off heap memory in JavaCPP/DL4J

  2. Warmup time for Python scripts (sometimes your Python script may require warming up the interpreter). Summary of this is: depending on what your Python script does when running the Python server, you may want to consider sending a warmup request to your application to obtain normal usage.

  3. Python path: When using the Python step runner, an additional Anaconda distribution may be required for custom Python script execution. An end to end example can be found in the docker directory.

  4. For monitoring, your server has an automatic /metrics endpoint built in that is pollable by Prometheus or something that can parse the Prometheus format.

  5. A PID file automatically gets written upon startup. Overload the locaion with --pidFile=.....

  6. Logging is done via logback. Depending on your application, you may want to over ride how the logging works. This can be done by overriding the default logback.xml file

  7. Configurations can be downloaded from the internet! Vertx supports different ways of configuring different configuration providers. HTTP (without auth) and file are supported by default. For more on this, please see the official vertx docs and bundle your custom configuration provider within the built uber jar. If your favorite configuration provider isn't supported, please file an issue.

  8. Timeouts: Sometimes work execution may take longer. If this is the case, please consider looking at the --eventLoopTimeout and --eventLoopExecutionTimeout arguments.

  9. Other vertx arguments: Due to this being a vertx application at its core, other vertx JVM arguments will also work. We specify a few that are important for our specific application (such as file upload directories for binary files) in the KonduitServingMain but allow vertx arguments as well for startup.

For your specific application, consider using the built-in monitoring capabilities for both CPU and GPU memory to identify what your ideal pipelines configuration should look like under load.

Core workflow:

The core intended workflow is a few simple steps:

  1. Configure a server setting up:
  • InputTypes of your pipeline
  • OutputTypes of your pipeline
  • A ServingConfiguration containing things like host and port information
  • A series of PipelineSteps that represent what steps a deployed pipeline should perform
  1. Configure a client to connect to the server.

Building/Installation

Dependencies:

  1. JDK 8 is preferred.
  2. mvnw will download and setup Maven automatically

In order to build pipelines, you need to configure:

  1. Chip (-Dchip=YOURCHIP)
  2. OS (-Djavacpp.platform=YOUR PLATFORM)
  3. Way of packaging (-P<YOUR-PACKAGING>)
  4. Modules to include for your pipeline steps(-P<MODULE-TO-INCLUDE>)

-D is a JVM argument and and -P is a Maven profile. Below we specify the requirements for each configuration.

Chips

Konduit Serving can run on a wide variety of chips including:

  • ARM (experimental): -Dchip=arm
  • Intel/X86: -Dchip=cpu
  • CUDA: -Dchip=gpu

Operating systems

Supported operating systems include:

  • Linux
  • Mac
  • Windows

Untested but should work (please let us know if you would like to try setting this up!):

Packaging pipelines for a particular operating system typically will depend on the target system's supported chips. For example, we can target Linux with ARM or Intel architecture.

JavaCPP's platform classifier will also work depending only on the targeted chip. For these concerns, we introduced the -Dchip=gpu/cpu/arm argument to the build. This is a thin abstraction over JavaCPP's packaging to handle targeting the right platform automatically.

To further thin out other binaries that maybe included (such as opencv) we may use -Djavacpp.platform directly. This approach is mainly tested with Intel chips right now. For other chips, please file an issue.

These arguments are as follows:

  1. -Djavacpp.platform=windows-x86_64
  2. -Djavacpp.platform=linux-x86_64
  3. -Djavacpp.platform=macosx-x86_64

Specifying this can optimize the jar size quite a bit, otherwise you end up with extra operating system specific binaries in the jar. Initial feedback via GitHub Issues is much appreciated!

Packaging options

Konduit Serving packaging works by including all of the needed dependencies relative to the selected profiles/modules desired for inclusion with the package. Output size of the binary depends on a few core variables:

Many of the packaging options depend on the konduit-serving-distro-bom or pipelines bill of materials module. This module contains all of the module inclusion behavior and all of the various dependencies that end up in the output.

All of the modules rely on building an uber jar, and then packaging it in the appropriate platform specific way.

  1. The javacpp.platform jvm argument

  2. The modules included are relative to the Maven profiles Modules are described below

  • Standard Uberjar: -Puberjar
  • Debian/Ubuntu: -Pdeb
  • RPM(Centos, RHEL, OpenSuse,..): -Prpm
  • Docker: -Pdocker
  • WAR file(Java Servlet Application Servers): -Pwar
  • TAR file: -Ptar
  • Kubernetes: See the helm charts directory for sample charts on building a pipelines module for Kubernetes.

For now, there are no hosted packages beyond what is working in pip at the moment. Hosted repositories for the packaging formats listed above will be published later.

Modules to include

  • Python support: -Ppython
  • PMML support: -Ppmml

In order to configure pipelines for your platform, you use a Maven-based build profile. An example running on CPU:

./mvnw -Ppython -Ppmml -Dchip=cpu -Djavacpp.platform=windows-x86_64 -Puberjar clean install -Dmaven.test.skip=true

This will automatically download and setup a pipelines uber jar file (see the uber jar sub directory) containing all dependencies needed to run the platform.

The output will be in the target directory of whatever packaging mechanism you specify (docker, tar, ..)

For example if you build an uber jar, you need to use the -Puberjar profile, and the output will be found in model-server-uber-jar/target.

Custom Pipeline steps

Konduit Serving supports customization via 2 ways: Python code or implementing your own PipelineStep via the CustomPipelineStep and associated PipelineStepRunner in Java.

Custom pipeline steps are generally recommended for performance reasons. Nothing is wrong with starting with just a Python step though. Depending on scale, it may not matter.

Orchestration

Running multiple versions of a pipelines server with an orchestrations system with load balancing etc will heavily rely on vertx functionality. Konduit Serving is fairly small in scope right now.

Vertx has support for many different kinds of typical clustering patterns such as an API gateway, circuit breaker

Depending on what the user is looking to do, we could support some built in patterns in the future (for example basic load balanced pipelines).

Vertx itself allows for different patterns that could be implemented in either vertx itself or in Kubernetes.

Cluster management is also possible using one of several cluster node managers allowing a concept of node membership. Communication with multiple nodes or processes happens over the vertx event bus. Examples can be found here for how to send messages between instances.

A recommended architecture for fault tolerance is to have an API gateway + load balancer setup with multiple versions of the same pipeline on a named endpoint. That named endpoint would represent a load balanced pipeline instance where 1 of many pipelines maybe served.

In a proper cluster, you would address each instance (an InferenceVerticle in this case representing a worker) as: /pipeline1/some/inference/endpoint

For configuration, we recommend versioning all of your assets that are needed alongside the config.json in something like a bundle where you can download each versioned asset with its associated configuration and model and start the associated instances from that.

Reference KonduitServingMain for an example of the single node use case.

We will add clustering support based on these ideas at a later date. Please file an issue if you have specific questions in trying to get a cluster setup.

License

Every module in this repo is licensed under the terms of the Apache license 2.0, save for konduit-serving-pmml which is agpl to comply with the JPMML license.

You can’t perform that action at this time.