Konduit: Enterprise Runtime for Machine Learning Models
Konduit is a serving system and framework focused on deploying machine learning pipelines to production. The core abstraction is an idea called a "pipeline step". An individual step is meant to perform a task as part of using a machine learning model in a deployment. These steps generally include:
- Pre- or post-processing steps
- One or more machine learning models
- Transforming the output in a way that can be understood by humans, such as labels in a classification example.
For instance, if you want to run arbitrary Python code for pre-processing purposes,
you can use a
PythonPipelineStep. To perform inference on a (mix of) TensorFlow,
Keras, DL4J or PMML models, use
ModelPipelineStep. Konduit Serving also contains
functionality for other preprocessing tasks, such as DataVec transform processes,
or image transforms.
Konduit was built with the goal of providing proper low level interop with native math libraries such as TensorFlow and our very own DL4J's core math library libnd4j.
Konduit Serving (like some of the other libraries of a similar concept such as Seldon or MLflow) provides building blocks for developers to write their own production ML pipelines from pre-processing to model serving, exposable as a simple REST API.
Combining JavaCPP's low-level access to C-like apis from Java, with Java's robust server side application development (vertx on top of netty) allows for better access to faster math code in production while minimizing the surface area where native code = more security flaws (mainly in server side networked applications)
This allows us to do things like in zero-copy memory access of NumPy arrays or Arrow records for consumption straight from the server without copy or serialization overhead.
When dealing with deep learning, we can handle proper inference on the GPU (batching large workloads)
Extending that to Python SDK, we know when to return a raw Arrow record and return it as a pandas DataFrame!
We also strive to provide a Python-first SDK that makes it easy to integrate pipelines into a Python-first workflow.
Optionally, for the Java community, a vertx based model server and pipeline development framework allow a thin abstraction that is embeddable in a Java microservice.
Finally, we want to expose modern standards for monitoring everything from your GPU to your inference time.
Finally, we aim to provide integrations with more enterprise platforms typically seen outside the big data space.
python subdirectory for our Python SDK.
Configuring server on startup
Upon startup, the server loads a
config.yaml file specified by
the user. If the user specifies a YAML file, it is converted to a
that is then loaded by vertx.
This gets loaded in to an InferenceConfiguration which just contains a list of pipeline steps. Configuring the steps is relative to the implementation.
A small list (but not all!) of possible implementations can be found here.
An individual agent is a Java process that gets managed by a KonduitServingMain.
Outside of the pipeline components themselves, the main configuration is a ServingConfig which contains information such as the expected port to start the server on, and the host to listen on (default localhost).
If you want your model server to listen on the public internet, please use
Port configuration varies relative to your type of packaging (for example, in Docker, it may not matter) because the port is already mapped by Docker instead.
From there, your pipeline may run in to issues such as memory, or warm up issues.
When dealing with either, there are generally a few considerations:
Warmup time for Python scripts (sometimes your Python script may require warming up the interpreter). Summary of this is: depending on what your Python script does when running the Python server, you may want to consider sending a warmup request to your application to obtain normal usage.
Python path: When using the Python step runner, an additional Anaconda distribution may be required for custom Python script execution. An end to end example can be found in the docker directory.
For monitoring, your server has an automatic
/metricsendpoint built in that is pollable by Prometheus or something that can parse the Prometheus format.
A PID file automatically gets written upon startup. Overload the locaion with
Logging is done via logback. Depending on your application, you may want to over ride how the logging works. This can be done by overriding the default
Configurations can be downloaded from the internet! Vertx supports different ways of configuring different configuration providers. HTTP (without auth) and file are supported by default. For more on this, please see the official vertx docs and bundle your custom configuration provider within the built uber jar. If your favorite configuration provider isn't supported, please file an issue.
Timeouts: Sometimes work execution may take longer. If this is the case, please consider looking at the
Other vertx arguments: Due to this being a vertx application at its core, other vertx JVM arguments will also work. We specify a few that are important for our specific application (such as file upload directories for binary files) in the KonduitServingMain but allow vertx arguments as well for startup.
For your specific application, consider using the built-in monitoring capabilities for both CPU and GPU memory to identify what your ideal pipelines configuration should look like under load.
The core intended workflow is a few simple steps:
- Configure a server setting up:
InputTypes of your pipeline
OutputTypes of your pipeline
ServingConfigurationcontaining things like host and port information
- A series of
PipelineSteps that represent what steps a deployed pipeline should perform
- Configure a client to connect to the server.
In order to build pipelines, you need to configure:
- Chip (
- OS (
- Way of packaging (
- Modules to include for your pipeline steps(
-D is a JVM argument and and -P is a Maven profile. Below we specify the requirements for each configuration.
Konduit Serving can run on a wide variety of chips including:
- ARM (experimental):
Supported operating systems include:
Untested but should work (please let us know if you would like to try setting this up!):
- IOS via gluon
Packaging pipelines for a particular operating system typically will depend on the target system's supported chips. For example, we can target Linux with ARM or Intel architecture.
JavaCPP's platform classifier will also work depending only on the targeted chip. For these concerns, we introduced the -Dchip=gpu/cpu/arm argument to the build. This is a thin abstraction over JavaCPP's packaging to handle targeting the right platform automatically.
To further thin out other binaries that maybe included (such as opencv) we may use -Djavacpp.platform directly. This approach is mainly tested with Intel chips right now. For other chips, please file an issue.
These arguments are as follows:
Specifying this can optimize the jar size quite a bit, otherwise you end up with extra operating system specific binaries in the jar. Initial feedback via GitHub Issues is much appreciated!
Konduit Serving packaging works by including all of the needed dependencies relative to the selected profiles/modules desired for inclusion with the package. Output size of the binary depends on a few core variables:
Many of the packaging options depend on the konduit-serving-distro-bom or pipelines bill of materials module. This module contains all of the module inclusion behavior and all of the various dependencies that end up in the output.
All of the modules rely on building an uber jar, and then packaging it in the appropriate platform specific way.
The javacpp.platform jvm argument
The modules included are relative to the Maven profiles Modules are described below
- Standard Uberjar:
- RPM(Centos, RHEL, OpenSuse,..):
- WAR file(Java Servlet Application Servers):
- TAR file:
- Kubernetes: See the helm charts directory for sample charts on building a pipelines module for Kubernetes.
For now, there are no hosted packages beyond what is working in pip at the moment. Hosted repositories for the packaging formats listed above will be published later.
Modules to include
- Python support:
- PMML support:
In order to configure pipelines for your platform, you use a Maven-based build profile. An example running on CPU:
./mvnw -Ppython -Ppmml -Dchip=cpu -Djavacpp.platform=windows-x86_64 -Puberjar clean install -Dmaven.test.skip=true
This will automatically download and setup a pipelines uber jar file (see the uber jar sub directory) containing all dependencies needed to run the platform.
The output will be in the target directory of whatever packaging mechanism you specify (docker, tar, ..)
For example if you build an uber jar, you need to use the -Puberjar profile, and the output will be found in model-server-uber-jar/target.
Custom Pipeline steps
Custom pipeline steps are generally recommended for performance reasons. Nothing is wrong with starting with just a Python step though. Depending on scale, it may not matter.
Running multiple versions of a pipelines server with an orchestrations system with load balancing etc will heavily rely on vertx functionality. Konduit Serving is fairly small in scope right now.
Depending on what the user is looking to do, we could support some built in patterns in the future (for example basic load balanced pipelines).
Vertx itself allows for different patterns that could be implemented in either vertx itself or in Kubernetes.
Cluster management is also possible using one of several cluster node managers allowing a concept of node membership. Communication with multiple nodes or processes happens over the vertx event bus. Examples can be found here for how to send messages between instances.
A recommended architecture for fault tolerance is to have an API gateway + load balancer setup with multiple versions of the same pipeline on a named endpoint. That named endpoint would represent a load balanced pipeline instance where 1 of many pipelines maybe served.
In a proper cluster, you would address each instance (an InferenceVerticle in this case representing a worker)
For configuration, we recommend versioning all of your assets that are needed alongside
config.json in something like a bundle where you can download each versioned asset
with its associated configuration and model and start the associated instances from that.
Reference KonduitServingMain for an example of the single node use case.
We will add clustering support based on these ideas at a later date. Please file an issue if you have specific questions in trying to get a cluster setup.
Every module in this repo is licensed under the terms of the Apache license 2.0, save for
konduit-serving-pmml which is agpl to comply with the JPMML license.