# Open-Source Amalgamation and Single-Product Formats

## Amalgamation

The simplest form of export is "amalgamation" ... in which case the model and all necessary code to run are emitted as one big chunk.

In some cases, it's a single source code file that can be compiled on nearly any platform as a standalone program.
  * Classic amalgamation: MXNet + model code https://mxnet.incubator.apache.org/faq/smart_device.html#amalgamation-making-the-whole-system-a-single-file

In other cases, it's a chunk of consumable IR code that can be consumed in a common runtime:
  * H2O POJO export https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/productionizing.rst#pojo-quick-start
  * "Python Lambda Format" -- i.e., any single-function-call wrapper around a heap of Python + Python-compatible dependecies
    * *I'm not making this up* (not to say there's no role for this approach, but unlikely in the enterprise ML inference setting)
    * Examples
      * AWS Lambda, Google Cloud Functions, Azure Functions
      * https://fission.io/
      * http://fnproject.io/
      * https://github.com/kubeless/kubeless
      * http://openwhisk.incubator.apache.org/
      * etc.
    
And sometimes ... it's a coder implementing a model by hand and compiling it! (For simple, popular models, like linear/logistic regression, it's pretty easy once you have the model params.)

### What Does this Look Like? Let's Try It...

First, we need a model. So we'll train a quick linear regression on R's Diamonds dataset:

In [None]:
import pandas as pd

data = pd.read_csv('data/diamonds.csv')

X = data.carat
y = data.price

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression().fit(X.values.reshape(-1,1), y)

This is a basic model for predicting price (in dollars) from weight (in carats).

The params are:

In [None]:
model.coef_, model.intercept_

We can create a standalone ("amalgamated") version of this model using [SKompiler](https://pypi.org/project/SKompiler/)

In [None]:
from skompiler import skompile

expr = skompile(model.predict)
expr.to('sqlalchemy/postgresql')

In [None]:
expr.to('sympy/c')

Here's a quick example with a decision tree. First, we'll create the model.

In [None]:
from sklearn.tree import DecisionTreeRegressor

model_tree = DecisionTreeRegressor(max_depth=4).fit(X.values.reshape(-1,1), y)

In [None]:
for line in skompile(model_tree.predict).to('sympy/c').split('\n'):
    print(line)

#### Pros and Cons: Amalgamation

Pros:
* Easy-to-understand concept
* Fairly portable
* Can be compact and performant
  * May be a good choice for extremely constrained embedded environments

Cons:
* Not interoperable with other high-level environments
* Violates separation of code (logic) from data (parameters)
* May not fit in well with enterprise manageability and operations needs

## Open-Source Single-Product Format(s)

I.e., a format which serves a specific product ecosystem, but is not intended to interoperate with other systems nor serve as a "standard"

*Examples:*

__SparkML + MLeap__
  * MLeap supports Spark, some scikit-learn models and some TensorFlow models
  * Represents models in a "MLeap Bundle"
  * MLeap runtime is a JAR that can run in any Java application (or by with a lightweight scoring wrapper provided by MLeap)

__TensorFlow + TensorFlow Serving__
  * TensorFlow models (created directly with TensorFlow or with Keras) serialize to a TF-specific protocol buffer representation
  * TensorFlow Serving loads the latest version of a model
    * TF Serving exposes a gRPC service and a REST endpoint
    
Within the intended ecosystem, we can easily explort and use a model, but we don't get portability across tools/ecosystems.