# Deployment (High-Level) Solution

## What? Break the Dependency

__How? Export the Model and Move It__

This brings us, for better or worse to a choice of what format to use.

As always, it's a compromise to accommodate ease of use, performance, portability, "openness," etc.

## 0. Amalgamation

The simplest form of export is "amalgamation" ... in which case the model and all necessary code to run are emitted as one big chunk.

In some cases, it's a single source code file that can be compiled on nearly any platform as a standalone program.
  * Classic amalgamation: MXNet + model code https://mxnet.apache.org/faq/smart_device.html#amalgamation-making-the-whole-system-a-single-file

In other cases, it's a chunk of consumable IR code that can be consumed in a common runtime:
  * H2O POJO export https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/productionizing.rst#pojo-quick-start
  * TVM IR https://docs.tvm.ai/tutorials/cross_compilation_and_rpc.html
    * If we haven't already looked at it, make a mental note to explore TVM at some point: https://tvm.ai/about

And sometimes ... it's a coder implementing a model by hand and compiling it! (For simple, popular models, like linear/logistic regression, it's pretty easy once you have the model params.)

__Pros and Cons__

Pros:
* Easy-to-understand concept
* Fairly portable
* Can be compact and performant
  * May be a good choice for extremely constrained embedded environments

Cons:
* Not interoperable with other high-level environments
* Not easily human readable, diffable, manageable in a CMS or version control
* Violates separation of code from data
* May not fit in well with enterprise manageability and operations needs

## 1. Single-Product Format

I.e., a format which serves a specific product ecosystem, but is not intended to interoperate with other systems nor serve as a "standard"

*Examples:*

__SparkML + MLeap__
* MLeap supports Spark, some scikit-learn models and some TensorFlow models
* Represents models in a "MLeap Bundle"
* MLeap runtime is a JAR that can run in any Java application (or by with a lightweight scoring wrapper provided by MLeap)
  
__TensorFlow + TensorFlow Serving__
* TensorFlow models (created directly with TensorFlow or with Keras) serialize to a TF-specific protocol buffer representation
* TensorFlow Serving loads the latest version of a model
    * TF Serving exposes a gRPC service and, in the latest version, a REST endpoint

__TensorFlow + FlatBuffers + TFLite__
* FlatBuffers is an "open" format with multiple collaborators
* Targets iOS and Android

## 2. Existing Standard Format: PMML

<img src="https://materials.s3.amazonaws.com/i/PMML_Logo.png">

### PMML has existed for over 20 years, and is used widely throughout the world. 

It has many advantages, but is not perfect.

Pros:
* In wide use / well-accepted / large community
* Core XML dialect can be human readable
* Models can be processed/managed by text-based tools (VCS/CMS/etc.)
* Covers the majority of modeling cases companies use today
* *Formally* interoperable (reading/writing the container file format)

Cons:
* Support for production of models in the open-source world is spotty
* Support for consuming models in the OSS is sparse/minimal
* Importance of modern open-source tooling has been dragging PMML down
* Some modern model types and pipelines are not supported, or not supported efficiently/compactly
* *Semantic* interop is only marginally existent

In practice, PMML -- even with commercial/enterprise, supported products -- is more like USB C than USB 3. 

I.e., like USB C, it's very versatile in theory, and the plug always fits, but that tells you little or nothing about whether the two devices connected can have any conversation, let alone the specific conversation you need them to have.

Despite its imperfections, it has many advantages over single-product formats, so we often use it even if it cannot fulfil a promise of being the "universal" tool.

__Example__

Here is an example of a logistic regression classifier trained using R on the Iris dataset:

(http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/IrisMultinomReg.xml)

<img src="https://materials.s3.amazonaws.com/i/UFJlBqq.png" width=1000>

### Where do we get a PMML model?

A partial list of products supporting PMML is at http://dmg.org/pmml/products.html

Focusing on the *producing PMML* side, we can see there are a lot of products that can create PMML, even if most of them are commercial or have effectively commercial licensing schemes (e.g. JPMML).

In the open-source world (again, excluding AGPL code like JPMML), we have
* R -- strongest open-source export support
* Spark -- very limited support: the listed models are only supported under the *old/deprecated* RDD MLlib API
  * There is work in progress to add PMML export to the new API but it has just begun and may not make progress
* Python -- aside from the wrapper around the above-mentioned JPMML, the best option today is
  * https://nyoka-pmml.github.io/nyoka/index.html
  
It is important to note that
* although there are plenty of commercial products with at least some PMML support
* and although large enterprises can (and for support/legal reasons prefer to) pay for a product
* the lack of openness and community is leaving commercial-only ML tooling far behind
  * e.g., all of the top deep learning tools are FOSS
  * this means most of the performance-focused work is tied to the FOSS tools
  * scaling is owned by FOSS (kubeflow, Horovod, etc.)

### How do we run a PMML model?

Permissive OSS support for running PMML models is effectively nonexistent, so we need architect in tandem with business decisions around a vendor's analytics server product. These business decisions will go beyond the licensing and support, because they will affect all of our enterprise architectures: hardware, network, software, managment/monitoring/operations, reliability/contiuity, compliance etc.

However, we can make use of the AGPL code in JPMML for demonstration purposes.

#### JPMML

JPMML (https://github.com/jpmml) is a set of AGPL OSS projects that 
* form the de facto Java implementation of PMML
* offer interop with key FOSS tools like Apache Spark, R, Scikit-learn, XGBoost, TensorFlow, etc.
* provide easy scoring in your own apps, or using a "scoring wrapper" or hosted in the cloud
* is maintained and licensed in connection with https://openscoring.io/ 
* *note: there is an older, abandoned, version of JPMML under a more friendly Apache 2.0 license*
  * this older version has many features and might be suitable for some organizations with a higher risk/ownership appetite
  * https://github.com/jpmml/jpmml

## 3. Next-Gen Standard Format: PFA

<img src="https://materials.s3.amazonaws.com/i/PFA_Logo-200x200.png">

#### PFA (Portable Format for Analytics) is a Modern Replacement for PMML

##### "As data analyses mature, they must be hardened — they must have fewer dependencies, a more maintainable structure, and they must be robust against errors." - DMG

PFA, created in 2015, is intended to improve upon PMML.

From http://dmg.org/pfa/docs/motivation/:

*Tools such as Hadoop and Storm provide automated data pipelines, separating the data flow from the functions that are performed on data (mappers and reducers in Hadoop, spouts and bolts in Storm). Ordinarily, these functions are written in code that has access to the pipeline internals, the host operating system, the remote filesystem, the network, etc. However, all they should do is math.*

*PFA completes the abstraction by encapsulating these functions as PFA documents. From the point of view of the pipeline system, the documents are configuration files that may be loaded or replaced independently of the pipeline code.*

*This separation of concerns allows the data analysis to evolve independently of the pipeline. Since scoring engines written in PFA are not capable of accessing or manipulating their environment, they cannot jeopardize the production system. Data analysts can focus on the mathematical correctness of their algorithms and security reviews are only needed when the pipeline itself changes.*

*This decoupling is important because statistical models usually change more quickly than pipeline frameworks. Model details are often tweaked in response to discoveries about the data and models frequently need to be refreshed with new training samples.*

<img src="https://materials.s3.amazonaws.com/i/KuQPUbx.png" width=800>

(summarized from DMG)

#### Overview of PFA capabilities

PFA flexibility:
* Control structures, such as conditionals, loops, and user-defined functions
* Entirely expressed within JSON, and can therefore be easily generated and manipulated by other programs
* Fine-grained function library supporting extensibility callbacks
* Scoring engines can share data or update external variables, such as entries in a database.

The following contribute to PFA’s safety:

* Strict numerical compatibility: the same PFA document and the same input results in the same output, regardless of platform.
* Spec only defines functions that transform data. I/O is all controlled by the host system.
* Type system that can be statically checked. ... This system has a type-safe null and PFA only performs type-safe casting, which ensure that missing data never cause run-time errors.
* The callbacks that generalize PFA’s statistical models are not first-class functions
  * The set of functions that a PFA document might call can be predicted before it runs
  * A PFA host may choose to only allow certain functions.
* The semantics of shared data guarantee that data are never corrupted by concurrent access and scoring engines do not enter deadlock. 
  * The host can also statically determine which shared variables may be modified by a scoring engine, rather than at run-time.

__Example__

Here are some data records:

<img src="https://materials.s3.amazonaws.com/i/vsvToXy.png" width=600>

And a PFA document which returns the square-root of the sum of the squares of a record's x, y, and z values:

<img src="https://materials.s3.amazonaws.com/i/tIlag9o.png" width=600>

The above example -- along with numerous other tutorials -- can be viewed, *modified*, and run live online at http://dmg.org/pfa/docs/tutorial2/ and other dmg.org pages.

Although it may not be obvious from this small example, PFA is effectively a programming language, albeit a restricted one, and as such can express complex transformations and aggregations of data. The PFA document is a serialized representation or description of a scoring engine, of which one or more instances can be created by a runtime.

That said, it is still intended to be a machine-generated and machine-consumed document.