# Deploy Machine Learning Projects in Production with Open Standard Models

## Setting the Scene: Core Problem

In the context of machine learning...

* __Training__ is the process of obtaining a (hopefully useful) machine learning model from data
* __Inference__ or __Prediction__ or __Scoring__ is the process of using a model to obtain a __score__ for some new data encountered in your business or research
* Sometimes (e.g., in __online learning__ use cases) the same system is performing training and inference at approximately the same time

In this session, we will __not__ focus on training (or data preparation, cleaning, model tuning, selection, etc.) 

There are lots of great resources focusing on that phase of work, and we're going to assume that you already have a model you're happy with (or at least a process for creating those models).

So you've trained and tuned a model, done some validation tests to ensure it generates appropriate predictions on new data, and you want to deploy this model into an enterprise-scale inference service, where it can deliver predictions for the business. 

> For example, you've trained a great recommender system, and *now it needs to be exposed as a scalable service* consumed by your company's online-store app, which will send carts of products to that recommender, and receive back recommended products to offer  customers.

## Open Source Pre-History: Proprietary Inference Servers

Why is this a challenge today?

For a long time, businesses using machine learning employed proprietary tools like SAS, SPSS, and FICO to perform modeling.

Many of these products and vendors licensed proprietary "model servers" or "inference servers" which were created specifically to take models and expose them elsewhere in the IT infrastructure as a service.

If your company was a customer of these products, the enterprise "solution" included both the data mining tools (modeling) and the serving tools (inference).

## The Rise of Open Source: Stone Age

As open-source data science tools rose in prominence over the last decade, more data scientists, statisticians, researchers, and analysts began relying on
* Python
    * SciPy stack
    * scitkit-learn
    * TensorFlow
    * etc.
* R
    * dplyr
    * ggplot2
    * etc.
* Spark, H2O, others...

As we've all seen, the cycle of research, development, publication, and open-source tooling has led to a huge explosion of data-driven uses throughout the world.

__But__ none of those tools had a clear, complete story for how to deploy a model once it was trained.

So engineers carved out the *Stone-Age Solution* ... namely, attempt to wrap the data science stack in a lightweight web service framework, and put it into production.

The classic example is a Python [Flask](https://en.wikipedia.org/wiki/Flask_(web_framework)) web endpoint that wraps a call to scikit-learn's `model.predict(...)`

Before discussing the many drawbacks of this approach, let's quickly review...

## Open Source: Bronze Age

Since model inference is typically lightweight, stateless, and idempotent, it is an ideal candidate for a scale-out containerized service using a container scaling framework like Kubernetes.

The "Bronze Age" of open-source model deployment containerized the Stone Age approach, making it easy to scale, robust, etc.

Containerization was definitely an improvement ...

## Open Source: Platform Gold Rush

<img src='images/gold.jpg'>

Businesses realized that they wanted enterprise manageability over these ML inference services ... 

And a lot of entrepreneurs realized that making money in novel ML training was hard (after all, thousands of Ph.D. researchers were working on the same problems, and giving the results away for free) ... but making a "platform" that
* Dockerized open-source ML stacks
* Deployed them on-prem or in the cloud via Kubernetes
* and provided some manageability ("ML Ops") 
was both easy and lucrative.

### 2018-2019 will go down as the ML Ops Gold Rush

And, as in the California Gold Rush, it has been easier selling tools and services than finding actual gold.

__ML deployment platforms *do* have value to offer__ and we'll come back to that part. But first we need to focus on the Achilles heel, namely Dockerized data science stacks.

## You Wouldn't Deploy an Enterprise Service by Putting Your Dev Machine in the Datacenter

<img src='images/dock.jpg'>

## So Why Would You Deploy a ML Service by Putting the ML Stack in a Container?

"It works (for now)" is about the best thing you can say about such a deployment.

Meanwhile, how do we address...

* model inspection
* versioning
* diffing model versions
* porting to other environments (e.g., ARM vs. Intel or mobile vs. client vs. server)
* using the model in an alternate runtime (e.g., a scikit-learn model in a Spark job)
* using models *from* an alternate runtime (e.g., Spark cannot natively export a ML pipeline to a containerizable service)
* updating dependencies (e.g., patching a security vulnerability in underlying components https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-6446)
* not to mention lots of design issues like...
    * Why should a ML model (which is typically a limited set of math operations) be deployed as a full computing stack and environment?
    * Why should we use an enormous container and billions of compute cycles to perform arithmetic and a bit of trigonometry?
    
### Fundamentally, Containerizing a ML Stack (with Model) Violates Separation of Concerns

Consider: we send each other plain-text emails, which can be written and read according to standard text encodings, rather than, say, the absurd idea of sending executable VM Images with an OS and a word processor, along with document in the word processor's proprietary format

The ML model, once trained, can be viewed as data. 

It should be possible to 
* manage this data using standard, well known data-management tools and practices
* create this data using any compliant tool
* consume this data using any compliant tool
* validate that this data has a single universal interpretation
    * why is this important? consider the impact of tiny difference in implementation of, say, *ln(x)* on inference at scale

## How Can We Address this Separation-of-Concerns Problem?

__Get the model *out* of the model-creation environment (both logically and physically)__

Physically: create a separate entity like a file

Logically: ensure that entity is independent -- so saving a scikit-learn model as a pickle file (which will later need scikit-learn after being unpickled) does not count as a solution


## What Kind of Separate File Do We Want?

Ideally, we'd like a format that is ...
* an open, cross-industry standard
* not owned or controlled by any one organization
* not encumbered by intellectual property restrictions (licensing rules)
* works with many ML tools, on many platforms
* time/space efficient
* robust (can support many kinds of ML models, including future types)
* consistent (produces the same output for the same model, no matter the deployment OS, architecture, etc.)
* simple (does not support unnecessary operations)
* secure (minimizes attack surface by design, offers verifiability, etc)
* is human readable (or can be made human readable)
* can be managed in any database, content-management system, source control, etc.

*As in most engineering scenarios, there is no single, magical solution that hits every bullet-point*

But there are number of approaches which offer many of these attributes and which are worthy of consideration.

This session looks at several of these tools.