# Model Management

An MLflow model is a standard format for packaging models that can be used on a variety of downstream tools.  This lesson provides a generalizable way of handling machine learning models created in and deployed to a variety of environments.


 - Introduce model management best practices
 - Store and use different flavors of models for different deployment environments
 - Apply models combined with arbitrary pre and post-processing code using Python models


### Managing Machine Learning Models

Once a model has been trained and bundled with the environment it was trained in...<br><br>

* The next step is to package the model so that it can be used by a variety of serving tools
* Current deployment options include:
   - Container-based REST servers
   - Continuous deployment using Spark streaming
   - Batch
   - Managed cloud platforms such as Azure ML and AWS SageMaker
   
Packaging the final model in a platform-agnostic way offers the most flexibility in deployment options and allows for model reuse across a number of platforms.

**MLflow models is a convention for packaging machine learning models that offers self-contained code, environments, and models.**<br><br>

* The main abstraction in this package is the concept of **flavors** 
  - A flavor is a different ways the model can be used
  - For instance, a TensorFlow model can be loaded as a TensorFlow DAG or as a Python function
  - Using an MLflow model convention allows for both of these flavors
* The difference between projects and models is that models are for inference and serving
* The `python_function` flavor of models gives a generic way of bundling models 
* We can thereby deploy a python function without worrying about the underlying format of the model

**MLflow therefore maps any training framework to any deployment**, massively reducing the complexity of inference.

Arbitrary pre and post-processing steps can be included in the pipeline such as data loading, cleansing, and featurization.  This means that the full pipeline, not just the model, can be preserved.

<div><img src="https://files.training.databricks.com/images/eLearning/ML-Part-4/mlflow-models-enviornments.png" style="height: 400px; margin: 20px"/></div>

Run the following cell to set up our environment.


### Model Flavors

Flavors offer a way of saving models in a way that's agnostic to the training development, making it significantly easier to be used in various deployment options.  The current built-in flavors include the following:<br><br>

* <a href="https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#module-mlflow.pyfunc" target="_blank">mlflow.pyfunc</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.h2o.html#module-mlflow.h2o" target="_blank">mlflow.h2o</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.keras.html#module-mlflow.keras" target="_blank">mlflow.keras</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.pytorch.html#module-mlflow.pytorch" target="_blank">mlflow.pytorch</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html#module-mlflow.sklearn" target="_blank">mlflow.sklearn</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.spark.html#module-mlflow.spark" target="_blank">mlflow.spark</a>
* <a href="https://mlflow.org/docs/latest/python_api/mlflow.tensorflow.html#module-mlflow.tensorflow" target="_blank">mlflow.tensorflow</a>

Models also offer reproducibility since the run ID and the timestamp of the run are preserved as well.

<div><img src="https://files.training.databricks.com/images/eLearning/ML-Part-4/mlflow-models.png" style="height: 400px; margin: 20px"/></div>

Import the data

In [87]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("airbnb-cleaned-mlflow.csv")
X_train, X_test, y_train, y_test = train_test_split(df.drop(["price"], axis=1), df[["price"]].values.ravel(), random_state=42)

Train a random forest model.

In [88]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

rf = RandomForestRegressor(n_estimators=100, max_depth=5)
rf.fit(X_train, y_train)

rf_mse = mean_squared_error(y_test, rf.predict(X_test))

rf_mse

7074.520689859381

Train a neural network.

In [89]:
import tensorflow as tf
tf.set_random_seed(42) # For reproducibility

from keras.models import Sequential
from keras.layers import Dense

nn = Sequential([
  Dense(40, input_dim=21, activation='relu'),
  Dense(20, activation='relu'),
  Dense(1, activation='linear')
])

nn.compile(optimizer="adam", loss="mse")
nn.fit(X_train, y_train, validation_split=.2, epochs=40, verbose=2)

# nn.evaluate(X_test, y_test)
nn_mse = mean_squared_error(y_test, nn.predict(X_test))

nn_mse

Train on 2824 samples, validate on 706 samples
Epoch 1/40
 - 0s - loss: 26583.2805 - val_loss: 21337.8679
Epoch 2/40
 - 0s - loss: 20568.8117 - val_loss: 20016.8772
Epoch 3/40
 - 0s - loss: 19596.8769 - val_loss: 19175.5001
Epoch 4/40
 - 0s - loss: 18805.9316 - val_loss: 18120.7715
Epoch 5/40
 - 0s - loss: 17887.9526 - val_loss: 17522.7208
Epoch 6/40
 - 0s - loss: 16781.1336 - val_loss: 15535.7193
Epoch 7/40
 - 0s - loss: 15431.9440 - val_loss: 13907.2625
Epoch 8/40
 - 0s - loss: 13956.1992 - val_loss: 12105.9886
Epoch 9/40
 - 0s - loss: 12532.7612 - val_loss: 10987.7080
Epoch 10/40
 - 0s - loss: 11715.1897 - val_loss: 11692.6057
Epoch 11/40
 - 0s - loss: 11230.3535 - val_loss: 9974.7573
Epoch 12/40
 - 0s - loss: 10685.3257 - val_loss: 9638.7825
Epoch 13/40
 - 0s - loss: 10560.2003 - val_loss: 9571.0626
Epoch 14/40
 - 0s - loss: 10393.8605 - val_loss: 9513.5465
Epoch 15/40
 - 0s - loss: 10302.3213 - val_loss: 9391.6727
Epoch 16/40
 - 0s - loss: 10206.0836 - val_loss: 9311.3108
Epoch 17

8309.21803916579

Now log the two models

In [90]:
import mlflow.sklearn

with mlflow.start_run(run_name="RF Model") as run:
  mlflow.sklearn.log_model(rf, "model")
  mlflow.log_metric("mse", rf_mse)

  sklearnRunID = run.info.run_uuid
  sklearnURI = run.info.artifact_uri
  
  experimentID = run.info.experiment_id

In [91]:
import mlflow.keras

with mlflow.start_run(run_name="NN Model") as run:
  mlflow.keras.log_model(nn, "model")
  mlflow.log_metric("mse", nn_mse)

  kerasRunID = run.info.run_uuid
  kerasURI = run.info.artifact_uri

In [92]:
print(sklearnURI)
print(kerasURI)

/Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/e5d6f0fcccb54aa986d1e1bd4c1581aa/artifacts
/Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/1657ee6477ff495299536516490d1957/artifacts


In [93]:
%%sh
ls -al /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/e5d6f0fcccb54aa986d1e1bd4c1581aa/

total 8
drwxr-xr-x  7 azeltov  staff  224 May 20 14:30 .
drwxr-xr-x  7 azeltov  staff  224 May 20 14:30 ..
drwxr-xr-x  3 azeltov  staff   96 May 20 14:30 artifacts
-rw-r--r--  1 azeltov  staff  449 May 20 14:30 meta.yaml
drwxr-xr-x  3 azeltov  staff   96 May 20 14:30 metrics
drwxr-xr-x  2 azeltov  staff   64 May 20 14:30 params
drwxr-xr-x  5 azeltov  staff  160 May 20 14:30 tags


Look at the model flavors.  Both have their respective `keras` or `sklearn` flavors as well as a `python_function` flavor.

In [109]:
%%sh 
ls -al /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/e5d6f0fcccb54aa986d1e1bd4c1581aa/artifacts/model/
echo "-------------------------"
ls -al /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/1657ee6477ff495299536516490d1957/artifacts/model/


total 800
drwxr-xr-x  5 azeltov  staff     160 May 20 14:30 .
drwxr-xr-x  3 azeltov  staff      96 May 20 14:30 ..
-rw-r--r--  1 azeltov  staff     343 May 20 14:30 MLmodel
-rw-r--r--  1 azeltov  staff     119 May 20 14:30 conda.yaml
-rw-r--r--  1 azeltov  staff  397445 May 20 14:30 model.pkl
-------------------------
total 112
drwxr-xr-x  5 azeltov  staff    160 May 20 14:30 .
drwxr-xr-x  3 azeltov  staff     96 May 20 14:30 ..
-rw-r--r--  1 azeltov  staff    287 May 20 14:30 MLmodel
-rw-r--r--  1 azeltov  staff    102 May 20 14:30 conda.yaml
-rw-r--r--  1 azeltov  staff  48408 May 20 14:30 model.h5


In [110]:
%%sh 
cat /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/e5d6f0fcccb54aa986d1e1bd4c1581aa/artifacts/model/MLmodel

artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.6.6
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.19.1
run_id: e5d6f0fcccb54aa986d1e1bd4c1581aa
utc_time_created: '2019-05-16 15:22:57.380021'


In [None]:
%%sh 
cat /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/7044111e06624a19b48d1e6b96744ba4/artifacts/model/MLmodel


Now we can use both of these models in the same way, even though they were trained by different packages.

In [96]:
import mlflow.pyfunc

rf_pyfunc_model = mlflow.pyfunc.load_pyfunc(path=(sklearnURI+"/model"))
type(rf_pyfunc_model)

sklearn.ensemble.forest.RandomForestRegressor

In [97]:
import mlflow.pyfunc

nn_pyfunc_model = mlflow.pyfunc.load_pyfunc(path=(kerasURI+"/model"))
type(nn_pyfunc_model)

mlflow.keras._KerasModelWrapper

Both will implement a predict method.  The `sklearn` model is still of type `sklearn` because this package natively implements this method.

In [98]:
rf_pyfunc_model.predict(X_test)

array([149.01402207, 119.33512745, 119.59790147, ..., 172.23161991,
       119.33512745, 119.33512745])

In [101]:
nn_pyfunc_model.predict(X_test)

Unnamed: 0,0
1210,144.847275
1729,129.515747
4428,140.550583
3720,407.037903
2970,112.609253
291,88.946892
4222,77.743027
4622,122.463432
4477,505.032166
960,119.904266


In [75]:
t =X_test.iloc[0]
t

host_total_listings_count        1.000000
neighbourhood_cleansed          29.000000
zipcode                         21.000000
latitude                        37.750854
longitude                     -122.478961
property_type                    0.000000
room_type                        0.000000
accommodates                     4.000000
bathrooms                        1.000000
bedrooms                         0.000000
beds                             4.000000
bed_type                         0.000000
minimum_nights                   2.000000
number_of_reviews              194.000000
review_scores_rating            96.000000
review_scores_accuracy          10.000000
review_scores_cleanliness       10.000000
review_scores_checkin           10.000000
review_scores_communication     10.000000
review_scores_location           9.000000
review_scores_value              9.000000
Name: 1210, dtype: float64

(azure_automl) alexs-mbp-2:mlflowdemo azeltov$ mlflow pyfunc serve -m /Users/azeltov/git/mlflowdemo/notebooks/mlruns/0/e5d6f0fcccb54aa986d1e1bd4c1581aa/artifacts/model/  -p 1234

In [113]:
sample = X_test.iloc[[0]]
query_input = list(sample.as_matrix().flatten())
sample_json = sample.to_json(orient="split")
print(sample_json)

{"columns":["host_total_listings_count","neighbourhood_cleansed","zipcode","latitude","longitude","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","minimum_nights","number_of_reviews","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location","review_scores_value"],"index":[1210],"data":[[1.0,29,21,37.750853666,-122.4789613464,0,0,4.0,1.0,0.0,4.0,0,2.0,194.0,96.0,10.0,10.0,10.0,10.0,9.0,9.0]]}


In [104]:
import requests
import json

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    "Content-Type": "application/json",
  }
  if service_key is not None:
    headers["Authorization"] = "Bearer {service_key}".format(service_key=service_key)
    
  print("Sending batch prediction request with inputs: {}".format(inputs))
  response = requests.post(scoring_uri, data=inputs, headers=headers)
  print("Response: {}".format(response.text))
  preds = json.loads(response.text)
  print("Received response: {}".format(preds))
  return preds

In [118]:
dev_prediction = query_endpoint_example(scoring_uri='http://127.0.0.1:1234/invocations', inputs=sample_json)

Sending batch prediction request with inputs: {"columns":["host_total_listings_count","neighbourhood_cleansed","zipcode","latitude","longitude","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","minimum_nights","number_of_reviews","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location","review_scores_value"],"index":[1210],"data":[[1.0,29,21,37.750853666,-122.4789613464,0,0,4.0,1.0,0.0,4.0,0,2.0,194.0,96.0,10.0,10.0,10.0,10.0,9.0,9.0]]}
Response: [149.01402206843878]
Received response: [149.01402206843878]


In [119]:
%%sh 
curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["host_total_listings_count","neighbourhood_cleansed","zipcode","latitude","longitude","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","minimum_nights","number_of_reviews","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location","review_scores_value"],"data":[[1.0,29.0,21.0,37.750853666,-122.4789613464,0.0,0.0,4.0,1.0,0.0,4.0,0.0,2.0,194.0,96.0,10.0,10.0,10.0,10.0,9.0,9.0]]}' http://127.0.0.1:1234/invocations


[149.01402206843878]

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   542  100    20  100   522   1367  35704 --:--:-- --:--:-- --:--:-- 37285


## ----------------------------------------------------------------------------------------------

In [120]:
import requests
server_port_number=1234
headers = {'Content-type': 'application/json'}
url = "http://localhost:{port_number}/invocations".format(port_number=server_port_number)

input_json='{"columns":["host_total_listings_count","neighbourhood_cleansed","zipcode","latitude","longitude","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","minimum_nights","number_of_reviews","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location","review_scores_value"],"data":[[1.0,29.0,21.0,37.750853666,-122.4789613464,0.0,0.0,4.0,1.0,0.0,4.0,0.0,2.0,194.0,96.0,10.0,10.0,10.0,10.0,9.0,9.0]]}'

response = requests.post(url=url, headers=headers, data=input_json)

print(response)
print(response.text)

<Response [200]>
[149.01402206843878]



### Pre and Post Processing Code using `pyfunc`

A `pyfunc` is a generic python model that can define any model, regardless of the libraries used to train it.  As such, it's defined as a directory structure with all of the dependencies.  It is then "just an object" with a predict method.  Since it makes very few assumptions, it can be deployed using MLflow, SageMaker, a Spark UDF or in any other environment.

Check out <a href="https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom" target="_blank">the `pyfunc` documentation for details</a><br>

Check out <a href="https://github.com/mlflow/mlflow/blob/master/docs/source/models.rst#example-saving-an-xgboost-model-in-mlflow-format" target="_blank">this README for generic example code and integration with `XGBoost`</a>

Define a model class.

In [None]:
import mlflow.pyfunc

class AddN(mlflow.pyfunc.PythonModel):

    def __init__(self, n):
        self.n = n

    def predict(self, context, model_input):
        return model_input.apply(lambda column: column + self.n)

Construct and save the model.

In [None]:
model_path = "add_n_model2"
add5_model = AddN(n=5)

mlflow.pyfunc.save_model(dst_path=model_path, python_model=add5_model)

Load the model in `python_function` format

In [30]:
loaded_model = mlflow.pyfunc.load_pyfunc(model_path)
type(loaded_model)

mlflow.pyfunc.model._PythonModelPyfuncWrapper

Evaluate the model.

In [31]:
import pandas as pd

model_input = pd.DataFrame([range(10)])
model_output = loaded_model.predict(model_input)

assert model_output.equals(pd.DataFrame([range(5, 15)]))

model_output

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,5,6,7,8,9,10,11,12,13,14


## Review
**Question:** How do MLflow projects differ from models?  
**Answer:** The focus of MLflow projects is reproducibility of runs and packaging of code.  MLflow models focuses on various deployment environments.

**Question:** What is a ML model flavor?  
**Answer:** Flavors are a convention that deployment tools can use to understand the model, which makes it possible to write tools that work with models from any ML library without having to integrate each tool with each library.  Instead of having to map each training environment to a deployment environment, ML model flavors manages this mapping for you.

**Question:** How do I add pre and post processing logic to my models?  
**Answer:** A model class that extends `mlflow.pyfunc.PythonModel` allows you to have load, pre-processing, and post-processing logic.

## Next Steps

Start the next lesson, [Production Issues]($./05-Production-Issues ).

## Additional Topics & Resources

**Q:** Where can I find out more information on MLflow Models?
**A:** Check out <a href="https://www.mlflow.org/docs/latest/models.html" target="_blank">the MLflow documentation</a>

-sandbox
&copy; 2019 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>