# Step-by-step process of deploying a model to production

The commissioning process should reduce the risk of incorrect changes to the service.

The main task in this work is to get acquainted with the practical part of the process of validating candidates for commissioning at various stages.

Continuous integration can also be automated to eliminate human error when testing model versions.

All changes are available in [MLflow interface](/app/)

## Preparing Experiment Data

Let's import the necessary modules and define the variables.

The code is similar to the first laboratory work, we fill out the register of experiments for further work with them.

In [46]:
import os
import sys
import warnings
import pprint

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from mlflow.tracking import MlflowClient


import mlflow
import mlflow.sklearn

MLFLOW_SERVER_URL = 'http://127.0.0.1:5000/'
experiment_name = 'Final_Project1'

warnings.filterwarnings("ignore")
np.random.seed(40)
data = pd.read_csv("Dataset/iris.csv")

import sklearn
encoder = sklearn.preprocessing.OneHotEncoder(handle_unknown='ignore')
y = np.array(data["variety"])
y = y.reshape(-1,1)
encoder.fit(y)
data["variety"] = encoder.transform(y).toarray()

train, test = train_test_split(data)

train_x = train.drop(["variety"], axis=1)
test_x = test.drop(["variety"], axis=1)
train_y = train[["variety"]]
test_y = test[["variety"]]

test_later_x, test_x = test_x[:10], test_x[10:]
test_later_y, test_y = test_y[:10], test_y[10:]

client = mlflow.tracking.MlflowClient(MLFLOW_SERVER_URL)

mlflow.set_tracking_uri(MLFLOW_SERVER_URL)

mlflow.set_experiment(experiment_name)


for alpha, l1_ratio in ((0.3, 0.5), (0.3, 0.3), (0.8, 0.5), (0.45, 0.3), (0.2, 0.3), (0.9, 0.9)):
    with mlflow.start_run():
        
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)
        rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
        mae = mean_absolute_error(test_y, predicted_qualities)
        r2 = r2_score(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

Elasticnet model (alpha=0.300000, l1_ratio=0.500000):
  RMSE: 0.22744552699068046
  MAE: 0.2016874375196894
  R2: 0.7887631733620541
Elasticnet model (alpha=0.300000, l1_ratio=0.300000):
  RMSE: 0.21430541884512883
  MAE: 0.19068179306295002
  R2: 0.8124655154355901
Elasticnet model (alpha=0.800000, l1_ratio=0.500000):
  RMSE: 0.3388654849846033
  MAE: 0.3143659462145404
  R2: 0.5311115809351108
Elasticnet model (alpha=0.450000, l1_ratio=0.300000):
  RMSE: 0.23220806152760973
  MAE: 0.20570007884891398
  R2: 0.7798242826598251
Elasticnet model (alpha=0.200000, l1_ratio=0.300000):
  RMSE: 0.20467188959740729
  MAE: 0.18074904196419297
  R2: 0.8289467885685591
Elasticnet model (alpha=0.900000, l1_ratio=0.900000):
  RMSE: 0.5063373885270652
  MAE: 0.47448979591836743
  R2: -0.046875


## Overview of the existing architecture and model deployment process

MLflow has registered several experiment runs with different metrics.

List of experiments:

In [47]:
client = mlflow.tracking.MlflowClient(MLFLOW_SERVER_URL)
experiment = client.get_experiment_by_name(experiment_name)
client.list_run_infos(experiment.experiment_id)

[<RunInfo: artifact_uri='./mlruns/16/52acbc02e2c14d92927dcf1552dafc07/artifacts', end_time=1652034380092, experiment_id='16', lifecycle_stage='active', run_id='52acbc02e2c14d92927dcf1552dafc07', run_uuid='52acbc02e2c14d92927dcf1552dafc07', start_time=1652034377518, status='FINISHED', user_id='kanishkkumar'>,
 <RunInfo: artifact_uri='./mlruns/16/9c3f3de071e94ddbb18b9fd0f74b197b/artifacts', end_time=1652034377508, experiment_id='16', lifecycle_stage='active', run_id='9c3f3de071e94ddbb18b9fd0f74b197b', run_uuid='9c3f3de071e94ddbb18b9fd0f74b197b', start_time=1652034374501, status='FINISHED', user_id='kanishkkumar'>,
 <RunInfo: artifact_uri='./mlruns/16/775fa278145244e4afc7ba1e7dedb99b/artifacts', end_time=1652034374485, experiment_id='16', lifecycle_stage='active', run_id='775fa278145244e4afc7ba1e7dedb99b', run_uuid='775fa278145244e4afc7ba1e7dedb99b', start_time=1652034371791, status='FINISHED', user_id='kanishkkumar'>,
 <RunInfo: artifact_uri='./mlruns/16/daa675626b7546f5b20d311e214be029/

## Process for model loss and Epoch

In [54]:
def train_keras_model(X, y):
  import tensorflow.keras
  from tensorflow.keras.models import Sequential
  from tensorflow.keras.layers import Dense

  model = Sequential()
  model.add(Dense(100, input_shape=(train_x.shape[-1],), activation="relu", name="hidden_layer"))
  model.add(Dense(1))
  model.compile(loss="mse", optimizer="adam")

  model.fit(train_x , train_y, epochs=100, batch_size=64, validation_split=.2)
  return model

import mlflow



with mlflow.start_run():
  # Automatically capture the model's parameters, metrics, artifacts,
  # and source code with the `autolog()` function
    mlflow.tensorflow.autolog()

    train_keras_model(train_x , train_y)
    run_id = mlflow.active_run().info.run_id

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100
INFO:tensorflow:Assets written to: /var/folders/4v/ly_0zrn96tq82mx7z3363f_c0000gn/T/tmp53e8k_8x/model/data/model/assets


In [84]:
run_id

'c05b251159b541e6ab859976d3b0cdb6'

### 3. Rollback the version from the test environment and mark the model

If the testing of the model is unsuccessful, it is necessary to roll back the version of the model back to the stable one and mark the model that did not pass the test in order to prevent its possible calculation in the future.

#### Rollback version to production in test environment

A workable version is currently in production. Let's put the same version in a test environment, since we need a workable version for the subsequent selection of candidates for commissioning.

Laying out a stable version from the production environment (`Production`) to the test one (` Staging`):

In [6]:
# Creating a new version using the one that is now in the test env
result = client.create_model_version(
    name=current_prod.name,
    source=current_prod.source,
    run_id=current_prod.run_id
)
# Deploy the new version to test
client.transition_model_version_stage(
    name=current_prod.name,
    version=result.version,
    stage="Staging"
)

2022/05/05 21:27:53 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sk-learn-model-ci, version 3


<ModelVersion: creation_timestamp=1651786073652, current_stage='Staging', description='', last_updated_timestamp=1651786073662, name='sk-learn-model-ci', run_id='74f035d8947a486a9e5e933222c77ca1', run_link='', source='./mlruns/1/74f035d8947a486a9e5e933222c77ca1/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='3'>

This operation also brings the test environment to a production state, which allows testing to be performed close to the production environment.

## Rejected a version

In this case, the error was during the registration of the model - the path to the model file was specified incorrectly.

The path can be updated to the correct one:

In [7]:
new_staging = client.create_model_version(
    name=current_prod.name,
    source=current_staging.source.replace('mdel', 'model'), # the bug
    run_id=current_prod.run_id
)
client.transition_model_version_stage(
    name=current_prod.name,
    version=new_staging.version,
    stage="Staging"
)

2022/05/05 21:27:57 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sk-learn-model-ci, version 4


<ModelVersion: creation_timestamp=1651786077888, current_stage='Staging', description='', last_updated_timestamp=1651786077898, name='sk-learn-model-ci', run_id='74f035d8947a486a9e5e933222c77ca1', run_link='', source='./mlruns/1/cbc98fe93da64a158cefa505f63f367a/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='4'>

Let's check that the model is now running on the test server correctly:

In [71]:
os.system('MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow models serve -m "models:/sk-learn-model-ci/Staging" -p 5005 --no-conda &')

0

In [73]:
import requests

url = f'http://127.0.0.1:5000'

http_data = test_later_x[:10].to_json(orient='split')
response = requests.post(url=url, headers={'Content-Type': 'application/json'}, data=http_data)

print(f'Predictions: {response.text}')
print(test_later_y)

Predictions: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>

     variety
38       1.0
66       0.0
115      0.0
117      0.0
89       0.0
136      0.0
68       0.0
94       0.0
84       0.0
28       1.0
