## MLflow architecture

### MLflow Components and Their Tasks

 - MLflow Project - creating an environment for experiments, grouping experiments.
 - MLflow Tracking - fixing parameters and quality metrics of experiments.
 - MLflow Models - preparing a version of the model for distribution.
 - MLflow Registry - centralized repository of models and layout in operation.

#### MLflow Projects

- Setting up the environment:
- programming languages
- package manager (e.g. conda)
- dependency (xgboost libraries, scikit-learn, ...)

- Description of the environment (Infrastructure as code):
- various OS
- local environment
- cloud services

#### MLflow Tracking

Fixes everything related to the launch of the model:
 - Datasets (for training and testing)
 - Sets of parameters (e.g. number of trees, layers, L1 / L2)
 - Values of quality metrics
 - Speed of work and other technical metrics.

#### MLflow Models and Client

- Serializes model artifacts as needed additional development
- Information about which model is used on which environment

## 1. Working with a project and running an experiment

A project is a folder with files associated with this project:
  - File with MLProject metadata (YAML format) and also include files (for example, conda.yaml)
  - Files with code for running experiments (for example, in Python)

To create a project, it is enough to describe the correct MLProject file.

### Example of creating a project

The `MLproject` project file specifies that the model is trained in the` conda` environment, and is used by `scikit-learn` as the ML library (specified in` conda.yaml`). The training of the model itself is described in the file `train.py`, the necessary data preparation is also described there.

In [30]:
%%writefile MLproject
name: Final_Project

conda_env: conda.yaml

entry_points:
      main:
        parameters:
            SepalLengthCm: {type: float, default: 4.6}
            SepalWidthCm: {type: float, default: 3.6}
            PetalLengthCm: {type: float, default: 1.0}
            PetalWidthCm: {type: float, default: 0.2}
        command: "python app.py {SepalLengthCm} {SepalWidthCm} {PetalLengthCm} {PetalWidthCm}"

Overwriting MLproject


In [31]:
%%writefile conda.yaml
name: Final_Project
channels:
  - defaults
dependencies:
  - numpy>=1.14.3
  - pandas>=1.0.0
  - scikit-learn=0.19.1
  - pip
  - pip:
    - mlflow

Overwriting conda.yaml


#### Preparing data for the experiment

The following code imports the modules necessary for work, loads the data from the `wine-quality.csv` file and splits them into test and validation selections

In [32]:
import os
import sys
import warnings
from pprint import pprint

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

MLFLOW_SERVER_URL = 'http://127.0.0.1:5000'

import array
from sklearn.preprocessing import OneHotEncoder
warnings.filterwarnings("ignore")
np.random.seed(40)
data = pd.read_csv("Dataset/iris.csv")

import sklearn
encoder = sklearn.preprocessing.OneHotEncoder(handle_unknown='ignore')
y = np.array(data["variety"])
y = y.reshape(-1,1)
encoder.fit(y)
data["variety"] = encoder.transform(y).toarray()

train, test = train_test_split(data)

train_x = train.drop(["variety"], axis=1)
test_x = test.drop(["variety"], axis=1)
train_y = train[["variety"]]
test_y = test[["variety"]]

### Create and run an experiment

The experiment code itself does not depend on MLflow, you can use the ready-made code.

To fix the launch parameters and metrics of the model, you need to start training within the experiment and project.

`tracking_url` - the address of the raised` mlflow` server, which will be used to store experiments. The web interface is also available at this address for viewing the launch results.

#### Running the experiment

To run the experiment, you need to execute the model creation code inside the MLflow launch context and store the parameters and resulting metrics in this context.

In [33]:
# connect to the server
import os
os.environ['MLFLOW_TRACKING_USERNAME'] = 'name'
os.environ['MLFLOW_TRACKING_PASSWORD'] = 'pass'

mlflow.set_tracking_uri(MLFLOW_SERVER_URL)

experiment_name = 'Final_Project'
mlflow.set_experiment(experiment_name)

# run the experiment
with mlflow.start_run():
    alpha = 0.5
    l1_ratio = 0.5

    # model
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    # metrics
    predicted_qualities = lr.predict(test_x)
    rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
    mae = mean_absolute_error(test_y, predicted_qualities)
    r2 = r2_score(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    # save the metric values for the experiment
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.sklearn.log_model(lr, "model")

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.25753089436149107
  MAE: 0.23424782687663875
  R2: 0.7149726152407471


## 2. Preparing the model for distribution

The launch of a successful experiment can be prepared for commissioning.

### Review the conducted experiments and select the candidate for deployment

To get information about running experiments, you need to create a client `mlflow.tracking.MlflowClient`, then select the experiment you want and select the desired experiment start.

The code below takes the last run of the experiment from the list of all runs.

In [34]:
client = mlflow.tracking.MlflowClient(MLFLOW_SERVER_URL)
experiment = client.get_experiment_by_name(experiment_name)
run_info = client.list_run_infos(experiment.experiment_id)[-1]

print(experiment)
print(run_info)

<Experiment: artifact_location='./mlruns/3', experiment_id='3', lifecycle_stage='active', name='Final_Project', tags={}>
<RunInfo: artifact_uri='./mlruns/3/b5735d1ced244550a62255ff03b52961/artifacts', end_time=1652018151889, experiment_id='3', lifecycle_stage='active', run_id='b5735d1ced244550a62255ff03b52961', run_uuid='b5735d1ced244550a62255ff03b52961', start_time=1652018151852, status='FAILED', user_id='kanishkkumar'>


###  A Model in the MLflow Model Client

Model Client is also available in the web interface. To do this, select a model on the experiments page

Below is a code that performs a similar action.

In [9]:
from mlflow.tracking import MlflowClient

# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()
experiment_id = client.create_experiment("Social NLP Experiments")
client.set_experiment_tag(experiment_id, "nlp.framework", "Spark NLP")

# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: Social NLP Experiments
Experiment_id: 1
Artifact Location: ./mlruns/1
Tags: {'nlp.framework': 'Spark NLP'}
Lifecycle_stage: active


## 3. Deployment of the model and testing the server

To use the model in a specific environment, you need to transfer the client model to the desired environment. This operation the client model in the desired environment, does not start the web server.

### Deployment of the model in test and production environments

Terminology used in MLflow Model Client:

  - Experiment - test experiment
  - Production - operational experiment
  - Location - experiment location

In [37]:
from mlflow.tracking import MlflowClient

# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()
experiment_id = client.create_experiment("SocialMediaTextAnalyzer")
client.set_experiment_tag(experiment_id, "nlp.framework", "Spark NLP")

# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: SocialMediaTextAnalyzer
Experiment_id: 5
Artifact Location: ./mlruns/5
Tags: {'nlp.framework': 'Spark NLP'}
Lifecycle_stage: active


In [48]:
import mlflow
 
# The default path where the MLflow autologging function stores the model
artifact_path = "model"
model_uri = "runs:/{run_id}/{artifact_path}".format(run_id=experiment_, artifact_path=artifact_path)

To use the model, you need to start a web server, pass it the name of the model and environment, as well as the port (in the example, it runs on port 8080) as parameters.

In [62]:
os.system('MLFLOW_TRACKING_URI=http://127.0.0.1:5000/ mlflow models serve -m "models:/sk-learn-new-model/Staging" -p 5005 --no-conda &')

0

In [67]:
import requests

url = f'http://127.0.0.1:8080'

http_data = test_x[:10].to_json(orient='split')
response = requests.post(url=url, headers={'Content-Type': 'application/json'}, data=http_data)

print(f'Predictions: {response.text}')

Predictions: <!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Prediction</title>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
   
</head>
<body>
<div class="container">
    <div class="row">
        <div class="span4"></div>
        <div class="span4">

<h1>FLASK APP RUNNING</h1>
<h2>Please enter your flower measurements below:</h2>
<form method="POST">
    
    <input id="csrf_token" name="csrf_token" type="hidden" value="ImU0ZjE0YjY4NTNjZWI2MDdjZmJhNjFlNDliZmE4ZjAwODdjOGY1YmYi.Ynfryg.dIV-J3kedB-kYBAp1Jz7bV77MTU">
    <table class="table">
        <tr>
        <th>Attribute</th>
        <th>Value</th>
    </tr>
        <tr>
            <td> <label for="SepalLengthCm">Sepal

In [68]:
## HTML Source Code

# Thanking You