# MLOps using MLFlow
 MLflow is an MLOps tool that enables data scientist to quickly productionalization of their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.

-sandbox
##### DEMO details

######In this Demo Notebook we are taking a look at:
 * How to set up a ElasticNet Model in Spark.
 * How to create an MLFlow experiment.
 * How to track model params and metrics with MLFlow.
 * How to deploy Model to different environment using MLFlow Model Registry.

<img src ='/files/MLFlow_Img.png' height=200 margin: 20px>

### Training a model and adding to the mlFlow registry

In [0]:
dbutils.widgets.text(name = "model_name", defaultValue = "mlops-demo-wine-model", label = "Model Name")
dbutils.widgets.combobox(name = "trigger_pipeline", defaultValue = "True", choices=["True","False"],label = "Trigger Pipeline")
dbutils.widgets.text(name = "stage", defaultValue = "staging", label = "Stage")

In [0]:
model_name=dbutils.widgets.get("model_name")
stage = dbutils.widgets.get("stage")

### Connect to an MLflow tracking server

MLflow can collect data about a model training session, such as validation accuracy. It can also save artifacts produced during the training session, such as a PySpark pipeline model.

By default, these data and artifacts are stored on the cluster's local filesystem. However, they can also be stored remotely using an [MLflow Tracking Server](https://mlflow.org/docs/latest/tracking.html).

In [0]:
import mlflow
mlflow.__version__

# Using the hosted mlflow tracking server

## Training a model

In [0]:
wine_data_path = "/dbfs/FileStore/tables/winequality_red.csv"

### In an MLflow run, train and save an ElasticNet model for rating wines

Using Scikit-learn's Elastic Net regression module, we will train wine quality dataset. We will use mlflow tracking server to save performance metrics, hyperparameter data, and model artifacts for future reference. mlflow tracking server will persist metrics and artifact, allowing other users to view and download it. For more information about model tracking in MLflow, see the [MLflow tracking reference](https://www.mlflow.org/docs/latest/tracking.html).

Later, we will use the saved MLflow model artifacts to deploy the trained model to Azure ML for serving.

In [0]:
import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn


def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


def train_model(wine_data_path, model_path, alpha, l1_ratio):
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    data = pd.read_csv(wine_data_path, sep=None)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    
    # Start a new MLflow training run 
    with mlflow.start_run():
        # Fit the Scikit-learn ElasticNet model
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        # Evaluate the performance of the model using several accuracy metrics
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log model hyperparameters and performance metrics to the MLflow tracking server
        # (or to disk if no)
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, model_path)
        
        return mlflow.active_run().info.run_uuid

In [0]:
alpha_1 = 0.60
l1_ratio_1 = 0.7
model_path = 'model'
run_id1 = train_model(wine_data_path=wine_data_path, model_path=model_path, alpha=alpha_1, l1_ratio=l1_ratio_1)
model_uri = "runs:/"+run_id1+"/model"

In [0]:
print(model_uri)

## Register the Model in the Model Registry

In [0]:
import time
result = mlflow.register_model(
    model_uri,
    model_name
)
time.sleep(10)
version = result.version

### Transitioning the model to 'Staging"

In [0]:
import mlflow
client = mlflow.tracking.MlflowClient()

client.transition_model_version_stage(
    name=model_name,
    version=version,
    stage="staging")

### Get the latest version of the model that was put into the current stage

In [0]:
import mlflow
import mlflow.sklearn

client = mlflow.tracking.MlflowClient()
latest_model = client.get_latest_versions(name = model_name, stages=[stage])
print(latest_model[0])

In [0]:
model_uri="runs:/{}/model".format(latest_model[0].run_id)
latest_sk_model = mlflow.sklearn.load_model(model_uri)