<a href="https://colab.research.google.com/github/Satwikram/MLOPS-Implementations/blob/main/MLflow%20Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Author: Satwik Ram K

### Setup

In [None]:
!pip install mlflow

### Importing Dependencies

In [None]:
import os
import warnings
import sys

import pandas as pd
import numpy as np

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

### Loading the dataset

In [None]:
dpath = "/content/sample_data/california_housing_train.csv"
df = pd.read_csv(dpath)
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


### Start the MLflow tracking server by

In [None]:
# !mlflow server --backend-store-uri mlruns/ --default-artifact-root mlruns/ --host localhost --port 5000

remote_server_uri = "http://localhost:5000" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)  # or set the MLFLOW_TRACKING_URI in the env

In [None]:
exp_name = "Housing"
mlflow.set_experiment(exp_name)

What do we track?
Code Version: Git commit hash used for the run (if it was run from an MLflow Project)

Start & End Time: Start and end time of the run
Source: what code run?

Parameters: Key-value input parameters.

Metrics: Key-value metrics, where the value is numeric (can be updated over the run)

Artifacts: Output files in any format.

### Load Data Pipeline

In [None]:
def load_data(train_path, test_path):
  
    train = pd.read_csv(train_path)
    test = pd.read_csv(test_path)

    X_train = train.drop(["median_house_value"], axis=1)
    X_test = test.drop(["median_house_value"], axis=1)
    y_train = train[["median_house_value"]]
    y_test = test[["median_house_value"]]
    
    return X_train, y_train, X_test, y_test

### Evaluation Metrics

In [None]:
def eval_metrics(actual, pred):
  
    # compute relevant metrics
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    
    return rmse, mae, r2

### Training the model

In [None]:
def train(alpha=0.5, l1_ratio=0.5):

    # train a model with given parameters
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    train_path = "/content/sample_data/california_housing_train.csv"
    test_path = "/content/sample_data/california_housing_test.csv"

    X_train, y_train, X_test, y_test = load_data(train_path, test_path)

    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(X_train, y_train)

        # Evaluate Metrics
        predicted_qualities = lr.predict(X_test)
        (rmse, mae, r2) = eval_metrics(y_test, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param(key="alpha", value=alpha)
        mlflow.log_param(key="l1_ratio", value=l1_ratio)
        mlflow.log_metric(key="rmse", value=rmse)
        mlflow.log_metrics({"mae": mae, "r2": r2})
        mlflow.log_artifact(train_path)
        print("Save to: {}".format(mlflow.get_artifact_uri()))
        
        mlflow.sklearn.log_model(lr, "model")

In [None]:
train(0.5, 0.5)

### Comparing runs

Run mlflow ui in a terminal or http://your-tracking-server-host:5000 to view the experiment log and visualize and compare different runs and experiments. The logs and the model artifacts are saved in the mlruns directory (or where you specified).

### Packaging the experiment as a MLflow project as conda env

Specify the entrypoint for this project by creating a MLproject file and adding an conda environment with a conda.yaml. You can copy the yaml file from the experiment logs.

To run this project, invoke mlflow run . -P alpha=0.42. After running this command, MLflow runs your training code in a new Conda environment with the dependencies specified in conda.yaml.

### Deploying the model 

Deploy the model locally by running

mlflow models serve -m mlruns/0/f5f7c052ddc5469a852aa52c14cabdf1/artifacts/model/ -h 0.0.0.0 -p 1234

Test the endpoint:

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://0.0.0.0:1234/invocations

You can also simply build a docker image from your model

mlflow models build-docker -m mlruns/1/d671f37a9c7f478989e67eb4ff4d1dac/artifacts/model/ -n elastic_net_wine

and run the container with

docker run -p 8080:8080 elastic_net_wine.

Or you can directly deploy to AWS sagemaker or Microsoft Azure ML.

### Tagging Runs



In [None]:
from datetime import datetime
from mlflow.tracking import MlflowClient

client = MlflowClient()
experiments = client.list_experiments() # returns a list of mlflow.entities.Experiment
print(experiments)

In [None]:
# get the run
_run = client.get_run(run_id="3627a8dd69d14bee919205e5e69c8bca")
print(_run)

In [None]:
# add a tag to the run
dt = datetime.now().strftime("%d-%m-%Y (%H:%M:%S.%f)")
client.set_tag(_run.info.run_id, "deployed", dt)