# MLflow Training Tutorial

This `train.pynb` Jupyter notebook predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

> This is the Jupyter notebook version of the `train.py` example

Attribution
* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.


In [3]:
import logging
import warnings

import numpy as np
import pandas as pd
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature

warnings.filterwarnings("ignore")
np.random.seed(40)

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

* Databricks Host: https://community.cloud.databricks.com/

* Username: Your Databricks CE email address.

* Password: Your Databricks CE password.

In [4]:
mlflow.login()
mlflow.set_experiment("/Users/hericson@lia.ufc.br/Tutorial")

2024/09/05 11:44:30 INFO mlflow.utils.credentials: No valid Databricks credentials found, please enter your credentials...


MlflowException: `mlflow.login()` failed with error: Failed to validate databricks credentials: The service at /api/2.1/clusters/list-zones is temporarily unavailable. Please try again later. [TraceId: -]

In [None]:
def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2

In [None]:
# Wine Quality Sample
def train(in_alpha, in_l1_ratio, train_x, test_x, train_y, test_y):

    # Set default values if no alpha is provided
    alpha = 0.5 if float(in_alpha) is None else float(in_alpha)

    # Set default values if no l1_ratio is provided
    l1_ratio = 0.5 if float(in_l1_ratio) is None else float(in_l1_ratio)

    # Useful for multiple runs (only doing one run in this sample notebook)
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print(f"Elasticnet model (alpha={alpha:f}, l1_ratio={l1_ratio:f}):")
        print(f"  RMSE: {rmse}")
        print(f"  MAE: {mae}")
        print(f"  R2: {r2}")

        # Infer model signature
        predictions = lr.predict(train_x)
        signature = infer_signature(train_x, predictions)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model", signature=signature)

In [None]:
data = pd.read_csv("data/winequality-red.csv", sep=";")

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

In [None]:
train(0.5, 0.5, train_x, test_x, train_y, test_y)

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.793164022927685
  MAE: 0.6271946374319586
  R2: 0.10862644997792636


Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/09/04 14:46:16 INFO mlflow.tracking._tracking_service.client: 🏃 View run omniscient-swan-287 at: https://community.cloud.databricks.com/ml/experiments/1795471961155523/runs/133839031151452cac7d6ca39371706a.
2024/09/04 14:46:16 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://community.cloud.databricks.com/ml/experiments/1795471961155523.


In [None]:
train(0.2, 0.2)

Elasticnet model (alpha=0.200000, l1_ratio=0.200000):
  RMSE: 0.7336400911821402
  MAE: 0.5643841279275427
  R2: 0.2373946606358417


Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/09/04 14:46:30 INFO mlflow.tracking._tracking_service.client: 🏃 View run masked-stag-378 at: https://community.cloud.databricks.com/ml/experiments/1795471961155523/runs/f96c13029e7f40f6a7545aab57b3a017.
2024/09/04 14:46:30 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://community.cloud.databricks.com/ml/experiments/1795471961155523.


In [None]:
train(0.1, 0.1)

Elasticnet model (alpha=0.100000, l1_ratio=0.100000):
  RMSE: 0.7128829045893679
  MAE: 0.5462202174984664
  R2: 0.2799376066653345


Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/09/04 14:46:45 INFO mlflow.tracking._tracking_service.client: 🏃 View run inquisitive-newt-151 at: https://community.cloud.databricks.com/ml/experiments/1795471961155523/runs/79b67fa7c56046d58e798aef8421aff8.
2024/09/04 14:46:45 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://community.cloud.databricks.com/ml/experiments/1795471961155523.


### Logged Model

In [None]:
logged_model = 'runs:/edc207f740bb471195c595e6988f5d7f/model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
y_predict = loaded_model.predict(pd.DataFrame(test_x))
(rmse, mae, r2) = eval_metrics(test_y, y_predict)
print(f'RMSE: {rmse:.3f} | MAE: {mae:.3f} | R2: {r2:.3f}')

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

NameError: name 'test_x' is not defined