## Storage format 

- specific how the model is packaged and saved
- include the model, metadata of the model, hypreparameters and the model's version
- supports multiple storage formats - a directory of files, a single file format, python functions or container images

## Model signature

- specific the input, output data types and shapes that the model expects and returns.
- used by MLflow to generate REST API for the model
- defined using the python function annotations syntax
- store as part of the MLflow model and can be accessed by other MLflow components.

## Model API

- A REST API providing a standardized interface for interacting with model
- API supports both synchronous and asynchronous requests
- Can be used for real time inference or batch processing
- can be deployed to various environments cloud platforms, edge devices or on-premises servers

## Flavor

- refer to a specific way of serializing and storing a machine learning model
- each of the supported frameworks and libraries has an associated flavor in MLflow
- additional community-driven flavors and custom flavors

## Miscalleneous

- Provides functionality for evaluatingthe model using metrics such as accuracy, precision, recall, F1 score
- Provides tools to deploy MLflow models to various platforms
- Can set a custom target by specifying a custom deployment target along with the necessary code



---
## Model API

- save model
- log model
- load model


In [9]:
import warnings
import argparse
import logging
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow
import mlflow.sklearn
from pathlib import Path
import os
from mlflow.models.signature import ModelSignature, infer_signature
from mlflow.types.schema import Schema, ColSpec
import sklearn
import joblib
import cloudpickle
from mlflow.models import make_metric
import matplotlib.pyplot as plt

## logging recorder

In [10]:
#process log tracking
logging.basicConfig(level=logging.DEBUG,
                    filename='./logfile.log',
                    filemode='w', # 'w' 表示寫模式, 'a' 表示追加模式, 'w' 表示如果文件已存在，先将其清空。如果你想在不清空现有日志的情况下向文件追加日志，可以使用 'a' 模式。
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)



## evaluation metrics

In [11]:
#evaluation function
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


## raw data ingestion

In [12]:
warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file from the URL
data = pd.read_csv("data/red-wine-quality.csv")
# os.mkdir("data/")
data.to_csv("data/red-wine-quality.csv", index=False)
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

data_dir = 'red-wine-data'

if not os.path.exists(data_dir):
    os.makedirs(data_dir)

data.to_csv(data_dir + '/data.csv')
train.to_csv(data_dir + '/train.csv')
test.to_csv(data_dir + '/test.csv')

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]



## parameter setting

In [13]:
alpha = 0.9
l1_ratio = 0.9  

## tracking uri

In [14]:
# set tracking folder
mlflow.set_tracking_uri(uri="")

# 全路徑寫法 file:xxxx
# mlflow.set_tracking_uri(uri=r"file:C:\Users\xdxd2\Sunny_VS_worksapce\Sunny_python\ML\mytracks")

print("The set tracking uri is ", mlflow.get_tracking_uri())


The set tracking uri is  


## experiment id

In [15]:
exp = mlflow.set_experiment(experiment_name="experiment_custom_metrics")

print("Name: {}".format(exp.name))
print("Experiment_id: {}".format(exp.experiment_id))
print("Artifact Location: {}".format(exp.artifact_location))
print("Tags: {}".format(exp.tags))
print("Lifecycle_stage: {}".format(exp.lifecycle_stage))
print("Creation timestamp: {}".format(exp.creation_time))


2024/01/31 12:28:20 INFO mlflow.tracking.fluent: Experiment with name 'experiment_custom_metrics' does not exist. Creating a new experiment.


Name: experiment_custom_metrics
Experiment_id: 370280449364329886
Artifact Location: file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/370280449364329886
Tags: {}
Lifecycle_stage: active
Creation timestamp: 1706675300388


## Model_Signature

In [16]:
mlflow.start_run(experiment_id=exp.experiment_id, run_name="evaluate_model")

tags = {
    "engineering": "ML platform",
    "release.candidate": "RC1",
    "release.version": "2.0"
}

mlflow.set_tags(tags)
mlflow.sklearn.autolog(
    log_input_examples=False,
    log_model_signatures=False,
    log_models=False
)
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(
    alpha, l1_ratio))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)

mlflow.log_params({
    "alpha": 0.9,
    "l1_ratio": 0.9
})

mlflow.log_metrics({
    "rmse": rmse,
    "r2": r2,
    "mae": mae
})

sklearn_model_path = "sklearn_model.pkl"
joblib.dump(lr, sklearn_model_path)
artifacts = {
    "sklearn_model": sklearn_model_path,
    "data": data_dir
}

class SklearnWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        self.sklearn_model = joblib.load(
            context.artifacts["sklearn_model"])

    def predict(self, context, model_input):
        return self.sklearn_model.predict(model_input.values)

# Create a Conda environment for the new MLflow Model that contains all necessary dependencies.
conda_env = {
    "channels": ["defaults"],
    "dependencies": [
        "python={}".format(3.10),
        "pip",
        {
            "pip": [
                "mlflow=={}".format(mlflow.__version__),
                "scikit-learn=={}".format(sklearn.__version__),
                "cloudpickle=={}".format(cloudpickle.__version__),
            ],
        },
    ],
    "name": "sklearn_env",
}

mlflow.pyfunc.log_model(
    artifact_path="sklear_mlflow_pyfunc",
    python_model=SklearnWrapper(),
    artifacts=artifacts,
    code_path=["12_models_load_model.ipynb"],
    conda_env=conda_env
)

# _builtin_metrics -> 非內建指標
# builtin_metrics -> 內建指標

def squared_diff_plus_one(eval_df, _builtin_metrics):
    return np.sum(np.abs(eval_df["prediction"] - eval_df["target"] + 1) ** 2)

def sum_on_target_divided_by_two(_eval_df, builtin_metrics):
    return builtin_metrics["sum_on_target"] / 2

squared_diff_plus_one_metric = make_metric(
    eval_fn=squared_diff_plus_one,
    greater_is_better=False,
    name="squared diff plus one"
)

sum_on_target_divided_by_two_metric = make_metric(
    eval_fn=sum_on_target_divided_by_two,
    greater_is_better=True,
    name="sum on target divided by two"
)


def prediction_target_scatter(eval_df, _builtin_metrics, artifacts_dir):
    plt.scatter(eval_df["prediction"], eval_df["target"])
    plt.xlabel("Targets")
    plt.ylabel("Predictions")
    plt.title("Targets vs. Predictions")
    plot_path = os.path.join(artifacts_dir, "example_scatter_plot.png")
    plt.savefig(plot_path)
    return {"example_scatter_plot_artifact": plot_path}


artifacts_uri = mlflow.get_artifact_uri("sklear_mlflow_pyfunc")
mlflow.evaluate(
    artifacts_uri,
    test,
    targets="quality",
    model_type="regressor",
    evaluators=["default"],
    custom_metrics=[
        squared_diff_plus_one_metric,
        sum_on_target_divided_by_two_metric
    ],
    custom_artifacts=[prediction_target_scatter]
)


artifacts_uri = mlflow.get_artifact_uri()
print("The artifact path is", artifacts_uri)
mlflow.end_run()

run = mlflow.last_active_run()
print("Active run id is {}".format(run.info.run_id))
print("Active run name is {}".format(run.info.run_name))



Elasticnet model (alpha=0.900000, l1_ratio=0.900000):
  RMSE: 0.8312296853893981
  MAE: 0.6673520215793272
  R2: 0.02101549378688994


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/3 [00:00<?, ?it/s]

2024/01/31 12:28:24 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/01/31 12:28:24 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/01/31 12:28:24 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/01/31 12:28:24 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: squared diff plus one
2024/01/31 12:28:24 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: sum on target divided by two
2024/01/31 12:28:27 INFO mlflow.models.evaluation.default_evaluator: Shap explainer Permutation is used.
Permutation explainer: 401it [00:20, 12.08it/s]                         


The artifact path is file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/370280449364329886/5ae123d3e78b4fa8a3155c7c594a22ad/artifacts
Active run id is 5ae123d3e78b4fa8a3155c7c594a22ad
Active run name is evaluate_model


## inference

In [17]:
mlflow.start_run()

id = mlflow.pyfunc.load_model(model_uri="runs:/cb2d193e28e54b5c845a56792941583c/sklear_mlflow_pyfunc")
predicted_qualities = id.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("  RMSE for test data: %s" % rmse)
print("  MAE for test data: %s" % mae)
print("  R2 for test data: %s" % r2)

mlflow.end_run()



  RMSE for test data: 0.7442929001520973
  MAE for test data: 0.5763000946156918
  R2 for test data: 0.21508707276848893
