## Storage format 

- specific how the model is packaged and saved
- include the model, metadata of the model, hypreparameters and the model's version
- supports multiple storage formats - a directory of files, a single file format, python functions or container images

## Model signature

- specific the input, output data types and shapes that the model expects and returns.
- used by MLflow to generate REST API for the model
- defined using the python function annotations syntax
- store as part of the MLflow model and can be accessed by other MLflow components.

## Model API

- A REST API providing a standardized interface for interacting with model
- API supports both synchronous and asynchronous requests
- Can be used for real time inference or batch processing
- can be deployed to various environments cloud platforms, edge devices or on-premises servers

## Flavor

- refer to a specific way of serializing and storing a machine learning model
- each of the supported frameworks and libraries has an associated flavor in MLflow
- additional community-driven flavors and custom flavors

## Miscalleneous

- Provides functionality for evaluatingthe model using metrics such as accuracy, precision, recall, F1 score
- Provides tools to deploy MLflow models to various platforms
- Can set a custom target by specifying a custom deployment target along with the necessary code



---
## Model API

- save model
- log model
- load model


In [1]:
import warnings
import argparse
import logging
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow
import mlflow.sklearn
from pathlib import Path
import os
from mlflow.models.signature import ModelSignature, infer_signature
from mlflow.types.schema import Schema,ColSpec
import sklearn
import joblib
import cloudpickle


## logging recorder

In [2]:
#process log tracking
logging.basicConfig(level=logging.DEBUG,
                    filename='./logfile.log',
                    filemode='w', # 'w' 表示寫模式, 'a' 表示追加模式, 'w' 表示如果文件已存在，先将其清空。如果你想在不清空现有日志的情况下向文件追加日志，可以使用 'a' 模式。
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)




## evaluation metrics

In [3]:
#evaluation function
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


## raw data ingestion

In [4]:
warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file from the URL
data = pd.read_csv("data/red-wine-quality.csv")
#os.mkdir("data/")
data.to_csv("data/red-wine-quality.csv", index=False)
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

data_dir = 'red-wine-data'

if not os.path.exists(data_dir):
    os.makedirs(data_dir)

data.to_csv(data_dir + '/data.csv')
train.to_csv(data_dir + '/train.csv')
test.to_csv(data_dir + '/test.csv')

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

## parameter setting

In [5]:
alpha = 0.3
l1_ratio = 0.3  

## tracking uri

In [6]:
# set tracking folder
mlflow.set_tracking_uri(uri="")

# 全路徑寫法 file:xxxx
# mlflow.set_tracking_uri(uri=r"file:C:\Users\xdxd2\Sunny_VS_worksapce\Sunny_python\ML\mytracks")

print("The set tracking uri is ", mlflow.get_tracking_uri())


The set tracking uri is  


## experiment id

In [7]:
exp = mlflow.set_experiment(experiment_name="experiment_custom_sklearn")

print("Name: {}".format(exp.name))
print("Experiment_id: {}".format(exp.experiment_id))
print("Artifact Location: {}".format(exp.artifact_location))
print("Tags: {}".format(exp.tags))
print("Lifecycle_stage: {}".format(exp.lifecycle_stage))
print("Creation timestamp: {}".format(exp.creation_time))


Name: experiment_custom_sklearn
Experiment_id: 381824655908351219
Artifact Location: file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/381824655908351219
Tags: {}
Lifecycle_stage: active
Creation timestamp: 1706626463653


## Model_Signature

In [8]:
mlflow.start_run(experiment_id=exp.experiment_id, run_name="run_1")

# hyper parameter tuning
alpha = 0.3
l1_ratio = 0.3


# add exp tags
mlflow.set_tag("release.version", "0.1")

tags = {
    "engineering": "ML platform",
    "release.candidate": "RC1",
    "release.version":"2.0"
}
mlflow.set_tags(tags)

lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=62)
lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)


#log parameters
params = {
    "alpha" : alpha,
    "l1_ratio": l1_ratio
}
mlflow.log_params(params)

#log metrics
metrics = {
    "rmse":rmse,
    "r2":r2,
    "mae":mae
}
mlflow.log_metrics(metrics)


sklearn_model_path = "sklearn_model.pkl"
joblib.dump(lr, sklearn_model_path)
artifacts = {
    "sklearn_model" : sklearn_model_path,
    "data" : data_dir
}

class SklearnWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        self.sklearn_model = joblib.load(context.artifacts["sklearn_model"])

    def predict(self, context, model_input):
        return self.sklearn_model.predict(model_input.values)


# Create a Conda environment for the new MLflow Model that contains all necessary dependencies.
conda_env = {
    "channels": ["defaults"],
    "dependencies": [
        "python={}".format(3.10),
        "pip",
        {
            "pip": [
                "mlflow=={}".format(mlflow.__version__),
                "scikit-learn=={}".format(sklearn.__version__),
                "cloudpickle=={}".format(cloudpickle.__version__),
            ],
        },
    ],
    "name": "sklearn_env",
}

mlflow.pyfunc.log_model(
    artifact_path="sklearn_mlflow_pyfunc",
    python_model=SklearnWrapper(),
    artifacts=artifacts,
    code_path=["main.py"],
    conda_env=conda_env
)


artifacts_uri=mlflow.get_artifact_uri()
print("The artifact path is",artifacts_uri )

mlflow.end_run()

run = mlflow.active_run()
print(f"active run id is {run.info.run_id}")
print(f"active run name is {run.info.run_name}")



Elasticnet model (alpha=0.300000, l1_ratio=0.300000):
  RMSE: 0.7442929001520973
  MAE: 0.5763000946156918
  R2: 0.21508707276848893


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/3 [00:00<?, ?it/s]

The artifact path is file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/381824655908351219/ffbdd3190ee946ccb84972fc50752866/artifacts


AttributeError: 'NoneType' object has no attribute 'info'

In [None]:
mlflow.end_run()

