## Storage format 

- specific how the model is packaged and saved
- include the model, metadata of the model, hypreparameters and the model's version
- supports multiple storage formats - a directory of files, a single file format, python functions or container images

## Model signature

- specific the input, output data types and shapes that the model expects and returns.
- used by MLflow to generate REST API for the model
- defined using the python function annotations syntax
- store as part of the MLflow model and can be accessed by other MLflow components.

## Model API

- A REST API providing a standardized interface for interacting with model
- API supports both synchronous and asynchronous requests
- Can be used for real time inference or batch processing
- can be deployed to various environments cloud platforms, edge devices or on-premises servers

## Flavor

- refer to a specific way of serializing and storing a machine learning model
- each of the supported frameworks and libraries has an associated flavor in MLflow
- additional community-driven flavors and custom flavors

## Miscalleneous

- Provides functionality for evaluatingthe model using metrics such as accuracy, precision, recall, F1 score
- Provides tools to deploy MLflow models to various platforms
- Can set a custom target by specifying a custom deployment target along with the necessary code



---
## Model API

- save model
- log model
- load model


In [2]:
import warnings
import argparse
import logging
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow
import mlflow.sklearn
from pathlib import Path
import os
from mlflow.models.signature import ModelSignature, infer_signature
from mlflow.types.schema import Schema,ColSpec



## logging recorder

In [3]:
#process log tracking
logging.basicConfig(level=logging.DEBUG,
                    filename='./logfile.log',
                    filemode='w', # 'w' 表示寫模式, 'a' 表示追加模式, 'w' 表示如果文件已存在，先将其清空。如果你想在不清空现有日志的情况下向文件追加日志，可以使用 'a' 模式。
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)



## evaluation metrics

In [4]:
#evaluation function
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


## raw data ingestion

In [5]:
warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file from the URL
data = pd.read_csv("data/red-wine-quality.csv")
#os.mkdir("data/")
data.to_csv("data/red-wine-quality.csv", index=False)
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)
train.to_csv("data/train.csv")
test.to_csv("data/test.csv")
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

## parameter setting

In [6]:
alpha = 0.3
l1_ratio = 0.3  

## tracking uri

In [7]:
# set tracking folder
mlflow.set_tracking_uri(uri="")

# 全路徑寫法 file:xxxx
# mlflow.set_tracking_uri(uri=r"file:C:\Users\xdxd2\Sunny_VS_worksapce\Sunny_python\ML\mytracks")

print("The set tracking uri is ", mlflow.get_tracking_uri())


The set tracking uri is  


## experiment id

In [8]:
exp = mlflow.set_experiment(experiment_name="experiment_signature")

print("Name: {}".format(exp.name))
print("Experiment_id: {}".format(exp.experiment_id))
print("Artifact Location: {}".format(exp.artifact_location))
print("Tags: {}".format(exp.tags))
print("Lifecycle_stage: {}".format(exp.lifecycle_stage))
print("Creation timestamp: {}".format(exp.creation_time))


Name: experiment_signature
Experiment_id: 513148386009526498
Artifact Location: file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/513148386009526498
Tags: {}
Lifecycle_stage: active
Creation timestamp: 1706604750158


## Model_Signature

In [9]:
mlflow.start_run()
tags = {
    "engineering": "ML platform",
    "release.candidate":"RC1",
    "release.version": "2.0"
}

mlflow.set_tags(tags)
mlflow.sklearn.autolog(
    log_input_examples=False,
    log_model_signatures=False,
    log_models=False
)

lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)


input_data = [
    {"name": "fixed acidity", "type": "double"},
    {"name": "volatile acidity", "type": "double"},
    {"name": "citric acid", "type": "double"},
    {"name": "residual sugar", "type": "double"},
    {"name": "chlorides", "type": "double"},
    {"name": "free sulfur dioxide", "type": "double"},
    {"name": "total sulfur dioxide", "type": "double"},
    {"name": "density", "type": "double"},
    {"name": "pH", "type": "double"},
    {"name": "sulphates", "type": "double"},
    {"name": "alcohol", "type": "double"},
    {"name": "quality", "type": "double"}
]

output_data = [{'type': 'long'}]

input_schema = Schema([ColSpec(col["type"], col['name']) for col in input_data])
output_schema = Schema([ColSpec(col['type']) for col in output_data])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)


input_example = {
    "fixed acidity": np.array([7.2, 7.5, 7.0, 6.8, 6.9]),
    "volatile acidity": np.array([0.35, 0.3, 0.28, 0.38, 0.25]),
    "citric acid": np.array([0.45, 0.5, 0.55, 0.4, 0.42]),
    "residual sugar": np.array([8.5, 9.0, 8.2, 7.8, 8.1]),
    "chlorides": np.array([0.045, 0.04, 0.035, 0.05, 0.042]),
    "free sulfur dioxide": np.array([30, 35, 40, 28, 32]),
    "total sulfur dioxide": np.array([120, 125, 130, 115, 110]),
    "density": np.array([0.997, 0.996, 0.995, 0.998, 0.994]),
    "pH": np.array([3.2, 3.1, 3.0, 3.3, 3.2]),
    "sulphates": np.array([0.65, 0.7, 0.68, 0.72, 0.62]),
    "alcohol": np.array([9.2, 9.5, 9.0, 9.8, 9.4]),
    "quality": np.array([6, 7, 6, 8, 7])
}

# signature = infer_signature(test_x, predicted_qualities)
# input_example = {
#     "columns":np.array(test_x.columns),
#     "data": np.array(test_x.values)
# }


#log model
mlflow.log_artifact("./data/red-wine-quality.csv")
mlflow.sklearn.save_model(lr, "model", signature=signature, input_example=input_example)
artifacts_uri=mlflow.get_artifact_uri()
print("The artifact path is",artifacts_uri )

mlflow.end_run()

run = mlflow.last_active_run()
print("Active run id is {}".format(run.info.run_id))
print("Active run name is {}".format(run.info.run_name))

Elasticnet model (alpha=0.300000, l1_ratio=0.300000):
  RMSE: 0.7442929001520973
  MAE: 0.5763000946156918
  R2: 0.21508707276848893
The artifact path is file:///C:/Users/xdxd2/Sunny_VS_worksapce/Sunny_python/ML/MLOps_fundamentals/MLflow/basic/mlruns/513148386009526498/f815f9e11082448e9afd932afa9653fd/artifacts
Active run id is f815f9e11082448e9afd932afa9653fd
Active run name is bemused-cat-546


## infer_signature

In [10]:
mlflow.start_run()
tags = {
    "engineering": "ML platform",
    "release.candidate":"RC1",
    "release.version": "2.0"
}

mlflow.set_tags(tags)
mlflow.sklearn.autolog(
    log_input_examples=False,
    log_model_signatures=False,
    log_models=False
)

lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)


signature = infer_signature(test_x, predicted_qualities)
input_example = {
    "columns":np.array(test_x.columns),
    "data": np.array(test_x.values)
}


#log model
mlflow.log_artifact("./data/red-wine-quality.csv")
mlflow.sklearn.save_model(lr, "model", signature=signature, input_example=input_example)
artifacts_uri=mlflow.get_artifact_uri()
print("The artifact path is",artifacts_uri )

mlflow.end_run()

run = mlflow.last_active_run()
print("Active run id is {}".format(run.info.run_id))
print("Active run name is {}".format(run.info.run_name))

Elasticnet model (alpha=0.300000, l1_ratio=0.300000):
  RMSE: 0.7442929001520973
  MAE: 0.5763000946156918
  R2: 0.21508707276848893


MlflowException: Path 'model' already exists and is not empty