# MLflow fundamentals

## Autologging

**Author:** Alekzandev

Auto-logging is a feature that automatically logs the parameters and metrics of your model. This feature is available for the following libraries: `scikit-learn`, `XGBoost`, `LightGBM`, `PyTorch`, `Keras`, `TensorFlow`, `fast.ai`, `statsmodels`, and `gluon`.


### `mlflow.autolog()`

In [1]:
import mlflow

In [23]:
mlflow.set_tracking_uri("../experiments")
experiment_name = 'core_logging_functions'

experiment_id = mlflow.create_experiment(
    name=experiment_name,
    tags={
        "mlflow.source.name": "autolog_functions",
        "mlflow.source.type": "NOTEBOOK",
        "mlflow.user": "alekzandev",
        "team.name": "MLOps",
        "team.department": "EAeIA",
        "team.project": "MLFlow-101",
        "team.organization": "Nequi",
        "team.country": "CO"
    },
    artifact_location="file:/Users/javallejos/projects/mlflow-101/models/artifacts/mlflow"
)

In [4]:
# Train a linear regression model using data from the diabetes dataset
from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import numpy
import pandas
import mlflow.sklearn

In [7]:
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target


In [24]:
with mlflow.start_run(
    experiment_id=experiment_id,
    tags={
        "mlflow.runName": "autolog_runner",
        "mlflow.source.name": "4-Autologging.ipynb",
        "mlflow.source.type": "Project",
        "mlflow.user": "alekzandev",
        "mlflow.source.git.branch": "develop"
    }
    ):
    mlflow.autolog(
        log_input_examples=True
    )
    random_state = 42
    # Split the data into training and test sets. (0.75, 0.25) split.
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state)

    # Train a linear regression model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Make predictions
    predicted_qualities = model.predict(X_test)

    # Evaluate the model
    mse = mean_squared_error(y_test, predicted_qualities)
    mae = numpy.mean(numpy.abs(y_test - predicted_qualities))
    r2 = model.score(X_test, y_test)


In [None]:
#! mlflow ui --backend-store-uri "file:/Users/javallejos/projects/mlflow-101/experiments"

[2024-11-04 00:54:27 -0500] [16888] [INFO] Starting gunicorn 23.0.0
[2024-11-04 00:54:27 -0500] [16888] [INFO] Listening at: http://127.0.0.1:5000 (16888)
[2024-11-04 00:54:27 -0500] [16888] [INFO] Using worker: sync
[2024-11-04 00:54:27 -0500] [16889] [INFO] Booting worker with pid: 16889
[2024-11-04 00:54:27 -0500] [16890] [INFO] Booting worker with pid: 16890
[2024-11-04 00:54:27 -0500] [16891] [INFO] Booting worker with pid: 16891
[2024-11-04 00:54:27 -0500] [16892] [INFO] Booting worker with pid: 16892
^C
[2024-11-04 00:54:49 -0500] [16888] [INFO] Handling signal: int
[2024-11-04 00:54:49 -0500] [16890] [INFO] Worker exiting (pid: 16890)
[2024-11-04 00:54:49 -0500] [16891] [INFO] Worker exiting (pid: 16891)
[2024-11-04 00:54:49 -0500] [16892] [INFO] Worker exiting (pid: 16892)
[2024-11-04 00:54:49 -0500] [16889] [INFO] Worker exiting (pid: 16889)


### `mlflow.<library>.autolog()`

In [26]:
experiment_name = 'autolog_library_functions'

experiment_id = mlflow.create_experiment(
    name=experiment_name,
    tags={
        "mlflow.source.name": "core_logging_functions",
        "mlflow.source.type": "NOTEBOOK",
        "mlflow.user": "alekzandev",
        "team.name": "MLOps",
        "team.department": "EAeIA",
        "team.project": "MLFlow-101",
        "team.organization": "Nequi",
        "team.country": "CO"
    },
    artifact_location="file:/Users/javallejos/projects/mlflow-101/models/artifacts/mlflow"
)

In [27]:
with mlflow.start_run(
    experiment_id=experiment_id,
    tags={
        "mlflow.runName": "autolog_runner",
        "mlflow.source.name": "4-Autologging.ipynb",
        "mlflow.source.type": "Project",
        "mlflow.user": "alekzandev",
        "mlflow.source.git.branch": "develop"
    }
    ):
    mlflow.sklearn.autolog(
        log_post_training_metrics = True,
        serialization_format='pickle'
    )
    random_state = 42
    # Split the data into training and test sets. (0.75, 0.25) split.
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state)

    # Train a linear regression model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Make predictions
    predicted_qualities = model.predict(X_test)

    # Evaluate the model
    mse = mean_squared_error(y_test, predicted_qualities)
    mae = numpy.mean(numpy.abs(y_test - predicted_qualities))
    r2 = model.score(X_test, y_test)


In [1]:
! mlflow ui --backend-store-uri "file:/Users/javallejos/projects/mlflow-101/experiments"

[2024-11-04 00:56:12 -0500] [17017] [INFO] Starting gunicorn 23.0.0
[2024-11-04 00:56:12 -0500] [17017] [INFO] Listening at: http://127.0.0.1:5000 (17017)
[2024-11-04 00:56:12 -0500] [17017] [INFO] Using worker: sync
[2024-11-04 00:56:12 -0500] [17018] [INFO] Booting worker with pid: 17018
[2024-11-04 00:56:12 -0500] [17019] [INFO] Booting worker with pid: 17019
[2024-11-04 00:56:12 -0500] [17020] [INFO] Booting worker with pid: 17020
[2024-11-04 00:56:12 -0500] [17021] [INFO] Booting worker with pid: 17021
^C
[2024-11-04 00:56:28 -0500] [17017] [INFO] Handling signal: int
[2024-11-04 00:56:29 -0500] [17021] [INFO] Worker exiting (pid: 17021)
[2024-11-04 00:56:29 -0500] [17020] [INFO] Worker exiting (pid: 17020)
[2024-11-04 00:56:29 -0500] [17018] [INFO] Worker exiting (pid: 17018)
[2024-11-04 00:56:29 -0500] [17019] [INFO] Worker exiting (pid: 17019)
