## Máster en Big Data y Data Science

### Metodologías de gestión y diseño de proyectos de big data

#### Sesión Práctica 01 - Ejemplo mlflow

---

En esta libreta se realizan una serie de pasos básicos para verificar la operatoria a seguir con respecto al registro del entrenamiento y evaluación de modelos de aprendizaje automático utilizando la librería mlflow.

In [6]:
import mlflow
import numpy as np

from sklearn.datasets import load_iris
from sklearn.cluster import DBSCAN, KMeans
from sklearn.metrics import (
    silhouette_score,
    davies_bouldin_score,
)

Se realiza en primera instancia la lectura del conjunto de datos que será uno genérico.

In [7]:
db = load_iris()
features = db.data
target = db.target

Se ejecuta un método de clusterización haciendo uso de "autolog" de parte de la librería. Esta es una opción básica de registro que es de utilidad para los métodos de sklearn.

In [8]:
# Opción básica
mlflow.autolog()

# Clustering con DBSCAN.
dbscan = DBSCAN(eps=0.5, min_samples=5)

dbscan.fit_predict(features)

2025/06/22 18:23:23 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2025/06/22 18:23:23 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '2f7b6ffac12d468e843156d731c2c094', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  1,
        1,  1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,
       -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1,  1,
        1,  1,  1, -1, -1,  1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
        1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1])

In [9]:
# Clustering con Kmeans.
kmeans = KMeans(n_clusters=3, random_state=42)

kmeans.fit_predict(features)

# davies_bouldin_score(features, kmeans.fit_predict(features))

2025/06/22 18:23:30 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '4152b4eb1a364e50a4cb777552f5950c', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 0, 2, 0, 2, 0, 0, 2, 2, 0, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 2], dtype=int32)

Se muestran ejemplos con una denominación diferente para la experimentación a realizar a fin de dejar mayor registro.

In [13]:
# Se puede hacer una experimentación nominal

exp_name = 'Clustering-Ejemplos'
exp_id = mlflow.create_experiment(name=exp_name)

with mlflow.start_run(experiment_id=exp_id, run_name="Kmeans - K=2"):
    modelo_clusters = KMeans(n_clusters=2)
    trained_model = modelo_clusters.fit(features)
    cluster_labels = trained_model.labels_
    score = silhouette_score(features, cluster_labels)
    #save parameter
    mlflow.log_param('value_of_k', 2)
    #save metric
    mlflow.log_metric('silhoutte_score', score)
    #save model
    input_example = np.array([features[0]])  # Example input
    mlflow.sklearn.log_model(trained_model, "Clustering_Model", input_example=input_example)
    #end current run
    mlflow.end_run()

MlflowException: Experiment 'Clustering-Ejemplos' already exists.

In [14]:
with mlflow.start_run(experiment_id=exp_id, run_name="Kmeans - K=3"):
    modelo_clusters = KMeans(n_clusters=3)
    trained_model = modelo_clusters.fit(features)
    cluster_labels = trained_model.labels_
    score = silhouette_score(features, cluster_labels)
    score_2 = davies_bouldin_score(features, cluster_labels)
    #save parameter
    mlflow.log_param('value_of_k', 3)
    #save metric
    mlflow.log_metric('silhoutte_score', score)
    mlflow.log_metric('davies_bouldin_score', score_2)
    #save model
    input_example = np.array([features[0]])  # Example input
    mlflow.sklearn.log_model(trained_model, "Clustering_Model", input_example=input_example)
    #end current run
    mlflow.end_run()



Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

In [15]:
with mlflow.start_run(experiment_id=exp_id, run_name="DBSCAN"):
    modelo_clusters = DBSCAN(eps=0.5, min_samples=5)
    trained_model = modelo_clusters.fit(features)
    cluster_labels = trained_model.labels_
    score=silhouette_score(features, cluster_labels)
    #save parameter
    mlflow.log_param('min_samples', 5)
    mlflow.log_param('eps', 0.5)
    #save metric
    mlflow.log_metric('silhoutte_score', score)
    #save model
    input_example = np.array([features[0]])  # Example input
    mlflow.sklearn.log_model(trained_model, "Clustering_Model", input_example=input_example)
    #end current run
    mlflow.end_run()



Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

  "inputs": [
    [
      5.1,
      3.5,
      1.4,
      0.2
    ]
  ]
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: Model does not have the "python_function" flavor


Este comando es el que permite visualizar la interfaz de mlflow en un navegador web. Se debe considerar ejecutarlo en la misma ubicación que contiene al directorio `mlruns`.

In [None]:
!mlflow ui