
# 🚀 **MLflow**

Este bloque añade **MLflow Tracking** al flujo de trabajo para que todos los **parámetros**, **métricas**, **artefactos** (gráficas) y el **modelo** queden registrados.

> **Tracking URI servidor:** `http://186.121.46.71:5000/`  
> **Qué vas a registrar:** hiperparámetros del modelo, métricas (Exactitud, F1 macro y ponderado), matriz de confusión, importancia de variables y el propio modelo (con *signature*).

---

## ✅ Requisitos previos (ejecuta fuera o en una celda separada)
1. Tener un entorno (venv) activo con estas librerías:
   ```bash
   pip install -U mlflow scikit-learn pandas matplotlib joblib
   ```
2. Servidor MLflow levantado (ejemplo en Windows):
   ```bash
   mlflow ui 
     --host 0.0.0.0 
     --port 5000     
   ```


In [44]:

# ============================================================================
# 1) CONFIGURACIÓN: Imports y MLflow
# ============================================================================

import mlflow, mlflow.sklearn
from mlflow.models.signature import infer_signature
from mlflow.tracking import MlflowClient

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# --- Configura Servidor MLflow ---
MLFLOW_URI = "http://186.121.46.71:5000/"
EXPERIMENTO = "Clasificacion_Precios"

mlflow.set_tracking_uri(MLFLOW_URI)
mlflow.set_experiment(EXPERIMENTO)

print("Tracking URI  ->", mlflow.get_tracking_uri())
print("Experimento   ->", mlflow.get_experiment_by_name(EXPERIMENTO))


2025/10/07 14:47:02 INFO mlflow.tracking.fluent: Experiment with name 'Clasificacion_Precios' does not exist. Creating a new experiment.


Tracking URI  -> http://186.121.46.71:5000/
Experimento   -> <Experiment: artifact_location='mlflow-artifacts:/238271231404982644', creation_time=1759866422259, experiment_id='238271231404982644', last_update_time=1759866422259, lifecycle_stage='active', name='Clasificacion_Precios', tags={}>


In [45]:

# ============================================================================
# 2) CARGA DE DATOS
#    - Busca 'dataset.csv' en ./notebooks o junto al notebook
# ============================================================================

RUTA_PROYECTO = Path.cwd()

posibles = [
    RUTA_PROYECTO / "notebooks" / "dataset.csv",
    RUTA_PROYECTO / "dataset.csv",
    Path("dataset.csv"),
]

archivo = next((p for p in posibles if p.exists()), None)
if archivo is None:
    raise FileNotFoundError(
        "No se encontró dataset.csv. Colócalo en ./notebooks o junto al notebook, "
        "o modifica esta celda con la ruta absoluta."
    )

df = pd.read_csv(archivo, sep="|")
print("Archivo cargado desde:", archivo)
print("Shape:", df.shape)
display(df.head(3))


Archivo cargado desde: D:\Publico\Documentos\2024-2025\GIT\Taller1\Mlopstrabajofinal\notebooks\dataset.csv
Shape: (99608, 26)


Unnamed: 0.1,Unnamed: 0,bathrooms,bedrooms,price,square_feet,latitude,longitude,time,has_photo_bin,has_fee_bin,...,cat_housing/rent/other,cat_housing/rent/short_term,pets_Cats,"pets_Cats,Dogs","pets_Cats,Dogs,None",pets_Dogs,pets_No,pri_Monthly,pri_Monthly|Weekly,pri_Weekly
0,0,1.0,1.0,2195.0,542.0,33.852,-118.3759,1577360000.0,1.0,0.0,...,0,0,1,0,0,0,0,1,0,0
1,1,1.5,3.0,1250.0,1500.0,37.0867,-76.4941,1577360000.0,1.0,0.0,...,0,0,0,1,0,0,0,1,0,0
2,2,2.0,3.0,1395.0,1650.0,35.823,-78.6438,1577360000.0,1.0,0.0,...,0,0,0,0,0,0,1,1,0,0


In [46]:

# ============================================================================
# 3) PREPROCESAMIENTO
#    - Crea etiqueta de 3 clases (precio: bajo/medio/alto) usando terciles
#    - Quita columnas no predictoras (price, Unnamed: 0)
# ============================================================================

assert "price" in df.columns, "Falta la columna 'price' en el dataset."

q1, q2 = df["price"].quantile(0.33), df["price"].quantile(0.66)
def price_to_class(p):
    if p < q1:  return 0  # bajo
    if p <= q2: return 1  # medio
    return 2              # alto

y = df["price"].apply(price_to_class)

drop_cols = [c for c in ["price", "Unnamed: 0"] if c in df.columns]
X = df.drop(columns=drop_cols)

print("X shape:", X.shape, " y shape:", y.shape)
print("Cortes de terciles -> q1:", float(q1), " q2:", float(q2))


X shape: (99608, 24)  y shape: (99608,)
Cortes de terciles -> q1: 1126.0  q2: 1599.0


In [47]:

# ============================================================================
# 4) SPLIT ENTRENAMIENTO/PRUEBA (estratificado)
# ============================================================================
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=0, stratify=y
)

print("Train:", X_train.shape, " Test:", X_test.shape)


Train: (79686, 24)  Test: (19922, 24)


In [48]:

# ============================================================================
# 5) ENTRENAMIENTO Y MÉTRICAS
# ============================================================================
clf = RandomForestClassifier(
    n_estimators=120,
    random_state=0,
    n_jobs=-1,
    class_weight="balanced_subsample"
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

metricas = {
    "exactitud": float(accuracy_score(y_test, y_pred)),
    "f1_macro": float(f1_score(y_test, y_pred, average="macro")),
    "f1_ponderado": float(f1_score(y_test, y_pred, average="weighted")),
}
metricas


{'exactitud': 0.8280293143258709,
 'f1_macro': 0.8272123211136865,
 'f1_ponderado': 0.8275870789193016}

In [49]:

# ============================================================================
# 6) GRÁFICAS (guardadas como artefactos)
#    - Matriz de confusión
#    - Importancia de variables (Top‑10)
# ============================================================================

from pathlib import Path
plots_dir = Path("mlflow_plots"); plots_dir.mkdir(exist_ok=True)

# Matriz de confusión
cm = confusion_matrix(y_test, y_pred)
plt.figure()
plt.imshow(cm, interpolation="nearest")
plt.title("Matriz de confusión (RandomForest - rango de precio)")
plt.xlabel("Clase predicha"); plt.ylabel("Clase real")
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, cm[i, j], ha="center", va="center")
plt.tight_layout()
cm_path = plots_dir / "matriz_confusion.png"
plt.savefig(cm_path); plt.close()

# Importancia de variables
importancias = pd.Series(clf.feature_importances_, index=X.columns)                .sort_values(ascending=False).head(10)
plt.figure()
importancias.plot(kind="bar")
plt.title("Top‑10 importancia de variables (RandomForest)")
plt.ylabel("Importancia")
plt.tight_layout()
fi_path = plots_dir / "importancia_variables.png"
plt.savefig(fi_path); plt.close()

print("Artefactos guardados:", cm_path, "y", fi_path)


Artefactos guardados: mlflow_plots\matriz_confusion.png y mlflow_plots\importancia_variables.png


In [50]:

# ============================================================================
# 7) TRACKING EN MLFLOW
#    - Parámetros, métricas, artefactos y modelo con firma
# ============================================================================

firma = infer_signature(X_test, y_pred)

with mlflow.start_run(run_name="rf_precio_3clases"):
    # Parámetros
    mlflow.log_params({
        "modelo": "RandomForestClassifier",
        "n_estimators": 120,
        "class_weight": "balanced_subsample",
        "random_state": 0,
        "num_features": X.shape[1],
        "precio_q1": float(q1),
        "precio_q2": float(q2),
    })
    # Métricas
    for k, v in metricas.items():
        mlflow.log_metric(k, v)

    # Artefactos
    mlflow.log_artifact(str(cm_path), artifact_path="plots")
    mlflow.log_artifact(str(fi_path), artifact_path="plots")

    # Modelo
    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        signature=firma,
        input_example=X_test.head(3)
    )

print("✅ Run registrado en MLflow. Revisa el UI en", MLFLOW_URI)




🏃 View run rf_precio_3clases at: http://186.121.46.71:5000/#/experiments/238271231404982644/runs/a1271e6d9f924eb08c83837330a64c96
🧪 View experiment at: http://186.121.46.71:5000/#/experiments/238271231404982644
✅ Run registrado en MLflow. Revisa el UI en http://186.121.46.71:5000/
