# Gestión de Modelos con MLflow y PyFunc
Este notebook demuestra cómo crear un modelo personalizado con lógica de preprocesamiento usando MLflow `pyfunc`, y cómo guardarlo, cargarlo y reutilizarlo.

## 1. Importar librerías necesarias

In [1]:
import pandas as pd
import json
import os
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from mlflow.models.signature import infer_signature
import mlflow
import mlflow.pyfunc
from sys import version_info
import sklearn

## 2. Cargar los datos
Cargamos los datos de calidad del vino (vino blanco y tinto).

[Descargar](https://archive.ics.uci.edu/dataset/186/wine+quality)

In [2]:
vino_blanco = pd.read_csv("winequality-white.csv", sep=";")
vino_tinto = pd.read_csv("winequality-red.csv", sep=";")

In [3]:
vino_tinto['es_tinto'] = 1
vino_blanco['es_tinto'] = 0

In [4]:
vino_blanco.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality', 'es_tinto'],
      dtype='object')

In [5]:
datos = pd.concat([vino_tinto, vino_blanco], axis=0)
datos.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)
datos['quality'] = (datos.quality >= 7).astype(int)

In [6]:
datos.reset_index(drop=True, inplace=True)
datos.dropna(inplace=True)

## 3. Dividir en entrenamiento y prueba

In [7]:
train, test = train_test_split(datos, random_state=123)
X_train = train.drop(["quality"], axis=1)
X_test = test.drop(["quality"], axis=1)
y_train = train.quality
y_test = test.quality

## 4. Crear la clase personalizada con preprocesamiento
Esta clase define un modelo que incluye lógica de preprocesamiento antes de entrenar o predecir.

In [9]:
class RFWithPreprocess(mlflow.pyfunc.PythonModel):

    def __init__(self, params):
        """
        Initialize with just the model hyperparameters
        """
        self.params = params
        self.rf_model = None
              
    def preprocess_input(self, model_input):
        """
        Return pre-processed model_input
        """
        processed_input = model_input.copy()
        """
            Here all the preprocessing
        """
        print("Data Preprocesed")
        return processed_input
  
    def fit(self, X_train, y_train):
        """
        Uses the same preprocessing logic to fit the model
        """
        from sklearn.ensemble import RandomForestClassifier

        processed_model_input = self.preprocess_input(X_train)
        
        rf_model = RandomForestClassifier(**self.params)
        rf_model.fit(processed_model_input, y_train)

        self.rf_model = rf_model
    
    def predict(self, context, model_input):
        """
        This is the main entrance to the model in deployment systems
        """
        processed_model_input = self.preprocess_input(model_input.copy())
        return self.rf_model.predict(processed_model_input)




## 5. Guardar configuración del modelo como archivo JSON

In [10]:
import json 
import os

params = {
    "n_estimators": 15, 
    "max_depth": 5
}




## 6. Instanciar, cargar configuración y entrenar el modelo

In [11]:
model = RFWithPreprocess(params)

model.fit(X_train, y_train)

Data Preprocesed



## 7. Realizar predicciones

In [12]:
predictions = model.predict(context=None, model_input=X_test)
predictions

Data Preprocesed


array([0, 0, 0, ..., 0, 0, 0], shape=(1625,))

## 8. Inferir firma del modelo
Esto ayuda a MLflow a entender la estructura de entrada y salida del modelo.

In [13]:
firma = infer_signature(X_test, predictions)



## 10. Guardar el modelo en MLflow

In [14]:
with mlflow.start_run() as run:
    mlflow.pyfunc.log_model(
        name="modelo_rf_con_preprocesamiento",
        python_model=model,
        signature=firma,
        input_example=X_test[:3]
    )
    run_id = run.info.run_id
    print(f"SUCCESS! Model logged with run_id: {run_id}")

2025/08/14 09:34:46 INFO mlflow.pyfunc: Validating input example against model signature


Data Preprocesed
SUCCESS! Model logged with run_id: bd32bc4d414d402a89397225ab0be4b8
🏃 View run rebellious-yak-694 at: http://localhost:5000/#/experiments/112621643149097054/runs/bd32bc4d414d402a89397225ab0be4b8
🧪 View experiment at: http://localhost:5000/#/experiments/112621643149097054


## 11. Cargar el modelo desde MLflow y hacer predicciones

In [15]:
ruta_modelo = f"runs:/{run_id}/modelo_rf_con_preprocesamiento"
modelo_cargado = mlflow.pyfunc.load_model(ruta_modelo)

In [16]:
resultado = modelo_cargado.predict(X_test)
resultado[:10]

Data Preprocesed


array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])