# Experimentos con ModelTrainers
Este cuaderno usa las clases `ModelTrainer` existentes en `src/models` junto con la configuracion de `params.yaml` para evaluar los modelos soportados.


## 1. Configuracion del entorno
Se a?ade la carpeta `src` al `PYTHONPATH` para importar modulos del proyecto.


In [None]:
import os
import sys
from pathlib import Path

for candidate in [Path.cwd(), *Path.cwd().parents]:
    if (candidate / 'src').exists():
        project_root = candidate
        if str(project_root) not in sys.path:
            sys.path.append(str(project_root))
        break
else:
    raise FileNotFoundError('No se encontro la carpeta src en la jerarquia.')

os.chdir(project_root)
print(f'Usando project_root: {project_root}')


## 2. Cargar configuracion y utilidades
Se reutilizan `params.yaml`, `DataLoader` y `FeatureEngineer` tal como lo hace `src/main.py`.


In [None]:
import json
import yaml
import pandas as pd

from src.data.data_loader import DataLoader
from src.data.feature_engineer import FeatureEngineer
from src.models.xgboost_model.model_trainer import ModelTrainer as XGBTrainer
from src.models.random_forest_model.model_trainer import ModelTrainer as RFTrainer
from src.models.linear_regression_model.model_trainer import ModelTrainer as LRTrainer

params_path = project_root / 'params.yaml'
with params_path.open('r') as f:
    cfg = yaml.safe_load(f)
print(f'Configuracion cargada desde: {params_path}')
cfg


## 3. Ejecutar data loader y feature engineering
Reproduce los pasos iniciales de la pipeline: carga datos limpios y genera conjuntos de entrenamiento/prueba usando `FeatureEngineer`.


In [None]:
raw_path = project_root / cfg['data']['raw_path']
processed_path = project_root / cfg['data']['processed_path']

loader = DataLoader(str(raw_path), str(processed_path))
processed_df = loader.run()  # devuelve el dataframe procesado

features = cfg['features']['selected']
target = cfg['features']['target']
feature_engineer = FeatureEngineer(features, target)
X_train, X_test, y_train, y_test = feature_engineer.run(processed_df)
X_train.shape, X_test.shape


## 4. Funciones auxiliares
Utilidades para registrar m?tricas de cada modelo.


In [None]:
def summarize_metrics(label, metrics):
    display(pd.DataFrame([metrics], index=[label]).round(4))

def run_experiment(trainer_cls, params, label):
    trainer = trainer_cls(model_params=params)
    metrics = trainer.run(X_train, X_test, y_train, y_test)
    summarize_metrics(label, metrics)
    cv_scores = trainer.cross_validate(X_train, y_train)
    return metrics, cv_scores


## 5. Experimentos con los modelos definidos en params.yaml
Se ejecuta cada `ModelTrainer` con los hiperpar?metros especificados.


In [None]:
results = {}
cv_details = {}

# Linear Regression
lr_params = cfg['train']['linear_regression']
results['linear_regression'], cv_details['linear_regression'] = run_experiment(
    LRTrainer, lr_params, 'LinearRegression'
)

# Random Forest
rf_params = cfg['train']['random_forest']
results['random_forest'], cv_details['random_forest'] = run_experiment(
    RFTrainer, rf_params, 'RandomForestRegressor'
)

# XGBoost
xgb_params = cfg['train']['xgboost']
results['xgboost'], cv_details['xgboost'] = run_experiment(
    XGBTrainer, xgb_params, 'XGBRegressor'
)


## 6. Comparativa de resultados
Se consolidan las m?tricas de entrenamiento y prueba para todos los modelos.


In [None]:
results_df = pd.DataFrame(results).T
results_df.round(4)


## 7. Resultados de Cross-Validation
Se guarda un resumen de los scores de CV para consulta posterior.


In [None]:
cv_json = {name: scores.tolist() if hasattr(scores, 'tolist') else list(scores)
           for name, scores in cv_details.items()}
print(json.dumps(cv_json, indent=2))
