# üìäNOTEBOOK MASTER DE DESARROLLO DE MODELO - COORDINADOR PRINCIPAL
## Monitorizaci√≥n y Prevenci√≥n Multimodal de Alzheimer - Fase 4
 
**Objetivo**: Orquestar el desarrollo completo de modelos para predicci√≥n de riesgo de Alzheimer
 
### üéØ Objetivos de la Fase 4:
 - **Regresi√≥n**: Predicci√≥n del `composite_risk_score` (continuo 0-1)
 - **Clasificaci√≥n**: Predicci√≥n de `risk_category` (Low/Moderate/High)
 - **An√°lisis Temporal**: Modelos para evoluci√≥n longitudinal
 - **Estratificaci√≥n**: Segmentaci√≥n de pacientes por riesgo
 
### üìã Pipeline de Desarrollo:
1. **Configuraci√≥n del entorno ML** (MLflow, validaci√≥n cruzada)
2. **Preparaci√≥n de datos** (splits, validaci√≥n)
3. **Modelos de Regresi√≥n** ‚Üí `09b_regression_models.ipynb`
4. **Modelos de Clasificaci√≥n** ‚Üí `09c_classification_models.ipynb`
5. **An√°lisis Temporal** ‚Üí `09d_temporal_analysis.ipynb`
6. **Estratificaci√≥n de Riesgo** ‚Üí `09e_risk_stratification.ipynb`
7. **Evaluaci√≥n Final y Selecci√≥n**

## Importar librer√≠as

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# ML Libraries
from sklearn.model_selection import train_test_split, StratifiedKFold, TimeSeriesSplit
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.metrics import mean_squared_error, r2_score, classification_report
from sklearn.pipeline import Pipeline

# MLflow para tracking
import mlflow
import mlflow.sklearn
import mlflow.xgboost
from mlflow.tracking import MlflowClient

# Utilities
import json
import os
from datetime import datetime
import sys
sys.path.append('../scripts/modeling')

In [2]:
# Configuraci√≥n de estilo
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Iniciando Fase 4: Desarrollo de Modelos")
print("=" * 60)

Iniciando Fase 4: Desarrollo de Modelos


## üìÅ CONFIGURACI√ìN DE RUTAS Y CARGA DE DATOS

In [3]:
print("üìÅ Configurando rutas y cargando datos...")

# Rutas de archivos
DATA_PATH = '../data/processed/features/'
METADATA_PATH = '../data/processed/features/'
MODELS_PATH = '../models/'
RESULTS_PATH = '../reports/modeling/'

# Crear directorios si no existen
os.makedirs(MODELS_PATH, exist_ok=True)
os.makedirs(RESULTS_PATH, exist_ok=True)

# Cargar dataset final procesado
print("üìä Cargando dataset final...")
df_final = pd.read_csv(f'{DATA_PATH}alzheimer_features_selected_20250605.csv')

# Cargar metadatos de feature engineering
print("üìã Cargando metadatos de feature engineering...")
with open('../data/processed/features/feature_engineering_metadata_20250605.json', 'r') as f:
    fe_metadata = json.load(f)

print(f"‚úÖ Dataset cargado: {df_final.shape}")
print(f"‚úÖ Features seleccionadas: {len(fe_metadata['selected_features'])}")
print(f"‚úÖ Registros v√°lidos: {df_final['composite_risk_score'].notna().sum()}")


üìÅ Configurando rutas y cargando datos...
üìä Cargando dataset final...
üìã Cargando metadatos de feature engineering...
‚úÖ Dataset cargado: (48466, 189)
‚úÖ Features seleccionadas: 192
‚úÖ Registros v√°lidos: 48466


## üîß CONFIGURACI√ìN DE MLFLOW

In [5]:
print("\nüîß Configurando MLflow para tracking de experimentos...")

# Configurar MLflow
mlflow.set_tracking_uri("file:../mlruns")
experiment_name = "Alzheimer_Multimodal_Monitoring_Phase4"

try:
    experiment_id = mlflow.create_experiment(experiment_name)
    print(f"‚úÖ Experimento creado: {experiment_name}")
except:
    experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
    print(f"‚úÖ Experimento existente: {experiment_name}")

mlflow.set_experiment(experiment_name)

# Cliente MLflow para gesti√≥n avanzada
client = MlflowClient()


üîß Configurando MLflow para tracking de experimentos...
‚úÖ Experimento creado: Alzheimer_Multimodal_Monitoring_Phase4


## üìä AN√ÅLISIS PRELIMINAR DEL DATASET

In [6]:
print("\nüìä Realizando an√°lisis preliminar del dataset...")

# Informaci√≥n b√°sica
print("üîç INFORMACI√ìN B√ÅSICA DEL DATASET:")
print(f"  ‚Ä¢ Forma del dataset: {df_final.shape}")
print(f"  ‚Ä¢ Registros √∫nicos: {df_final.drop_duplicates().shape[0]}")
print(f"  ‚Ä¢ Memoria utilizada: {df_final.memory_usage(deep=True).sum() / 1024**2:.1f} MB")

# Variables objetivo
target_continuous = 'composite_risk_score'
target_categorical = 'risk_category'

print(f"\nüéØ VARIABLES OBJETIVO:")
print(f"  ‚Ä¢ Continua: {target_continuous}")
print(f"  ‚Ä¢ Categ√≥rica: {target_categorical}")

# Distribuci√≥n de la variable objetivo continua
print(f"\nüìà DISTRIBUCI√ìN - {target_continuous.upper()}:")
target_stats = df_final[target_continuous].describe()
for stat, value in target_stats.items():
    print(f"  ‚Ä¢ {stat}: {value:.4f}")

# Distribuci√≥n de la variable objetivo categ√≥rica
print(f"\nüìä DISTRIBUCI√ìN - {target_categorical.upper()}:")
risk_dist = df_final[target_categorical].value_counts()
risk_pct = df_final[target_categorical].value_counts(normalize=True) * 100
for category in risk_dist.index:
    print(f"  ‚Ä¢ {category}: {risk_dist[category]:,} ({risk_pct[category]:.1f}%)")



üìä Realizando an√°lisis preliminar del dataset...
üîç INFORMACI√ìN B√ÅSICA DEL DATASET:
  ‚Ä¢ Forma del dataset: (48466, 189)
  ‚Ä¢ Registros √∫nicos: 48198
  ‚Ä¢ Memoria utilizada: 92.6 MB

üéØ VARIABLES OBJETIVO:
  ‚Ä¢ Continua: composite_risk_score
  ‚Ä¢ Categ√≥rica: risk_category

üìà DISTRIBUCI√ìN - COMPOSITE_RISK_SCORE:
  ‚Ä¢ count: 48466.0000
  ‚Ä¢ mean: 0.3671
  ‚Ä¢ std: 0.2128
  ‚Ä¢ min: 0.0000
  ‚Ä¢ 25%: 0.1489
  ‚Ä¢ 50%: 0.3631
  ‚Ä¢ 75%: 0.5714
  ‚Ä¢ max: 0.9286

üìä DISTRIBUCI√ìN - RISK_CATEGORY:
  ‚Ä¢ Low: 22,501 (46.4%)
  ‚Ä¢ Moderate: 22,345 (46.1%)
  ‚Ä¢ High: 3,620 (7.5%)


## üßπ PREPARACI√ìN DE DATOS PARA MODELADO

In [8]:
print("\nüßπ Preparando datos para modelado...")

# Identificar features num√©ricas (excluyendo targets)
features_to_exclude = [target_continuous, target_categorical, 'subject_id', 'ID', 'RID', 
                      'mapped_rid', 'subject_id_clinical', 'subject_id_genetics', 
                      'subject_id_activity', 'PTID_apoe', 'RID_genetics']

# Features para modeling
feature_cols = [col for col in df_final.columns if col not in features_to_exclude]
print(f"‚úÖ Features para modelado: {len(feature_cols)}")

# Preparar matrices de features y targets
X = df_final[feature_cols].copy()
y_continuous = df_final[target_continuous].copy()
y_categorical = df_final[target_categorical].copy()

# Identificar y manejar valores faltantes
missing_info = X.isnull().sum()
missing_cols = missing_info[missing_info > 0].sort_values(ascending=False)

if len(missing_cols) > 0:
    print(f"\n‚ö†Ô∏è  VALORES FALTANTES DETECTADOS:")
    for col, missing_count in missing_cols.head(10).items():
        missing_pct = (missing_count / len(X)) * 100
        print(f"  ‚Ä¢ {col}: {missing_count:,} ({missing_pct:.1f}%)")
    
    # Estrategia de imputaci√≥n simple para el an√°lisis inicial
    imputer = SimpleImputer(strategy='median')
    X_imputed = pd.DataFrame(imputer.fit_transform(X), columns=X.columns, index=X.index)
    X = X_imputed
    print("‚úÖ Valores faltantes imputados con mediana")

print(f"\n‚úÖ Dataset preparado: {X.shape}")
print(f"‚úÖ Targets v√°lidos: {len(y_continuous.dropna())}")


üßπ Preparando datos para modelado...
‚úÖ Features para modelado: 178

‚ö†Ô∏è  VALORES FALTANTES DETECTADOS:
  ‚Ä¢ DXAD: 36,053 (74.4%)
  ‚Ä¢ DXNORM: 36,053 (74.4%)
  ‚Ä¢ CDRSB_CHANGE_ANNUAL_normalized: 28,462 (58.7%)
  ‚Ä¢ CDRSB_CHANGE_normalized: 28,447 (58.7%)
  ‚Ä¢ APTESTDT_year: 27,450 (56.6%)
  ‚Ä¢ USERDATE_day: 27,450 (56.6%)
  ‚Ä¢ APTESTDT_years_ago: 27,450 (56.6%)
  ‚Ä¢ USERDATE_years_ago: 27,450 (56.6%)
  ‚Ä¢ USERDATE_year: 27,450 (56.6%)
  ‚Ä¢ SITEID: 27,450 (56.6%)


ValueError: Cannot use median strategy with non-numeric data:
could not convert string to float: 'F'

## üîÑ CONFIGURACI√ìN DE VALIDACI√ìN CRUZADA

In [None]:
print("\nüîÑ Configurando estrategias de validaci√≥n...")

# Par√°metros de validaci√≥n
RANDOM_STATE = 42
TEST_SIZE = 0.2
N_SPLITS = 5

# Split estratificado basado en risk_category
print("üìä Realizando split estratificado...")
X_train, X_test, y_train_cont, y_test_cont, y_train_cat, y_test_cat = train_test_split(
    X, y_continuous, y_categorical,
    test_size=TEST_SIZE,
    stratify=y_categorical,
    random_state=RANDOM_STATE
)

print(f"‚úÖ Training set: {X_train.shape}")
print(f"‚úÖ Test set: {X_test.shape}")

# Validaci√≥n cruzada estratificada
skf = StratifiedKFold(n_splits=N_SPLITS, shuffle=True, random_state=RANDOM_STATE)

print(f"‚úÖ Validaci√≥n cruzada: {N_SPLITS} folds estratificados")

# Verificar distribuci√≥n en splits
print(f"\nüìä DISTRIBUCI√ìN EN SPLITS:")
print("Training set:")
train_dist = y_train_cat.value_counts(normalize=True) * 100
for category, pct in train_dist.items():
    print(f"  ‚Ä¢ {category}: {pct:.1f}%")

print("Test set:")
test_dist = y_test_cat.value_counts(normalize=True) * 100
for category, pct in test_dist.items():
    print(f"  ‚Ä¢ {category}: {pct:.1f}%")

## üéØ CONFIGURACI√ìN DE ALGORITMOS Y PIPELINES

In [None]:
print("\nüéØ Configurando algoritmos para desarrollo...")

# Algoritmos para regresi√≥n (composite_risk_score)
REGRESSION_ALGORITHMS = {
    'RandomForest': {
        'estimator': 'RandomForestRegressor',
        'params': {
            'n_estimators': [100, 200, 300],
            'max_depth': [10, 15, 20, None],
            'min_samples_split': [5, 10],
            'min_samples_leaf': [2, 4],
            'max_features': ['sqrt', 'log2']
        },
        'priority': 'high'
    },
    'XGBoost': {
        'estimator': 'XGBRegressor',
        'params': {
            'n_estimators': [100, 200, 300],
            'max_depth': [6, 8, 10],
            'learning_rate': [0.01, 0.1, 0.2],
            'subsample': [0.8, 0.9],
            'colsample_bytree': [0.8, 0.9]
        },
        'priority': 'high'
    },
    'LightGBM': {
        'estimator': 'LGBMRegressor',
        'params': {
            'n_estimators': [100, 200],
            'max_depth': [6, 8],
            'learning_rate': [0.01, 0.1],
            'num_leaves': [31, 63]
        },
        'priority': 'medium'
    },
    'ElasticNet': {
        'estimator': 'ElasticNet',
        'params': {
            'alpha': [0.001, 0.01, 0.1, 1.0],
            'l1_ratio': [0.1, 0.5, 0.7, 0.9]
        },
        'priority': 'baseline'
    }
}

# Algoritmos para clasificaci√≥n (risk_category)
CLASSIFICATION_ALGORITHMS = {
    'RandomForest': {
        'estimator': 'RandomForestClassifier',
        'params': {
            'n_estimators': [100, 200, 300],
            'max_depth': [10, 15, 20],
            'min_samples_split': [5, 10],
            'class_weight': ['balanced', None]
        },
        'priority': 'high'
    },
    'XGBoost': {
        'estimator': 'XGBClassifier',
        'params': {
            'n_estimators': [100, 200],
            'max_depth': [6, 8],
            'learning_rate': [0.01, 0.1],
            'scale_pos_weight': [1, 2, 3]
        },
        'priority': 'high'
    },
    'LogisticRegression': {
        'estimator': 'LogisticRegression',
        'params': {
            'C': [0.001, 0.01, 0.1, 1.0, 10.0],
            'penalty': ['l1', 'l2'],
            'solver': ['liblinear', 'saga'],
            'class_weight': ['balanced', None]
        },
        'priority': 'baseline'
    }
}

print(f"‚úÖ Algoritmos de regresi√≥n configurados: {len(REGRESSION_ALGORITHMS)}")
print(f"‚úÖ Algoritmos de clasificaci√≥n configurados: {len(CLASSIFICATION_ALGORITHMS)}")


## üìä CONFIGURACI√ìN DE M√âTRICAS DE EVALUACI√ìN

In [None]:
print("\nüìä Configurando m√©tricas de evaluaci√≥n...")

# M√©tricas para regresi√≥n
REGRESSION_METRICS = {
    'primary': ['r2', 'neg_mean_squared_error', 'neg_mean_absolute_error'],
    'secondary': ['neg_root_mean_squared_error'],
    'clinical': ['mean_absolute_percentage_error']
}

# M√©tricas para clasificaci√≥n
CLASSIFICATION_METRICS = {
    'primary': ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro'],
    'secondary': ['roc_auc_ovr', 'precision_weighted', 'recall_weighted'],
    'clinical': ['balanced_accuracy']
}

print("‚úÖ M√©tricas de regresi√≥n configuradas:")
for metric_type, metrics in REGRESSION_METRICS.items():
    print(f"  ‚Ä¢ {metric_type}: {', '.join(metrics)}")

print("‚úÖ M√©tricas de clasificaci√≥n configuradas:")
for metric_type, metrics in CLASSIFICATION_METRICS.items():
    print(f"  ‚Ä¢ {metric_type}: {', '.join(metrics)}")

## üíæ GUARDADO DE CONFIGURACI√ìN Y METADATOS

In [None]:
print("\nüíæ Guardando configuraci√≥n de modelado...")

# Crear metadatos de la fase de modelado
modeling_metadata = {
    'timestamp': datetime.now().strftime('%Y%m%d_%H%M%S'),
    'phase': 'model_development',
    'dataset_info': {
        'shape': df_final.shape,
        'features_count': len(feature_cols),
        'training_samples': len(X_train),
        'test_samples': len(X_test)
    },
    'targets': {
        'continuous': target_continuous,
        'categorical': target_categorical
    },
    'validation_config': {
        'test_size': TEST_SIZE,
        'cv_folds': N_SPLITS,
        'random_state': RANDOM_STATE,
        'stratification': 'risk_category'
    },
    'algorithms': {
        'regression': list(REGRESSION_ALGORITHMS.keys()),
        'classification': list(CLASSIFICATION_ALGORITHMS.keys())
    },
    'feature_engineering_source': fe_metadata['timestamp']
}

# Guardar metadatos
metadata_file = f'{METADATA_PATH}modeling_metadata_{modeling_metadata["timestamp"]}.json'
with open(metadata_file, 'w') as f:
    json.dump(modeling_metadata, f, indent=2)

# Guardar splits de datos para consistencia
print("üíæ Guardando splits de datos...")
train_indices = pd.DataFrame({'index': X_train.index})
test_indices = pd.DataFrame({'index': X_test.index})

train_indices.to_csv(f'{DATA_PATH}train_indices.csv', index=False)
test_indices.to_csv(f'{DATA_PATH}test_indices.csv', index=False)

print(f"‚úÖ Metadatos guardados: {metadata_file}")
print(f"‚úÖ Splits guardados en {DATA_PATH}")

## üéØ VISUALIZACI√ìN DE DISTRIBUCIONES DE TARGET

In [None]:
print("\nüéØ Creando visualizaciones preliminares...")

fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('üìä An√°lisis de Variables Objetivo - Alzheimer Risk Prediction', 
             fontsize=16, fontweight='bold')

# 1. Distribuci√≥n del score continuo
axes[0, 0].hist(y_continuous.dropna(), bins=50, alpha=0.7, color='skyblue', edgecolor='black')
axes[0, 0].axvline(y_continuous.mean(), color='red', linestyle='--', 
                   label=f'Media: {y_continuous.mean():.3f}')
axes[0, 0].set_title('Distribuci√≥n del Composite Risk Score')
axes[0, 0].set_xlabel('Risk Score')
axes[0, 0].set_ylabel('Frecuencia')
axes[0, 0].legend()

# 2. Distribuci√≥n categ√≥rica
risk_counts = y_categorical.value_counts()
colors = ['green', 'orange', 'red']
axes[0, 1].pie(risk_counts.values, labels=risk_counts.index, autopct='%1.1f%%',
               colors=colors, startangle=90)
axes[0, 1].set_title('Distribuci√≥n de Categor√≠as de Riesgo')

# 3. Boxplot por categor√≠a
df_plot = pd.DataFrame({
    'risk_score': y_continuous,
    'risk_category': y_categorical
}).dropna()

sns.boxplot(data=df_plot, x='risk_category', y='risk_score', ax=axes[1, 0])
axes[1, 0].set_title('Risk Score por Categor√≠a')
axes[1, 0].set_xlabel('Categor√≠a de Riesgo')
axes[1, 0].set_ylabel('Composite Risk Score')

# 4. Distribuci√≥n en train/test
split_data = pd.DataFrame({
    'split': ['Train'] * len(y_train_cat) + ['Test'] * len(y_test_cat),
    'category': list(y_train_cat) + list(y_test_cat)
})

split_counts = split_data.groupby(['split', 'category']).size().unstack(fill_value=0)
split_counts.plot(kind='bar', ax=axes[1, 1], color=colors)
axes[1, 1].set_title('Distribuci√≥n Train/Test por Categor√≠a')
axes[1, 1].set_xlabel('Split')
axes[1, 1].set_ylabel('N√∫mero de Muestras')
axes[1, 1].legend(title='Risk Category')
axes[1, 1].tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.savefig(f'{RESULTS_PATH}target_distributions_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

## üöÄ PLAN DE EJECUCI√ìN DE NOTEBOOKS

In [None]:
print("\nüöÄ PLAN DE EJECUCI√ìN - FASE 4: DESARROLLO DE MODELOS")
print("=" * 65)

execution_plan = {
    '04b_regression_models.ipynb': {
        'objetivo': 'Predicci√≥n del composite_risk_score (regresi√≥n)',
        'algoritmos': list(REGRESSION_ALGORITHMS.keys()),
        'prioridad': 'Alta',
        'tiempo_estimado': '2-3 horas',
        'dependencias': ['Datos preparados', 'MLflow configurado']
    },
    '04c_classification_models.ipynb': {
        'objetivo': 'Predicci√≥n de risk_category (clasificaci√≥n)',
        'algoritmos': list(CLASSIFICATION_ALGORITHMS.keys()),
        'prioridad': 'Alta',
        'tiempo_estimado': '2-3 horas',
        'dependencias': ['Regresi√≥n completada']
    },
    '04d_temporal_analysis.ipynb': {
        'objetivo': 'An√°lisis de series temporales y evoluci√≥n',
        'algoritmos': ['LSTM', 'ARIMA', 'Time Series RF'],
        'prioridad': 'Media',
        'tiempo_estimado': '3-4 horas',
        'dependencias': ['Modelos base completados']
    },
    '04e_risk_stratification.ipynb': {
        'objetivo': 'Estratificaci√≥n y segmentaci√≥n de pacientes',
        'algoritmos': ['Clustering', 'Survival Analysis'],
        'prioridad': 'Media',
        'tiempo_estimado': '2 horas',
        'dependencias': ['An√°lisis temporal completado']
    }
}

for notebook, info in execution_plan.items():
    print(f"\nüìì {notebook}")
    print(f"   üéØ Objetivo: {info['objetivo']}")
    print(f"   ü§ñ Algoritmos: {', '.join(info['algoritmos'])}")
    print(f"   ‚≠ê Prioridad: {info['prioridad']}")
    print(f"   ‚è±Ô∏è  Tiempo estimado: {info['tiempo_estimado']}")
    print(f"   üìã Dependencias: {', '.join(info['dependencias'])}")


## ‚úÖ VERIFICACI√ìN FINAL Y SIGUIENTE PASO

In [None]:
print("\n" + "="*65)
print("‚úÖ CONFIGURACI√ìN COMPLETADA - FASE 4: DESARROLLO DE MODELOS")
print("="*65)

# Verificaciones finales
checks = {
    'Datos cargados correctamente': df_final.shape[0] > 0,
    'Features seleccionadas': len(feature_cols) > 0,
    'Splits realizados': len(X_train) > 0 and len(X_test) > 0,
    'MLflow configurado': mlflow.get_experiment_by_name(experiment_name) is not None,
    'Targets v√°lidos': y_continuous.notna().sum() > 0,
    'Metadatos guardados': os.path.exists(metadata_file)
}

print("\nüîç VERIFICACIONES FINALES:")
for check, status in checks.items():
    status_symbol = "‚úÖ" if status else "‚ùå"
    print(f"   {status_symbol} {check}")

# Resumen de preparaci√≥n
print(f"\nüìä RESUMEN DE PREPARACI√ìN:")
print(f"   ‚Ä¢ Dataset: {df_final.shape[0]:,} registros, {len(feature_cols)} features")
print(f"   ‚Ä¢ Training: {len(X_train):,} muestras")
print(f"   ‚Ä¢ Testing: {len(X_test):,} muestras")
print(f"   ‚Ä¢ Target continuo: {target_continuous}")
print(f"   ‚Ä¢ Target categ√≥rico: {target_categorical}")
print(f"   ‚Ä¢ MLflow experiment: {experiment_name}")

print(f"\nüéØ SIGUIENTE PASO:")
print(f"   ‚û°Ô∏è  Ejecutar: 04b_regression_models.ipynb")
print(f"   üéØ Objetivo: Desarrollar modelos de regresi√≥n para composite_risk_score")
print(f"   üìä Algoritmos prioritarios: Random Forest, XGBoost")

print("\nüöÄ ¬°Fase 4 lista para comenzar el desarrollo de modelos!")


## üìù LOG DE CONFIGURACI√ìN PARA MLFLOW

In [None]:
print("üìù Registrando configuraci√≥n inicial en MLflow...")

with mlflow.start_run(run_name="Phase4_Setup_Configuration"):
    # Log de par√°metros de configuraci√≥n
    mlflow.log_param("phase", "model_development_setup")
    mlflow.log_param("dataset_shape", f"{df_final.shape[0]}x{df_final.shape[1]}")
    mlflow.log_param("features_count", len(feature_cols))
    mlflow.log_param("train_samples", len(X_train))
    mlflow.log_param("test_samples", len(X_test))
    mlflow.log_param("cv_folds", N_SPLITS)
    mlflow.log_param("random_state", RANDOM_STATE)
    
    # Log de distribuci√≥n de targets
    for category, count in y_categorical.value_counts().items():
        mlflow.log_metric(f"target_distribution_{category}", count)
    
    mlflow.log_metric("target_mean", y_continuous.mean())
    mlflow.log_metric("target_std", y_continuous.std())
    
    # Guardar artefactos
    mlflow.log_artifact(metadata_file, "metadata")
    mlflow.log_artifact(f'{RESULTS_PATH}target_distributions_analysis.png', "visualizations")
    
    # Tags
    mlflow.set_tag("project", "Alzheimer_Risk_Prediction")
    mlflow.set_tag("phase", "4_model_development")
    mlflow.set_tag("setup_status", "completed")

print("‚úÖ Configuraci√≥n registrada en MLflow")
print("\nüéâ ¬°Master Coordinator completado exitosamente!")

---

__Abraham Tartalos__