## Entrenamiento con split temporal para CAC_source_30

In [15]:
#  Inicialización
import os
import sys
import pandas as pd

# Añadir src al path para importar los scripts
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..', 'src')))

#  Imports del script
from train import (
    load_and_prepare_data,
    build_preprocessor,
    train_models,
    train_stacking_model,
    save_models
)

#  Cargar y preparar datos
print("Cargando y preparando datos para CAC_source_30...")
X_train, y_train, X_val_cac, y_val_cac, _, _ = load_and_prepare_data(
    path="../data/processed/final_dataset.csv",
    target="CAC_source_30",
    date_column="first_session"
)

#  Preprocesamiento automático
preprocessor = build_preprocessor(X_train)

Cargando y preparando datos para CAC_source_30...


In [5]:
#  Entrenar modelos individuales
print("Entrenando modelos base y avanzados...")
modelos = train_models(X_train, y_train, preprocessor)

Entrenando modelos base y avanzados...
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012558 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2377
[LightGBM] [Info] Number of data points in the train set: 23174, number of used features: 26
[LightGBM] [Info] Start training from score 6296.056221


In [6]:
#   Modelo ensamblado (stacking)
print("Entrenando modelo ensamblado (stacking)...")
stacked_model = train_stacking_model(X_train, y_train, preprocessor, modelos)
modelos["stacking"] = stacked_model

Entrenando modelo ensamblado (stacking)...
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003069 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2377
[LightGBM] [Info] Number of data points in the train set: 23174, number of used features: 26
[LightGBM] [Info] Start training from score 6296.056221
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003649 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2358
[LightGBM] [Info] Number of data points in the train set: 18539, number of used features: 26
[LightGBM] [Info] Start training from score 6282.456550




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002682 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2353
[LightGBM] [Info] Number of data points in the train set: 18539, number of used features: 26
[LightGBM] [Info] Start training from score 6315.656794




[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008954 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2362
[LightGBM] [Info] Number of data points in the train set: 18539, number of used features: 26
[LightGBM] [Info] Start training from score 6292.274192




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002486 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2359
[LightGBM] [Info] Number of data points in the train set: 18539, number of used features: 26
[LightGBM] [Info] Start training from score 6298.552624




[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009313 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2361
[LightGBM] [Info] Number of data points in the train set: 18540, number of used features: 26
[LightGBM] [Info] Start training from score 6291.341199


  return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T


In [7]:
#  Guardar modelos
print("Guardando modelos en carpeta /models...")
save_models(modelos, target_name="CAC_source_30", save_path="../models/")

print(" Entrenamiento completado. Test set disponible para evaluación.")

Guardando modelos en carpeta /models...
Modelos guardados exitosamente en ../models/
 Entrenamiento completado. Test set disponible para evaluación.


## Entrenamiento con split temporal para LTV_180

In [16]:
#  Cargar y preparar datos
print("Cargando y preparando datos para LTV_180...")
X_train, y_train, X_val_ltv, y_val_ltv, _, _ = load_and_prepare_data(
    path="../data/processed/final_dataset.csv",
    target="LTV_180",
    date_column="first_session"
)
#  Preprocesamiento
preprocessor = build_preprocessor(X_train)

Cargando y preparando datos para LTV_180...


In [9]:
# Entrenar modelos base y avanzados
print("Entrenando modelos...")
modelos = train_models(X_train, y_train, preprocessor)

Entrenando modelos...
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011495 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2299
[LightGBM] [Info] Number of data points in the train set: 22294, number of used features: 26
[LightGBM] [Info] Start training from score 7.413806


In [10]:
# Ensamblador
print("Entrenando modelo stacking...")
stacked_model = train_stacking_model(X_train, y_train, preprocessor, modelos)
modelos["stacking"] = stacked_model

Entrenando modelo stacking...
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002829 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2299
[LightGBM] [Info] Number of data points in the train set: 22294, number of used features: 26
[LightGBM] [Info] Start training from score 7.413806
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003054 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2280
[LightGBM] [Info] Number of data points in the train set: 17835, number of used features: 26
[LightGBM] [Info] Start training from score 7.579161




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002250 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2276
[LightGBM] [Info] Number of data points in the train set: 17835, number of used features: 26
[LightGBM] [Info] Start training from score 6.906667




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001937 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2285
[LightGBM] [Info] Number of data points in the train set: 17835, number of used features: 26
[LightGBM] [Info] Start training from score 7.715320




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002672 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2281
[LightGBM] [Info] Number of data points in the train set: 17835, number of used features: 26
[LightGBM] [Info] Start training from score 7.115687




[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011271 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2283
[LightGBM] [Info] Number of data points in the train set: 17836, number of used features: 26
[LightGBM] [Info] Start training from score 7.752178


  return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T


In [11]:
#  Guardar
print("Guardando modelos...")
save_models(modelos, target_name="LTV_180", save_path="../models/")

print(" Entrenamiento y test set listo para evaluación.")

Guardando modelos...
Modelos guardados exitosamente en ../models/
 Entrenamiento y test set listo para evaluación.
