En este notebook vamos a armar y entrenar los primeros modelos en base al set creado en EDA.ipynb

In [1]:
import pandas as pd
import numpy as np

TRAIN_PATH = "data/processed/monaco_2025_colapinto_train.csv"
TEST_PATH  = "data/processed/monaco_2025_colapinto_test.csv"

train_df = pd.read_csv(TRAIN_PATH)
test_df  = pd.read_csv(TEST_PATH)

print(train_df.shape, test_df.shape)
train_df.head()


(74, 33) (35, 33)


Unnamed: 0,Time,Driver,DriverNumber,LapTime,LapNumber,Stint,PitOutTime,PitInTime,Sector1Time,Sector2Time,...,LapStartTime,LapStartDate,TrackStatus,Position,Deleted,DeletedReason,FastF1Generated,IsAccurate,LapTime_s,Session
0,0 days 00:20:13.553000,COL,43,0 days 00:01:21.554000,2.0,1.0,,,0 days 00:00:21.906000,0 days 00:00:38.005000,...,0 days 00:18:51.999000,2025-05-23 11:33:52.006,1,,False,,False,True,81.554,FP1
1,0 days 00:21:31.817000,COL,43,0 days 00:01:18.264000,3.0,1.0,,,0 days 00:00:20.706000,0 days 00:00:36.862000,...,0 days 00:20:13.553000,2025-05-23 11:35:13.560,1,,False,,False,True,78.264,FP1
2,0 days 00:32:59.095000,COL,43,0 days 00:01:17.385000,7.0,2.0,,,0 days 00:00:20.513000,0 days 00:00:36.473000,...,0 days 00:31:41.710000,2025-05-23 11:46:41.717,1,,False,,False,True,77.385,FP1
3,0 days 00:35:56.644000,COL,43,0 days 00:01:16.777000,9.0,2.0,,,0 days 00:00:20.280000,0 days 00:00:35.999000,...,0 days 00:34:39.867000,2025-05-23 11:49:39.874,1,,False,,False,True,76.777,FP1
4,0 days 00:38:50.930000,COL,43,0 days 00:01:15.875000,11.0,2.0,,,0 days 00:00:20.053000,0 days 00:00:35.723000,...,0 days 00:37:35.055000,2025-05-23 11:52:35.062,1,,False,,False,True,75.875,FP1


Asegurarnos de tener LapTime_s como target numérico

In [2]:
def ensure_laptime_seconds(df):
    df = df.copy()
    if "LapTime_s" in df.columns:
        # Ya está listo
        return df
    
    if "LapTime" not in df.columns:
        raise ValueError("No encuentro LapTime ni LapTime_s en el dataframe.")
    
    # Convertir LapTime a timedelta si hiciera falta
    if not np.issubdtype(df["LapTime"].dtype, np.number):
        df["LapTime"] = pd.to_timedelta(df["LapTime"])
    
    df["LapTime_s"] = df["LapTime"].dt.total_seconds()
    return df

train_df = ensure_laptime_seconds(train_df)
test_df  = ensure_laptime_seconds(test_df)

train_df[["LapTime", "LapTime_s"]].head()


Unnamed: 0,LapTime,LapTime_s
0,0 days 00:01:21.554000,81.554
1,0 days 00:01:18.264000,78.264
2,0 days 00:01:17.385000,77.385
3,0 days 00:01:16.777000,76.777
4,0 days 00:01:15.875000,75.875


Definir las columnas “legales” para X

In [None]:
LEGAL_FEATURES = [
    "LapNumber",
    "Stint",
    "TyreLife",
    "Position",
    "Session",
    "Compound",
    "FreshTyre",
]

LEGAL_FEATURES = [c for c in LEGAL_FEATURES if c in train_df.columns]

print("Features legales que voy a usar:")
print(LEGAL_FEATURES)


Features legales que voy a usar:
['LapNumber', 'Stint', 'TyreLife', 'Position', 'Session', 'Compound', 'FreshTyre', 'TrackStatus']


Construyo X e Y para train y test

In [4]:
y_train = train_df["LapTime_s"].to_numpy()
y_test  = test_df["LapTime_s"].to_numpy()

X_train_raw = train_df[LEGAL_FEATURES].copy()
X_test_raw  = test_df[LEGAL_FEATURES].copy()

X_train_raw.head()


Unnamed: 0,LapNumber,Stint,TyreLife,Position,Session,Compound,FreshTyre,TrackStatus
0,2.0,1.0,2.0,,FP1,HARD,True,1
1,3.0,1.0,3.0,,FP1,HARD,True,1
2,7.0,2.0,7.0,,FP1,HARD,False,1
3,9.0,2.0,9.0,,FP1,HARD,False,1
4,11.0,2.0,11.0,,FP1,HARD,False,1
