# Ventilator Pressure Prediction - Baseline Model

This notebook implements a baseline LightGBM model for predicting ventilator pressure.

## Approach
- Use LightGBM with time series features
- Basic feature engineering: lags, rolling statistics
- 5-fold time series cross-validation
- Predict pressure for each time step

In [1]:
import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
SEED = 42
np.random.seed(SEED)

## Load Data

In [2]:
# Load training and test data
train_df = pd.read_csv('/home/data/train.csv')
test_df = pd.read_csv('/home/data/test.csv')

print(f"Training data shape: {train_df.shape}")
print(f"Test data shape: {test_df.shape}")
print(f"Training columns: {train_df.columns.tolist()}")
print(f"Test columns: {test_df.columns.tolist()}")

# Display basic info about the data
print("\nTraining data info:")
print(train_df.head())
print(f"\nUnique breaths in train: {train_df['breath_id'].nunique()}")
print(f"Unique breaths in test: {test_df['breath_id'].nunique()}")

Training data shape: (5432400, 8)
Test data shape: (603600, 7)
Training columns: ['id', 'breath_id', 'R', 'C', 'time_step', 'u_in', 'u_out', 'pressure']
Test columns: ['id', 'breath_id', 'R', 'C', 'time_step', 'u_in', 'u_out']

Training data info:
   id  breath_id  R   C  time_step      u_in  u_out  pressure
0   1      85053  5  10   0.000000  4.174419      0  6.118700
1   2      85053  5  10   0.033812  7.050149      0  5.907794
2   3      85053  5  10   0.067497  7.564931      0  7.313837
3   4      85053  5  10   0.101394  8.103306      0  8.227765
4   5      85053  5  10   0.135344  8.502619      0  9.422901

Unique breaths in train: 67905
Unique breaths in test: 7545


## Feature Engineering

Create basic time series features:
- Lag features (previous pressure values)
- Rolling statistics
- Interaction features

In [3]:
def create_features(df, is_train=True):
    """Create features for the model"""
    
    # Sort by breath_id and time_step to ensure proper ordering
    df = df.sort_values(['breath_id', 'time_step']).reset_index(drop=True)
    
    # Basic features
    features = ['R', 'C', 'time_step', 'u_in', 'u_out']
    
    # Create lag features (previous values within the same breath)
    for lag in [1, 2, 3]:
        df[f'u_in_lag_{lag}'] = df.groupby('breath_id')['u_in'].shift(lag)
        df[f'u_out_lag_{lag}'] = df.groupby('breath_id')['u_out'].shift(lag)
        
        # Fill NaN values with 0 for lag features
        df[f'u_in_lag_{lag}'] = df[f'u_in_lag_{lag}'].fillna(0)
        df[f'u_out_lag_{lag}'] = df[f'u_out_lag_{lag}'].fillna(0)
    
    # Rolling statistics for u_in
    for window in [5, 10]:
        df[f'u_in_rolling_mean_{window}'] = df.groupby('breath_id')['u_in'].rolling(window, min_periods=1).mean().reset_index(0, drop=True)
        df[f'u_in_rolling_std_{window}'] = df.groupby('breath_id')['u_in'].rolling(window, min_periods=1).std().reset_index(0, drop=True)
    
    # Rate of change of u_in
    df['u_in_diff'] = df.groupby('breath_id')['u_in'].diff().fillna(0)
    
    # Interaction features
    df['R_C_interaction'] = df['R'] * df['C']
    df['u_in_R_interaction'] = df['u_in'] * df['R']
    df['u_in_C_interaction'] = df['u_in'] * df['C']
    
    # Time since start of breath
    df['time_since_start'] = df.groupby('breath_id')['time_step'].transform(lambda x: x - x.min())
    
    # Cumulative sum of u_in within breath
    df['u_in_cumsum'] = df.groupby('breath_id')['u_in'].cumsum()
    
    # Add all created features to the feature list
    feature_cols = [col for col in df.columns if col not in ['id', 'breath_id', 'pressure']]
    
    return df, feature_cols

# Create features for training data
print("Creating features for training data...")
train_df, feature_cols = create_features(train_df, is_train=True)

# Create features for test data
print("Creating features for test data...")
test_df, _ = create_features(test_df, is_train=False)

print(f"Number of features: {len(feature_cols)}")
print(f"Feature columns: {feature_cols[:10]}...")  # Show first 10 features

Creating features for training data...


Creating features for test data...


Number of features: 21
Feature columns: ['R', 'C', 'time_step', 'u_in', 'u_out', 'u_in_lag_1', 'u_out_lag_1', 'u_in_lag_2', 'u_out_lag_2', 'u_in_lag_3']...


## Prepare Data for Training

In [4]:
# Prepare training data
X = train_df[feature_cols]
y = train_df['pressure']

print(f"Training features shape: {X.shape}")
print(f"Training target shape: {y.shape}")

# Check for any missing values
print(f"\nMissing values in features: {X.isnull().sum().sum()}")
print(f"Missing values in target: {y.isnull().sum()}")

# Fill any remaining NaN values with 0
X = X.fillna(0)
test_df[feature_cols] = test_df[feature_cols].fillna(0)

Training features shape: (5432400, 21)
Training target shape: (5432400,)

Missing values in features: 135810
Missing values in target: 0


## Cross-Validation Setup

Use KFold cross-validation since we're dealing with time series data within each breath, but breaths are independent.

In [5]:
# Create breath-level splits for cross-validation
# Each breath is independent, so we can use KFold on breath_ids
breath_ids = train_df['breath_id'].unique()
n_splits = 5

kf = KFold(n_splits=n_splits, shuffle=True, random_state=SEED)

folds = []
for fold, (train_idx, val_idx) in enumerate(kf.split(breath_ids)):
    train_breaths = breath_ids[train_idx]
    val_breaths = breath_ids[val_idx]
    
    # Get indices for these breaths
    train_indices = train_df[train_df['breath_id'].isin(train_breaths)].index
    val_indices = train_df[train_df['breath_id'].isin(val_breaths)].index
    
    folds.append((train_indices, val_indices))
    
print(f"Created {len(folds)} folds")
print(f"Total breaths: {len(breath_ids)}")
for i, (train_idx, val_idx) in enumerate(folds):
    train_breath_count = len(train_df.loc[train_idx, 'breath_id'].unique())
    val_breath_count = len(train_df.loc[val_idx, 'breath_id'].unique())
    print(f"Fold {i+1}: {train_breath_count} train breaths, {val_breath_count} val breaths")

Created 5 folds
Total breaths: 67905


Fold 1: 54324 train breaths, 13581 val breaths


Fold 2: 54324 train breaths, 13581 val breaths


Fold 3: 54324 train breaths, 13581 val breaths


Fold 4: 54324 train breaths, 13581 val breaths


Fold 5: 54324 train breaths, 13581 val breaths


## Train Model with Cross-Validation

In [6]:
# LightGBM parameters
lgb_params = {
    'objective': 'regression',
    'metric': 'mae',
    'boosting_type': 'gbdt',
    'num_leaves': 100,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': -1,
    'random_state': SEED,
    'n_jobs': -1
}

# Store predictions and scores
oof_predictions = np.zeros(len(train_df))
test_predictions = np.zeros(len(test_df))
fold_scores = []

print("Training LightGBM model with cross-validation...")

for fold, (train_idx, val_idx) in enumerate(folds):
    print(f"\nFold {fold + 1}/{n_splits}")
    
    # Split data
    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
    
    # Create LightGBM datasets
    train_set = lgb.Dataset(X_train, label=y_train)
    val_set = lgb.Dataset(X_val, label=y_val)
    
    # Train model
    model = lgb.train(
        lgb_params,
        train_set,
        num_boost_round=10000,
        valid_sets=[val_set],
        callbacks=[
            lgb.early_stopping(100),
            lgb.log_evaluation(100)
        ]
    )
    
    # Predict on validation set
    val_pred = model.predict(X_val, num_iteration=model.best_iteration)
    oof_predictions[val_idx] = val_pred
    
    # Calculate validation score
    val_score = mean_absolute_error(y_val, val_pred)
    fold_scores.append(val_score)
    print(f"Fold {fold + 1} MAE: {val_score:.4f}")
    
    # Predict on test set
    test_pred = model.predict(test_df[feature_cols], num_iteration=model.best_iteration)
    test_predictions += test_pred / n_splits

# Calculate overall CV score
cv_score = mean_absolute_error(y, oof_predictions)
print(f"\nOverall CV MAE: {cv_score:.4f}")
print(f"Fold scores: {fold_scores}")
print(f"Mean ± Std: {np.mean(fold_scores):.4f} ± {np.std(fold_scores):.4f}")

Training LightGBM model with cross-validation...

Fold 1/5


Training until validation scores don't improve for 100 rounds


[100]	valid_0's l1: 0.788154


[200]	valid_0's l1: 0.710863


[300]	valid_0's l1: 0.677747


[400]	valid_0's l1: 0.653922


[500]	valid_0's l1: 0.634436


[600]	valid_0's l1: 0.621765


[700]	valid_0's l1: 0.610515


[800]	valid_0's l1: 0.599745


[900]	valid_0's l1: 0.591362


[1000]	valid_0's l1: 0.584154


[1100]	valid_0's l1: 0.577419


[1200]	valid_0's l1: 0.571325


[1300]	valid_0's l1: 0.566018


[1400]	valid_0's l1: 0.560813


[1500]	valid_0's l1: 0.556858


[1600]	valid_0's l1: 0.5525


[1700]	valid_0's l1: 0.548992


[1800]	valid_0's l1: 0.545408


[1900]	valid_0's l1: 0.542395


[2000]	valid_0's l1: 0.53925


[2100]	valid_0's l1: 0.535841


[2200]	valid_0's l1: 0.532875


[2300]	valid_0's l1: 0.530504


[2400]	valid_0's l1: 0.528012


[2500]	valid_0's l1: 0.525729


[2600]	valid_0's l1: 0.523177


[2700]	valid_0's l1: 0.521173


[2800]	valid_0's l1: 0.518999


[2900]	valid_0's l1: 0.517105


[3000]	valid_0's l1: 0.515367


[3100]	valid_0's l1: 0.513604


[3200]	valid_0's l1: 0.512232


[3300]	valid_0's l1: 0.510375


[3400]	valid_0's l1: 0.508791


[3500]	valid_0's l1: 0.50725


[3600]	valid_0's l1: 0.505661


[3700]	valid_0's l1: 0.504015


[3800]	valid_0's l1: 0.502839


[3900]	valid_0's l1: 0.5013


[4000]	valid_0's l1: 0.500029


[4100]	valid_0's l1: 0.498859


[4200]	valid_0's l1: 0.497553


[4300]	valid_0's l1: 0.496369


[4400]	valid_0's l1: 0.49526


[4500]	valid_0's l1: 0.494129


[4600]	valid_0's l1: 0.492972


[4700]	valid_0's l1: 0.491963


[4800]	valid_0's l1: 0.491032


[4900]	valid_0's l1: 0.490064


[5000]	valid_0's l1: 0.489084


[5100]	valid_0's l1: 0.488083


[5200]	valid_0's l1: 0.487165


[5300]	valid_0's l1: 0.48639


[5400]	valid_0's l1: 0.485447


[5500]	valid_0's l1: 0.484624


[5600]	valid_0's l1: 0.483769


[5700]	valid_0's l1: 0.482922


[5800]	valid_0's l1: 0.481961


[5900]	valid_0's l1: 0.480909


[6000]	valid_0's l1: 0.480074


[6100]	valid_0's l1: 0.479269


[6200]	valid_0's l1: 0.478424


[6300]	valid_0's l1: 0.477761


[6400]	valid_0's l1: 0.477098


[6500]	valid_0's l1: 0.476476


[6600]	valid_0's l1: 0.475796


[6700]	valid_0's l1: 0.475109


[6800]	valid_0's l1: 0.474413


[6900]	valid_0's l1: 0.473883


[7000]	valid_0's l1: 0.473165


[7100]	valid_0's l1: 0.472467


[7200]	valid_0's l1: 0.471997


[7300]	valid_0's l1: 0.47149


[7400]	valid_0's l1: 0.470862


[7500]	valid_0's l1: 0.470392


[7600]	valid_0's l1: 0.469928


[7700]	valid_0's l1: 0.469348


[7800]	valid_0's l1: 0.468772


[7900]	valid_0's l1: 0.468239


[8000]	valid_0's l1: 0.467813


[8100]	valid_0's l1: 0.46733


[8200]	valid_0's l1: 0.46678


[8300]	valid_0's l1: 0.466174


[8400]	valid_0's l1: 0.465602


[8500]	valid_0's l1: 0.465158


[8600]	valid_0's l1: 0.46476


[8700]	valid_0's l1: 0.464324


[8800]	valid_0's l1: 0.463974


[8900]	valid_0's l1: 0.463561


[9000]	valid_0's l1: 0.463138


[9100]	valid_0's l1: 0.462693


[9200]	valid_0's l1: 0.462323


[9300]	valid_0's l1: 0.461881


[9400]	valid_0's l1: 0.461414


[9500]	valid_0's l1: 0.461001


[9600]	valid_0's l1: 0.460565


[9700]	valid_0's l1: 0.460145


[9800]	valid_0's l1: 0.45981


[9900]	valid_0's l1: 0.459479


[10000]	valid_0's l1: 0.459097
Did not meet early stopping. Best iteration is:
[10000]	valid_0's l1: 0.459097


Fold 1 MAE: 0.4591



Fold 2/5


Training until validation scores don't improve for 100 rounds


[100]	valid_0's l1: 0.796596


[200]	valid_0's l1: 0.714961


[300]	valid_0's l1: 0.679905


[400]	valid_0's l1: 0.657115


[500]	valid_0's l1: 0.638669


[600]	valid_0's l1: 0.62314


[700]	valid_0's l1: 0.612064


[800]	valid_0's l1: 0.60126


[900]	valid_0's l1: 0.592633


[1000]	valid_0's l1: 0.585603


[1100]	valid_0's l1: 0.579024


[1200]	valid_0's l1: 0.57358


[1300]	valid_0's l1: 0.568064


[1400]	valid_0's l1: 0.563142


[1500]	valid_0's l1: 0.558597


[1600]	valid_0's l1: 0.554279


[1700]	valid_0's l1: 0.550354


[1800]	valid_0's l1: 0.547076


[1900]	valid_0's l1: 0.54343


[2000]	valid_0's l1: 0.540196


[2100]	valid_0's l1: 0.53719


[2200]	valid_0's l1: 0.534379


[2300]	valid_0's l1: 0.53175


[2400]	valid_0's l1: 0.528829


[2500]	valid_0's l1: 0.526176


[2600]	valid_0's l1: 0.523622


[2700]	valid_0's l1: 0.521508


[2800]	valid_0's l1: 0.519313


[2900]	valid_0's l1: 0.517418


[3000]	valid_0's l1: 0.515386


[3100]	valid_0's l1: 0.513633


[3200]	valid_0's l1: 0.511973


[3300]	valid_0's l1: 0.510075


[3400]	valid_0's l1: 0.508639


[3500]	valid_0's l1: 0.507235


[3600]	valid_0's l1: 0.505617


[3700]	valid_0's l1: 0.504224


[3800]	valid_0's l1: 0.502827


[3900]	valid_0's l1: 0.501613


[4000]	valid_0's l1: 0.500178


[4100]	valid_0's l1: 0.498916


[4200]	valid_0's l1: 0.497669


[4300]	valid_0's l1: 0.496606


[4400]	valid_0's l1: 0.495423


[4500]	valid_0's l1: 0.494419


[4600]	valid_0's l1: 0.493425


[4700]	valid_0's l1: 0.492369


[4800]	valid_0's l1: 0.491366


[4900]	valid_0's l1: 0.490344


[5000]	valid_0's l1: 0.48928


[5100]	valid_0's l1: 0.488397


[5200]	valid_0's l1: 0.487552


[5300]	valid_0's l1: 0.48661


[5400]	valid_0's l1: 0.485765


[5500]	valid_0's l1: 0.484888


[5600]	valid_0's l1: 0.484104


[5700]	valid_0's l1: 0.483414


[5800]	valid_0's l1: 0.482736


[5900]	valid_0's l1: 0.481982


[6000]	valid_0's l1: 0.481323


[6100]	valid_0's l1: 0.480567


[6200]	valid_0's l1: 0.479849


[6300]	valid_0's l1: 0.479293


[6400]	valid_0's l1: 0.478534


[6500]	valid_0's l1: 0.477878


[6600]	valid_0's l1: 0.477131


[6700]	valid_0's l1: 0.476482


[6800]	valid_0's l1: 0.475837


[6900]	valid_0's l1: 0.475289


[7000]	valid_0's l1: 0.474618


[7100]	valid_0's l1: 0.474031


[7200]	valid_0's l1: 0.473483


[7300]	valid_0's l1: 0.472817


[7400]	valid_0's l1: 0.472298


[7500]	valid_0's l1: 0.471837


[7600]	valid_0's l1: 0.471427


[7700]	valid_0's l1: 0.470861


[7800]	valid_0's l1: 0.470226


[7900]	valid_0's l1: 0.469736


[8000]	valid_0's l1: 0.469127


[8100]	valid_0's l1: 0.468675


[8200]	valid_0's l1: 0.468094


[8300]	valid_0's l1: 0.467615


[8400]	valid_0's l1: 0.467191


[8500]	valid_0's l1: 0.466782


[8600]	valid_0's l1: 0.466263


[8700]	valid_0's l1: 0.465818


[8800]	valid_0's l1: 0.465325


[8900]	valid_0's l1: 0.464731


[9000]	valid_0's l1: 0.464312


[9100]	valid_0's l1: 0.463902


[9200]	valid_0's l1: 0.463536


[9300]	valid_0's l1: 0.463063


[9400]	valid_0's l1: 0.462654


[9500]	valid_0's l1: 0.462258


[9600]	valid_0's l1: 0.46188


[9700]	valid_0's l1: 0.461396


[9800]	valid_0's l1: 0.460971


[9900]	valid_0's l1: 0.460556


[10000]	valid_0's l1: 0.460182
Did not meet early stopping. Best iteration is:
[10000]	valid_0's l1: 0.460182


Fold 2 MAE: 0.4602



Fold 3/5


Training until validation scores don't improve for 100 rounds


[100]	valid_0's l1: 0.790702


[200]	valid_0's l1: 0.709937


[300]	valid_0's l1: 0.675622


[400]	valid_0's l1: 0.652201


[500]	valid_0's l1: 0.632862


[600]	valid_0's l1: 0.618703


[700]	valid_0's l1: 0.606907


[800]	valid_0's l1: 0.597224


[900]	valid_0's l1: 0.589687


[1000]	valid_0's l1: 0.58289


[1100]	valid_0's l1: 0.576641


[1200]	valid_0's l1: 0.570078


[1300]	valid_0's l1: 0.56421


[1400]	valid_0's l1: 0.559419


[1500]	valid_0's l1: 0.55502


[1600]	valid_0's l1: 0.55032


[1700]	valid_0's l1: 0.546661


[1800]	valid_0's l1: 0.542934


[1900]	valid_0's l1: 0.53969


[2000]	valid_0's l1: 0.536348


[2100]	valid_0's l1: 0.533464


[2200]	valid_0's l1: 0.530918


[2300]	valid_0's l1: 0.528208


[2400]	valid_0's l1: 0.525963


[2500]	valid_0's l1: 0.523668


[2600]	valid_0's l1: 0.521395


[2700]	valid_0's l1: 0.519096


[2800]	valid_0's l1: 0.517134


[2900]	valid_0's l1: 0.515112


[3000]	valid_0's l1: 0.512999


[3100]	valid_0's l1: 0.511234


[3200]	valid_0's l1: 0.509518


[3300]	valid_0's l1: 0.507514


[3400]	valid_0's l1: 0.505781


[3500]	valid_0's l1: 0.504337


[3600]	valid_0's l1: 0.502827


[3700]	valid_0's l1: 0.501386


[3800]	valid_0's l1: 0.499927


[3900]	valid_0's l1: 0.498467


[4000]	valid_0's l1: 0.49721


[4100]	valid_0's l1: 0.495686


[4200]	valid_0's l1: 0.494676


[4300]	valid_0's l1: 0.493798


[4400]	valid_0's l1: 0.492644


[4500]	valid_0's l1: 0.491571


[4600]	valid_0's l1: 0.490482


[4700]	valid_0's l1: 0.489465


[4800]	valid_0's l1: 0.488458


[4900]	valid_0's l1: 0.48736


[5000]	valid_0's l1: 0.486413


[5100]	valid_0's l1: 0.48547


[5200]	valid_0's l1: 0.484541


[5300]	valid_0's l1: 0.48352


[5400]	valid_0's l1: 0.482596


[5500]	valid_0's l1: 0.48187


[5600]	valid_0's l1: 0.480891


[5700]	valid_0's l1: 0.480037


[5800]	valid_0's l1: 0.479431


[5900]	valid_0's l1: 0.478555


[6000]	valid_0's l1: 0.477762


[6100]	valid_0's l1: 0.476993


[6200]	valid_0's l1: 0.476188


[6300]	valid_0's l1: 0.475578


[6400]	valid_0's l1: 0.475039


[6500]	valid_0's l1: 0.47416


[6600]	valid_0's l1: 0.473447


[6700]	valid_0's l1: 0.472783


[6800]	valid_0's l1: 0.472137


[6900]	valid_0's l1: 0.471532


[7000]	valid_0's l1: 0.470758


[7100]	valid_0's l1: 0.470124


[7200]	valid_0's l1: 0.469485


[7300]	valid_0's l1: 0.468955


[7400]	valid_0's l1: 0.468346


[7500]	valid_0's l1: 0.467759


[7600]	valid_0's l1: 0.467057


[7700]	valid_0's l1: 0.466416


[7800]	valid_0's l1: 0.465918


[7900]	valid_0's l1: 0.465299


[8000]	valid_0's l1: 0.464787


[8100]	valid_0's l1: 0.464354


[8200]	valid_0's l1: 0.463919


[8300]	valid_0's l1: 0.463388


[8400]	valid_0's l1: 0.462897


[8500]	valid_0's l1: 0.462366


[8600]	valid_0's l1: 0.461906


[8700]	valid_0's l1: 0.461405


[8800]	valid_0's l1: 0.460975


[8900]	valid_0's l1: 0.46044


[9000]	valid_0's l1: 0.459941


[9100]	valid_0's l1: 0.4595


[9200]	valid_0's l1: 0.459026


[9300]	valid_0's l1: 0.458682


[9400]	valid_0's l1: 0.458227


[9500]	valid_0's l1: 0.457738


[9600]	valid_0's l1: 0.457252


[9700]	valid_0's l1: 0.456857


[9800]	valid_0's l1: 0.456475


[9900]	valid_0's l1: 0.456163


[10000]	valid_0's l1: 0.455745
Did not meet early stopping. Best iteration is:
[10000]	valid_0's l1: 0.455745


Fold 3 MAE: 0.4557



Fold 4/5


Training until validation scores don't improve for 100 rounds


[100]	valid_0's l1: 0.797474


[200]	valid_0's l1: 0.719423


[300]	valid_0's l1: 0.684803


[400]	valid_0's l1: 0.660914


[500]	valid_0's l1: 0.643058


[600]	valid_0's l1: 0.62641


[700]	valid_0's l1: 0.613724


[800]	valid_0's l1: 0.603915


[900]	valid_0's l1: 0.594513


[1000]	valid_0's l1: 0.586673


[1100]	valid_0's l1: 0.580288


[1200]	valid_0's l1: 0.5739


[1300]	valid_0's l1: 0.56854


[1400]	valid_0's l1: 0.563902


[1500]	valid_0's l1: 0.558801


[1600]	valid_0's l1: 0.554388


[1700]	valid_0's l1: 0.550416


[1800]	valid_0's l1: 0.547258


[1900]	valid_0's l1: 0.543364


[2000]	valid_0's l1: 0.54039


[2100]	valid_0's l1: 0.537668


[2200]	valid_0's l1: 0.535117


[2300]	valid_0's l1: 0.532674


[2400]	valid_0's l1: 0.530322


[2500]	valid_0's l1: 0.527955


[2600]	valid_0's l1: 0.525547


[2700]	valid_0's l1: 0.523395


[2800]	valid_0's l1: 0.520594


[2900]	valid_0's l1: 0.518616


[3000]	valid_0's l1: 0.516704


[3100]	valid_0's l1: 0.514867


[3200]	valid_0's l1: 0.513119


[3300]	valid_0's l1: 0.511265


[3400]	valid_0's l1: 0.509465


[3500]	valid_0's l1: 0.507805


[3600]	valid_0's l1: 0.505942


[3700]	valid_0's l1: 0.504513


[3800]	valid_0's l1: 0.503014


[3900]	valid_0's l1: 0.501544


[4000]	valid_0's l1: 0.500337


[4100]	valid_0's l1: 0.499001


[4200]	valid_0's l1: 0.497543


[4300]	valid_0's l1: 0.496225


[4400]	valid_0's l1: 0.495159


[4500]	valid_0's l1: 0.494024


[4600]	valid_0's l1: 0.492857


[4700]	valid_0's l1: 0.491878


[4800]	valid_0's l1: 0.490752


[4900]	valid_0's l1: 0.489836


[5000]	valid_0's l1: 0.488965


[5100]	valid_0's l1: 0.487971


[5200]	valid_0's l1: 0.487062


[5300]	valid_0's l1: 0.486343


[5400]	valid_0's l1: 0.485496


[5500]	valid_0's l1: 0.48461


[5600]	valid_0's l1: 0.483806


[5700]	valid_0's l1: 0.482999


[5800]	valid_0's l1: 0.482128


[5900]	valid_0's l1: 0.481326


[6000]	valid_0's l1: 0.480648


[6100]	valid_0's l1: 0.479886


[6200]	valid_0's l1: 0.479123


[6300]	valid_0's l1: 0.478275


[6400]	valid_0's l1: 0.477628


[6500]	valid_0's l1: 0.477055


[6600]	valid_0's l1: 0.476442


[6700]	valid_0's l1: 0.475873


[6800]	valid_0's l1: 0.475275


[6900]	valid_0's l1: 0.474678


[7000]	valid_0's l1: 0.474111


[7100]	valid_0's l1: 0.473425


[7200]	valid_0's l1: 0.472836


[7300]	valid_0's l1: 0.472239


[7400]	valid_0's l1: 0.47156


[7500]	valid_0's l1: 0.471019


[7600]	valid_0's l1: 0.470503


[7700]	valid_0's l1: 0.469943


[7800]	valid_0's l1: 0.469367


[7900]	valid_0's l1: 0.468836


[8000]	valid_0's l1: 0.468267


[8100]	valid_0's l1: 0.467762


[8200]	valid_0's l1: 0.467225


[8300]	valid_0's l1: 0.466724


[8400]	valid_0's l1: 0.466061


[8500]	valid_0's l1: 0.465571


[8600]	valid_0's l1: 0.465063


[8700]	valid_0's l1: 0.464644


[8800]	valid_0's l1: 0.464156


[8900]	valid_0's l1: 0.463616


[9000]	valid_0's l1: 0.463142


[9100]	valid_0's l1: 0.462669


[9200]	valid_0's l1: 0.462286


[9300]	valid_0's l1: 0.461906


[9400]	valid_0's l1: 0.461555


[9500]	valid_0's l1: 0.461238


[9600]	valid_0's l1: 0.460791


[9700]	valid_0's l1: 0.460301


[9800]	valid_0's l1: 0.459917


[9900]	valid_0's l1: 0.459515


[10000]	valid_0's l1: 0.459156
Did not meet early stopping. Best iteration is:
[10000]	valid_0's l1: 0.459156


Fold 4 MAE: 0.4592



Fold 5/5


Training until validation scores don't improve for 100 rounds


[100]	valid_0's l1: 0.796293


[200]	valid_0's l1: 0.716518


[300]	valid_0's l1: 0.682508


[400]	valid_0's l1: 0.658546


[500]	valid_0's l1: 0.639653


[600]	valid_0's l1: 0.625009


[700]	valid_0's l1: 0.612609


[800]	valid_0's l1: 0.603061


[900]	valid_0's l1: 0.594486


[1000]	valid_0's l1: 0.587626


[1100]	valid_0's l1: 0.581034


[1200]	valid_0's l1: 0.575094


[1300]	valid_0's l1: 0.569516


[1400]	valid_0's l1: 0.564665


[1500]	valid_0's l1: 0.559638


[1600]	valid_0's l1: 0.555484


[1700]	valid_0's l1: 0.551773


[1800]	valid_0's l1: 0.548498


[1900]	valid_0's l1: 0.545069


[2000]	valid_0's l1: 0.541564


[2100]	valid_0's l1: 0.538685


[2200]	valid_0's l1: 0.536084


[2300]	valid_0's l1: 0.533499


[2400]	valid_0's l1: 0.531159


[2500]	valid_0's l1: 0.528573


[2600]	valid_0's l1: 0.526053


[2700]	valid_0's l1: 0.523875


[2800]	valid_0's l1: 0.521852


[2900]	valid_0's l1: 0.519739


[3000]	valid_0's l1: 0.51803


[3100]	valid_0's l1: 0.516264


[3200]	valid_0's l1: 0.514634


[3300]	valid_0's l1: 0.51295


[3400]	valid_0's l1: 0.511231


[3500]	valid_0's l1: 0.509686


[3600]	valid_0's l1: 0.507843


[3700]	valid_0's l1: 0.506711


[3800]	valid_0's l1: 0.505189


[3900]	valid_0's l1: 0.50375


[4000]	valid_0's l1: 0.502472


[4100]	valid_0's l1: 0.500841


[4200]	valid_0's l1: 0.499541


[4300]	valid_0's l1: 0.498383


[4400]	valid_0's l1: 0.497004


[4500]	valid_0's l1: 0.496035


[4600]	valid_0's l1: 0.494998


[4700]	valid_0's l1: 0.493873


[4800]	valid_0's l1: 0.492834


[4900]	valid_0's l1: 0.491753


[5000]	valid_0's l1: 0.490917


[5100]	valid_0's l1: 0.48991


[5200]	valid_0's l1: 0.489099


[5300]	valid_0's l1: 0.488062


[5400]	valid_0's l1: 0.487315


[5500]	valid_0's l1: 0.486386


[5600]	valid_0's l1: 0.485443


[5700]	valid_0's l1: 0.484632


[5800]	valid_0's l1: 0.483754


[5900]	valid_0's l1: 0.482776


[6000]	valid_0's l1: 0.482085


[6100]	valid_0's l1: 0.48138


[6200]	valid_0's l1: 0.480732


[6300]	valid_0's l1: 0.479993


[6400]	valid_0's l1: 0.47936


[6500]	valid_0's l1: 0.478598


[6600]	valid_0's l1: 0.477939


[6700]	valid_0's l1: 0.477412


[6800]	valid_0's l1: 0.476777


[6900]	valid_0's l1: 0.476161


[7000]	valid_0's l1: 0.475541


[7100]	valid_0's l1: 0.474951


[7200]	valid_0's l1: 0.474326


[7300]	valid_0's l1: 0.473771


[7400]	valid_0's l1: 0.473141


[7500]	valid_0's l1: 0.472606


[7600]	valid_0's l1: 0.472094


[7700]	valid_0's l1: 0.471575


[7800]	valid_0's l1: 0.4711


[7900]	valid_0's l1: 0.470459


[8000]	valid_0's l1: 0.469936


[8100]	valid_0's l1: 0.469403


[8200]	valid_0's l1: 0.468882


[8300]	valid_0's l1: 0.468312


[8400]	valid_0's l1: 0.467749


[8500]	valid_0's l1: 0.467305


[8600]	valid_0's l1: 0.46672


[8700]	valid_0's l1: 0.466232


[8800]	valid_0's l1: 0.465728


[8900]	valid_0's l1: 0.465292


[9000]	valid_0's l1: 0.464892


[9100]	valid_0's l1: 0.464426


[9200]	valid_0's l1: 0.463975


[9300]	valid_0's l1: 0.463478


[9400]	valid_0's l1: 0.463101


[9500]	valid_0's l1: 0.462534


[9600]	valid_0's l1: 0.462108


[9700]	valid_0's l1: 0.461777


[9800]	valid_0's l1: 0.461383


[9900]	valid_0's l1: 0.460962


[10000]	valid_0's l1: 0.460511
Did not meet early stopping. Best iteration is:
[10000]	valid_0's l1: 0.460511


Fold 5 MAE: 0.4605



Overall CV MAE: 0.4589
Fold scores: [0.45909724434092763, 0.4601823214417843, 0.4557451057214325, 0.4591555154652729, 0.46051051291440553]
Mean ± Std: 0.4589 ± 0.0017


## Feature Importance

In [None]:
# Get feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': model.feature_importance(importance_type='gain')
}).sort_values('importance', ascending=False)

print("Top 15 most important features:")
print(feature_importance.head(15))

## Create Submission

In [None]:
# Create submission file
submission = pd.DataFrame({
    'id': test_df['id'],
    'pressure': test_predictions
})

# Ensure the submission has the correct format
print(f"Submission shape: {submission.shape}")
print(f"Submission head:")
print(submission.head())

# Save submission
submission_path = '/home/submission/submission.csv'
submission.to_csv(submission_path, index=False)
print(f"\nSubmission saved to: {submission_path}")

# Verify submission format matches sample
sample_submission = pd.read_csv('/home/data/sample_submission.csv')
print(f"\nSample submission shape: {sample_submission.shape}")
print(f"Sample submission head:")
print(sample_submission.head())

# Check if IDs match
if set(submission['id']) == set(sample_submission['id']):
    print("✓ Submission IDs match sample submission IDs")
else:
    print("⚠ Warning: Submission IDs don't match sample submission IDs")