### 1. Data Cleaning & Imputation

**Load Raw Data**

>We start by importing the raw training and test datasets. These will serve as the foundation for all future transformations, imputations, and feature engineering work. This step ensures we’re working with the unmodified data straight from the competition source.

In [1]:
import pandas as pd
# Load data
train = pd.read_csv('../data/train.csv')
test = pd.read_csv('../data/test.csv')

**Extract GroupID from PassengerId & Track Missing Values**

>The PassengerId is formatted as `GroupID_PassengerNumber`. Splitting this gives us a `GroupID`, which may be useful for inferring transport outcomes within families or travel groups. We also add a column to track how many fields are missing per row, which may indicate `CryoSleep` or data quality issues.

In [2]:
# Extract GroupID
train['GroupID'] = train['PassengerId'].str.split('_').str[0]
test['GroupID'] = test['PassengerId'].str.split('_').str[0]

# Track total missing fields per row
train['MissingCount'] = train.isnull().sum(axis=1)
test['MissingCount'] = test.isnull().sum(axis=1)


**Impute CryoSleep Based on Spending Behavior**

>Passengers in `CryoSleep` almost never have spending in categories like `Spa`, `VRDeck`, `ShoppingMall`, etc. So if all spending is zero or missing, and `CryoSleep` is `null`, we can safely assume they were asleep.

We also create a `NoSpend` flag to capture passengers with no non-zero spending, useful for downstream analysis.

In [3]:
# Spending categories
spend_cols = ['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']

# Create 'NoSpend' flag
train['NoSpend'] = train[spend_cols].fillna(0).sum(axis=1) == 0
test['NoSpend'] = test[spend_cols].fillna(0).sum(axis=1) == 0

# Impute CryoSleep as True when no spend and missing CryoSleep
train['CryoSleep'] = train.apply(
    lambda row: True if pd.isna(row['CryoSleep']) and row['NoSpend'] else row['CryoSleep'],
    axis=1
)
test['CryoSleep'] = test.apply(
    lambda row: True if pd.isna(row['CryoSleep']) and row['NoSpend'] else row['CryoSleep'],
    axis=1
)
