This is my first time using the XGBoost + CatBoost blend approach, but I wanted to give it a go and see if we can score higher on this exercise to find what features work best. It was difficult but fun!
- Task: Binary classification — predict presence/absence of heart disease
- Metric: AUC (Area Under ROC Curve)
- Data: 630K train rows, 270K test rows, 13 numeric features, no nulls
- Link: https://www.kaggle.com/competitions/playground-series-s6e2
Leaderboard: #817 — Public LB Score: 0.95355
| Model | OOF AUC |
|---|---|
| XGBoost (7-seed avg) | 0.95542 |
| CatBoost (7-seed avg) | 0.95550 |
| 50/50 Blend | 0.95550 |
| Run | Learning Rate | Seeds | OOF AUC | Public LB | Rank |
|---|---|---|---|---|---|
| v1 | 0.05 | 3 | 0.95547 | 0.95353 | #850 |
| v2 | 0.02 | 7 | 0.95550 | 0.95355 | #817 |
├── data.py # Data loading and target encoding
├── features.py # Feature column definitions and transformations
├── model.py # Stratified K-fold CV with multi-seed averaging
├── submission.py # Blend predictions and generate submission CSV
├── requirements.txt
└── data/
├── train.csv
├── test.csv
└── sample_submission.csv
pip install -r requirements.txt
python submission.py