<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
color:#ffffff;
padding:38px;
border-radius:20px;
box-shadow:0 0 35px rgba(0,255,255,0.35);
text-align:center;
font-family:Arial;
">

<div style="
font-size:34px;
font-weight:800;
letter-spacing:1.5px;
background: linear-gradient(90deg,#00ffff,#00c6ff);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
">
ü´Ä Heart Disease Prediction
</div>

<div style="
font-size:18px;
margin-top:10px;
color:#d8faff;
letter-spacing:1px;
">
Kaggle Playground Series ‚Äì S6E2
</div>

<hr style="
border:1px solid rgba(0,255,255,0.5);
margin:25px 0;
">

<div style="
display:flex;
justify-content:space-around;
flex-wrap:wrap;
margin-top:15px;
">

<div style="
background:rgba(0,255,255,0.1);
padding:14px 26px;
border-radius:14px;
margin:10px;
font-size:15px;
">
üìä Structured Clinical Data Modeling
</div>

<div style="
background:rgba(0,255,200,0.1);
padding:14px 26px;
border-radius:14px;
margin:10px;
font-size:15px;
">
üîÅ Stratified Cross-Validation Framework
</div>

<div style="
background:rgba(0,200,255,0.12);
padding:14px 26px;
border-radius:14px;
margin:10px;
font-size:15px;
">
‚ö° Dual Gradient Boosting Architecture
</div>

<div style="
background:rgba(255,255,255,0.08);
padding:14px 26px;
border-radius:14px;
margin:10px;
font-size:15px;
">
üéØ ROC-AUC Optimized Ensemble Strategy
</div>

</div>

<div style="
margin-top:28px;
font-size:14px;
color:#bceeff;
letter-spacing:1px;
">
Stable ‚Ä¢ Signal-Focused ‚Ä¢ Leaderboard-Driven
</div>

</div>

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
color:#ffffff;
padding:30px;
border-radius:18px;
box-shadow:0 0 30px rgba(0,255,255,0.3);
">

<div style="font-size:32px;font-weight:800;text-align:center;letter-spacing:1px;">
ü´Ä Heart Disease Prediction ‚Äì Elite Dual Boost Pipeline
</div>

<hr style="border:1px solid rgba(0,255,255,0.6); margin:22px 0;">

<div style="font-size:22px;font-weight:bold;color:#00ffff;">
üìñ Project Overview
</div>

<div style="font-size:16px;margin-top:10px;line-height:1.6;">
This notebook builds a robust and competition-driven machine learning pipeline for the Kaggle Playground Series S6E2 ‚Äì Heart Disease Prediction challenge. The task focuses on estimating the probability of heart disease using structured clinical data, with performance evaluated through the ROC-AUC metric.
</div>

<br>

<div style="font-size:22px;font-weight:bold;color:#00ffff;">
‚öôÔ∏è Methodology
</div>

<div style="font-size:16px;margin-top:10px;line-height:1.6;">
The solution follows a signal-focused and stability-oriented modeling strategy:
<ul style="margin-top:12px;">
<li>Target-aware data preprocessing</li>
<li>Minimal yet effective feature engineering</li>
<li>Safe categorical encoding strategy</li>
<li>Stratified K-Fold cross-validation</li>
<li>Dual gradient boosting models (XGBoost & LightGBM)</li>
<li>Fine-tuned weighted ensemble blending</li>
<li>Probability-based final submission</li>
</ul>
</div>

<br>

<div style="font-size:22px;font-weight:bold;color:#00ffff;">
üéØ Objective
</div>

<div style="font-size:16px;margin-top:10px;line-height:1.6;">
The primary objective is to design a low-variance, high-performance ensemble model that ensures strong generalization across validation folds while achieving competitive leaderboard results through optimized blending and controlled model complexity.
</div>

</div>

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üìö  Library Imports
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Import essential Python libraries for data processing, modeling, evaluation, and visualization.
</div>

</div>

In [1]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üìÇ  Data Loading
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Load training and test datasets and prepare the base structure for modeling.
</div>

</div>

In [2]:
train = pd.read_csv("/kaggle/input/playground-series-s6e2/train.csv")
test = pd.read_csv("/kaggle/input/playground-series-s6e2/test.csv")

test_ids = test["id"]

train.drop(columns=["id"], inplace=True)
test.drop(columns=["id"], inplace=True)

train["Heart Disease"] = train["Heart Disease"].map({
    "Absence": 0,
    "Presence": 1
})

TARGET = "Heart Disease"

X = train.drop(columns=[TARGET])
y = train[TARGET]

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üéØ  Target Preparation
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Convert categorical target labels into numerical format for supervised learning.
</div>

</div>

In [3]:
# Risk Index
X["Risk_Index"] = (X["Age"] * X["BP"]) / (X["Max HR"] + 1)
test["Risk_Index"] = (test["Age"] * test["BP"]) / (test["Max HR"] + 1)

# HR Age Ratio
X["HR_Age_ratio"] = X["Max HR"] / (X["Age"] + 1)
test["HR_Age_ratio"] = test["Max HR"] / (test["Age"] + 1)

# Chol BP Ratio
X["Chol_BP_ratio"] = X["Cholesterol"] / (X["BP"] + 1)
test["Chol_BP_ratio"] = test["Cholesterol"] / (test["BP"] + 1)

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
‚öôÔ∏è  Feature Engineering
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Create meaningful interaction features to enhance predictive signal strength.
</div>

</div>

In [4]:
combined = pd.concat([X, test], axis=0)

cat_cols = combined.select_dtypes(include=["object"]).columns

for col in cat_cols:
    combined[col] = combined[col].astype("category").cat.codes

X = combined.iloc[:len(X)]
test = combined.iloc[len(X):]

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üîê  Categorical Encoding
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Apply stable encoding techniques to transform categorical variables safely.
</div>

</div>

In [5]:
folds = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

oof_xgb = np.zeros(len(X))
oof_lgb = np.zeros(len(X))

test_xgb = np.zeros(len(test))
test_lgb = np.zeros(len(test))

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üîÅ  Cross-Validation Strategy
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Implement stratified 5-fold validation to ensure robust model evaluation.
</div>

</div>

In [6]:
for fold, (train_idx, val_idx) in enumerate(folds.split(X, y)):
    
    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

    model = XGBClassifier(
        n_estimators=1600,
        learning_rate=0.028,
        max_depth=3,
        subsample=0.85,
        colsample_bytree=0.75,
        min_child_weight=3,
        gamma=0.05,
        reg_lambda=6,
        reg_alpha=2,
        tree_method="hist",
        eval_metric="auc",
        early_stopping_rounds=100,
        random_state=42,
        n_jobs=-1
    )

    model.fit(X_train, y_train,
              eval_set=[(X_val, y_val)],
              verbose=False)

    oof_xgb[val_idx] = model.predict_proba(X_val)[:,1]
    test_xgb += model.predict_proba(test)[:,1] / folds.n_splits

print("XGB AUC:", roc_auc_score(y, oof_xgb))

XGB AUC: 0.9553116286619634


<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üöÄ  XGBoost Model Training
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Train a shallow, regularized boosting model for stable decision boundaries.
</div>

</div>

In [7]:
for fold, (train_idx, val_idx) in enumerate(folds.split(X, y)):
    
    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

    model = LGBMClassifier(
        n_estimators=2000,
        learning_rate=0.025,
        num_leaves=48,
        min_child_samples=25,
        subsample=0.8,
        colsample_bytree=0.7,
        reg_lambda=6,
        reg_alpha=3,
        random_state=42,
        n_jobs=-1
    )

    model.fit(X_train, y_train)

    oof_lgb[val_idx] = model.predict_proba(X_val)[:,1]
    test_lgb += model.predict_proba(test)[:,1] / folds.n_splits

print("LGB AUC:", roc_auc_score(y, oof_lgb))

[LightGBM] [Info] Number of positive: 225963, number of negative: 278037
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.030077 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1187
[LightGBM] [Info] Number of data points in the train set: 504000, number of used features: 16
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.448339 -> initscore=-0.207383
[LightGBM] [Info] Start training from score -0.207383
[LightGBM] [Info] Number of positive: 225963, number of negative: 278037
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.029147 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1182
[LightGBM] [Info] Number of data points in the train set: 504000, number of used features: 16
[LightGBM] [

<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üåø  LightGBM Model Training
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Train a complementary gradient boosting model to introduce structural diversity.
</div>

</div>

In [8]:
best_auc = 0
best_w = 0

for w in np.arange(0.45, 0.65, 0.005):
    
    blend = w * oof_xgb + (1 - w) * oof_lgb
    score = roc_auc_score(y, blend)
    
    if score > best_auc:
        best_auc = score
        best_w = w

print("Best AUC:", best_auc)
print("Best Weight:", best_w)

Best AUC: 0.9553633541832596
Best Weight: 0.6450000000000002


<div style="
background: radial-gradient(circle at top left,#0f2027,#203a43,#2c5364);
padding:20px;
border-radius:16px;
margin-top:25px;
box-shadow:0 0 18px rgba(0,255,255,0.25);
color:#ffffff;
text-align:center;
">

<div style="font-size:22px;font-weight:bold;">
üèÅ  Model Blending & Submission
</div>

<div style="font-size:14px;color:#c8faff;margin-top:8px;">
Optimize ensemble weights and generate the final probability-based submission file.
</div>

</div>

In [9]:
final_test = best_w * test_xgb + (1 - best_w) * test_lgb

In [10]:
submission = pd.DataFrame({
    "id": test_ids,
    "Heart Disease": final_test
})

submission.to_csv("submission.csv", index=False)
submission.head()

Unnamed: 0,id,Heart Disease
0,630000,0.944759
1,630001,0.009609
2,630002,0.987763
3,630003,0.005456
4,630004,0.208192
