# ðŸ¦· Dental Implant 10-Year Survival Prediction

## Notebook 04: XGBoost Model

**Objective:** Train and evaluate an XGBoost classifier - a powerful gradient boosting algorithm that often performs well in tabular data competitions.

---


### ðŸŽ¨ Setup: Import Libraries & Configure Plotting


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report, roc_curve
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

# Periospot Brand Colors
COLORS = {
    'periospot_blue': '#15365a',
    'mystic_blue': '#003049',
    'periospot_red': '#6c1410',
    'crimson_blaze': '#a92a2a',
    'vanilla_cream': '#f7f0da',
    'black': '#000000',
    'white': '#ffffff',
    'classic_periospot_blue': '#0031af',
    'periospot_light_blue': '#0297ed',
    'periospot_dark_blue': '#02011e',
    'periospot_yellow': '#ffc430',
    'periospot_bright_blue': '#1040dd'
}

periospot_palette = [COLORS['periospot_blue'], COLORS['crimson_blaze'], 
                     COLORS['periospot_light_blue'], COLORS['periospot_yellow']]

# Configure matplotlib
plt.rcParams['font.family'] = 'DejaVu Sans'
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['figure.facecolor'] = COLORS['white']
plt.rcParams['axes.facecolor'] = COLORS['vanilla_cream']
plt.rcParams['axes.edgecolor'] = COLORS['periospot_blue']

sns.set_palette(periospot_palette)

print("âœ… Libraries imported and plotting style configured!")
print(f"XGBoost version: {xgb.__version__}")


---

### 1. Load Processed Data & Setup


In [None]:
# Load the processed data
X = pd.read_csv('../data/processed/X_train.csv')
y = pd.read_csv('../data/processed/y_train.csv').values.ravel()

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

# Split into train and validation
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTraining set: {X_train.shape[0]} samples")
print(f"Validation set: {X_val.shape[0]} samples")


---

### 2. Train XGBoost Model

XGBoost (Extreme Gradient Boosting) is a powerful gradient boosting algorithm known for its speed and performance in competitions.


In [None]:
# TODO: Initialize the XGBoost Classifier with appropriate hyperparameters.
# Hint: Use xgb.XGBClassifier() with parameters like:
#   - n_estimators: number of boosting rounds (e.g., 100)
#   - max_depth: maximum tree depth (e.g., 6)
#   - learning_rate: step size shrinkage (e.g., 0.1)
#   - random_state: for reproducibility (42)

xgb_model = xgb.XGBClassifier(
    n_estimators=...,
    max_depth=...,
    learning_rate=...,
    random_state=42,
    eval_metric='auc',
    use_label_encoder=False
)

# TODO: Fit the model on the training data.
# Hint: You can use eval_set for early stopping monitoring.
# xgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)
...

print("âœ… XGBoost model trained!")


---

### 3. Evaluate XGBoost Model


In [None]:
# TODO: Make predictions on the validation set.

y_pred_xgb = ...  # Class predictions
y_pred_xgb_proba = ...  # Probability predictions (use [:, 1] for positive class)

# TODO: Calculate metrics
roc_auc_xgb = ...
accuracy_xgb = ...

print(f"XGBoost Results:")
print(f"  - ROC-AUC: {roc_auc_xgb:.4f}")
print(f"  - Accuracy: {accuracy_xgb:.4f}")


In [None]:
# Classification report and confusion matrix

print("Classification Report:")
print(classification_report(y_val, y_pred_xgb))

# Plot confusion matrix
fig, ax = plt.subplots(figsize=(8, 6))
cm = confusion_matrix(y_val, y_pred_xgb)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=['Predicted 0', 'Predicted 1'],
            yticklabels=['Actual 0', 'Actual 1'])
ax.set_title('XGBoost - Confusion Matrix', fontweight='bold')
plt.tight_layout()
plt.savefig('../figures/xgb_confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()


In [None]:
# Plot ROC curve
fig, ax = plt.subplots(figsize=(10, 8))

fpr, tpr, _ = roc_curve(y_val, y_pred_xgb_proba)
ax.plot(fpr, tpr, label=f'XGBoost (AUC = {roc_auc_xgb:.4f})', 
        color=COLORS['periospot_blue'], linewidth=2)
ax.plot([0, 1], [0, 1], 'k--', label='Random Classifier')

ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('XGBoost - ROC Curve', fontweight='bold')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/xgb_roc_curve.png', dpi=150, bbox_inches='tight')
plt.show()


---

### 4. Feature Importance Analysis


In [None]:
# TODO: Visualize feature importance from XGBoost.
# XGBoost provides multiple importance types: 'weight', 'gain', 'cover'

feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': xgb_model.feature_importances_
}).sort_values('importance', ascending=False)

# Plot top 15 features
fig, ax = plt.subplots(figsize=(10, 8))
top_features = feature_importance.head(15)
sns.barplot(data=top_features, x='importance', y='feature', 
            palette=periospot_palette, ax=ax)
ax.set_title('XGBoost - Top 15 Feature Importances', fontweight='bold')
ax.set_xlabel('Importance')
ax.set_ylabel('Feature')
plt.tight_layout()
plt.savefig('../figures/xgb_feature_importance.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nTop 10 Most Important Features:")
print(feature_importance.head(10).to_string(index=False))


---

### 5. Save Results


In [None]:
# Save the XGBoost results to a JSON file

results_xgb = {
    "model": "XGBoost",
    "roc_auc": float(roc_auc_xgb),
    "accuracy": float(accuracy_xgb),
    "hyperparameters": {
        "n_estimators": xgb_model.n_estimators,
        "max_depth": xgb_model.max_depth,
        "learning_rate": xgb_model.learning_rate
    }
}

with open('../results/xgboost_results.json', 'w') as f:
    json.dump(results_xgb, f, indent=2)

print("âœ… Results saved to results/xgboost_results.json")


---

### âœ… XGBoost Training Complete!

**Next Steps:** 
- Try LightGBM in `05_LightGBM.ipynb`
- Try CatBoost in `06_CatBoost.ipynb`
- Compare all models to select the best one


# ðŸ¦· Dental Implant 10-Year Survival Prediction

## Notebook 04: XGBoost Model

**Objective:** Train and evaluate an XGBoost (Extreme Gradient Boosting) classifier. XGBoost is known for its speed and performance in tabular data competitions.

---
should 