# ðŸ¦· Dental Implant 10-Year Survival Prediction

## Notebook 05: LightGBM Model

**Objective:** Train and evaluate a LightGBM classifier - a fast, efficient gradient boosting framework that uses histogram-based algorithms.

---


### ðŸŽ¨ Setup: Import Libraries & Configure Plotting


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report, roc_curve
import lightgbm as lgb
import warnings
warnings.filterwarnings('ignore')

# Periospot Brand Colors
COLORS = {
    'periospot_blue': '#15365a',
    'mystic_blue': '#003049',
    'periospot_red': '#6c1410',
    'crimson_blaze': '#a92a2a',
    'vanilla_cream': '#f7f0da',
    'black': '#000000',
    'white': '#ffffff',
    'classic_periospot_blue': '#0031af',
    'periospot_light_blue': '#0297ed',
    'periospot_dark_blue': '#02011e',
    'periospot_yellow': '#ffc430',
    'periospot_bright_blue': '#1040dd'
}

periospot_palette = [COLORS['periospot_blue'], COLORS['crimson_blaze'], 
                     COLORS['periospot_light_blue'], COLORS['periospot_yellow']]

# Configure matplotlib
plt.rcParams['font.family'] = 'DejaVu Sans'
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['figure.facecolor'] = COLORS['white']
plt.rcParams['axes.facecolor'] = COLORS['vanilla_cream']
plt.rcParams['axes.edgecolor'] = COLORS['periospot_blue']

sns.set_palette(periospot_palette)

print("âœ… Libraries imported and plotting style configured!")
print(f"LightGBM version: {lgb.__version__}")


---

### 1. Load Processed Data & Setup


In [None]:
# Load the processed data
X = pd.read_csv('../data/processed/X_train.csv')
y = pd.read_csv('../data/processed/y_train.csv').values.ravel()

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

# Split into train and validation
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTraining set: {X_train.shape[0]} samples")
print(f"Validation set: {X_val.shape[0]} samples")


---

### 2. Train LightGBM Model

LightGBM (Light Gradient Boosting Machine) is known for its speed and efficiency, especially with large datasets. It uses histogram-based algorithms.


In [None]:
# TODO: Initialize the LightGBM Classifier with appropriate hyperparameters.
# Hint: Use lgb.LGBMClassifier() with parameters like:
#   - n_estimators: number of boosting rounds (e.g., 100)
#   - max_depth: maximum tree depth (e.g., -1 for no limit, or specific value like 6)
#   - learning_rate: step size shrinkage (e.g., 0.1)
#   - num_leaves: max number of leaves in one tree (e.g., 31)
#   - random_state: for reproducibility (42)

lgb_model = lgb.LGBMClassifier(
    n_estimators=...,
    max_depth=...,
    learning_rate=...,
    num_leaves=...,
    random_state=42,
    verbose=-1
)

# TODO: Fit the model on the training data.
# lgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
...

print("âœ… LightGBM model trained!")


---

### 3. Evaluate LightGBM Model


In [None]:
# TODO: Make predictions on the validation set.

y_pred_lgb = ...  # Class predictions
y_pred_lgb_proba = ...  # Probability predictions (use [:, 1] for positive class)

# TODO: Calculate metrics
roc_auc_lgb = ...
accuracy_lgb = ...

print(f"LightGBM Results:")
print(f"  - ROC-AUC: {roc_auc_lgb:.4f}")
print(f"  - Accuracy: {accuracy_lgb:.4f}")


In [None]:
# Classification report and confusion matrix

print("Classification Report:")
print(classification_report(y_val, y_pred_lgb))

# Plot confusion matrix
fig, ax = plt.subplots(figsize=(8, 6))
cm = confusion_matrix(y_val, y_pred_lgb)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=['Predicted 0', 'Predicted 1'],
            yticklabels=['Actual 0', 'Actual 1'])
ax.set_title('LightGBM - Confusion Matrix', fontweight='bold')
plt.tight_layout()
plt.savefig('../figures/lgb_confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()


In [None]:
# Plot ROC curve
fig, ax = plt.subplots(figsize=(10, 8))

fpr, tpr, _ = roc_curve(y_val, y_pred_lgb_proba)
ax.plot(fpr, tpr, label=f'LightGBM (AUC = {roc_auc_lgb:.4f})', 
        color=COLORS['periospot_light_blue'], linewidth=2)
ax.plot([0, 1], [0, 1], 'k--', label='Random Classifier')

ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('LightGBM - ROC Curve', fontweight='bold')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/lgb_roc_curve.png', dpi=150, bbox_inches='tight')
plt.show()


---

### 4. Feature Importance Analysis


In [None]:
# TODO: Visualize feature importance from LightGBM.

feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': lgb_model.feature_importances_
}).sort_values('importance', ascending=False)

# Plot top 15 features
fig, ax = plt.subplots(figsize=(10, 8))
top_features = feature_importance.head(15)
sns.barplot(data=top_features, x='importance', y='feature', 
            palette=periospot_palette, ax=ax)
ax.set_title('LightGBM - Top 15 Feature Importances', fontweight='bold')
ax.set_xlabel('Importance')
ax.set_ylabel('Feature')
plt.tight_layout()
plt.savefig('../figures/lgb_feature_importance.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nTop 10 Most Important Features:")
print(feature_importance.head(10).to_string(index=False))


---

### 5. Save Results


In [None]:
# Save the LightGBM results to a JSON file

results_lgb = {
    "model": "LightGBM",
    "roc_auc": float(roc_auc_lgb),
    "accuracy": float(accuracy_lgb),
    "hyperparameters": {
        "n_estimators": lgb_model.n_estimators,
        "max_depth": lgb_model.max_depth,
        "learning_rate": lgb_model.learning_rate,
        "num_leaves": lgb_model.num_leaves
    }
}

with open('../results/lightgbm_results.json', 'w') as f:
    json.dump(results_lgb, f, indent=2)

print("âœ… Results saved to results/lightgbm_results.json")


---

### âœ… LightGBM Training Complete!

**Next Steps:** 
- Try CatBoost in `06_CatBoost.ipynb`
- Compare all models to select the best one for submission


# ðŸ¦· Dental Implant 10-Year Survival Prediction

## Notebook 05: LightGBM Model

**Objective:** Train and evaluate a LightGBM (Light Gradient Boosting Machine) classifier. LightGBM is known for its efficiency with large datasets and categorical features.

---
