# XGBoost (Extreme Gradient Boosting)

XGBoost is a powerful **ensemble learning** method based on gradient boosting.

### Why XGBoost?
- Very efficient and fast.
- Handles missing values.
- Regularization to avoid overfitting.
- Widely used in Kaggle/industry.

### Steps:
1. Load dataset
2. Train-test split
3. Train XGBoost model
4. Make predictions
5. Evaluate performance


In [1]:
# Install xgboost if not available (uncomment if needed)
# !pip install xgboost

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import xgboost as xgb

In [2]:
# Load dataset (Breast Cancer)
data = load_breast_cancer()
X, y = data.data, data.target

print("Feature shape:", X.shape)
print("Classes:", data.target_names)

In [3]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [4]:
# Train XGBoost model
model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
model.fit(X_train, y_train)

In [5]:
# Predictions
y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=data.target_names))

In [6]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=data.target_names, yticklabels=data.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

In [7]:
# Feature importance plot
xgb.plot_importance(model, max_num_features=10, importance_type='weight')
plt.show()

### Key Takeaways:
- XGBoost is one of the best performing algorithms for classification/regression.
- Handles missing data and prevents overfitting via regularization.
- Feature importance can be easily visualized.

👉 Often a go-to model in Kaggle competitions and real-world ML tasks.