# 🚀 Overview of XGBoost

---

## 📌 What is XGBoost?

XGBoost (Extreme Gradient Boosting) is an **advanced implementation** of the **Gradient Boosting algorithm**. It is specifically designed for:

- **Speed** and **performance**.
- Introduces various **enhancements** to make it **faster**, **more efficient**, and capable of **handling complex datasets**.

---

## 🔧 Key Improvements Over Traditional Gradient Boosting

### 1. **Speed**
- XGBoost is optimized for **faster execution** compared to traditional gradient boosting methods.
- It achieves this with advanced techniques like **parallelization** and **hardware optimization**.

### 2. **Handling Missing Data**
- XGBoost handles missing data effectively by automatically learning the best way to deal with it during the training process.

### 3. **Regularization**
- It incorporates **L1** (Lasso) and **L2** (Ridge) regularization techniques to prevent **overfitting**.
- Regularization improves the generalization power of the model.

### 4. **Custom Loss Functions**
- Users can define **custom loss functions** to tailor the optimization to specific problems, beyond the default regression or classification losses.

### 5. **Tree Pruning**
- XGBoost uses **tree pruning** for building trees more efficiently. It prunes trees by looking at **leaf nodes**, allowing for a more balanced structure.

---

## ✅ Summary

- **XGBoost** is an advanced boosting algorithm that significantly improves on traditional gradient boosting by enhancing **speed**, **regularization**, and **handling complex data**.
- It includes features like **custom loss functions** and **tree pruning**, which makes it suitable for a wide variety of machine learning tasks, including large-scale and complex datasets.

---


# 🧩 Hyperparameters in XGBoost and How to Tune Them

---

## 📌 Key Hyperparameters

### 1. **Learning Rate (eta)**  
- **Purpose**: Controls the contribution of each tree to the overall model.
- **Typical Range**: 0.01 to 0.3  
- **Effect**: A lower learning rate typically leads to more trees and a better model, but requires more computation time.  
- **Tip**: A lower learning rate often results in better generalization but requires more boosting rounds.

---

### 2. **Number of Trees (n_estimators)**  
- **Purpose**: Determines the total number of boosting rounds or trees to build.
- **Effect**:  
  - **Larger values** may improve performance but increase computation time.
  - Too many trees can lead to overfitting, so balancing this with the learning rate is essential.

---

### 3. **Tree Depth (max_depth)**  
- **Purpose**: Limits the depth of individual trees, helping to balance bias and variance.
- **Effect**:  
  - **Shallow trees** generalize better, but may not capture complex relationships.
  - **Deeper trees** may overfit and capture noise in the data.
- **Typical Range**: 3-10, with larger depths increasing model complexity.

---

### 4. **Subsample**  
- **Purpose**: Controls the fraction of data used to train each tree.
- **Effect**: Helps reduce overfitting by randomly selecting a subset of the data for training each tree.
- **Typical Range**: 0.5 to 1.0  
- **Tip**: Lower values make the model more robust to overfitting but may lead to underfitting if too small.

---

### 5. **Colsample_bytree**  
- **Purpose**: Controls the fraction of features used for each tree split.
- **Effect**: Helps prevent overfitting by selecting a subset of features for each split.
- **Typical Range**: 0.5 to 1.0  
- **Tip**: Reducing the fraction can make the model more generalizable by preventing it from relying too heavily on specific features.

---

### 6. **Regularization Parameters**  
- **lambda (L2 Regularization)**: Controls L2 regularization, helping to prevent overfitting by penalizing large weights.
- **alpha (L1 Regularization)**: Controls L1 regularization, helping to enforce sparsity in the feature weights.

---

## ✅ Summary of Hyperparameter Tuning

- **Learning Rate (eta)**: Lower rates require more trees for convergence but may improve generalization.
- **Number of Trees (n_estimators)**: A higher number of trees can improve performance but increases computation time.
- **Tree Depth (max_depth)**: Controls bias-variance tradeoff; shallow trees generalize better.
- **Subsample**: Helps reduce overfitting by using a subset of the data for each tree.
- **Colsample_bytree**: Prevents overfitting by using only a subset of features for splits.
- **Regularization (lambda, alpha)**: Helps reduce overfitting by penalizing overly complex models.

---

### 🎯 Tuning Tips
- Start with a **higher number of trees** and a **lower learning rate**.
- Use **cross-validation** to find the optimal balance between regularization and tree complexity.
- **Grid search** or **random search** can help identify the best combination of hyperparameters.

---


In [2]:
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
from xgboost import XGBClassifier 

data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"features: {data.feature_names}")
print(f"Classes: {data.target_names}")

# Convert dataset to DMatrix
dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)

# Train XGBoost model
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'max_depth': 3,
    'eta': 0.1
}

xgb_model = xgb.train(params,  dtrain, num_boost_round = 100)

y_pred = (xgb_model.predict(dtest) > 0.5).astype(int)

accuracy = accuracy_score(y_test, y_pred)
print(accuracy)


# Define hyperparameter grid
param_grid = {
    'learmomg_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]
}

# Initialize XGBoost classifier
xgb_clf = XGBClassifier(user_label_encoder = False, eval_metric = 'logloss', random_state = 42)

# Perform Grid Search
grid_search = GridSearchCV(estimator=xgb_clf, param_grid=param_grid, cv = 5, scoring='accuracy', n_jobs = -1)
grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-Validation Accuracy: {grid_search.best_score_}")

features: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Classes: ['malignant' 'benign']
0.956140350877193
Best Parameters: {'colsample_bytree': 0.8, 'learmomg_rate': 0.01, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.8}
Best Cross-Validation Accuracy: 0.9736263736263737


Parameters: { "learmomg_rate", "user_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
