# üî• 1. What is Gradient Boosting?

**Gradient Boosting is an ensemble(jorna) technique where:**
  * Models (typically decision trees) are added one at a time.

  * Each new tree tries to correct the errors made by the previous trees.

  * It minimizes a loss function using gradient descent.

**üß† Think of it as:**
"Make a prediction ‚Üí See where you're wrong ‚Üí Train a new tree to fix those errors ‚Üí Repeat."



### üõ† Gradient Boosting Flow:
  * Start with an initial model (say, mean of target).

  * Calculate the residuals (errors).

  * Train a tree on the residuals.

  * Add this tree to the model prediction.

  * Repeat the process multiple times.

## Gradient Boosting using GradientBoostingClassifier
**We'll use the Breast Cancer dataset from sklearn:**

In [122]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd 
from xgboost import XGBClassifier
import time
import numpy as np

In [18]:
data = load_breast_cancer()
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [19]:
df = pd.DataFrame(data.data, columns=data.feature_names)
df["Target"] = data.target
X, y = df.drop("Target", axis=1), df["Target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [20]:
gb_model = GradientBoostingClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42)

In [21]:
gb_model.fit(X_train, y_train)

In [22]:
y_pred = gb_model.predict(X_test)

In [23]:
print("Gradient Boosting Accuracy:", accuracy_score(y_test, y_pred))

Gradient Boosting Accuracy: 0.956140350877193


# XGBoost using XGBClassifier

In [51]:
# xgb_model = XGBClassifier(learning_rate = 0.1, n_estimators = 1000, max_depth = 5, eval_metric = "logloss", use_label_encoder = False)
xgb_model = XGBClassifier(learning_rate = 0.1, n_estimators = 1000, max_depth = 5, eval_metric='logloss')


In [52]:
xgb_model.fit(X_train, y_train)

In [53]:
y_pred_xgb = xgb_model.predict(X_test)

In [54]:
print("XGBoost Accuracy:", accuracy_score(y_test, y_pred_xgb))

XGBoost Accuracy: 0.956140350877193


## Issues With the GradientBoost and how XGB Solves it 
* Slow Training (No Parallelism)
* No Regularization ‚Üí Overfitting Risk
* Handling Missing Data: Not Supported Natively
* Limited Tree Pruning: Greedy & Shallow
* Resource Heavy on Large Data

## How XGBoost Solves These:

**Problem in GBM    -->  XGBoost Fix**

* Slow training	Uses --> parallel tree construction (n_jobs)
* No regularization	--> Adds L1 and L2 penalties
* Missing data	--> Handles missing values natively
* Tree pruning	--> Uses smart pruning (loss-based gain)
* Resource heavy --> Uses DMatrix for optimized memory

### 1. Slow Training (No Parallelism) 
*Using Previous Loaded Data to Perform Actions*

#### Train Gradient Boosting (Traditional) and Record Time

In [97]:
start_time_gb = time.time()
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42, learning_rate=0.1)
# gb_model = GradientBoostingClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)
end_time_gb = time.time()

gb_time = end_time_gb - start_time_gb
gb_preds = gb_model.predict(X_test)
gb_acc = accuracy_score(y_test, gb_preds)

print(f"Gradient Boosting Time: {gb_time:.4f}s | Accuracy: {gb_acc:.4f}")

Gradient Boosting Time: 0.3318s | Accuracy: 0.9415


In [101]:
print("üîÅ Training XGBoost (Default n_jobs = -1)...")
start_time = time.time()

# 2. XGBoost with Parallel Training (default: n_jobs = -1)
xgb_parallel = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    eval_metric="logloss"
)

xgb_parallel.fit(X_train, y_train)
time_parallel = time.time() - start_time
acc_parallel = accuracy_score(y_test, xgb_parallel.predict(X_test))

print(f"‚úÖ XGBoost (Parallel) - Time: {time_parallel:.4f}s | Accuracy: {acc_parallel:.4f}\n")

# ------------------------------------------------------------

print("üîÅ Training XGBoost (n_jobs = 1, no parallelism)...")
start_time = time.time()

# 3. XGBoost with Single Thread (like sklearn)
xgb_single_thread = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    eval_metric="logloss",
    n_jobs=0
)

xgb_single_thread.fit(X_train, y_train)
time_single = time.time() - start_time
acc_single = accuracy_score(y_test, xgb_single_thread.predict(X_test))

print(f"‚úÖ XGBoost (Single Thread) - Time: {time_single:.4f}s | Accuracy: {acc_single:.4f}")

üîÅ Training XGBoost (Default n_jobs = -1)...
‚úÖ XGBoost (Parallel) - Time: 0.1157s | Accuracy: 0.9357

üîÅ Training XGBoost (n_jobs = 1, no parallelism)...
‚úÖ XGBoost (Single Thread) - Time: 0.0507s | Accuracy: 0.9357


**n_jobs effect but we can see difference in large dataset**

### 2. No Regularization ‚Üí Overfitting Risk

In [103]:
gb_overfit = GradientBoostingClassifier(n_estimators=500, max_depth=10, learning_rate=0.1)
gb_overfit.fit(X_train, y_train)

print("Train Accuracy:", gb_overfit.score(X_train, y_train))
print("Test Accuracy:", gb_overfit.score(X_test, y_test))

Train Accuracy: 1.0
Test Accuracy: 0.9122807017543859


In [116]:
xgb_no_reg = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    eval_metric='logloss',
    reg_alpha=0,     # No L1 Default value is also 0
    reg_lambda=0     # No L2 Default value is also 0
)
xgb_no_reg.fit(X_train, y_train)

print("Train Accuracy wih no Reqularization:", xgb_no_reg.score(X_train, y_train))
print("Test Accuracy with no Regularization:", xgb_no_reg.score(X_test, y_test))
## With Regularizations
xgb_with_reg = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    eval_metric='logloss',
    reg_alpha=10,     # L1 penalty (sparsity)
    reg_lambda=15     # L2 penalty (shrinkage)
)
xgb_with_reg.fit(X_train, y_train)
print("Reduce OverFitting")
print("Train Accuracy wih Reqularization:", xgb_with_reg.score(X_train, y_train))
print("Test Accuracy with Regularization:", xgb_with_reg.score(X_test, y_test))

Train Accuracy wih no Reqularization: 1.0
Test Accuracy with no Regularization: 0.956140350877193
Reduce OverFitting
Train Accuracy wih Reqularization: 0.9868131868131869
Test Accuracy with Regularization: 0.956140350877193


### ‚úÖ 3. No Native Handling of Missing Values
**‚ùå Problem:**
Traditional Gradient Boosting fails if you pass missing values.

**‚úÖ XGBoost Solution:**
Handles missing data internally ‚Äî no need to fill or impute.

‚úÖ XGBoost finds optimal split direction for missing values automatically.

**üîç Code Comparison:**

### Using previous models of GB and XGB
i am just editing Data to add some missing values 

In [123]:
X_train[0][0] = np.nan

In [None]:
### ‚úÖ XGBoost: Handles missing value
xgb_model.fit(X_train, y_train)
print("‚úÖ XGBoost trained successfully with missing values!")
### ‚ùå Gradient Boosting: Will raise an error
try:
    gb_model.fit(X_train, y_train)
except ValueError as e:
    print(f"‚ùå Gradient Boosting failed: {e}")

‚úÖ XGBoost trained successfully with missing values!
‚ùå Gradient Boosting failed: Input X contains NaN.
GradientBoostingClassifier does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values


In [None]:
‚úÖ 4. Limited Tree Pruning (Greedy Splitting)
‚ùå Problem:
Gradient Boosting splits greedily and stops early if gain is small ‚Üí suboptimal trees.

‚úÖ XGBoost Solution:
Uses post-pruning with gamma (Œ≥) ‚Äî only keeps branches with sufficient gain.

üîç Code Comparison:

xgb_no_gamma = XGBClassifier(max_depth=5, use_label_encoder=False, eval_metric='logloss')
xgb_no_gamma.fit(X_train, y_train)

xgb_gamma = XGBClassifier(max_depth=5, gamma=5, use_label_encoder=False, eval_metric='logloss')
xgb_gamma.fit(X_train, y_train)

print("Accuracy without gamma:", xgb_no_gamma.score(X_test, y_test))
print("Accuracy with gamma:", xgb_gamma.score(X_test, y_test))
‚úÖ With gamma, trees are simpler and more effective.

In [None]:
‚úÖ 5. Resource-Heavy on Big Data
‚ùå Problem:
Gradient Boosting doesn‚Äôt scale well ‚Äî training becomes slow on large datasets.

‚úÖ XGBoost Solution:
Uses optimized data structure DMatrix + tree_method='hist' for faster training and less memory usage.

üîç Code Comparison:
python
Copy
Edit
from sklearn.datasets import make_classification
X_big, y_big = make_classification(n_samples=100000, n_features=50, random_state=42)

start = time.time()
xgb_big = XGBClassifier(n_estimators=100, tree_method='hist', use_label_encoder=False, eval_metric='logloss')
xgb_big.fit(X_big, y_big)
print("Training on large dataset:", round(time.time() - start, 2), "s")
‚úÖ Much faster and scalable due to hist method.