# 🔥 1. What is Gradient Boosting?

**Gradient Boosting is an ensemble(jorna) technique where:**
  * Models (typically decision trees) are added one at a time.

  * Each new tree tries to correct the errors made by the previous trees.

  * It minimizes a loss function using gradient descent.

**🧠 Think of it as:**
"Make a prediction → See where you're wrong → Train a new tree to fix those errors → Repeat."



### 🛠 Gradient Boosting Flow:
  * Start with an initial model (say, mean of target).

  * Calculate the residuals (errors).

  * Train a tree on the residuals.

  * Add this tree to the model prediction.

  * Repeat the process multiple times.

## Gradient Boosting using GradientBoostingClassifier
**We'll use the Breast Cancer dataset from sklearn:**

In [1]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd 
from xgboost import XGBClassifier
import time
import numpy as np

In [2]:
data = load_breast_cancer()
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [3]:
df = pd.DataFrame(data.data, columns=data.feature_names)
df["Target"] = data.target
X, y = df.drop("Target", axis=1), df["Target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
68,9.029,17.33,58.79,250.5,0.10660,0.14130,0.31300,0.04375,0.2111,0.08046,...,10.310,22.65,65.50,324.7,0.14820,0.43650,1.25200,0.17500,0.4228,0.11750
181,21.090,26.57,142.70,1311.0,0.11410,0.28320,0.24870,0.14960,0.2395,0.07398,...,26.680,33.48,176.50,2089.0,0.14910,0.75840,0.67800,0.29030,0.4098,0.12840
63,9.173,13.86,59.20,260.9,0.07721,0.08751,0.05988,0.02180,0.2341,0.06963,...,10.010,19.23,65.59,310.1,0.09836,0.16780,0.13970,0.05087,0.3282,0.08490
248,10.650,25.22,68.01,347.0,0.09657,0.07234,0.02379,0.01615,0.1897,0.06329,...,12.250,35.19,77.98,455.7,0.14990,0.13980,0.11250,0.06136,0.3409,0.08147
60,10.170,14.88,64.55,311.9,0.11340,0.08061,0.01084,0.01290,0.2743,0.06960,...,11.020,17.45,69.86,368.6,0.12750,0.09866,0.02168,0.02579,0.3557,0.08020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,8.888,14.64,58.79,244.0,0.09783,0.15310,0.08606,0.02872,0.1902,0.08980,...,9.733,15.67,62.56,284.4,0.12070,0.24360,0.14340,0.04786,0.2254,0.10840
106,11.640,18.33,75.17,412.5,0.11420,0.10170,0.07070,0.03485,0.1801,0.06520,...,13.140,29.26,85.51,521.7,0.16880,0.26600,0.28730,0.12180,0.2806,0.09097
270,14.290,16.82,90.30,632.6,0.06429,0.02675,0.00725,0.00625,0.1508,0.05376,...,14.910,20.65,94.44,684.6,0.08567,0.05036,0.03866,0.03333,0.2458,0.06120
435,13.980,19.62,91.12,599.5,0.10600,0.11330,0.11260,0.06463,0.1669,0.06544,...,17.040,30.80,113.90,869.3,0.16130,0.35680,0.40690,0.18270,0.3179,0.10550


In [4]:
gb_model = GradientBoostingClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42)

In [5]:
gb_model.fit(X_train, y_train)

In [6]:
y_pred = gb_model.predict(X_test)

In [7]:
print("Gradient Boosting Accuracy:", accuracy_score(y_test, y_pred))

Gradient Boosting Accuracy: 0.956140350877193


# XGBoost using XGBClassifier

In [8]:
# xgb_model = XGBClassifier(learning_rate = 0.1, n_estimators = 1000, max_depth = 5, eval_metric = "logloss", use_label_encoder = False)
xgb_model = XGBClassifier(learning_rate = 0.1, n_estimators = 1000, max_depth = 5, eval_metric='logloss')


In [9]:
xgb_model.fit(X_train, y_train)

In [10]:
y_pred_xgb = xgb_model.predict(X_test)

In [11]:
print("XGBoost Accuracy:", accuracy_score(y_test, y_pred_xgb))

XGBoost Accuracy: 0.956140350877193


## Issues With the GradientBoost and how XGB Solves it 
* Slow Training (No Parallelism)
* No Regularization → Overfitting Risk
* Handling Missing Data: Not Supported Natively
* Limited Tree Pruning: Greedy & Shallow
* Resource Heavy on Large Data

## How XGBoost Solves These:

**Problem in GBM    -->  XGBoost Fix**

* Slow training	Uses --> parallel tree construction (n_jobs)
* No regularization	--> Adds L1 and L2 penalties
* Missing data	--> Handles missing values natively
* Tree pruning	--> Uses smart pruning (loss-based gain)
* Resource heavy --> Uses DMatrix for optimized memory

### 1. Slow Training (No Parallelism) 
*Using Previous Loaded Data to Perform Actions*

#### Train Gradient Boosting (Traditional) and Record Time

In [12]:
start_time_gb = time.time()
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42, learning_rate=0.1)
# gb_model = GradientBoostingClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)
end_time_gb = time.time()

gb_time = end_time_gb - start_time_gb
gb_preds = gb_model.predict(X_test)
gb_acc = accuracy_score(y_test, gb_preds)

print(f"Gradient Boosting Time: {gb_time:.4f}s | Accuracy: {gb_acc:.4f}")

Gradient Boosting Time: 0.3637s | Accuracy: 0.9561


In [13]:
print("🔁 Training XGBoost (Default n_jobs = -1)...")
start_time = time.time()

# 2. XGBoost with Parallel Training (default: n_jobs = -1)
xgb_parallel = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    eval_metric="logloss"
)

xgb_parallel.fit(X_train, y_train)
time_parallel = time.time() - start_time
acc_parallel = accuracy_score(y_test, xgb_parallel.predict(X_test))

print(f"✅ XGBoost (Parallel) - Time: {time_parallel:.4f}s | Accuracy: {acc_parallel:.4f}\n")

# ------------------------------------------------------------

print("🔁 Training XGBoost (n_jobs = 1, no parallelism)...")
start_time = time.time()

# 3. XGBoost with Single Thread (like sklearn)
xgb_single_thread = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    eval_metric="logloss",
    n_jobs=0
)

xgb_single_thread.fit(X_train, y_train)
time_single = time.time() - start_time
acc_single = accuracy_score(y_test, xgb_single_thread.predict(X_test))

print(f"✅ XGBoost (Single Thread) - Time: {time_single:.4f}s | Accuracy: {acc_single:.4f}")

🔁 Training XGBoost (Default n_jobs = -1)...
✅ XGBoost (Parallel) - Time: 0.0487s | Accuracy: 0.9561

🔁 Training XGBoost (n_jobs = 1, no parallelism)...
✅ XGBoost (Single Thread) - Time: 0.0447s | Accuracy: 0.9561


**n_jobs effect but we can see difference in large dataset**

### 2. No Regularization → Overfitting Risk

In [14]:
gb_overfit = GradientBoostingClassifier(n_estimators=500, max_depth=10, learning_rate=0.1)
gb_overfit.fit(X_train, y_train)

print("Train Accuracy:", gb_overfit.score(X_train, y_train))
print("Test Accuracy:", gb_overfit.score(X_test, y_test))

Train Accuracy: 1.0
Test Accuracy: 0.9385964912280702


In [15]:
xgb_no_reg = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    eval_metric='logloss',
    reg_alpha=0,     # No L1 Default value is also 0
    reg_lambda=0     # No L2 Default value is also 0
)
xgb_no_reg.fit(X_train, y_train)

print("Train Accuracy wih no Reqularization:", xgb_no_reg.score(X_train, y_train))
print("Test Accuracy with no Regularization:", xgb_no_reg.score(X_test, y_test))
## With Regularizations
xgb_with_reg = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    eval_metric='logloss',
    reg_alpha=10,     # L1 penalty (sparsity)
    reg_lambda=15     # L2 penalty (shrinkage)
)
xgb_with_reg.fit(X_train, y_train)
print("Reduce OverFitting")
print("Train Accuracy wih Reqularization:", xgb_with_reg.score(X_train, y_train))
print("Test Accuracy with Regularization:", xgb_with_reg.score(X_test, y_test))

Train Accuracy wih no Reqularization: 1.0
Test Accuracy with no Regularization: 0.956140350877193
Reduce OverFitting
Train Accuracy wih Reqularization: 0.9868131868131869
Test Accuracy with Regularization: 0.956140350877193


### ✅ 3. No Native Handling of Missing Values
**❌ Problem:**
Traditional Gradient Boosting fails if you pass missing values.

**✅ XGBoost Solution:**
Handles missing data internally — no need to fill or impute.*0

✅ XGBoost finds optimal split direction for missing values automatically.

**🔍 Code Comparison:**

### Using previous models of GB and XGB
i am just editing Data to add some missing values 

In [16]:
X_train.iloc[0][0] = np.nan

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  X_train.iloc[0][0] = np.nan
  X_train.iloc[0][0] = np.nan


In [17]:
### ✅ XGBoost: Handles missing value
xgb_model.fit(X_train, y_train)
print("✅ XGBoost trained successfully with missing values!")
### ❌ Gradient Boosting: Will raise an error
try:
    gb_model.fit(X_train, y_train)
except ValueError as e:
    print(f"❌ Gradient Boosting failed: {e}")

✅ XGBoost trained successfully with missing values!
❌ Gradient Boosting failed: Input X contains NaN.
GradientBoostingClassifier does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values


## ✅ 4. Limited Tree Pruning (Greedy Splitting)
**❌ Problem:**
Gradient Boosting splits greedily and stops early if gain is small → suboptimal trees.

**✅ XGBoost Solution:**
Uses post-pruning with gamma (γ) — only keeps branches with sufficient gain.

✅ With gamma, trees are simpler and more effective.

**🔍 Code Comparison:**


In [18]:
X_train = X_train.dropna()
y_train = y_train.loc[X_train.index]  # keep labels aligned

In [19]:
# ✅ XGBoost without gamma (greedy splitting)
xgb_no_gamma = XGBClassifier(max_depth=5, gamma=0, eval_metric='mlogloss')
xgb_no_gamma.fit(X_train, y_train)

# ✅ XGBoost with gamma (post-pruning)
xgb_gamma = XGBClassifier(max_depth=5, gamma=5, eval_metric='mlogloss')
xgb_gamma.fit(X_train, y_train)

# ❌ Gradient Boosting (greedy only, no post-pruning)
gb_model = GradientBoostingClassifier(max_depth=5)
gb_model.fit(X_train, y_train)

# 🔍 Accuracy Comparison
print("✅ XGBoost (No Gamma) Accuracy:", xgb_no_gamma.score(X_test, y_test))
print("✅ XGBoost (With Gamma) Accuracy:", xgb_gamma.score(X_test, y_test))
print("❌ GradientBoosting Accuracy:", gb_model.score(X_test, y_test))

✅ XGBoost (No Gamma) Accuracy: 0.956140350877193
✅ XGBoost (With Gamma) Accuracy: 0.956140350877193
❌ GradientBoosting Accuracy: 0.9649122807017544


## ✅ 5. Resource-Heavy on Big Data
**❌ Problem:**
Gradient Boosting doesn’t scale well — training becomes slow on large datasets.

**✅ XGBoost Solution:**
Uses optimized data structure DMatrix + tree_method='hist' for faster training and less memory usage.

🔍 Code Comparison:


In [22]:
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
import time

# X_big, y_big = make_classification(n_samples=100000, n_features=50, random_state=42)

# ✅ XGBoost - Optimized
start = time.time()
xgb_big = XGBClassifier(n_estimators=100, tree_method='hist', eval_metric='logloss')
xgb_big.fit(X_train, y_train)
print("XGBoost training time:", round(time.time() - start, 2), "seconds")

# ❌ Gradient Boosting - Slower
from sklearn.ensemble import GradientBoostingClassifier
start = time.time()
gb_big = GradientBoostingClassifier(n_estimators=100)
gb_big.fit(X_train, y_train)
print("Gradient Boosting training time:", round(time.time() - start, 2), "seconds")


XGBoost training time: 0.18 seconds
Gradient Boosting training time: 0.37 seconds


**Summary**
Use GradientBoostingClassifier when:

Dataset is small/medium

You want simplicity and integration with sklearn pipelines

**Use XGBClassifier when:**

You need speed and scalability

You want more regularization control

You're working on larger, more complex datasets