### Q1. What is Gradient Boosting Regression?

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

### Q4. What is a weak learner in Gradient Boosting?

### Q5. What is the intuition behind the Gradient Boosting algorithm?

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

## Answers

### Q1. What is Gradient Boosting Regression?



Gradient Boosting Regression is a machine learning technique used for regression tasks. It's an ensemble learning method that combines the predictions of multiple weak regression models (typically decision trees) to create a strong predictive model. Gradient Boosting Regression belongs to the broader family of boosting algorithms, and it's particularly effective for solving regression problems where the goal is to predict continuous numerical values rather than discrete categories.

#### Initialization:

- An initial model is created, usually a simple one like the mean or median of the target variable. This initial model serves as a starting point for the ensemble.

#### Sequential Training of Weak Learners:

- Gradient Boosting Regression trains a series of weak regression models sequentially.
- In each iteration, a new weak learner (typically a shallow decision tree) is trained to predict the residuals or errors from the previous ensemble's predictions.
- The residuals represent the difference between the actual target values and the predictions made by the current ensemble. The new weak learner is trained to capture the patterns in these residuals.
- The learning rate, which controls the contribution of each weak learner to the ensemble, can be adjusted to make the process more conservative or aggressive.

#### Combination of Weak Learners:

- After each iteration, the predictions of the new weak learner are combined with the predictions of the previous ensemble.
- The combination is done by adding the new weak learner's predictions (scaled by the learning rate) to the previous ensemble's predictions.

#### Sequential Iteration:

- Steps 2 and 3 are repeated for a predefined number of iterations or until a certain performance criterion is met.
- With each iteration, the ensemble becomes more accurate in modeling the relationship between the input features and the target variable.

#### Final Model Output:

- The final prediction for a given input is the sum of the initial model's prediction and the predictions of all the weak learners in the ensemble.

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.



In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
df=sns.load_dataset('iris')

In [3]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
from sklearn.prspecieseprocessing import LabelEncoder
encoder=LabelEncoder()

In [5]:
df['species']=encoder.fit_transform(df['species'])

In [7]:
X=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [8]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)

In [9]:
from sklearn.ensemble import GradientBoostingRegressor
gbr=GradientBoostingRegressor()

In [11]:
gbr.fit(X_train,y_train)

In [16]:
y_pred=gbr.predict(X_test)

In [29]:
from sklearn.metrics import r2_score,mean_squared_error

In [17]:
r2_score(y_test,y_pred)

0.9957266014015651

In [30]:
mean_squared_error(y_test,y_pred)

0.00300972740623842

### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters



In [18]:
param={ 'n_estimators':[100,200,300],
       "max_depth":[1,2,3],
       "alpha":[.5,.2,.9]
       

}

In [19]:
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import GridSearchCV

In [20]:
clf_gbr=GridSearchCV(GradientBoostingRegressor(),param_grid=param,cv=5,scoring='r2')

In [22]:
clf_gbr.fit(X_train,y_train)

In [23]:
clf_gbr.best_params_

{'alpha': 0.2, 'max_depth': 1, 'n_estimators': 100}

In [24]:
new_gbr=GradientBoostingRegressor(alpha=0.2,max_depth=1,n_estimators=100)

In [25]:
new_gbr.fit(X_train,y_train)

In [27]:
y_pred_new=new_gbr.predict(X_test)

In [28]:
r2_score(y_test,y_pred_new)

0.9866286382623181

In [31]:
mean_squared_error(y_test,y_pred_new)

0.009417364880348025

### Q4. What is a weak learner in Gradient Boosting?



In the context of Gradient Boosting, a weak learner refers to a simple, often shallow, and less expressive model that performs slightly better than random chance on the task at hand. Weak learners are also sometimes referred to as base learners. These models are typically simple decision trees, linear models, or other simple models.

The concept of using weak learners in Gradient Boosting is a key element of the algorithm's success. Here are some characteristics of weak learners in Gradient Boosting:

1. Low Complexity: Weak learners are deliberately kept simple and have low complexity. For decision trees, they are often shallow with a limited number of nodes and a small maximum depth.

2. Low Bias, High Variance: Weak learners have low bias, meaning they can fit the training data relatively well. However, they have high variance, which means they are sensitive to small changes in the training data and can overfit if used independently.

3. Slightly Better Than Random: Weak learners should perform slightly better than random guessing. In binary classification, this means they should have an accuracy better than 50%, and in regression, they should have a predictive performance better than predicting the mean of the target variable.

4. Fast to Train: Weak learners are typically computationally efficient and quick to train. This is essential because Gradient Boosting involves sequentially training many weak learners.

### Q5. What is the intuition behind the Gradient Boosting algorithm?




The intuition behind the Gradient Boosting algorithm is to build a strong predictive model by sequentially combining the outputs of multiple weak models (typically simple decision trees) in a way that focuses on correcting the errors made by the previous models.

#### Start with a Simple Model:
Gradient Boosting begins with a simple model, often just the mean (for regression) or a constant (for classification). This model serves as an initial approximation of the target variable or class distribution.

#### Iteratively Correct Errors:
The algorithm proceeds iteratively, where each iteration focuses on improving the model's predictions. In each iteration:

- A new weak learner (decision tree) is trained to predict the residuals or errors of the previous model. These residuals represent the differences between the actual target values and the current model's predictions.
- The new weak learner is chosen to minimize the errors (residuals) made by the previous model, effectively correcting the model's mistakes.
- The learning rate, which controls the contribution of each weak learner, can be adjusted to make the updates more conservative or aggressive.

#### Combine Weak Learners:
The predictions of each weak learner are combined in a weighted manner to update the overall model. Weak learners that perform better at correcting the model's errors are given higher weights in the combination.

#### Gradual Improvement: 
Over multiple iterations, the ensemble of weak learners gradually improves its predictive performance by reducing the errors in each iteration. The final model combines the strengths of all the weak learners to make accurate predictions.

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?



The Gradient Boosting algorithm builds an ensemble of weak learners iteratively to create a strong predictive model.

#### Initialization:

The ensemble starts with an initial model, which is often a simple one, such as the mean (for regression) or a constant (for classification). This initial model serves as the starting point for the ensemble.

#### Sequential Training of Weak Learners:

- Gradient Boosting trains a series of weak learners sequentially, one after the other. Each iteration focuses on improving the model's predictions.
- In each iteration, a new weak learner (typically a shallow decision tree) is trained to predict the residuals or errors from the previous ensemble's predictions. These residuals represent the differences between the actual target values and the current model's predictions.
- The weak learner is chosen to minimize the errors (residuals) made by the previous model, effectively correcting the model's mistakes.
- The learning rate, which controls the contribution of each weak learner to the ensemble, can be adjusted to make the updates more conservative or aggressive.

#### Combination of Weak Learners:

- After each iteration, the predictions of the new weak learner are combined with the predictions of the previous ensemble.
- The combination is done by adding the new weak learner's predictions (scaled by the learning rate) to the previous ensemble's predictions.
- The ensemble's predictions are updated, aiming to reduce the residual errors from the previous iterations.
#### Sequential Iteration:

- Steps 2 and 3 are repeated for a predefined number of iterations or until a certain performance criterion is met.
- With each iteration, the ensemble becomes more accurate in modeling the relationship between the input features and the target variable.
#### Final Model Output:

- The final prediction for a given input is the sum of the initial model's prediction and the predictions of all the weak learners in the ensemble.

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

#### Initialization:

The ensemble starts with an initial model, often represented as F0(x), which is a simple approximation of the target variable. For regression tasks, this can be the mean of the target values.

#### Residual Calculation:

- In each iteration t, the algorithm calculates the residuals or errors of the current model's predictions. The residuals for example i are calculated as:
#### ri**(t)=yi-Ft-1(Xi)

- yi represents the actual target value for example 
- Fi-1(Xi) represents the ensemble's prediction up to iteration t-1 for example i.

#### Training a Weak Learner:

- A new weak learner (e.g., a shallow decision tree) is trained to predict the residuals ri**t The goal is to find a weak learner ht(X) that  minimizes the loss function, which can be defined as:
#### L(y,Ft-1(X)+ht(X))

- Common loss functions include mean squared error (for regression) or logistic loss (for classification).

#### Weighted Combination of Weak Learners:

- After training the new weak learner, its predictions ht(x) are combined with the predictions of the previous ensemble up to iteration t-1

- The ensemble's prediction at iteration t becomes:
  Ft(X)=Ft-1(X)+alpha(ht)(X)
- alpha(t) is a weight assigned to the new weak learner, and it represents the contribution of the weak learner to the ensemble. It is determined during training and is based on the optimization of the loss function.

#### Sequential Iteration:

Steps 3 and 4 are repeated for a predefined number of iterations, with each iteration focusing on reducing the residuals and improving the ensemble's predictions.
The ensemble gradually improves by learning to correct its previous errors and capture the remaining patterns in the data.

#### Final Model Output:

The final prediction of the Gradient Boosting ensemble is the sum of the initial model's prediction and the contributions of all the weak learners up to the final iteration:

#### Ft(X)=F0(X)+sum(alpha(t)ht(x))