#### Q1. What is Gradient Boosting Regression?
- Gradient boosting Regression calculates the difference between the current prediction and the known correct target value. This difference is called residual. After that Gradient boosting Regression trains a weak model that maps features to that residual.

#### Q4. What is a weak learner in Gradient Boosting?
- Decision trees are used as the weak learner in gradient boosting. Specifically regression trees are used that output real values for splits and whose output can be added together, allowing subsequent models outputs to be added and “correct” the residuals in the predictions

#### Q5. What is the intuition behind the Gradient Boosting algorithm?
- The key idea is to optimize a loss function that measures the difference between the predicted values and the true labels, and to use gradient descent to find the optimal parameters of the model. In gradient boosting, the loss function is typically a differentiable function, such as mean squared error for regression problems or cross-entropy for classification problems.

#### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?
- The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding models to the ensemble and adjusting the predictions of each model to correct for the errors of the previous models.

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?
- Initialize the ensemble
- Fit a weak learner
- Calculate the residuals
- Update the predictions
- Repeat steps 2-4
- Combine the weak learners

#### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's= performance using metrics such as mean squared error and R-squared.

#### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [1]:
###Q2 and Q3 combined below
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns


In [2]:
data=sns.load_dataset('tips')

In [4]:
data.isnull().sum()

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


In [6]:
data.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [22]:
## indendent and dependent variable
X=data.iloc[:,1:]
X

Unnamed: 0,tip,sex,smoker,day,time,size
0,1.01,Female,No,Sun,Dinner,2
1,1.66,Male,No,Sun,Dinner,3
2,3.50,Male,No,Sun,Dinner,3
3,3.31,Male,No,Sun,Dinner,2
4,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...
239,5.92,Male,No,Sat,Dinner,3
240,2.00,Female,Yes,Sat,Dinner,2
241,2.00,Male,Yes,Sat,Dinner,2
242,1.75,Male,No,Sat,Dinner,2


In [15]:
y=data.iloc[:,0]

In [17]:
data.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [18]:
from sklearn.model_selection import train_test_split

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [29]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


In [44]:
## create list for numrical and categorical colum
numeric=[i for i in X_train.columns if ((X_train[i].dtype=='int64')| (X_train[i].dtype=='float64')) ]

In [47]:
Category=[i for i in X_train.columns if ((X_train[i].dtype=='category')) ]
Category

['sex', 'smoker', 'day', 'time']

In [42]:
 X_train['smoker'].dtype

CategoricalDtype(categories=['Yes', 'No'], ordered=False)

In [48]:
## create pipeline and column transfer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

In [50]:
## data encoding and feature scalling lbraries
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

In [51]:
numpipline=Pipeline(
    steps=[
        ('scaler',StandardScaler())
    ]
)

catpipline=Pipeline(
    steps=[
        ('encoder',OneHotEncoder())
    ]
)

In [52]:
preprocessing=ColumnTransformer([
    ('numpipline',numpipline,numeric),
    ('catpipline',catpipline,Category)
])

In [53]:
X_train=preprocessing.fit_transform(X_train)
X_test=preprocessing.transform(X_test)

In [54]:
## create model for graident boost
from sklearn.ensemble import GradientBoostingRegressor

In [61]:
from sklearn.metrics import r2_score,mean_squared_error
import matplotlib.pyplot as plt

In [56]:
model={
    'Gradient':GradientBoostingRegressor()
}


In [64]:
def evalauate(X_train, X_test, y_train, y_test,models):
    report={}
    for i in range(len(models)):
        model=list(models.values())[i]
        
        ##fit the model
        model.fit(X_train,y_train)
        
        ##predict the value
        y_pred=model.predict(X_test)
        
        ## r2 score
        score=r2_score(y_test,y_pred)
        Mse=mean_squared_error(y_test,y_pred)
        report[list(models.keys())[i]]={'score':score,'Mse':Mse}
    return report
        
    

In [65]:
evalauate(X_train, X_test, y_train, y_test,model)

{'Gradient': {'score': 0.4221690437364807, 'Mse': 43.006403083239704}}

In [66]:
## randomoised serach cv
from sklearn.model_selection import RandomizedSearchCV

In [67]:
gradient=GradientBoostingRegressor()

In [74]:
parameter={
    'n_estimators':[100,200],
    'learning_rate':[0.1,0.001,0.0001],
    'max_depth':[3,5,7,8],
    'loss':['squared_error','huber'],
    'alpha':[0.9,1,2] 
}

In [72]:
rcv=RandomizedSearchCV(gradient,param_distributions=parameter,scoring='neg_mean_squared_error',cv=5,verbose=3)

In [73]:
rcv.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END alpha=0.9, learning_rate=0.001, loss=huber, max_depth=3, n_estimators=200;, score=-41.925 total time=   1.3s
[CV 2/5] END alpha=0.9, learning_rate=0.001, loss=huber, max_depth=3, n_estimators=200;, score=-102.068 total time=   1.3s
[CV 3/5] END alpha=0.9, learning_rate=0.001, loss=huber, max_depth=3, n_estimators=200;, score=-71.338 total time=   1.3s
[CV 4/5] END alpha=0.9, learning_rate=0.001, loss=huber, max_depth=3, n_estimators=200;, score=-81.374 total time=   1.6s
[CV 5/5] END alpha=0.9, learning_rate=0.001, loss=huber, max_depth=3, n_estimators=200;, score=-68.807 total time=   1.6s
[CV 1/5] END alpha=1, learning_rate=0.001, loss=huber, max_depth=8, n_estimators=100;, score=nan total time=   0.0s
[CV 2/5] END alpha=1, learning_rate=0.001, loss=huber, max_depth=8, n_estimators=100;, score=nan total time=   0.0s
[CV 3/5] END alpha=1, learning_rate=0.001, loss=huber, max_depth=8, n_estimators=100;, score=nan

RandomizedSearchCV(cv=5, estimator=GradientBoostingRegressor(),
                   param_distributions={'alpha': [0.9, 1, 2],
                                        'learning_rate': [0.1, 0.001, 0.0001],
                                        'loss': ['squared_error', 'huber'],
                                        'max_depth': [3, 5, 7, 8],
                                        'n_estimators': [100, 200, 300]},
                   scoring='neg_mean_squared_error', verbose=3)

In [75]:
rcv.best_params_

{'n_estimators': 100,
 'max_depth': 3,
 'loss': 'squared_error',
 'learning_rate': 0.1,
 'alpha': 0.9}

In [78]:
gradient=GradientBoostingRegressor(n_estimators=100,max_depth=3,loss='squared_error',learning_rate=0.1,alpha=0.9)

In [79]:
gradient.fit(X_train,y_train)

GradientBoostingRegressor()

In [80]:
y_predcv=gradient.predict(X_test)

In [82]:
 score=r2_score(y_test,y_predcv)
Mse=mean_squared_error(y_test,y_predcv)
print('score',score)
print('Mse',Mse)

score 0.4309860343545726
Mse 42.350178198794644


#### R2 is increased the hyperparameter tuning