                                                        GRADIENT BOOST                                                       

Gradient boosting is an ensemble algorithm that is used to solve the problem related to high variance or underfitting.

Gradient Boosting is an ensemble machine learning technique that builds a predictive model by sequentially adding models (often decision trees) to minimize the residual errors made by previous models. Unlike Random Forests, which build independent trees in parallel, Gradient Boosting builds trees in a sequential manner, where each tree tries to correct the mistakes of the previous ones.

Gradient Boosting is built on two concepts:-
- Boosting: An ensemble method that builds models sequentially, where each model aims to correct the errors of its predecessor.
- Gradient Descent: An optimization algorithm that minimizes a loss function by iteratively moving toward the optimal solution.

Construction of Gradient Boost:-

1) Initialize the Model with a Base Prediction: 
    - Set an Initial Prediction: Start with a simple model that provides a baseline prediction. For example, in regression, the mean of the target values y is often used as the initial prediction.
    - Compute Initial Residuals: The model calculates the residuals, which are the differences between the actual values y and the initial predictions. These residuals indicate how much each prediction differs from the actual values and serve as the target for the next tree to minimize.

2) Fit the First Weak Learner (Decision Tree) on Residuals:
    - Train a Weak Learner: A weak learner (usually a shallow decision tree) is trained on the residuals from the initial prediction, rather than the actual target values y.
    - Minimize the Residuals: The weak learner tries to fit the residuals, effectively predicting how much the initial model’s prediction needs to be adjusted. This new tree captures patterns in the errors (residuals) left by the initial model.

3) Update Predictions with a Learning Rate:
    - Update the Overall Prediction: The predictions of the first weak learner are added to the initial prediction to update the model’s prediction.
        - The learning rate, α, is applied to control the contribution of each tree to the final prediction. Smaller learning rates make the process more gradual and prevent overfitting.
            New Prediction = Initial Prediction + 𝛼 × Tree’s Prediction on Residuals
    - Compute New Residuals: With the updated predictions, the residuals are recalculated based on the differences between the new predictions and the actual values. These new residuals highlight areas where the model is still making errors and need further refinement.

4) Iteratively Add More Trees:
    - Repeat the Process: Each subsequent weak learner is trained to predict the residuals of the current model (current residuals after all previous trees). Each tree incrementally corrects errors made by the ensemble up to that point.
    - Gradient Descent Optimization: At each step, the weak learner’s predictions are used to move closer to the optimal solution. The model essentially follows the gradient of the loss function, seeking to minimize the error by adjusting predictions toward the direction that reduces residuals.
    - Stopping Criteria: The process stops after a specified number of iterations (trees) or when the residuals are minimized below a threshold.
The final model is an aggregation of the initial prediction plus all weak learners weighted by the learning rate.


Key Components and Parameters in Gradient Boosting:-

1. Loss Function: In gradient boosting, the loss function guides the model to minimize prediction errors at each step. Common loss functions include:
    - Mean Squared Error (MSE) for regression tasks
    - Logarithmic Loss for binary classification
    - Deviance (or Log Loss) for multiclass classification

2. Learning Rate (α): Controls the contribution of each new tree to the overall model. Lower values of the learning rate require more trees but make the model less likely to overfit, as each step is more conservative.

3. Number of Estimators (Trees): Determines how many weak learners are added. More trees generally improve performance up to a point, but too many trees can cause overfitting.

4. Tree Depth (max_depth): Sets the maximum depth of each tree. Shallow trees help prevent overfitting and speed up training.

5. Subsample: Determines the fraction of samples used for training each tree. Randomly sampling the data helps introduce variety among the trees and reduces overfitting.


Gradient Boosting Algorithm Pseudocode:-

To summarize, here’s the pseudocode for gradient boosting:

- Initialize the model with a base prediction, e.g., the mean of y.
- For each iteration (i):
    - Compute residuals based on the current model’s predictions.
    - Train a new tree to predict these residuals.
    - Update the overall prediction by adding the tree’s predictions (weighted by the learning rate) to the previous predictions.
    - Calculate new residuals based on updated predictions.
- Repeat until the specified number of trees is reached or residuals are minimized.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn import metrics

In [2]:
raw_data_1 = pd.read_csv(r"S:\VS code\python\Data _Analytics\Dataset\WineQT.csv")
raw_data_1 = raw_data_1.drop(columns=['Id'])
raw_data_1

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1138,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1139,6.8,0.620,0.08,1.9,0.068,28.0,38.0,0.99651,3.42,0.82,9.5,6
1140,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1141,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6


In [4]:
raw_data_2 = pd.read_csv(r"S:\VS code\python\Data _Analytics\Dataset\Housing_Data.csv")
raw_data_2 = raw_data_2.drop(columns=['Address'])
raw_data_2

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1.059034e+06
1,79248.642455,6.002900,6.730821,3.09,40173.072174,1.505891e+06
2,61287.067179,5.865890,8.512727,5.13,36882.159400,1.058988e+06
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1.260617e+06
4,59982.197226,5.040555,7.839388,4.23,26354.109472,6.309435e+05
...,...,...,...,...,...,...
4995,60567.944140,7.830362,6.137356,3.46,22837.361035,1.060194e+06
4996,78491.275435,6.999135,6.576763,4.02,25616.115489,1.482618e+06
4997,63390.686886,7.250591,4.805081,2.13,33266.145490,1.030730e+06
4998,68001.331235,5.534388,7.130144,5.44,42625.620156,1.198657e+06


Making Gradient boosting models

In [7]:
x_classific = raw_data_1.iloc[:, :-1]
y_classific = raw_data_1['quality']

In [10]:
x_regress = raw_data_2.iloc[:, :-1]
y_regress = raw_data_2['Price']

In [11]:
x_train_classific, x_test_classific, y_train_classific, y_test_classific = train_test_split(x_classific, y_classific, test_size=0.8, random_state=42)

In [12]:
x_train_regress, x_test_regress, y_train_regress, y_test_regress = train_test_split(x_regress, y_regress, test_size=0.8, random_state=42)

In [59]:
GradientBoosting_classification = GradientBoostingClassifier(n_estimators=200, max_depth=1, random_state=42, learning_rate=0.2)
GradientBoosting_classification.fit(x_train_classific, y_train_classific)

In [None]:
GradientBoosting_regression = GradientBoostingRegressor(n_estimators=200, max_depth=3, random_state=42, learning_rate=0.)
GradientBoosting_regression.fit(x_train_regress, y_train_regress)

In [60]:
predicationsClassification = GradientBoosting_classification.predict(x_test_classific)
predicationsClassification

array([5, 5, 5, 6, 6, 6, 5, 5, 6, 5, 6, 6, 5, 6, 4, 5, 5, 6, 5, 7, 6, 6,
       5, 6, 5, 6, 6, 6, 5, 6, 6, 6, 7, 6, 6, 5, 5, 6, 7, 6, 6, 5, 6, 5,
       5, 5, 5, 6, 5, 6, 6, 4, 5, 7, 6, 7, 6, 7, 7, 6, 5, 6, 6, 6, 6, 6,
       6, 5, 5, 6, 6, 5, 5, 4, 5, 5, 7, 6, 5, 6, 6, 5, 6, 5, 5, 7, 6, 6,
       7, 5, 5, 5, 5, 5, 5, 5, 5, 7, 6, 6, 6, 4, 5, 6, 5, 5, 6, 6, 6, 6,
       7, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 6, 5, 5,
       6, 6, 8, 4, 4, 5, 6, 6, 7, 6, 6, 5, 6, 5, 5, 5, 5, 5, 6, 5, 6, 5,
       7, 6, 5, 6, 7, 6, 5, 5, 6, 6, 6, 5, 5, 5, 6, 5, 6, 4, 5, 5, 5, 5,
       6, 6, 6, 5, 6, 5, 7, 5, 5, 5, 4, 5, 5, 6, 6, 7, 7, 6, 6, 5, 5, 8,
       6, 6, 6, 6, 5, 6, 6, 5, 5, 5, 6, 6, 5, 6, 5, 6, 5, 5, 6, 5, 6, 6,
       5, 5, 5, 6, 4, 6, 5, 6, 6, 5, 6, 7, 6, 5, 6, 6, 5, 5, 6, 5, 6, 6,
       6, 5, 8, 6, 5, 6, 6, 5, 5, 5, 4, 6, 5, 6, 6, 6, 5, 5, 5, 5, 7, 4,
       5, 5, 7, 6, 5, 6, 6, 6, 5, 6, 5, 6, 5, 6, 5, 5, 5, 6, 5, 5, 5, 7,
       5, 5, 6, 5, 6, 5, 7, 6, 7, 6, 6, 5, 6, 5, 5,

In [18]:
y_test_classific

158     5
1081    6
291     5
538     6
367     6
       ..
123     5
1052    7
608     6
143     7
751     6
Name: quality, Length: 915, dtype: int64

In [20]:
predicationRegression = GradientBoosting_regression.predict(x_test_regress)
predicationRegression


array([1415056.07079386, 1227976.71395751, 1280471.34899197, ...,
       1027120.95496364, 1075257.93245793, 1269009.74619718])

In [21]:
y_test_regress

1501    1.339096e+06
2586    1.251794e+06
2653    1.340095e+06
1055    1.431508e+06
705     1.042374e+06
            ...     
3335    1.749820e+06
1920    9.951372e+05
3715    1.110932e+06
4646    9.850593e+05
946     1.285158e+06
Name: Price, Length: 4000, dtype: float64

Metrics Evaluation

In [61]:
print("Accuracy: (Classification) ", round(metrics.accuracy_score(y_test_classific, predicationsClassification)*100,2))
print("Accuracy: (Regression) ", round((1- metrics.mean_absolute_percentage_error(y_test_regress, predicationRegression))*100, 2))

Accuracy: (Classification)  55.41
Accuracy: (Regression)  90.2
