# Gradient Boosting

- Gradient Boosting is a powerful machine learning technique used for both classification and regression tasks.- 
It builds models sequentially, with each new model correcting the errors of its predecessor
- 
This is achieved by combining weak learners (typically decision trees) in a way that produces a strong learner.

# Key Components of Gradient Boosting

### 1.Weak Learner:

- Typically, decision trees with limited depth (shallow trees) are used as weak learners. These trees are simple models that can only capture basic patterns in the data.

### 2.Loss Function:

- The loss function measures how well the model fits the training data. Common loss functions include Mean Squared Error (MSE) for regression and Log Loss for classification.
Gradient Boosting optimizes the loss function by reducing the error at each iteration.

### 3.Additive Model:

- Gradient Boosting constructs the final model by iteratively adding weak learners. Each weak learner corrects the errors of the previous model, gradually improving performance.

### 4.Learning Rate:

- The learning rate (η) controls the contribution of each tree to the final model. Smaller values lead to more gradual learning, requiring more iterations but often resulting in better generalization.


# The Gradient Boosting Process

### 1.Initialization:

- The process starts with an initial prediction. For regression, this is usually the mean of the target variable, and for classification, it might be the log odds of the classes.

### 2.Compute Residuals:

- For each iteration, the model computes the residuals, which are the differences between the predicted values and the actual target values. These residuals indicate the errors made by the current model.

### 3.Fit a Weak Learner:

- A weak learner (e.g., a shallow decision tree) is trained to predict the residuals (errors) from the previous step.

### 4.Update the Model:

- The predictions of the weak learner are scaled by the learning rate and added to the existing model's predictions.
    #### \( F_{m}(x) = F_{m-1}(x) + \eta \cdot h_{m}(x) \)
- Where:

- \( F_{m}(x) \): Updated model  
- \( F_{m-1}(x) \): Previous model  
- \( h_{m}(x) \): New weak learner  
- \( \eta \): Learning

### 5.Repeat:

- The process repeats for a specified number of iterations or until the model achieves the desired performance.te  



# Gradient Boosting and Gradients

- The name "Gradient Boosting" comes from the use of gradients to minimize the loss function.
- At each step, the algorithm computes the gradient of the loss function with respect to the predictions. This gradient represents the direction and magnitude of the error reduction required.
- The weak learner is trained to approximate this gradient, effectively performing a step in gradient descent to minimize the loss.


# Regularization in Gradient Boosting

#### To prevent overfitting and improve generalization, Gradient Boosting includes several regularization techniques:

#### 1.Learning Rate:

- A smaller learning rate reduces the impact of each tree, forcing the model to learn slowly.

#### 2.Tree Constraints:

- Limiting the depth of trees, the number of leaves, or the minimum samples per leaf prevents overly complex trees.

#### 3.Subsampling:

- A random subset of the training data is used for each tree. This introduces randomness and reduces overfitting.

#### 4.Feature Subsampling:

- A random subset of features is used for splitting at each node of a tree, similar to Random Forest.


# Advantages of Gradient Boosting

#### 1.Accuracy:

- Gradient Boosting often achieves state-of-the-art performance on structured data.

#### 2.Flexibility:

- It can optimize various loss functions and be applied to both regression and classification.

#### 3.Customizability:

- Hyperparameters such as learning rate, number of trees, and tree depth can be fine-tuned for specific problems.

# Disadvantages of Gradient Boosting

#### 1.Computational Cost:

- Training can be slow due to the sequential nature of the algorithm.

#### 2.Prone to Overfitting:

- Without proper regularization, Gradient Boosting can overfit the training data.

#### 3.Sensitivity to Hyperparameters:

- Performance depends heavily on careful tuning of hyperparameters like learning rate, number of trees, and tree depth.

In [31]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, StandardScaler

In [35]:
data =  pd.read_csv(r"C:\Users\Shaik Sakhlaih\OneDrive\Desktop\titanic.csv")
data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [37]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [39]:
data.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [41]:
data.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [43]:
data.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [53]:
imputer = SimpleImputer(strategy='most_frequent')
data['Age'] = imputer.fit_transform(data[['Age']])

In [55]:
data.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [57]:
data['Cabin'].mode()[0]

'B96 B98'

In [59]:
data['Cabin'] = data['Cabin'].fillna('B98')

In [61]:
data.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       2
dtype: int64

In [63]:
data['Embarked'].mode()[0]

'S'

In [65]:
data['Embarked'] = data['Embarked'].fillna(data['Embarked'].mode()[0])

In [67]:
data.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64

In [69]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          891 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    object 
 11  Embarked     891 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [71]:
from sklearn.preprocessing import OrdinalEncoder

ol = OrdinalEncoder()

data['Pclass'] = ol.fit_transform(data[['Pclass']]).astype(int)

data['Pclass'].value_counts()

Pclass
2    491
0    216
1    184
Name: count, dtype: int64

In [73]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

data['Name'] = le.fit_transform(data['Name'])
data['Cabin'] = le.fit_transform(data['Cabin'])
data['Ticket'] = le.fit_transform(data['Ticket'])


a = data['Name'].value_counts()
b = data['Cabin'].value_counts()
c = data['Ticket'].value_counts()

print(a)
print(b)
print(c)

Name
108    1
98     1
267    1
284    1
566    1
      ..
431    1
518    1
411    1
428    1
220    1
Name: count, Length: 891, dtype: int64
Cabin
48     687
64       4
146      4
47       4
63       3
      ... 
125      1
77       1
73       1
126      1
61       1
Name: count, Length: 148, dtype: int64
Ticket
333    7
568    7
80     7
249    6
566    6
      ..
513    1
98     1
212    1
606    1
466    1
Name: count, Length: 681, dtype: int64


In [75]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int32  
 3   Name         891 non-null    int32  
 4   Sex          891 non-null    object 
 5   Age          891 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    int32  
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    int32  
 11  Embarked     891 non-null    object 
dtypes: float64(2), int32(4), int64(4), object(2)
memory usage: 69.7+ KB


In [77]:
data['Age'] = data['Age'].astype(int)
data['Fare'] = data['Fare'].astype(int)

In [79]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   PassengerId  891 non-null    int64 
 1   Survived     891 non-null    int64 
 2   Pclass       891 non-null    int32 
 3   Name         891 non-null    int32 
 4   Sex          891 non-null    object
 5   Age          891 non-null    int32 
 6   SibSp        891 non-null    int64 
 7   Parch        891 non-null    int64 
 8   Ticket       891 non-null    int32 
 9   Fare         891 non-null    int32 
 10  Cabin        891 non-null    int32 
 11  Embarked     891 non-null    object
dtypes: int32(6), int64(4), object(2)
memory usage: 62.8+ KB


In [81]:
data = pd.get_dummies(data, columns = ['Sex'], drop_first = True)

In [83]:
data['Sex_male'].value_counts()

Sex_male
True     577
False    314
Name: count, dtype: int64

In [85]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Sex_male
0,1,0,2,108,22,1,0,523,7,48,S,True
1,2,1,0,190,38,1,0,596,71,82,C,False
2,3,1,2,353,26,0,0,669,7,48,S,False
3,4,1,0,272,35,1,0,49,53,56,S,False
4,5,0,2,15,35,0,0,472,8,48,S,True


In [87]:
data['Sex_male'] = data['Sex_male'].astype(int)

In [89]:
data.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Sex_male
0,1,0,2,108,22,1,0,523,7,48,S,1
1,2,1,0,190,38,1,0,596,71,82,C,0


In [91]:
data = pd.get_dummies(data, columns = ['Embarked'], drop_first = True)

In [93]:
data.head(1)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Sex_male,Embarked_Q,Embarked_S
0,1,0,2,108,22,1,0,523,7,48,1,False,True


In [95]:
data[['Embarked_Q', 'Embarked_S']] = data[['Embarked_Q','Embarked_S']].astype(int)

In [97]:
data.head(1)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Sex_male,Embarked_Q,Embarked_S
0,1,0,2,108,22,1,0,523,7,48,1,0,1


In [99]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   PassengerId  891 non-null    int64
 1   Survived     891 non-null    int64
 2   Pclass       891 non-null    int32
 3   Name         891 non-null    int32
 4   Age          891 non-null    int32
 5   SibSp        891 non-null    int64
 6   Parch        891 non-null    int64
 7   Ticket       891 non-null    int32
 8   Fare         891 non-null    int32
 9   Cabin        891 non-null    int32
 10  Sex_male     891 non-null    int32
 11  Embarked_Q   891 non-null    int32
 12  Embarked_S   891 non-null    int32
dtypes: int32(9), int64(4)
memory usage: 59.3 KB


# GradientBoostingClassifier

In [103]:
from sklearn.model_selection import train_test_split

x =  data.drop(['Survived'], axis = 1)
y = data['Survived']

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 0.2, random_state = 0)

a = x_train.shape, x_test.shape
b = y_train.shape, y_test.shape

print(a)
print(b)

((178, 12), (713, 12))
((178,), (713,))


In [113]:
gbc = GradientBoostingClassifier(learning_rate=0.3, n_estimators=100, max_depth=3, min_samples_split=2, min_samples_leaf=1, ccp_alpha = 10 )

model = gbc.fit(x_train, y_train)

model

In [115]:
y_pred = model.predict(x_test)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [120]:
from sklearn.metrics import accuracy_score

ass = accuracy_score(y_test, y_pred)*100

print(f"Accuracy_Score = {ass}")

Accuracy_Score = 63.394109396914445


In [127]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor

x =  data.drop(['Survived'], axis = 1)
y = data['Survived']

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 0.2, random_state = 0)

a = x_train.shape, x_test.shape
b = y_train.shape, y_test.shape

print(a)
print(b)

((178, 12), (713, 12))
((178,), (713,))


In [131]:
gbs = GradientBoostingRegressor(loss='squared_error', learning_rate=0.1, n_estimators=100, subsample=1.0)

model = gbs.fit(x_train, y_train)

model

In [137]:
y_pred = model.predict(x_test)
y_pred

array([ 2.19180180e-01, -1.18553274e-01,  1.53442323e-01,  9.21778922e-01,
        1.00292978e+00,  2.88651023e-01,  8.69585945e-01,  7.18322000e-01,
        1.08899061e+00,  9.24280769e-01,  8.44118509e-02,  9.27596355e-01,
        8.70821500e-02,  1.11511319e+00,  1.06671281e+00,  9.60005559e-01,
        3.58952837e-01,  2.38142620e-02,  1.23273790e-01,  7.20240986e-01,
        3.58159645e-01,  1.15429125e+00,  1.06065614e-01,  9.52542248e-01,
        9.14829035e-01,  1.06685891e+00,  1.49553763e-01,  5.03252424e-01,
        9.87437847e-01,  7.83858959e-01,  3.04603922e-02,  5.78939529e-01,
        1.22089669e-01,  8.18437555e-02,  2.35426375e-01,  2.38008307e-01,
        3.10543479e-01,  2.66614907e-01,  3.77976326e-01,  8.55037293e-02,
        7.36287609e-01,  5.58820500e-04,  2.03225401e-01,  1.85892636e-01,
        8.34099795e-01, -4.59842601e-02, -8.52055005e-03,  9.50437254e-01,
        2.41264729e-01,  2.89971217e-01,  9.12499094e-01,  3.68203346e-01,
        1.07199337e+00,  

In [143]:
from sklearn.metrics import r2_score

r2 = r2_score(y_test, y_pred)*10

print(f" Accuracy of R2_Score = {r2}")

 Accuracy of R2_Score = 1.092907537059773
