# Gradient Boosting Classifier
Gradient boosting classifiers are a group of machine learning algorithms that combine many weak learning models together to create a strong predictive model.

Why do we need?
Gradient boosting models are becoming popular because of their effectiveness at classifying complex datasets, and have recently been used to win many Kaggle data science competitions.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report, confusion_matrix
from pprint import pprint

In [2]:
cleaned = pd.read_csv('../3.Preprocessing/cleaned_dataset.csv')

In [3]:
print(cleaned.head(2))

        dur  spkts  dpkts  sbytes  dbytes         rate  sttl  dttl  \
0  0.000011      2      0     496       0   90909.0902   254     0   
1  0.000008      2      0    1762       0  125000.0003   254     0   

         sload  dload  ...  state_CON  state_ECO  state_FIN  state_INT  \
0  180363632.0    0.0  ...        0.0        0.0        0.0        1.0   
1  881000000.0    0.0  ...        0.0        0.0        0.0        1.0   

   state_PAR  state_REQ  state_RST  state_URN  state_no  label  
0        0.0        0.0        0.0        0.0       0.0      0  
1        0.0        0.0        0.0        0.0       0.0      0  

[2 rows x 198 columns]


In [4]:
cleaned = pd.get_dummies(cleaned,columns= ['attack_cat'])

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    cleaned[['dload','sbytes', 'dttl']], cleaned[['label']], test_size=0.3, random_state = 0)

In [6]:
 #Create a GradientBoostingClassifier Classifier
model = GradientBoostingClassifier(learning_rate=0.5, max_depth=3, random_state=42)

In [7]:
print('Parameters currently in use:\n')
pprint(model.get_params())

Parameters currently in use:

{'ccp_alpha': 0.0,
 'criterion': 'friedman_mse',
 'init': None,
 'learning_rate': 0.5,
 'loss': 'deviance',
 'max_depth': 3,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_iter_no_change': None,
 'random_state': 42,
 'subsample': 1.0,
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': 0,
 'warm_start': False}


In [8]:
model.fit(X_train,y_train)

  return f(*args, **kwargs)


GradientBoostingClassifier(learning_rate=0.5, random_state=42)

In [9]:
predictions = model.predict(X_test)

In [10]:
print("Confusion Matrix:")
print(confusion_matrix(y_test, predictions))

Confusion Matrix:
[[23981  3911]
 [ 2432 46978]]


In [11]:
print("Classification Report")
print(classification_report(y_test, predictions))

Classification Report
              precision    recall  f1-score   support

           0       0.91      0.86      0.88     27892
           1       0.92      0.95      0.94     49410

    accuracy                           0.92     77302
   macro avg       0.92      0.91      0.91     77302
weighted avg       0.92      0.92      0.92     77302

