# Gradient Boosting Machines

AdaBoost'un sınıflandırma ve regresyon problemlerine kolayca uyarlanabilen genelleştirilmiş versiyonudur.  
Artıklar üzerine tek bir tahminsel model formunda olan modeller serisi kurulur.  
Zayıf öğrenicileri bir araya getirip güçlü bir öğrenici ortaya çıkarmak fikrine dayanır.

Daha fazla bilgi için 'Data-Science-and-Machine-Learning-Tutorial/VBO_Lectures/6-Machine Learning/3-Nonlinear Regression Models/08_GBM.ipynb' dosyasını inceleyebilirsin.

In [1]:
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.model_selection import GridSearchCV

diabetes = pd.read_csv('diabetes.csv')
df = diabetes.copy()
df = df.dropna()
y = df['Outcome']
X = df.drop('Outcome', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   test_size=0.3,
                                                   random_state=238)

In [2]:
from sklearn.ensemble import GradientBoostingClassifier

In [3]:
gbm_model = GradientBoostingClassifier().fit(X_train, y_train)

In [4]:
y_train_pred = gbm_model.predict(X_train)
acc_train = accuracy_score(y_train_pred, y_train)
acc_train

0.9366852886405959

In [5]:
y_test_pred = gbm_model.predict(X_test)
acc_test = accuracy_score(y_test_pred, y_test)
acc_test

0.7705627705627706

## Model Tuning

In [7]:
gbm_model.get_params()

{'ccp_alpha': 0.0,
 'criterion': 'friedman_mse',
 'init': None,
 'learning_rate': 0.1,
 'loss': 'log_loss',
 'max_depth': 3,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_iter_no_change': None,
 'random_state': None,
 'subsample': 1.0,
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': 0,
 'warm_start': False}

In [10]:
gbm_params = {"learning_rate": [0.001, 0.01, 0.1, 0.5],
              "n_estimators": [100, 500, 1000],
              "max_depth": [3, 5, 10],
              "min_samples_split": [2, 5, 10]
}

In [18]:
gbm = GradientBoostingClassifier()
gbm_cv_model = GridSearchCV(gbm, gbm_params, cv=10, n_jobs=-1, verbose=0)
gbm_cv_model.fit(X_train, y_train)

In [19]:
gbm_cv_model.best_params_

{'learning_rate': 0.1,
 'max_depth': 3,
 'min_samples_split': 5,
 'n_estimators': 100}

In [20]:
gbm_tuned = GradientBoostingClassifier(learning_rate=0.01, max_depth=3, min_samples_split=5, n_estimators=1000)
gbm_tuned.fit(X_train, y_train)

In [21]:
y_train_pred = gbm_tuned.predict(X_train)
acc_train = accuracy_score(y_train_pred, y_train)
acc_train

0.9106145251396648

In [22]:
y_test_pred = gbm_tuned.predict(X_test)
acc_test = accuracy_score(y_test_pred, y_test)
acc_test

0.7489177489177489