# Category Boosting (CatBoost)

Kategorik değişkenler ile otomatik olarak mücadele edebilen, hızlı, başarılı bir diğer GBM türevi.

## Model & Tahmin

In [1]:
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.model_selection import GridSearchCV

diabetes = pd.read_csv('diabetes.csv')
df = diabetes.copy()
df = df.dropna()
y = df['Outcome']
X = df.drop('Outcome', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   test_size=0.3,
                                                   random_state=238)

In [4]:
# !pip install catboost

In [5]:
from catboost import CatBoostClassifier

In [6]:
cat_model = CatBoostClassifier().fit(X_train, y_train)

Learning rate set to 0.0079
0:	learn: 0.6882931	total: 153ms	remaining: 2m 33s
1:	learn: 0.6838783	total: 157ms	remaining: 1m 18s
2:	learn: 0.6798772	total: 160ms	remaining: 53s
3:	learn: 0.6758977	total: 162ms	remaining: 40.4s
4:	learn: 0.6726836	total: 165ms	remaining: 32.9s
5:	learn: 0.6679158	total: 168ms	remaining: 27.8s
6:	learn: 0.6638231	total: 171ms	remaining: 24.2s
7:	learn: 0.6607234	total: 174ms	remaining: 21.5s
8:	learn: 0.6575214	total: 176ms	remaining: 19.4s
9:	learn: 0.6541715	total: 180ms	remaining: 17.8s
10:	learn: 0.6505439	total: 182ms	remaining: 16.4s
11:	learn: 0.6469680	total: 186ms	remaining: 15.3s
12:	learn: 0.6436875	total: 189ms	remaining: 14.3s
13:	learn: 0.6409665	total: 193ms	remaining: 13.6s
14:	learn: 0.6375338	total: 196ms	remaining: 12.9s
15:	learn: 0.6333083	total: 199ms	remaining: 12.2s
16:	learn: 0.6294457	total: 201ms	remaining: 11.6s
17:	learn: 0.6263514	total: 204ms	remaining: 11.1s
18:	learn: 0.6227957	total: 208ms	remaining: 10.7s
19:	learn: 0.

In [8]:
y_train_pred = cat_model.predict(X_train)
y_train_acc = accuracy_score(y_train, y_train_pred)
y_train_acc

0.9534450651769087

In [9]:
y_test_pred = cat_model.predict(X_test)
y_test_acc = accuracy_score(y_test, y_test_pred)
y_test_acc

0.7662337662337663

## Model Tuning

In [10]:
catb_params = {
    'iterations': [200, 500],
    'learning_rate': [0.01, 0.05, 0.1],
    'depth': [3, 5, 8]
}

In [11]:
catb = CatBoostClassifier()
catb_cv_model = GridSearchCV(catb, catb_params, cv=10, n_jobs=-1, verbose=2).fit(X_train, y_train)
catb_cv_model.best_params_

Fitting 10 folds for each of 18 candidates, totalling 180 fits
0:	learn: 0.6636618	total: 2.52ms	remaining: 501ms
1:	learn: 0.6408477	total: 4.45ms	remaining: 441ms
2:	learn: 0.6193119	total: 6.23ms	remaining: 409ms
3:	learn: 0.6057019	total: 8.1ms	remaining: 397ms
4:	learn: 0.5899499	total: 9.88ms	remaining: 385ms
5:	learn: 0.5755574	total: 11.7ms	remaining: 377ms
6:	learn: 0.5581996	total: 13.4ms	remaining: 371ms
7:	learn: 0.5460749	total: 15.7ms	remaining: 377ms
8:	learn: 0.5350339	total: 17.7ms	remaining: 377ms
9:	learn: 0.5267558	total: 19.7ms	remaining: 374ms
10:	learn: 0.5192650	total: 21.8ms	remaining: 374ms
11:	learn: 0.5105104	total: 23.9ms	remaining: 374ms
12:	learn: 0.5015761	total: 25.8ms	remaining: 371ms
13:	learn: 0.4923129	total: 27.9ms	remaining: 371ms
14:	learn: 0.4849222	total: 30.1ms	remaining: 371ms
15:	learn: 0.4798209	total: 32.4ms	remaining: 372ms
16:	learn: 0.4733698	total: 34.6ms	remaining: 373ms
17:	learn: 0.4660859	total: 37ms	remaining: 374ms
18:	learn: 0.4

{'depth': 5, 'iterations': 200, 'learning_rate': 0.05}

In [12]:
catb_tuned = CatBoostClassifier(depth=5, 
                               iterations=200, 
                               learning_rate=0.05)
catb_tuned.fit(X_train, y_train)

0:	learn: 0.6636618	total: 2.06ms	remaining: 411ms
1:	learn: 0.6408477	total: 3.46ms	remaining: 343ms
2:	learn: 0.6193119	total: 4.7ms	remaining: 308ms
3:	learn: 0.6057019	total: 6.02ms	remaining: 295ms
4:	learn: 0.5899499	total: 7.31ms	remaining: 285ms
5:	learn: 0.5755574	total: 8.4ms	remaining: 272ms
6:	learn: 0.5581996	total: 9.77ms	remaining: 269ms
7:	learn: 0.5460749	total: 11ms	remaining: 263ms
8:	learn: 0.5350339	total: 12.3ms	remaining: 262ms
9:	learn: 0.5267558	total: 13.9ms	remaining: 263ms
10:	learn: 0.5192650	total: 15.3ms	remaining: 262ms
11:	learn: 0.5105104	total: 16.6ms	remaining: 260ms
12:	learn: 0.5015761	total: 17.8ms	remaining: 256ms
13:	learn: 0.4923129	total: 19.2ms	remaining: 254ms
14:	learn: 0.4849222	total: 20.5ms	remaining: 253ms
15:	learn: 0.4798209	total: 21.9ms	remaining: 252ms
16:	learn: 0.4733698	total: 23.3ms	remaining: 251ms
17:	learn: 0.4660859	total: 24.8ms	remaining: 251ms
18:	learn: 0.4626831	total: 26.3ms	remaining: 250ms
19:	learn: 0.4574139	total

<catboost.core.CatBoostClassifier at 0x29bab1000d0>

In [13]:
y_train_pred = catb_tuned.predict(X_train)
y_acc_train = accuracy_score(y_train, y_train_pred)
y_acc_train

0.9385474860335196

In [14]:
y_test_pred = catb_tuned.predict(X_test)
y_acc_test = accuracy_score(y_test, y_test_pred)
y_acc_test

0.7619047619047619