# Category Boosting (CatBoost)

Kategorik değişkenler ile otomatik olarak mücadele edebilen, hızlı, başarılı bir diğer GBM türevi. 2017'de Yandex tarafından geliştirilmiştir.

![image.png](image20.png)

In [1]:
import pandas as pd 
import numpy as np 
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.25,
                                                    random_state=42)

In [2]:
!pip install catboost



In [3]:
from catboost import CatBoostRegressor

catb = CatBoostRegressor()
catb_model = catb.fit(X_train, y_train)

Learning rate set to 0.031674
0:	learn: 437.6430699	total: 146ms	remaining: 2m 26s
1:	learn: 431.3923642	total: 147ms	remaining: 1m 13s
2:	learn: 424.8820360	total: 148ms	remaining: 49.1s
3:	learn: 418.2514904	total: 149ms	remaining: 37s
4:	learn: 412.6394021	total: 149ms	remaining: 29.7s
5:	learn: 406.6247020	total: 150ms	remaining: 24.9s
6:	learn: 400.5321206	total: 151ms	remaining: 21.4s
7:	learn: 394.6683437	total: 152ms	remaining: 18.8s
8:	learn: 388.2496484	total: 152ms	remaining: 16.8s
9:	learn: 382.9448842	total: 153ms	remaining: 15.2s
10:	learn: 377.2600080	total: 154ms	remaining: 13.8s
11:	learn: 372.4829606	total: 154ms	remaining: 12.7s
12:	learn: 366.6823437	total: 155ms	remaining: 11.8s
13:	learn: 362.6076230	total: 156ms	remaining: 11s
14:	learn: 358.0107745	total: 156ms	remaining: 10.3s
15:	learn: 353.2802665	total: 157ms	remaining: 9.66s
16:	learn: 348.5646265	total: 158ms	remaining: 9.12s
17:	learn: 343.6407912	total: 159ms	remaining: 8.66s
18:	learn: 339.2363847	total

In [4]:
y_pred = catb_model.predict(X_test)
np.sqrt(mean_squared_error(y_pred, y_test))

np.float64(351.194631344607)

In [5]:
catb_grid = {
    'iterations': [500, 1000, 2000],
    'learning_rate': [0.01, 0.03, 0.05, 0.1],
    'depth': [3,4,5,6,7,8]
}

In [6]:
from sklearn.model_selection import GridSearchCV

catb = CatBoostRegressor()
catb_cv_model = GridSearchCV(catb, catb_grid, cv=10, n_jobs=-1, verbose=2)
catb_cv_model.fit(X_train, y_train)

Fitting 10 folds for each of 72 candidates, totalling 720 fits
0:	learn: 425.7900818	total: 804us	remaining: 401ms
1:	learn: 404.8723520	total: 1.47ms	remaining: 365ms
2:	learn: 387.4057666	total: 2.06ms	remaining: 342ms
3:	learn: 372.2801584	total: 2.64ms	remaining: 327ms
4:	learn: 358.9204229	total: 3.27ms	remaining: 324ms
5:	learn: 347.0083933	total: 3.93ms	remaining: 324ms
6:	learn: 336.0130818	total: 4.5ms	remaining: 317ms
7:	learn: 324.3923300	total: 5.04ms	remaining: 310ms
8:	learn: 314.8690957	total: 5.55ms	remaining: 303ms
9:	learn: 308.5075563	total: 6.3ms	remaining: 309ms
10:	learn: 298.8587285	total: 6.92ms	remaining: 308ms
11:	learn: 294.7655438	total: 7.5ms	remaining: 305ms
12:	learn: 288.0697862	total: 8.04ms	remaining: 301ms
13:	learn: 282.6697154	total: 8.62ms	remaining: 299ms
14:	learn: 277.6121667	total: 9.15ms	remaining: 296ms
15:	learn: 273.4383979	total: 9.68ms	remaining: 293ms
16:	learn: 269.1556201	total: 10.2ms	remaining: 290ms
17:	learn: 264.8098704	total: 10.

In [11]:
catb_cv_model.best_params_

{'depth': 3, 'iterations': 500, 'learning_rate': 0.1}

In [12]:
catb_tuned = CatBoostRegressor(iterations=500,
                              learning_rate=0.01, 
                              depth=8)

catb_tuned = catb_tuned.fit(X_train, y_train)

0:	learn: 442.4903140	total: 13.3ms	remaining: 6.63s
1:	learn: 440.4621805	total: 16.8ms	remaining: 4.19s
2:	learn: 438.5132091	total: 20.9ms	remaining: 3.46s
3:	learn: 436.2180377	total: 23.4ms	remaining: 2.91s
4:	learn: 434.0461579	total: 25.6ms	remaining: 2.53s
5:	learn: 431.8437770	total: 27.5ms	remaining: 2.27s
6:	learn: 430.1594587	total: 29.4ms	remaining: 2.07s
7:	learn: 428.0941830	total: 31.5ms	remaining: 1.93s
8:	learn: 426.0998774	total: 33.9ms	remaining: 1.85s
9:	learn: 424.0249067	total: 34.5ms	remaining: 1.69s
10:	learn: 422.1921868	total: 36.5ms	remaining: 1.62s
11:	learn: 420.2506764	total: 38.5ms	remaining: 1.57s
12:	learn: 418.3116383	total: 41.7ms	remaining: 1.56s
13:	learn: 416.2966847	total: 44.1ms	remaining: 1.53s
14:	learn: 414.5776175	total: 46.4ms	remaining: 1.5s
15:	learn: 412.8009394	total: 49.2ms	remaining: 1.49s
16:	learn: 410.9774146	total: 53.7ms	remaining: 1.52s
17:	learn: 409.1047417	total: 57.3ms	remaining: 1.53s
18:	learn: 407.6243957	total: 59.7ms	re

In [13]:
y_pred = catb_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_pred, y_test))

np.float64(359.3343039282949)