# Category Boosting (CatBoost)

Kategorik değişkenler ile otomatik olarak mücadele edebilen, hızlı, başarılı bir diğer GBM türevi.

* Kategorik değişken desteği
* Hızlı ve ölçeklenebilir GPU desteği
* Daha başarılı tahminler
* Hızlı train ve hızlı tahmin
* Rusyanın ilk açık kaynak kodlu, başarılı ML çalışması

otomatik olarak mücadele edebilmek

CatBoost'u bütün değişkenlerin sürekli olduğu aynı veri seti üzerinde belirli bir problem için incelediğinizde model performansı olarak kategorik değişkenlere göre kötü

kategorik değişkenlerin olduğu senaryoda daha iyi olduğu gözlemlenir.

# Category Boosting (CatBoost) - Model

In [15]:
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV,cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from sklearn import model_selection
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import BaggingRegressor

# uyarılar gözükmesin
from warnings import filterwarnings
filterwarnings('ignore')

In [16]:
import pandas as pd
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.25,
                                                    random_state=42)

In [14]:
# !pip install catboost



In [17]:
from catboost import CatBoostRegressor

In [18]:
catb = CatBoostRegressor()

In [19]:
catb_model = catb.fit(X_train, y_train)

Learning rate set to 0.031674
0:	learn: 437.6430699	total: 2.85ms	remaining: 2.85s
1:	learn: 431.3923642	total: 4.28ms	remaining: 2.13s
2:	learn: 424.8820360	total: 5.72ms	remaining: 1.9s
3:	learn: 418.2514904	total: 7.15ms	remaining: 1.78s
4:	learn: 412.6394021	total: 8.54ms	remaining: 1.7s
5:	learn: 406.6247020	total: 9.97ms	remaining: 1.65s
6:	learn: 400.5321206	total: 11.4ms	remaining: 1.61s
7:	learn: 394.6683437	total: 12.8ms	remaining: 1.58s
8:	learn: 388.2496484	total: 14.2ms	remaining: 1.56s
9:	learn: 382.9448842	total: 15.7ms	remaining: 1.55s
10:	learn: 377.2600080	total: 17.9ms	remaining: 1.61s
11:	learn: 372.4829606	total: 19.5ms	remaining: 1.61s
12:	learn: 366.6823437	total: 21ms	remaining: 1.59s
13:	learn: 362.6076230	total: 22.4ms	remaining: 1.58s
14:	learn: 358.0107745	total: 23.8ms	remaining: 1.56s
15:	learn: 353.2802665	total: 25.3ms	remaining: 1.56s
16:	learn: 348.5646265	total: 26.8ms	remaining: 1.55s
17:	learn: 343.6407912	total: 28.2ms	remaining: 1.54s
18:	learn: 3

In [20]:
import catboost

# Category Boosting (CatBoost) - Tahmin

In [21]:
y_pred = catb_model.predict(X_test)

In [22]:
np.sqrt(mean_squared_error(y_test, y_pred))

351.194631344607

# Category Boosting (CatBoost) - Model Tuning

GridSearchCV metoduyla en uygun parametreleri bulacağız

In [23]:
catb_grid = {
    "iterations":[200,500,1000,2000],
    "learning_rate":[0.01,0.03,0.05,0.1],
    "depth":[3,4,5,6,7,8]
}

In [24]:
catb = CatBoostRegressor()

In [25]:
catb_cv_model = GridSearchCV(catb, catb_grid, cv=5, n_jobs=-1, verbose=2)

In [26]:
catb_cv_model.fit(X_train, y_train)

Fitting 5 folds for each of 96 candidates, totalling 480 fits
0:	learn: 422.4143448	total: 1.69ms	remaining: 1.69s
1:	learn: 404.1864276	total: 2.52ms	remaining: 1.26s
2:	learn: 386.3231718	total: 3.31ms	remaining: 1.1s
3:	learn: 370.5548032	total: 4.25ms	remaining: 1.06s
4:	learn: 354.9242038	total: 5.44ms	remaining: 1.08s
5:	learn: 342.3403984	total: 6.42ms	remaining: 1.06s
6:	learn: 328.2370070	total: 7.58ms	remaining: 1.07s
7:	learn: 317.5056526	total: 8.77ms	remaining: 1.09s
8:	learn: 306.6243511	total: 9.73ms	remaining: 1.07s
9:	learn: 297.3147023	total: 10.7ms	remaining: 1.06s
10:	learn: 288.3685892	total: 11.9ms	remaining: 1.07s
11:	learn: 281.0996220	total: 12.7ms	remaining: 1.05s
12:	learn: 273.2254898	total: 13.6ms	remaining: 1.04s
13:	learn: 266.9003385	total: 14.6ms	remaining: 1.02s
14:	learn: 261.9092500	total: 15.7ms	remaining: 1.03s
15:	learn: 256.2637350	total: 16.8ms	remaining: 1.03s
16:	learn: 250.3667935	total: 17.8ms	remaining: 1.03s
17:	learn: 244.8631098	total: 1

CatBoost 14 dk sürdü.

In [27]:
catb_cv_model.best_params_

{'depth': 5, 'iterations': 1000, 'learning_rate': 0.1}

In [28]:
catb_tuned = CatBoostRegressor(iterations = 1000,
                              learning_rate = 0.1,
                              depth = 5)

In [29]:
catb_tuned = catb_tuned.fit(X_train, y_train)

0:	learn: 422.4143448	total: 1.03ms	remaining: 1.03s
1:	learn: 404.1864276	total: 2.37ms	remaining: 1.18s
2:	learn: 386.3231718	total: 3.43ms	remaining: 1.14s
3:	learn: 370.5548032	total: 4.6ms	remaining: 1.15s
4:	learn: 354.9242038	total: 5.52ms	remaining: 1.1s
5:	learn: 342.3403984	total: 6.72ms	remaining: 1.11s
6:	learn: 328.2370070	total: 7.71ms	remaining: 1.09s
7:	learn: 317.5056526	total: 8.85ms	remaining: 1.1s
8:	learn: 306.6243511	total: 9.98ms	remaining: 1.1s
9:	learn: 297.3147023	total: 10.8ms	remaining: 1.07s
10:	learn: 288.3685892	total: 11.9ms	remaining: 1.07s
11:	learn: 281.0996220	total: 13ms	remaining: 1.07s
12:	learn: 273.2254898	total: 13.9ms	remaining: 1.05s
13:	learn: 266.9003385	total: 15.1ms	remaining: 1.06s
14:	learn: 261.9092500	total: 16.2ms	remaining: 1.06s
15:	learn: 256.2637350	total: 17.1ms	remaining: 1.05s
16:	learn: 250.3667935	total: 18.2ms	remaining: 1.05s
17:	learn: 244.8631098	total: 19.4ms	remaining: 1.06s
18:	learn: 240.1540669	total: 20.2ms	remaini

In [30]:
y_pred = catb_tuned.predict(X_test)

In [31]:
np.sqrt(mean_squared_error(y_test,y_pred))

356.665762904938

doğrusal olmayan regresyon problemi anlamında örnek veri seti kapsamında test-train ayırma ve kişisel tercihler neticesinde en başarılı model...