# CatBoost

## Importing the libraries

In [9]:
!pip install catboost



In [10]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

In [11]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Splitting the dataset into the Training set and Test set

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Training CatBoost on the Training set

- No feature scaling is required here unlike other classification models.

In [13]:
from catboost import CatBoostClassifier   # The same CatBoostRegressor is available for Regression.
classifier = CatBoostClassifier()
classifier.fit(X_train, y_train)

Learning rate set to 0.007956
0:	learn: 0.6778283	total: 1.23ms	remaining: 1.23s
1:	learn: 0.6642874	total: 2.19ms	remaining: 1.09s
2:	learn: 0.6510578	total: 3.33ms	remaining: 1.11s
3:	learn: 0.6351685	total: 4.46ms	remaining: 1.11s
4:	learn: 0.6203906	total: 5.49ms	remaining: 1.09s
5:	learn: 0.6053561	total: 6.5ms	remaining: 1.08s
6:	learn: 0.5913363	total: 7.51ms	remaining: 1.06s
7:	learn: 0.5773888	total: 8.82ms	remaining: 1.09s
8:	learn: 0.5638394	total: 10ms	remaining: 1.1s
9:	learn: 0.5507421	total: 11.1ms	remaining: 1.09s
10:	learn: 0.5377201	total: 12.1ms	remaining: 1.09s
11:	learn: 0.5243873	total: 13.1ms	remaining: 1.08s
12:	learn: 0.5129034	total: 14.1ms	remaining: 1.07s
13:	learn: 0.5047204	total: 15.1ms	remaining: 1.06s
14:	learn: 0.4942404	total: 16.1ms	remaining: 1.06s
15:	learn: 0.4836253	total: 17.1ms	remaining: 1.05s
16:	learn: 0.4733355	total: 18.1ms	remaining: 1.05s
17:	learn: 0.4629416	total: 19.1ms	remaining: 1.04s
18:	learn: 0.4527778	total: 20.1ms	remaining: 1.

<catboost.core.CatBoostClassifier at 0x7f2a39214850>

## Making the Confusion Matrix

In [14]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[84  3]
 [ 0 50]]


0.9781021897810219

## Applying k-Fold Cross Validation

In [15]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print(f"Accuracy: {accuracies.mean()*100:.2f} %")
print(f"Standard Deviation: {accuracies.std()*100:.2f} %")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
6:	learn: 0.6007221	total: 7.27ms	remaining: 1.03s
7:	learn: 0.5865261	total: 8.34ms	remaining: 1.03s
8:	learn: 0.5760173	total: 9.35ms	remaining: 1.03s
9:	learn: 0.5641784	total: 10.2ms	remaining: 1.01s
10:	learn: 0.5538549	total: 11.2ms	remaining: 1s
11:	learn: 0.5413434	total: 12.2ms	remaining: 1s
12:	learn: 0.5308262	total: 13.1ms	remaining: 996ms
13:	learn: 0.5187893	total: 14.1ms	remaining: 992ms
14:	learn: 0.5084890	total: 15.1ms	remaining: 995ms
15:	learn: 0.4986254	total: 16.1ms	remaining: 992ms
16:	learn: 0.4890714	total: 17.2ms	remaining: 992ms
17:	learn: 0.4790883	total: 18.1ms	remaining: 990ms
18:	learn: 0.4700108	total: 19.2ms	remaining: 994ms
19:	learn: 0.4630325	total: 20.3ms	remaining: 993ms
20:	learn: 0.4536134	total: 21.3ms	remaining: 991ms
21:	learn: 0.4429695	total: 22.2ms	remaining: 989ms
22:	learn: 0.4362340	total: 23.2ms	remaining: 986ms
23:	learn: 0.4280061	total: 24.2ms	remaining: 984ms
24:	learn

- Here single accuracy of `CatBoost` model is same as `XGBoost` but the k-fold cross validation of `CatBoost` is `97.26` % which is better than `96.53` % of `XGBoost`.