# CatBoost - Categorical Boosting

### CatBoost is a gradient boosting library that is particularly effective for handling categorical features in supervised learning tasks, especially for tabular data. Like XGBoost, it excels in regression and classification problems. CatBoost implements a unique algorithm for encoding categorical variables and employs a symmetric tree structure, which helps improve performance and training speed.

## Importing the libraries

In [None]:
!pip install catboost

Collecting catboost
  Downloading catboost-1.2.7-cp39-cp39-win_amd64.whl.metadata (1.2 kB)
Collecting graphviz (from catboost)
  Downloading graphviz-0.20.3-py3-none-any.whl.metadata (12 kB)
Collecting plotly (from catboost)
  Downloading plotly-5.24.1-py3-none-any.whl.metadata (7.3 kB)
Downloading catboost-1.2.7-cp39-cp39-win_amd64.whl (101.8 MB)
   ---------------------------------------- 0.0/101.8 MB ? eta -:--:--
   ---------------------------------------- 0.1/101.8 MB 2.6 MB/s eta 0:00:39
   ---------------------------------------- 0.4/101.8 MB 5.0 MB/s eta 0:00:21
   ---------------------------------------- 0.7/101.8 MB 5.7 MB/s eta 0:00:18
   ---------------------------------------- 1.0/101.8 MB 5.8 MB/s eta 0:00:18
   ---------------------------------------- 1.2/101.8 MB 5.8 MB/s eta 0:00:18
    --------------------------------------- 1.4/101.8 MB 5.1 MB/s eta 0:00:20
    --------------------------------------- 1.5/101.8 MB 5.1 MB/s eta 0:00:20
    -----------------------------

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

In [3]:
dataset = pd.read_csv('../../datasets/breast-cancer.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Splitting the dataset into the Training set and Test set

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Training CatBoost on the Training set

In [5]:
from catboost import CatBoostClassifier
classifier = CatBoostClassifier()
classifier.fit(X_train, y_train)

Learning rate set to 0.007956
0:	learn: 0.6778283	total: 146ms	remaining: 2m 25s
1:	learn: 0.6642874	total: 147ms	remaining: 1m 13s
2:	learn: 0.6510578	total: 149ms	remaining: 49.4s
3:	learn: 0.6351685	total: 150ms	remaining: 37.3s
4:	learn: 0.6203906	total: 151ms	remaining: 30.1s
5:	learn: 0.6053561	total: 152ms	remaining: 25.2s
6:	learn: 0.5913363	total: 154ms	remaining: 21.8s
7:	learn: 0.5773888	total: 155ms	remaining: 19.2s
8:	learn: 0.5638394	total: 156ms	remaining: 17.2s
9:	learn: 0.5507421	total: 157ms	remaining: 15.6s
10:	learn: 0.5377201	total: 158ms	remaining: 14.2s
11:	learn: 0.5243873	total: 160ms	remaining: 13.2s
12:	learn: 0.5129034	total: 161ms	remaining: 12.2s
13:	learn: 0.5047204	total: 162ms	remaining: 11.4s
14:	learn: 0.4942404	total: 163ms	remaining: 10.7s
15:	learn: 0.4836253	total: 164ms	remaining: 10.1s
16:	learn: 0.4733355	total: 166ms	remaining: 9.57s
17:	learn: 0.4629416	total: 167ms	remaining: 9.11s
18:	learn: 0.4527778	total: 168ms	remaining: 8.68s
19:	learn

<catboost.core.CatBoostClassifier at 0x19e3f643400>

## Making the Confusion Matrix

In [6]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[84  3]
 [ 0 50]]


0.9781021897810219

## Applying k-Fold Cross Validation

In [7]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Learning rate set to 0.007604
0:	learn: 0.6772057	total: 1.86ms	remaining: 1.86s
1:	learn: 0.6652633	total: 3.14ms	remaining: 1.56s
2:	learn: 0.6511784	total: 4.33ms	remaining: 1.44s
3:	learn: 0.6360094	total: 5.5ms	remaining: 1.37s
4:	learn: 0.6221218	total: 6.67ms	remaining: 1.33s
5:	learn: 0.6065689	total: 7.9ms	remaining: 1.31s
6:	learn: 0.5931935	total: 11ms	remaining: 1.56s
7:	learn: 0.5783827	total: 12.1ms	remaining: 1.5s
8:	learn: 0.5646398	total: 13.3ms	remaining: 1.47s
9:	learn: 0.5508871	total: 14.5ms	remaining: 1.44s
10:	learn: 0.5390346	total: 15.9ms	remaining: 1.43s
11:	learn: 0.5279564	total: 17.5ms	remaining: 1.44s
12:	learn: 0.5181720	total: 18.7ms	remaining: 1.42s
13:	learn: 0.5055409	total: 19.9ms	remaining: 1.4s
14:	learn: 0.4943123	total: 21.1ms	remaining: 1.39s
15:	learn: 0.4824866	total: 22.4ms	remaining: 1.37s
16:	learn: 0.4729889	total: 23.6ms	remaining: 1.36s
17:	learn: 0.4638986	total: 25.3ms	remaining: 1.38s
18:	learn: 0.4549171	total: 26.3ms	remaining: 1.36