# üê± CatBoost Algorithm -- CatBoostRegressor
---
## Notebook prepared by Muhammad Anas


## Definition
CatBoost is a gradient boosting algorithm designed to handle categorical features efficiently and produce high-accuracy models for classification and regression tasks.

### Types of CatBoost
1. CatBoostClassifier: Used for classification tasks.
2. CatBoostRegressor: Used for regression tasks.


In [10]:
#importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier, CatBoostRegressor
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import mean_squared_error, r2_score,mean_absolute_error

In [2]:
#load the dataset
data = sns.load_dataset('tips')
data.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [3]:
#checking for null values
data.isnull().sum().sort_values(ascending=False)

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

In [4]:
#checking for the info of the dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


In [5]:
#splitting the dataset into features and target variable
X = data.drop('tip', axis=1)
y = data['tip']

In [6]:
categorical_features=['sex','smoker','day','time']

In [7]:
#splitting the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
#creating the CatBoost regressor model
model = CatBoostRegressor(iterations=1000, learning_rate=0.1, depth=6, cat_features=categorical_features)
#training the model
model.fit(
    X_train,
    y_train,
    cat_features=categorical_features,
    eval_set=(X_test, y_test),
    verbose=100
)


0:	learn: 1.3718734	test: 1.1621797	best: 1.1621797 (0)	total: 152ms	remaining: 2m 31s
100:	learn: 0.6711956	test: 0.8861799	best: 0.8489167 (37)	total: 1.43s	remaining: 12.7s
200:	learn: 0.5154613	test: 0.8552099	best: 0.8489167 (37)	total: 2.63s	remaining: 10.5s
300:	learn: 0.4151466	test: 0.8518470	best: 0.8487932 (294)	total: 3.83s	remaining: 8.88s
400:	learn: 0.3441838	test: 0.8422055	best: 0.8402592 (398)	total: 5.09s	remaining: 7.6s
500:	learn: 0.2970462	test: 0.8309149	best: 0.8299185 (491)	total: 6.36s	remaining: 6.34s
600:	learn: 0.2589017	test: 0.8230832	best: 0.8230832 (600)	total: 7.54s	remaining: 5s
700:	learn: 0.2245352	test: 0.8167887	best: 0.8167887 (700)	total: 8.71s	remaining: 3.72s
800:	learn: 0.1993923	test: 0.8148286	best: 0.8135134 (784)	total: 9.94s	remaining: 2.47s
900:	learn: 0.1791132	test: 0.8116223	best: 0.8116223 (900)	total: 11.2s	remaining: 1.23s
999:	learn: 0.1615269	test: 0.8112914	best: 0.8104056 (910)	total: 12.4s	remaining: 0us

bestTest = 0.8104055

<catboost.core.CatBoostRegressor at 0x1abadd00610>

In [9]:
#making predictions
y_pred = model.predict(X_test)

In [13]:
#evaluating the model
mse=mean_squared_error(y_test, y_pred)
rmse=np.sqrt(mse)
mae=mean_absolute_error(y_test, y_pred)
r2=r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'Root Mean Squared Error: {rmse}')
print(f'Mean Absolute Error: {mae}')
print(f'R^2 Score: {r2}')


Mean Squared Error: 0.6567572105825409
Root Mean Squared Error: 0.8104055840025665
Mean Absolute Error: 0.6437565020842931
R^2 Score: 0.4745821869706899
