`CatBoost
`
CatBoost is a gradient boosting library developed by Yandex, a Russian multinational technology company. It's designed to handle categorical features and provide efficient, scalable, and accurate predictions.

`Key Features of CatBoost:
`
1. Gradient Boosting: CatBoost uses gradient boosting to optimize the loss function.
2. Categorical Feature Support: CatBoost natively supports categorical features, eliminating the need for one-hot encoding or label encoding.
3. Efficient Handling of Categorical Features: CatBoost uses ordered boosting and other techniques to efficiently handle categorical features.
4. GPU Support: CatBoost supports GPU acceleration, making it fast and efficient for large datasets.
5. Python and R Support: CatBoost has APIs for both Python and R, making it accessible to a wide range of users.

`Advantages:
`
1. Handling Categorical Features: CatBoost's native support for categorical features makes it ideal for datasets with many categorical variables.
2. High Accuracy: CatBoost is known for its high accuracy and performance.
3. Efficient: CatBoost is designed to be efficient and scalable, making it suitable for large datasets.

`Applications:
`
1. Classification: CatBoost is widely used for classification tasks, such as image classification, sentiment analysis, and medical diagnosis.
2. Regression: CatBoost is used for regression tasks, such as predicting continuous outcomes like house prices or stock prices.
3. Ranking: CatBoost can be used for ranking tasks, such as recommending products or web search results.

CatBoost is a powerful library that's widely used in machine learning applications due to its ability to handle categorical features and provide accurate predictions.

In [11]:
from catboost import CatBoostClassifier
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report,  mean_squared_error, r2_score
from sklearn.model_selection import train_test_split




In [12]:
data = sns.load_dataset("titanic")

In [13]:
data["age"] = data["age"].fillna(data["age"].median())

In [14]:
for col in data.columns:
    if data[col].dtype == "object" or data[col].dtype.name == "category":
        le = LabelEncoder()
        data[col] = le.fit_transform(data[col].astype(str))

In [15]:
# Features and target
X = data.drop(columns=["alive", "deck", "survived"])
y = data["survived"]

# Split (80% train, 20% test)
train_X, test_X, train_Y, test_Y = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train AdaBoost
model = CatBoostClassifier()
model.fit(train_X, train_Y)

# Predict
y_pred = model.predict(test_X)

# Evaluate
print("Accuracy Score:", accuracy_score(test_Y, y_pred))
print("\nClassification Report:\n", classification_report(test_Y, y_pred))

Learning rate set to 0.008911
0:	learn: 0.6870374	total: 1.96ms	remaining: 1.96s
1:	learn: 0.6802282	total: 2.64ms	remaining: 1.32s
2:	learn: 0.6738666	total: 4.04ms	remaining: 1.34s
3:	learn: 0.6677662	total: 4.92ms	remaining: 1.23s
4:	learn: 0.6626316	total: 5.92ms	remaining: 1.18s
5:	learn: 0.6567517	total: 6.68ms	remaining: 1.11s
6:	learn: 0.6512725	total: 7.91ms	remaining: 1.12s
7:	learn: 0.6456234	total: 9.02ms	remaining: 1.12s
8:	learn: 0.6402984	total: 9.8ms	remaining: 1.08s
9:	learn: 0.6351999	total: 10.8ms	remaining: 1.07s
10:	learn: 0.6301893	total: 11.7ms	remaining: 1.05s
11:	learn: 0.6253030	total: 12.6ms	remaining: 1.03s
12:	learn: 0.6199236	total: 13.4ms	remaining: 1.02s
13:	learn: 0.6156303	total: 14.1ms	remaining: 990ms
14:	learn: 0.6119098	total: 14.5ms	remaining: 953ms
15:	learn: 0.6073336	total: 15.5ms	remaining: 954ms
16:	learn: 0.6028412	total: 16.8ms	remaining: 970ms
17:	learn: 0.5985806	total: 18ms	remaining: 981ms
18:	learn: 0.5944309	total: 18.9ms	remaining: 9