# Come, lets dive together!
<img src='https://avatars3.githubusercontent.com/u/29043415?s=300&v=3' style="float:right; width:10%"></img>

> **I guess a lot of us use CatBoost but how many of us actually understand what's going on behind the scenes?** 
> In this kernel we'll be exploring how CatBoost works and some key insights that you may find helpful in this competition.

## References and Credits:
This notebook wouldn't have been possible without these amazing resources. Some of the text and most of the figures used in this notebooks are taken from the below mentioned resources, combining everything into one.
1. [CatBoost Demystified - Sharanyu Rane](https://towardsdatascience.com/catboost-demystified-8b0b538bfa31)
2. [Deep Dive into Catboost Functionalities for Model Interpretation - Alwira Swain](https://towardsdatascience.com/deep-dive-into-catboost-functionalities-for-model-interpretation-7cdef669aeed)
3. [CatBoost Team's Tutorials](https://github.com/catboost/tutorials)
4. [Working with categorical data: Catboost - Katerina](https://medium.com/whats-your-data/working-with-categorical-data-catboost-8b5e11267a37)
5. [Machine Learning with CatBoost - Jitao David Zhang](https://accio.github.io/machinelearning/2018/05/30/catboost.html)
6. [What's so special about CatBoost - Hanish Sai Rohit Pallapothu
](https://medium.com/@hanishsidhu/whats-so-special-about-catboost-335d64d754ae)
7. [CatBoost: A machine learning library to handle categorical (CAT) data automatically - Sunil Ray](https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/)

## Contents
<a href="#Introduction:">1. Introduction</a>  
<a href="#What's-cool?">2. What's Cool?</a>  
<a href="#Why-CatBoost-matters?">3. Why CatBoost matters?</a>
<br>
<a href="#Implementing-CatBoost-for-DS-Bowl🥣">4. Implementing CatBoost for DS Bowl 🥣</a> <br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Data-Prep">4.1 Data Prep</a>   
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#The-CatBoost-Algorithm">4.2 The CatBoost Algorithm</a>   
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Categorical-Feature-Handling">4.3 Categorical Feature Handling</a> <br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Secret-of-CatBoost">4.4 Secrets of CatBoost</a> <br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Creating-our-Model-Class:">4.5 Creating Our Model</a> <br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Tuning-CatBoost">4.6 Tuning CatBoost</a> <br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Model-Analysis">4.7 Model Analysis</a> <br>
<a href="#Key-Takeaways">5. Key Takeaways</a>


<font size=5 color='red'>Please give this kernel an UPVOTE to show your appreciation, if you find it useful.</font>

## Introduction:

In the past few years anyone who has actively learned or implemented Machine Learning either in Kaggle or in real-life would have known about Gradient Boosted Machines for sure, that's how popular, useful and efficient they are when compared to other techniques. In fact, Gradient Boosted Decision Trees and Random Forest are my favorite ML models for tabular heterogeneous datasets. These models are the top performers on Kaggle competitions and are used widely in the industry.

So what is CatBoost you may ask. It is known as **Categorical Gradient Boosting.**. (If that didn't annoy you, there are no cats in ML atleast as of now)

> **CatBoost is based on gradient boosting. A relatively new machine learning technique developed by Yandex that outperforms many other existing boosting algorithms like XGBoost and Light GBM.**

<img src="https://miro.medium.com/max/1200/1*2p1GIUUcRSzyyJjSj4x7Iw.jpeg" style="width:60%"></img>


## What's cool?
While deep learning algorithms requires lots of data and computational power, boosting algorithms are still in need for most of the business problems. However boosting algorithms like XGBoost takes hours to train and sometimes you’ll get frustrated while tuning hyper-parameters.

On the other hand, CatBoost is **easy to implement and very powerful**. It provides impressive results in it’s very first run.

> One main difference between CatBoost and other boosting algorithms is that the CatBoost implements **symmetric trees**. This may sound crazy but helps in **decreasing prediction time**, which is extremely important for low latency environments.


## Why CatBoost matters?

**1. Improved Results**

Catboost achieves the best results on the benchmark, and that’s great. (Although the company who conducted the benchmark has a clear interest in the favor of Catboost 😅) Though, when you look at datasets where categorical features play a large role, such as Amazon and the Internet datasets, this improvement becomes significant and undeniable.

<img src="https://miro.medium.com/max/1818/1*vsg1IUlGtzCoNuGo9XqGwg.png" style="width:70%"></img>

**2. FAST**

While training time can take up longer than other GBDT implementations, prediction time is 13–16 times faster than the other libraries according to the Yandex benchmark

<img src="https://miro.medium.com/max/1664/1*BE8PZe54DMWe6gFdHlYsxg.png" style="width:70%"></img>

**3. Tried and Tested:**

Yandex is relying heavily on Catboost for ranking, forecasting and recommendations. This model is serving more than 70 million users each month.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Yandex_Logo.svg/1200px-Yandex_Logo.svg.png" style="width:30%"></img>


# Implementing CatBoost for DS Bowl🥣
Since this is a practical guide we'll dive right into the competition data and start implementing CatBoost with a few tips here and there and comprehensive explanation of how CatBoost works.

If you want to get better understanding of this competition data please take a look at some of these great kernels for EDA and introductions.

1. https://www.kaggle.com/erikbruin/data-science-bowl-2019-eda-and-baseline

2. https://www.kaggle.com/robikscube/2019-data-science-bowl-an-introduction

3. https://www.kaggle.com/gpreda/2019-data-science-bowl-eda

<font size=5 color='green'>Please give this kernel an UPVOTE to show your appreciation, if you find it useful.</font>


## Data Prep

In [None]:
# Main Libs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Utility libs
from tqdm import tqdm
import time
import datetime
from skopt import gp_minimize
from skopt.space import Real, Integer
from skopt.utils import use_named_args
from skopt.plots import plot_convergence
from copy import deepcopy
import pprint
import shap
import os

# You might have to do !pip install catboost
# If you don't have it on your local machine
# nevertheless Kaggle runtimes come preinstalled with CatBoost
import catboost

from pathlib import Path
data_dir = Path('../input/data-science-bowl-2019')
os.listdir(data_dir)

In [None]:
%%time
train = pd.read_csv('../input/data-science-bowl-2019/train.csv')
labels = pd.read_csv('../input/data-science-bowl-2019/train_labels.csv')
test = pd.read_csv('../input/data-science-bowl-2019/test.csv')
specs = pd.read_csv('../input/data-science-bowl-2019/specs.csv')
sample_submission = pd.read_csv('../input/data-science-bowl-2019/sample_submission.csv')

**How our data looks like**

In [None]:
train.head()

In [None]:
labels.head()

In [None]:
sample_submission.head()

In [None]:
list_of_user_activities = list(set(train['title'].value_counts().index).union(set(test['title'].value_counts().index)))
activities_map = dict(zip(list_of_user_activities, np.arange(len(list_of_user_activities))))

train['title'] = train['title'].map(activities_map)
test['title'] = test['title'].map(activities_map)
labels['title'] = labels['title'].map(activities_map)

In [None]:
win_code = dict(zip(activities_map.values(), (4100*np.ones(len(activities_map))).astype('int')))
win_code[activities_map['Bird Measurer (Assessment)']] = 4110

train['timestamp'] = pd.to_datetime(train['timestamp'])
test['timestamp'] = pd.to_datetime(test['timestamp'])

In [None]:
# Thanks for this beautiful function https://www.kaggle.com/mhviraf/a-new-baseline-for-dsb-2019-catboost-model 
def get_data(user_sample, test_set=False):
    last_activity = 0
    user_activities_count = {'Clip':0, 'Activity': 0, 'Assessment': 0, 'Game':0}
    accuracy_groups = {0:0, 1:0, 2:0, 3:0}
    all_assessments = []
    accumulated_accuracy_group = 0
    accumulated_accuracy=0
    accumulated_correct_attempts = 0 
    accumulated_uncorrect_attempts = 0 
    accumulated_actions = 0
    counter = 0
    durations = []
    for i, session in user_sample.groupby('game_session', sort=False):
        session_type = session['type'].iloc[0]
        session_title = session['title'].iloc[0]
        if test_set == True:
            second_condition = True
        else:
            if len(session)>1:
                second_condition = True
            else:
                second_condition= False
            
        if (session_type == 'Assessment') & (second_condition):
            all_attempts = session.query(f'event_code == {win_code[session_title]}')
            true_attempts = all_attempts['event_data'].str.contains('true').sum()
            false_attempts = all_attempts['event_data'].str.contains('false').sum()
            features = user_activities_count.copy()
            features['session_title'] = session['title'].iloc[0] 
            features['accumulated_correct_attempts'] = accumulated_correct_attempts
            features['accumulated_uncorrect_attempts'] = accumulated_uncorrect_attempts
            accumulated_correct_attempts += true_attempts 
            accumulated_uncorrect_attempts += false_attempts
            if durations == []:
                features['duration_mean'] = 0
            else:
                features['duration_mean'] = np.mean(durations)
            durations.append((session.iloc[-1, 2] - session.iloc[0, 2] ).seconds)
            features['accumulated_accuracy'] = accumulated_accuracy/counter if counter > 0 else 0
            accuracy = true_attempts/(true_attempts+false_attempts) if (true_attempts+false_attempts) != 0 else 0
            accumulated_accuracy += accuracy
            if accuracy == 0:
                features['accuracy_group'] = 0
            elif accuracy == 1:
                features['accuracy_group'] = 3
            elif accuracy == 0.5:
                features['accuracy_group'] = 2
            else:
                features['accuracy_group'] = 1

            features.update(accuracy_groups)
            features['accumulated_accuracy_group'] = accumulated_accuracy_group/counter if counter > 0 else 0
            features['accumulated_actions'] = accumulated_actions
            accumulated_accuracy_group += features['accuracy_group']
            accuracy_groups[features['accuracy_group']] += 1
            if test_set == True:
                all_assessments.append(features)
            else:
                if true_attempts+false_attempts > 0:
                    all_assessments.append(features)
                
            counter += 1
        accumulated_actions += len(session)
        if last_activity != session_type:
            user_activities_count[session_type] += 1
            last_activitiy = session_type
    if test_set:
        return all_assessments[-1] 
    return all_assessments

In [None]:
compiled_data = []
for i, (ins_id, user_sample) in tqdm(enumerate(train.groupby('installation_id', sort=False)), total=17000):
    compiled_data += get_data(user_sample)

In [None]:
new_train = pd.DataFrame(compiled_data)
del compiled_data
print("Train Data Shape:")
new_train.shape

In [None]:
import gc
gc.collect()

In [None]:
all_features = [x for x in new_train.columns if x not in ['accuracy_group']]
cat_features = ['session_title']
X, y = new_train[all_features], new_train['accuracy_group']
del train

## The CatBoost Algorithm

CatBoost does gradient boosting in a very elegant manner. Below is an explanation of CatBoost using a toy example

Let’s say, we have 10 data points in our dataset and are ordered in time as shown below.

<img src="https://miro.medium.com/max/303/1*K-2XayuU9Y4OklIlDWg1AQ.png"></img>

> If data doesn’t have time, CatBoost randomly creates an artificial time for each datapoint.

* **Step 1:** Calculate residuals for each datapoint using a model that has been trained on all the other data points at that time (For Example, to calculate residual for x5 datapoint, we train one model using x1, x2, x3 and x4 ). Hence we train different models for different data points . At the end we are calculating residuals for each datapoint that it’s corresponding model has never seen that datapoint before.
* **Step 2:** Train the model using the residuals of each datapoint
* **Step 3:** Repeat Step 1 & Step 2 (for n iterations)

For the above toy dataset, we should train 9 different models to get residuals for 9 data points. This is computationally expensive when we have more number of data points.
Hence by default, instead of training different model for each datapoint, it trains only log(num_of_datapoints) models. Now if a model has been trained on n data points then that model is used to calculate residuals for the next n data points.

* A model that has been trained on first data point is used for calculating residuals of second data point.
* An another model that has been trained on the first two data points is used for calculating residuals of third and fourth data points

In the above toy dataset, now we calculate residuals of x5,x6,x7 and x8 using a model that has been trained on x1, x2,x3 and x4.

All this procedure that I have explained until now is known as ordered boosting

**Random Permutations:**

CatBoost actually divides a given dataset into random permutations and apply ordered boosting on those random permutations. By default CatBoost creates four random permutations. With this randomness we can further stop overfitting our model. We can further control this randomness by tuning parameter bagging_temperature. This is something that you have already seen in other boosting algorithms

## Categorical Feature Handling

### Ordered Target Statistic:

Most of the GBDT algorithms and Kaggle competitors are already familiar with the use of Target Statistic (or target mean encoding).

> It’s a simple yet effective approach in which we encode each categorical feature with the estimate of the expected target y conditioned by the category.

Well, it turns out that applying this encoding carelessly (average value of y over the training examples with the same category) results in a target leakage.

To fight this prediction shift CatBoost uses a more effective strategy. It relies on the ordering principle and is inspired by online learning algorithms which get training examples sequentially in time. In this setting, the values of TS for each example rely only on the observed history.

To adapt this idea to a standard offline setting, Catboost introduces an artificial “time”— a random permutation σ1 of the training examples.

Then, for each example, it uses all the available “history” to compute its Target Statistic.
Note that, using only one random permutation, results in preceding examples with higher variance in Target Statistic than subsequent ones. To this end, CatBoost uses different permutations for different steps of gradient boosting.

### One-Hot Encoding:

* By default, CatBoost internally represents all the categorical features with One-hot encoding if and only if a categorical feature has two different categories.

* If you would like to implement One-hot encoding on a categorical feature that has N different categories then you can change parameter one_hot_max_size = N.

### Handling Numerical Features

CatBoost handle the numerical features in the same way that other tree algorithms do. We select the best possible split based on the Information Gain.



## Secret of CatBoost

Catboost introduces two critical algorithmic advances - the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features.
Both techniques are using random permutations of the training examples to fight the prediction shift caused by a special kind of target leakage present in all existing implementations of gradient boosting algorithms.

<img src="https://miro.medium.com/max/1796/1*nTMRk-U4KRFra3j8VMFz0A.png" style="width:70%"></img>

### Limitation
> **When the dataset has many numerical features, CatBoost takes more time to train than Light GBM.**

## Creating our Model Class:

In [None]:
class ModelOptimizer:
    best_score = None
    opt = None
    
    def __init__(self, model, X_train, y_train, categorical_columns_indices=None, n_fold=3, seed=2405, early_stopping_rounds=30, is_stratified=True, is_shuffle=True):
        self.model = model
        self.X_train = X_train
        self.y_train = y_train
        self.categorical_columns_indices = categorical_columns_indices
        self.n_fold = n_fold
        self.seed = seed
        self.early_stopping_rounds = early_stopping_rounds
        self.is_stratified = is_stratified
        self.is_shuffle = is_shuffle
        
        
    def update_model(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self.model, k, v)
            
    def evaluate_model(self):
        pass
    
    def optimize(self, param_space, max_evals=10, n_random_starts=2):
        start_time = time.time()
        
        @use_named_args(param_space)
        def _minimize(**params):
            self.model.set_params(**params)
            return self.evaluate_model()
        
        opt = gp_minimize(_minimize, param_space, n_calls=max_evals, n_random_starts=n_random_starts, random_state=2405, n_jobs=-1)
        best_values = opt.x
        optimal_values = dict(zip([param.name for param in param_space], best_values))
        best_score = opt.fun
        self.best_score = best_score
        self.opt = opt
        
        print('optimal_parameters: {}\noptimal score: {}\noptimization time: {}'.format(optimal_values, best_score, time.time() - start_time))
        print('updating model with optimal values')
        self.update_model(**optimal_values)
        plot_convergence(opt)
        return optimal_values
class CatboostOptimizer(ModelOptimizer):
    def evaluate_model(self):
        validation_scores = catboost.cv(
        catboost.Pool(self.X_train, 
                      self.y_train, 
                      cat_features=self.categorical_columns_indices),
        self.model.get_params(), 
        nfold=self.n_fold,
        stratified=self.is_stratified,
        seed=self.seed,
        early_stopping_rounds=self.early_stopping_rounds,
        shuffle=self.is_shuffle,
#         metrics='auc',
        plot=False)
        self.scores = validation_scores
        test_scores = validation_scores.iloc[:, 2]
        best_metric = test_scores.max()
        return 1 - best_metric

## Tuning CatBoost

`cat_features` — This parameter is a must in order to leverage Catboost preprocessing of categorical features, if you encode the categorical features yourself and don’t pass the columns indices as cat_features you are missing the essence of Catboost.

`one_hot_max_size` - As Catboost uses one-hot encoding for all features with at most one_hot_max_size unique values. In our case, the categorical features have a lot of unique values, so we won’t use 
one hot encoding, but depending on the dataset it may be a good idea to adjust this parameter.

`learning_rate & n_estimators` — The smaller the learning_rate, the more n_estimators needed to utilize the model. Usually, the approach is to start with a relative high learning_rate, tune other parameters and then decrease the 
learning_rate while increasing n_estimators.

`max_depth` — Depth of the base trees, this parameter has an high impact on training time.

`subsample` — Sample rate of rows, can’t be used in a Bayesian boosting type setting.

`colsample_bylevel`, `colsample_bytree`, `colsample_bynode`— Sample rate of columns.

`l2_leaf_reg` — L2 regularization coefficient

`random_strength` — Every split gets a score and random_strength is adding some randomness to the score, it helps to reduce overfitting.


**1. With Default Params**

In [None]:
default_cb = catboost.CatBoostClassifier(loss_function='MultiClass',
                                         task_type='CPU',
                                         random_seed=12,
                                         silent=True
                                        )
default_cb_optimizer = CatboostOptimizer(default_cb, X, y)
default_cb_optimizer.evaluate_model()

**2. Greedy Parameter Tuning**

In [None]:
greedy_cb = catboost.CatBoostClassifier(
    loss_function='MultiClass',
    task_type="CPU",
    learning_rate=0.01,
    iterations=2000,
    od_type="Iter",
    early_stopping_rounds=500,
    random_seed=24,
    silent=True
)

In [None]:
from sklearn.metrics import confusion_matrix
def qwk(act,pred,n=4,hist_range=(0,3)):
    
    O = confusion_matrix(act,pred)
    O = np.divide(O,np.sum(O))
    
    W = np.zeros((n,n))
    for i in range(n):
        for j in range(n):
            W[i][j] = ((i-j)**2)/((n-1)**2)
            
    act_hist = np.histogram(act,bins=n,range=hist_range)[0]
    prd_hist = np.histogram(pred,bins=n,range=hist_range)[0]
    
    E = np.outer(act_hist,prd_hist)
    E = np.divide(E,np.sum(E))
    
    num = np.sum(np.multiply(W,O))
    den = np.sum(np.multiply(W,E))
        
    return 1-np.divide(num,den)

In [None]:
cb_optimizer = CatboostOptimizer(greedy_cb, X, y)
params_space = [Real(0.01, 0.8, name='learning_rate'),]
cb_optimal_values = cb_optimizer.optimize(params_space)

**One Step Optimization**

In [None]:
cb = catboost.CatBoostClassifier(n_estimators=4000,
                         one_hot_max_size=2,
                         loss_function='MultiClass',
                         eval_metric='WKappa',
                         task_type='CPU',                
                         random_seed=5, 
                         use_best_model=True,
                         silent=True
                        )

In [None]:
one_cb_optimizer = CatboostOptimizer(cb, X, y)
params_space = [Real(0.01, 0.8, name='learning_rate'), 
                Integer(2, 10, name='max_depth'), 
                Real(0.5, 1.0, name='colsample_bylevel'), 
                Real(0.0, 100, name='bagging_temperature'), 
                Real(0.0, 100, name='random_strength'), 
                Real(1.0, 100, name='reg_lambda')]
one_cb_optimal_values = one_cb_optimizer.optimize(params_space, max_evals=40, n_random_starts=4)

In [None]:
one_cb_optimizer.model.get_params()

In [None]:
def make_classifier():
    clf = catboost.CatBoostClassifier(
            n_estimators = 4000,
            task_type = 'CPU',
            one_hot_max_size = 2,
            random_seed = 31,
            loss_function = 'MultiClass',
            learning_rate = 0.8,
            max_depth = 6,
            colsample_bylevel = 0.5,
            bagging_temperature = 28.635664398579774,
            random_strength = 100.0,
            reg_lambda = 100.0,
            early_stopping_rounds=500,
    )
    return clf
oof = np.zeros(len(X))

In [None]:
from sklearn.model_selection import KFold
oof = np.zeros(len(X))
NFOLDS = 5
folds = KFold(n_splits=NFOLDS, shuffle=True, random_state=2019)

training_start_time = time.time()
for fold, (trn_idx, test_idx) in enumerate(folds.split(X, y)):
    start_time = time.time()
    print(f'Training on fold {fold+1}')
    clf = make_classifier()
    clf.fit(X.loc[trn_idx, all_features], y.loc[trn_idx], eval_set=(X.loc[test_idx, all_features], y.loc[test_idx]),
                          use_best_model=True, verbose=500, cat_features=cat_features)    
    oof[test_idx] = clf.predict(X.loc[test_idx, all_features]).reshape(len(test_idx))
    print('Fold {} finished in {}'.format(fold + 1, str(datetime.timedelta(seconds=time.time() - start_time))))
    
print('-' * 30)
print('OOF QWK:', qwk(y, oof))
print('-' * 30)

In [None]:
# train model on all data once
clf = make_classifier()
clf.fit(X, y, verbose=500, cat_features=cat_features)

In [None]:
# process test set
new_test = []
for ins_id, user_sample in tqdm(test.groupby('installation_id', sort=False), total=1000):
    a = get_data(user_sample, test_set=True)
    new_test.append(a)
    
X_test = pd.DataFrame(new_test)
del test

In [None]:
# make predictions on test set once
preds = clf.predict(X_test)
del X_test

**Creating Submission File:**

In [None]:
sample_submission['accuracy_group'] = np.round(preds).astype('int')
sample_submission.to_csv('submission.csv', index=None)
sample_submission.head()

In [None]:
sample_submission['accuracy_group'].plot(kind='hist')

In [None]:
labels['accuracy_group'].plot(kind='hist')

In [None]:
pd.Series(oof).plot(kind='hist')

## Model Analysis
In addition to feature importance, which is quite popular for GBDT models to share, Catboost provides feature interactions and object (row) importance

In [None]:
clf = deepcopy(one_cb_optimizer.model)
pool = catboost.Pool(X, y, cat_features=cat_features)
clf.set_params(use_best_model=False, reg_lambda=1.0)
clf.fit(pool, use_best_model=False)
interactions = clf.get_feature_importance(pool, fstr_type=catboost.EFstrType.Interaction, prettified=True)
shap_values = clf.get_feature_importance(pool, fstr_type=catboost.EFstrType.ShapValues,prettified=True)

In [None]:
feature_interaction = [[X.columns[interaction[0]], X.columns[interaction[1]], interaction[2]] for i,interaction in interactions.iterrows()]
feature_interaction_df = pd.DataFrame(feature_interaction, columns=['feature1', 'feature2', 'interaction_strength'])
feature_interaction_df.head(10)

In [None]:
pd.Series(index=zip(feature_interaction_df['feature1'], feature_interaction_df['feature2']), data=feature_interaction_df['interaction_strength'].values, name='interaction_strength').head(10).plot(kind='barh', figsize=(18, 10), fontsize=16, color='b')

In [None]:
shap.initjs()
shap.summary_plot(shap_values[:, 0, :-1], X, feature_names=X.columns.tolist())

In [None]:
shap.initjs()
shap.summary_plot(shap_values[:, 1, :-1], X, feature_names=X.columns.tolist())

In [None]:
shap.initjs()
shap.summary_plot(shap_values[:, 2, :-1], X, feature_names=X.columns.tolist())

In [None]:
shap.initjs()
shap.summary_plot(shap_values[:, 3, :-1], X, feature_names=X.columns.tolist())

In [None]:
shap.summary_plot(shap_values[:, 0,:-1], X, feature_names=X.columns.tolist(), plot_type="bar")

In [None]:
shap.dependence_plot("accumulated_accuracy", shap_values[:, 3, :-1], X)

## Key Takeaways

* Catboost is built with a similar approach and attributes as with the “older” generation of GBDT models.
* Catboost’s power lies in its **categorical features preprocessing**, **prediction time** and **model analysis**.
* Catboost’s weaknesses are its training and optimization times.
* Don’t forget to pass `cat_features` argument to the classifier object. You aren’t really utilizing the power of Catboost without it.
* Though Catboost performs well with default parameters, there are several parameters that drive a significant improvement in results when tuned.

<font size=5 color='red'>Please give this kernel an UPVOTE to show your appreciation, if you find it useful.</font>