<a href="https://colab.research.google.com/github/bhushanmandava/Gradiant-Boosting-Algos-Classification/blob/main/catboost_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CatBoost Classifier

## Part 1 - Data Preprocessing

### Importing the dataset

In [None]:
import pandas as pd
dataset = pd.read_csv('churn_modelling.csv')

In [None]:
dataset.head()

### Checking missing data

In [None]:
dataset.info()

### Handling categorical variables

CustomerId and Surname columns

In [None]:
dataset.drop(['CustomerId', 'Surname'], axis = 1, inplace = True)

In [None]:
dataset.head()

Geography column

In [None]:
dataset['Geography'].unique()

In [None]:
geography_dummies = pd.get_dummies(dataset['Geography'], drop_first = True)

In [None]:
geography_dummies

In [None]:
dataset = pd.concat([geography_dummies, dataset], axis = 1)

In [None]:
dataset.head()

In [None]:
dataset.drop(['Geography'], axis = 1, inplace = True)

In [None]:
dataset.head()

Gender column

In [None]:
dataset['Gender'].unique()

In [None]:
dataset['Gender'] = dataset['Gender'].apply(lambda x: 0 if x == 'Female' else 1)

In [None]:
dataset.head()

### Creating the Training Set and the Test Set

Getting the inputs and output

In [None]:
X = dataset.iloc[:, :-1].values

In [None]:
y = dataset.iloc[:, -1].values

In [None]:
X

In [None]:
y

Getting the Training Set and the Test Set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Part 2 - Building and training the model

### Building the model

In [None]:
!pip install catboost

In [None]:
import catboost as cb
model = cb.CatBoostClassifier()

### Training the model

In [None]:
model.fit(X_train, y_train)

### Inference

In [None]:
y_pred = model.predict(X_test)

### Predicting the result of a single observation


Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer?

**Solution**

In [None]:
print(model.predict([[0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]))

Therefore, our model predicts that this customer stays in the bank!

**Important note 1:** Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2:** Notice also that the "France" country was not input as a string in the last column but as "0, 0" in the first two columns. That's because of course the predict method expects the dummy values of the Geography variable.

## Part 3: Evaluating the model

### Making the Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

### Accuracy

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

### k-Fold Cross Validation

In [None]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = model,
                             X = X,
                             y = y,
                             scoring = 'accuracy',
                             cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

### Grid Search

In [None]:
from sklearn.model_selection import GridSearchCV
parameters = [{'learning_rate': [0.001,0.005,0.01], 'depth': [4,7,10], 'l2_leaf_reg': [2,6,10], 'random_strength': [0,5,10]}]
grid_search = GridSearchCV(estimator = model,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10)
grid_search.fit(X, y)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
print("Best Accuracy: {:.2f} %".format(best_accuracy*100))
print("Best Parameters:", best_parameters)