### Постановка задачи<a class="anchor" id="task"></a>

**Описание датасета**

* **Home Ownership** - домовладение
* **Annual Income** - годовой доход
* **Years in current job** - количество лет на текущем месте работы
* **Tax Liens** - налоговые обременения
* **Number of Open Accounts** - количество открытых счетов
* **Years of Credit History** - количество лет кредитной истории
* **Maximum Open Credit** - наибольший открытый кредит
* **Number of Credit Problems** - количество проблем с кредитом
* **Months since last delinquent** - количество месяцев с последней просрочки платежа
* **Bankruptcies** - банкротства
* **Purpose** - цель кредита
* **Term** - срок кредита
* **Current Loan Amount** - текущая сумма кредита
* **Current Credit Balance** - текущий кредитный баланс
* **Monthly Debt** - ежемесячный долг
* **Credit Default** - факт невыполнения кредитных обязательств (0 - погашен вовремя, 1 - просрочка)

**Подключение библиотек и скриптов**

In [1398]:
!pip install catboost



In [1399]:
import pandas as pd
import numpy as np
import pickle
import random
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, ShuffleSplit, cross_val_score, learning_curve
from sklearn.model_selection import KFold, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import classification_report, f1_score, precision_score, recall_score

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
import xgboost as xgb, lightgbm as lgbm, catboost as catb

%matplotlib inline

In [1400]:
import warnings
warnings.simplefilter('ignore')

**Пути к директориям и файлам**

In [1401]:
# input
### TRAIN_DATASET_PATH = '../course_project_train.csv'
TRAIN_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_train.csv'
###DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_train.csv'
# output
###TEST_DATASET_PATH = '../course_project_test.csv'
TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test.csv'

PREP_TRAIN_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_data_prep.csv'
PREP_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_data_prep.csv'

TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test.csv'
TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test.csv'
PREP_TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_data_prep.csv'
PREP_TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_data_prep.csv'
TEST_REAL_NORM_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_data_prepaired.csv'


# input
###DATASET_PATH = '../training_project_data.csv'
#DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_train.csv'

###PREP_DATASET_PATH = '../training_project_data_prep.csv'
#PREP_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_data_prep.csv'

# output
###TRAIN_FULL_PATH = '../training_project_train_full.csv'
TRAIN_FULL_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_train_full.csv'
###TRAIN_PART_PATH = '../training_project_train_part_b.csv'
TRAIN_PART_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_train_part_b.csv'
###TEST_PART_PATH = '../training_project_test_part.csv'
TEST_PART_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_part.csv'

###SCALER_FILE_PATH = '../scaler.pkl'
SCALER_FILE_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_scaler.pkl'

#TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test.csv'
#PREP_TEST_REAL_DATASET_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_data_prep.csv'
#TEST_REAL_NORM_PATH = '/content/drive/MyDrive/Colab 2nd notebook/Project/1/course_project_test_data_prepaired.csv'

In [1402]:
def get_classification_report(y_train_true, y_train_pred, y_test_true, y_test_pred):
    print('TRAIN\n\n' + classification_report(y_train_true, y_train_pred))
    print('TEST\n\n' + classification_report(y_test_true, y_test_pred))
    print('CONFUSION MATRIX\n')
    print(pd.crosstab(y_test_true, y_test_pred))

In [1403]:
def balance_df_by_target(df, target_name):

    target_counts = df[target_name].value_counts()

    major_class_name = target_counts.argmax()
    minor_class_name = target_counts.argmin()

    disbalance_coeff = int(target_counts[major_class_name] / target_counts[minor_class_name]) - 1

    for i in range(disbalance_coeff):
        sample = df[df[target_name] == minor_class_name].sample(target_counts[minor_class_name])
        df = df.append(sample, ignore_index=True)

    return df.sample(frac=1) 

In [1404]:
def Processing_Home_Ownership_NAN(X):
        
    X.loc[X['Home Ownership'].isna(), 'Home Ownership'] = X['Home Ownership'].mode()[0]

    return X

In [1405]:
def Processing_Annual_Income_NAN(X):

    X.loc[X['Annual Income'].isna(), 'Annual Income'] = X['Annual Income'].median()
 
    return X

In [1406]:
def Processing_Years_in_current_job_NAN(X):
  
    condition = X['Years in current job'].isna()
        
    X.loc[X['Years in current job'].isna(), 'Years in current job'] = X['Years in current job'].mode()[0]

    return X

In [1407]:
def Processing_Tax_Liens_NAN(X):

    X.loc[X['Tax Liens'].isna(), 'Tax Liens'] = X['Tax Liens'].median()
 
    return X

In [1408]:
def Processing_Number_of_Open_Accounts_NAN(X):

    X.loc[X['Number of Open Accounts'].isna(), 'Number of Open Accounts'] = X['Tax Liens'].median()
 
    return X

In [1409]:
def Processing_Years_of_Credit_History_NAN(X):

    X.loc[X['Years of Credit History'].isna(), 'Years of Credit History'] = X['Years of Credit History'].median()
 
    return X

In [1410]:
def Processing_Maximum_Open_Credit_NAN(X):

    X.loc[X['Maximum Open Credit'].isna(), 'Maximum Open Credit'] = X['Maximum Open Credit'].median()
 
    return X

In [1411]:
def Processing_Number_of_Credit_Problems_NAN(X):

    X.loc[X['Number of Credit Problems'].isna(), 'Number of Credit Problems'] = X['Number of Credit Problems'].median()
 
    return X

In [1412]:
def Processing_Months_since_last_delinquent_NAN(X):
        
    X.loc[X['Months since last delinquent'].isna(), 'Months since last delinquent'] = X['Months since last delinquent'].median()

    return X

In [1413]:
def Processing_Bankruptcies_NAN(X):
  
    X.loc[X['Bankruptcies'].isna(), 'Bankruptcies'] = X['Bankruptcies'].median()

    return X

In [1414]:
def Processing_Purpose_NAN(X):
        
    X.loc[X['Purpose'].isna(), 'Purpose'] = X['Purpose'].mode()[0]

    return X

In [1415]:
def Processing_Term_NAN(X):
        
    X.loc[X['Term'].isna(), 'Term'] = X['Term'].mode()[0]

    return X

In [1416]:
def Processing_Current_Loan_Amount_NAN(X):
        
    X.loc[X['Current Loan Amount'].isna(), 'Current Loan Amount'] = X['Current Loan Amount'].median()

    return X

In [1417]:
def Processing_Current_Credit_Balance_NAN(X):

    X.loc[X['Current Credit Balance'].isna(), 'Current Credit Balance'] = X['Current Credit Balance'].median()
 
    return X

In [1418]:
def Processing_Monthly_Debt_NAN(X):

    X.loc[X['Monthly Debt'].isna(), 'Monthly Debt'] = X['Monthly Debt'].median()
 
    return X

In [1419]:
def Processing_Credit_Score_NAN(X):

    X.loc[X['Credit Score'].isna(), 'Credit Score'] = X['Credit Score'].median()

 
    return X

In [1420]:
def Processing_Years_in_current_job_outliers(X):
    condition0 = X['Years in current job']=="< 1 year"
    condition1 = X['Years in current job']=="1 year"
    condition2 = X['Years in current job']=="2 years"  
    condition3 = X['Years in current job']=="3 years" 
    condition4 = X['Years in current job']=="4 years" 
    condition5 = X['Years in current job']=="5 years" 
    condition6 = X['Years in current job']=="6 years" 
    condition7 = X['Years in current job']=="7 years" 
    condition8 = X['Years in current job']=="8 years" 
    condition9 = X['Years in current job']=="9 years" 
    condition10 = X['Years in current job']=="10+ years"
    X.loc[condition0, 'Years in current job'] = 'less 1 years'
    X.loc[condition1|condition2|condition3, 'Years in current job'] = '1 to 3 years'
    X.loc[condition4|condition5|condition6|condition7, 'Years in current job'] = '4 to 7 years'
    X.loc[condition8|condition9|condition10, 'Years in current job'] = 'more than 8 years'

    return X

In [1421]:
def Processing_Purpose_outliers(X):
    condition1 = X['Purpose']=="renewable energy"
    condition2 = X['Purpose']=="vacation"  
    condition3 = X['Purpose']=="educational expenses" 
    condition4 = X['Purpose']=="moving" 
    condition5 = X['Purpose']=="wedding" 
    condition6 = X['Purpose']=="small business" 
    condition7 = X['Purpose']=="buy house" 
    condition8 = X['Purpose']=="take a trip" 
    condition9 = X['Purpose']=="major purchase" 
    condition10 = X['Purpose']=="medical bills" 
    condition11 = X['Purpose']=="buy a car" 
    X.loc[condition1|condition2|condition3|condition4 \
                |condition5|condition6|condition7|condition8\
                |condition9|condition10|condition11, 'Purpose'] = 'Rest Purposes'

    return X

In [1422]:
def Processing_Current_Loan_Amount_outliers(X):
 
    X.loc[X['Current Loan Amount'] > 1000000.0, 'Current Loan Amount'] = X['Current Loan Amount'].median()

    return X

In [1423]:
def ID_addition_to_DF(X):
    X['ID'] = X.index.tolist()
    
    return X

In [1424]:
def Dummies_addition_to_DF(X):

    for cat_colname in X.select_dtypes(include='object').columns[0:]:
        X = pd.concat([X, pd.get_dummies(X[cat_colname], prefix=cat_colname)], axis=1)
    
    return X

In [1425]:
def Processing_ALL(X):

    ### ОБРАБОТКА ПРОПУСКОВ
    X = Processing_Home_Ownership_NAN(X)
    X = Processing_Annual_Income_NAN(X)
    X = Processing_Years_in_current_job_NAN(X)
    X = Processing_Tax_Liens_NAN(X)
    X = Processing_Number_of_Open_Accounts_NAN(X)
    X = Processing_Years_of_Credit_History_NAN(X)
    X = Processing_Maximum_Open_Credit_NAN(X)
    X = Processing_Number_of_Credit_Problems_NAN(X)
    X = Processing_Months_since_last_delinquent_NAN(X)
    X = Processing_Bankruptcies_NAN(X)
    X = Processing_Purpose_NAN(X)
    X = Processing_Term_NAN(X)
    X = Processing_Current_Loan_Amount_NAN(X)
    X = Processing_Current_Credit_Balance_NAN(X)
    X = Processing_Monthly_Debt_NAN(X)
    X = Processing_Credit_Score_NAN(X)
    
    ### ОБРАБОТКА ВЫБРОСОВ
    X = Processing_Years_in_current_job_outliers(X)
    X = Processing_Purpose_outliers(X)
    X = Processing_Current_Loan_Amount_outliers(X)

    ### ПОСТРОЕНИЕ НОВЫХ ПРИЗНАКОВ
    X = ID_addition_to_DF(X)
    X = Dummies_addition_to_DF(X)
    
    return X

In [1426]:
def Conversion_to_Category_type(X):

    for colname in CAT_FEATURE_NAMES:
        X[colname] = pd.Categorical(X[colname])
    
    ###X[CAT_FEATURE_NAMES].dtypes

    return X

### Загрузка данных<a class="anchor" id="load_data"></a>

In [1427]:
df_base = pd.read_csv(TRAIN_DATASET_PATH)

In [1428]:
####df_test = pd.read_csv(TEST_REAL_DATASET_PATH)
df_test_real_base = pd.read_csv(TEST_REAL_DATASET_PATH)

**Обработка пропусков и выбросов, построение новых признаков**

In [1429]:
####df_train = Processing_ALL(df_base)
df = Processing_ALL(df_base)

In [1430]:
df_test_real_prep = Processing_ALL(df_test_real_base)

### Сохранение обучающего датасета<a class="anchor" id="saving"></a>

In [1431]:
####df_train.to_csv(PREP_TRAIN_DATASET_PATH, index=False, encoding='utf-8')
#####df_test.to_csv(PREP_TEST_REAL_DATASET_PATH, index=False, encoding='utf-8')


## <center>Курсовой проект<a class="anchor" id="course_project"></a><center>

In [1432]:
####df_base = pd.read_csv(DATASET_PATH)
####df = pd.read_csv(PREP_DATASET_PATH)

####df_test_real_base = pd.read_csv(TEST_REAL_DATASET_PATH)
####df_test_real_prep = pd.read_csv(PREP_TEST_REAL_DATASET_PATH)

**Выделение целевой переменной и групп признаков**

In [1433]:
TARGET_NAME = 'Credit Default'
BASE_FEATURE_NAMES = df_base.columns.drop(TARGET_NAME).tolist()
NEW_FEATURE_NAMES = df.columns.drop([TARGET_NAME, 'ID'] + BASE_FEATURE_NAMES).tolist()

### Отбор признаков<a class="anchor" id="feature_selection"></a>

In [1434]:
NUM_FEATURE_NAMES = ['Annual Income', 'Tax Liens', 'Number of Open Accounts', 'Years of Credit History', \
                      'Maximum Open Credit', 'Number of Credit Problems', 'Months since last delinquent',\
                      'Bankruptcies','Current Loan Amount', 'Current Credit Balance', 'Monthly Debt', 'Credit Score']

CAT_FEATURE_NAMES = ['Home Ownership', 'Years in current job', 'Purpose', 'Term']

SELECTED_FEATURE_NAMES = NUM_FEATURE_NAMES + NEW_FEATURE_NAMES

### Приведение типов для модели CatBoost

In [1435]:
####for colname in CAT_FEATURE_NAMES:
###    df[colname] = pd.Categorical(df[colname])

df = Conversion_to_Category_type(df)

###df[CAT_FEATURE_NAMES].dtypes

In [1436]:
###for colname in CAT_FEATURE_NAMES:
###    df_test_real_prepared[colname] = pd.Categorical(df_test_real_prepared[colname])

df_test_real_prep = Conversion_to_Category_type(df_test_real_prep)
    
###df_test_real_prepared[CAT_FEATURE_NAMES].dtypes

### Нормализация данных<a class="anchor" id="normalization"></a>

In [1437]:
scaler = StandardScaler()

df_norm = df.copy()
df_norm[NUM_FEATURE_NAMES] = scaler.fit_transform(df_norm[NUM_FEATURE_NAMES])

df = df_norm.copy()

df_norm_test_real = df_test_real_prep.copy()
df_norm_test_real[NUM_FEATURE_NAMES] = scaler.fit_transform(df_norm_test_real[NUM_FEATURE_NAMES])

df_test_real = df_norm_test_real.copy()

**Сохранение модели для нормализации данных**

In [1438]:
with open(SCALER_FILE_PATH, 'wb') as file:
    pickle.dump(scaler, file)

### Разбиение на train и test<a class="anchor" id="train_and_test"></a>

In [1439]:
X = df[SELECTED_FEATURE_NAMES]
X_test_real = df_test_real[SELECTED_FEATURE_NAMES]
y = df[TARGET_NAME]

X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size=0.30, random_state=21)

### Балансировка целевой переменной<a class="anchor" id="target_balancing"></a>

In [1440]:
df_for_balancing = pd.concat([X_train, y_train], axis=1)
df_balanced = balance_df_by_target(df_for_balancing, TARGET_NAME)
    
df_balanced[TARGET_NAME].value_counts()

0    3771
1    2958
Name: Credit Default, dtype: int64

In [1441]:
X_train = df_balanced.drop(columns=TARGET_NAME)
y_train = df_balanced[TARGET_NAME]

### Сохранение обучающего и тестового датасетов<a class="anchor" id="train_test_saving"></a> 

In [1442]:
train = pd.concat([X_train, y_train], axis=1)
test = pd.concat([X_test, y_test], axis=1)

In [1443]:
df.to_csv(TRAIN_FULL_PATH, index=False, encoding='utf-8')
train.to_csv(TRAIN_PART_PATH, index=False, encoding='utf-8')
test.to_csv(TEST_PART_PATH, index=False, encoding='utf-8')
X_test_real.to_csv(TEST_REAL_NORM_PATH, index=False, encoding='utf-8')

### Построение и оценка базовых моделей<a class="anchor" id="baseline_modeling"></a> 

**Логистическая регрессия**

In [1444]:
model_lr = LogisticRegression()
model_lr.fit(X_train, y_train)

y_train_pred = model_lr.predict(X_train)
y_test_pred = model_lr.predict(X_test)
y_test_pred_real = model_lr.predict(X_test_real)


get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.67      0.86      0.75      3771
           1       0.72      0.46      0.56      2958

    accuracy                           0.68      6729
   macro avg       0.69      0.66      0.66      6729
weighted avg       0.69      0.68      0.67      6729

TEST

              precision    recall  f1-score   support

           0       0.80      0.86      0.83      1616
           1       0.55      0.46      0.50       634

    accuracy                           0.74      2250
   macro avg       0.68      0.66      0.66      2250
weighted avg       0.73      0.74      0.74      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1384  232
1                345  289


In [1445]:
###y_test_pred_real[2000:2500]

In [1446]:
###X_test_real['Credit Default'] = np.array(y_test_pred_real)

In [1447]:
###X_test_real['Credit Default'].value_counts()

In [1448]:
###X_test_real.info()

In [1449]:
###X_test_real = pd.read_csv(TEST_REAL_NORM_PATH)

**k ближайших соседей**

In [1450]:
model_knn = KNeighborsClassifier()
model_knn.fit(X_train, y_train)

y_train_pred = model_knn.predict(X_train)
y_test_pred = model_knn.predict(X_test)
y_test_pred_real = model_knn.predict(X_test_real)

get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.78      0.86      0.82      3771
           1       0.80      0.68      0.74      2958

    accuracy                           0.78      6729
   macro avg       0.79      0.77      0.78      6729
weighted avg       0.79      0.78      0.78      6729

TEST

              precision    recall  f1-score   support

           0       0.78      0.77      0.78      1616
           1       0.44      0.44      0.44       634

    accuracy                           0.68      2250
   macro avg       0.61      0.61      0.61      2250
weighted avg       0.68      0.68      0.68      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1251  365
1                352  282


In [1451]:
#y_test_pred = model_knn.predict(X_test_real)
####y_test_pred_real[1000:1500]

**Бустинговые алгоритмы**

*XGBoost*

In [1452]:
model_xgb = xgb.XGBClassifier(random_state=21)
model_xgb.fit(X_train, y_train)

y_train_pred = model_xgb.predict(X_train)
y_test_pred = model_xgb.predict(X_test)
y_test_pred_real = model_xgb.predict(X_test_real)
get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.73      0.87      0.80      3771
           1       0.79      0.59      0.68      2958

    accuracy                           0.75      6729
   macro avg       0.76      0.73      0.74      6729
weighted avg       0.76      0.75      0.74      6729

TEST

              precision    recall  f1-score   support

           0       0.81      0.85      0.83      1616
           1       0.56      0.48      0.51       634

    accuracy                           0.75      2250
   macro avg       0.68      0.66      0.67      2250
weighted avg       0.74      0.75      0.74      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1377  239
1                332  302


In [1453]:
###y_test_pred_real[1000:1500]

*LightGBM*

In [1454]:
model_lgbm = lgbm.LGBMClassifier(random_state=21)
model_lgbm.fit(X_train, y_train)

y_train_pred = model_lgbm.predict(X_train)
y_test_pred = model_lgbm.predict(X_test)
y_test_pred_real = model_lgbm.predict(X_test_real)

get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.91      0.96      0.93      3771
           1       0.95      0.88      0.91      2958

    accuracy                           0.92      6729
   macro avg       0.93      0.92      0.92      6729
weighted avg       0.93      0.92      0.92      6729

TEST

              precision    recall  f1-score   support

           0       0.79      0.84      0.82      1616
           1       0.52      0.45      0.48       634

    accuracy                           0.73      2250
   macro avg       0.66      0.64      0.65      2250
weighted avg       0.72      0.73      0.72      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1355  261
1                351  283


*CatBoost*

In [1455]:
model_catb = catb.CatBoostClassifier(silent=True, random_state=21)
model_catb.fit(X_train, y_train)

y_train_pred = model_catb.predict(X_train)
y_test_pred = model_catb.predict(X_test)
y_test_pred_real = model_catb.predict(X_test_real)

get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.89      0.96      0.92      3771
           1       0.94      0.85      0.89      2958

    accuracy                           0.91      6729
   macro avg       0.92      0.90      0.91      6729
weighted avg       0.91      0.91      0.91      6729

TEST

              precision    recall  f1-score   support

           0       0.80      0.86      0.83      1616
           1       0.57      0.46      0.51       634

    accuracy                           0.75      2250
   macro avg       0.68      0.66      0.67      2250
weighted avg       0.74      0.75      0.74      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1392  224
1                342  292


### Выбор лучшей модели и подбор гиперпараметров<a class="anchor" id="tuning_best_model"></a> 

In [1456]:
model_catb = catb.CatBoostClassifier(class_weights=[1, 3.5], silent=True, random_state=21)

**Подбор гиперпараметров**

In [1457]:
params = {'n_estimators':[50, 100, 200, 500, 700, 1000, 1200, 1500],
          'max_depth':[3, 5, 7]}

In [1458]:
cv=KFold(n_splits=3, random_state=21, shuffle=True)

In [1459]:
%%time

rs = RandomizedSearchCV(model_catb, params, scoring='f1', cv=cv, n_jobs=-1)
rs.fit(X, y)

CPU times: user 4.15 s, sys: 226 ms, total: 4.38 s
Wall time: 1min 58s


In [1460]:
rs.best_params_

{'max_depth': 5, 'n_estimators': 500}

In [1461]:
rs.best_score_

0.5416879704626523

**Обучение и оценка финальной модели**

In [1462]:
%%time

final_model = catb.CatBoostClassifier(n_estimators=200, max_depth=5,
                                      silent=True, random_state=21)
final_model.fit(X_train, y_train)

y_train_pred = final_model.predict(X_train)
y_test_pred = final_model.predict(X_test)

get_classification_report(y_train, y_train_pred, y_test, y_test_pred)

TRAIN

              precision    recall  f1-score   support

           0       0.82      0.92      0.87      3771
           1       0.88      0.74      0.80      2958

    accuracy                           0.84      6729
   macro avg       0.85      0.83      0.83      6729
weighted avg       0.84      0.84      0.84      6729

TEST

              precision    recall  f1-score   support

           0       0.81      0.84      0.83      1616
           1       0.55      0.49      0.52       634

    accuracy                           0.74      2250
   macro avg       0.68      0.67      0.67      2250
weighted avg       0.74      0.74      0.74      2250

CONFUSION MATRIX

col_0              0    1
Credit Default           
0               1364  252
1                323  311
CPU times: user 1.32 s, sys: 54.9 ms, total: 1.37 s
Wall time: 955 ms
