## <center>Курсовой проект<a class="anchor" id="course_project"></a><center>

### Постановка задачи<a class="anchor" id="course_project_task"></a>

**Задача**

Требуется, на основании имеющихся данных о клиентах банка, построить модель, используя обучающий датасет, для прогнозирования невыполнения долговых обязательств по текущему кредиту. Выполнить прогноз для примеров из тестового датасета.

**Пути к директориям и файлам**

In [1]:
TRAIN_DATASET_PATH = 'course_project_train.csv'
TEST_DATASET_PATH = 'course_project_test.csv'

**Целевая переменная**

Credit Default - факт невыполнения кредитных обязательств

**Метрика качества**

F1-score (sklearn.metrics.f1_score)

**Требования к решению**

*Целевая метрика*
* F1 > 0.5
* Метрика оценивается по качеству прогноза для главного класса (1 - просрочка по кредиту)

*Решение должно содержать*
1. Тетрадка Jupyter Notebook с кодом Вашего решения, названная по образцу {ФИО}\_solution.ipynb, пример SShirkin\_solution.ipynb
2. Файл CSV с прогнозами целевой переменной для тестового датасета, названный по образцу {ФИО}\_predictions.csv, пример SShirkin\_predictions.csv

*Рекомендации для файла с кодом (ipynb)*
1. Файл должен содержать заголовки и комментарии (markdown)
2. Повторяющиеся операции лучше оформлять в виде функций
3. Не делать вывод большого количества строк таблиц (5-10 достаточно)
4. По возможности добавлять графики, описывающие данные (около 3-5)
5. Добавлять только лучшую модель, то есть не включать в код все варианты решения проекта
6. Скрипт проекта должен отрабатывать от начала и до конца (от загрузки данных до выгрузки предсказаний)
7. Весь проект должен быть в одном скрипте (файл ipynb).
8. Допускается применение библиотек Python и моделей машинного обучения,
которые были в данном курсе.

**Сроки сдачи**

Cдать проект нужно в течение 5 дней после окончания последнего вебинара.
Оценки работ, сданных до дедлайна, будут представлены в виде рейтинга, ранжированного по заданной метрике качества.
Проекты, сданные после дедлайна или сданные повторно, не попадают в рейтинг, но можно будет узнать результат.

### Примерное описание этапов выполнения курсового проекта<a class="anchor" id="course_project_steps"></a>

**Построение модели классификации**
1. Обзор обучающего датасета
2. Обработка выбросов
3. Обработка пропусков
4. Анализ данных
5. Отбор признаков
6. Балансировка классов
7. Подбор моделей, получение бейзлана
8. Выбор наилучшей модели, настройка гиперпараметров
9. Проверка качества, борьба с переобучением
10. Интерпретация результатов

**Прогнозирование на тестовом датасете**
1. Выполнить для тестового датасета те же этапы обработки и постронияния признаков
2. Спрогнозировать целевую переменную, используя модель, построенную на обучающем датасете
3. Прогнозы должны быть для всех примеров из тестового датасета (для всех строк)
4. Соблюдать исходный порядок примеров из тестового датасета

### Обзор данных<a class="anchor" id="course_project_review"></a>

**Описание датасета**

* **Home Ownership** - домовладение
* **Annual Income** - годовой доход
* **Years in current job** - количество лет на текущем месте работы
* **Tax Liens** - налоговые обременения
* **Number of Open Accounts** - количество открытых счетов
* **Years of Credit History** - количество лет кредитной истории
* **Maximum Open Credit** - наибольший открытый кредит
* **Number of Credit Problems** - количество проблем с кредитом
* **Months since last delinquent** - количество месяцев с последней просрочки платежа
* **Bankruptcies** - банкротства
* **Purpose** - цель кредита
* **Term** - срок кредита
* **Current Loan Amount** - текущая сумма кредита
* **Current Credit Balance** - текущий кредитный баланс
* **Monthly Debt** - ежемесячный долг
* **Credit Default** - факт невыполнения кредитных обязательств (0 - погашен вовремя, 1 - просрочка)

- - - 

**Подключение библиотек и скриптов**

In [2]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

### Загрузка данных<a class="anchor" id="load_data"></a>

In [3]:
df_train = pd.read_csv(TRAIN_DATASET_PATH)
df_train.head()

Unnamed: 0,Home Ownership,Annual Income,Years in current job,Tax Liens,Number of Open Accounts,Years of Credit History,Maximum Open Credit,Number of Credit Problems,Months since last delinquent,Bankruptcies,Purpose,Term,Current Loan Amount,Current Credit Balance,Monthly Debt,Credit Score,Credit Default
0,Own Home,482087.0,,0.0,11.0,26.3,685960.0,1.0,,1.0,debt consolidation,Short Term,99999999.0,47386.0,7914.0,749.0,0
1,Own Home,1025487.0,10+ years,0.0,15.0,15.3,1181730.0,0.0,,0.0,debt consolidation,Long Term,264968.0,394972.0,18373.0,737.0,1
2,Home Mortgage,751412.0,8 years,0.0,11.0,35.0,1182434.0,0.0,,0.0,debt consolidation,Short Term,99999999.0,308389.0,13651.0,742.0,0
3,Own Home,805068.0,6 years,0.0,8.0,22.5,147400.0,1.0,,1.0,debt consolidation,Short Term,121396.0,95855.0,11338.0,694.0,0
4,Rent,776264.0,8 years,0.0,13.0,13.6,385836.0,1.0,,0.0,debt consolidation,Short Term,125840.0,93309.0,7180.0,719.0,0


In [4]:
df_train.shape

(7500, 17)

In [5]:
df_train.iloc[0]

Home Ownership                            Own Home
Annual Income                               482087
Years in current job                           NaN
Tax Liens                                        0
Number of Open Accounts                         11
Years of Credit History                       26.3
Maximum Open Credit                         685960
Number of Credit Problems                        1
Months since last delinquent                   NaN
Bankruptcies                                     1
Purpose                         debt consolidation
Term                                    Short Term
Current Loan Amount                          1e+08
Current Credit Balance                       47386
Monthly Debt                                  7914
Credit Score                                   749
Credit Default                                   0
Name: 0, dtype: object

In [6]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 17 columns):
Home Ownership                  7500 non-null object
Annual Income                   5943 non-null float64
Years in current job            7129 non-null object
Tax Liens                       7500 non-null float64
Number of Open Accounts         7500 non-null float64
Years of Credit History         7500 non-null float64
Maximum Open Credit             7500 non-null float64
Number of Credit Problems       7500 non-null float64
Months since last delinquent    3419 non-null float64
Bankruptcies                    7486 non-null float64
Purpose                         7500 non-null object
Term                            7500 non-null object
Current Loan Amount             7500 non-null float64
Current Credit Balance          7500 non-null float64
Monthly Debt                    7500 non-null float64
Credit Score                    5943 non-null float64
Credit Default                  7

In [7]:
dtrain=df_train.describe()
dtrain

Unnamed: 0,Annual Income,Tax Liens,Number of Open Accounts,Years of Credit History,Maximum Open Credit,Number of Credit Problems,Months since last delinquent,Bankruptcies,Current Loan Amount,Current Credit Balance,Monthly Debt,Credit Score,Credit Default
count,5943.0,7500.0,7500.0,7500.0,7500.0,7500.0,3419.0,7486.0,7500.0,7500.0,7500.0,5943.0,7500.0
mean,1366392.0,0.030133,11.130933,18.317467,945153.7,0.17,34.6926,0.117152,11873180.0,289833.2,18314.454133,1151.087498,0.281733
std,845339.2,0.271604,4.908924,7.041946,16026220.0,0.498598,21.688806,0.347192,31926120.0,317871.4,11926.764673,1604.451418,0.449874
min,164597.0,0.0,2.0,4.0,0.0,0.0,0.0,0.0,11242.0,0.0,0.0,585.0,0.0
25%,844341.0,0.0,8.0,13.5,279229.5,0.0,16.0,0.0,180169.0,114256.5,10067.5,711.0,0.0
50%,1168386.0,0.0,10.0,17.0,478159.0,0.0,32.0,0.0,309573.0,209323.0,16076.5,731.0,0.0
75%,1640137.0,0.0,14.0,21.8,793501.5,0.0,50.0,0.0,519882.0,360406.2,23818.0,743.0,1.0
max,10149340.0,7.0,43.0,57.7,1304726000.0,7.0,118.0,4.0,100000000.0,6506797.0,136679.0,7510.0,1.0


In [8]:
df_test = pd.read_csv(TEST_DATASET_PATH)
df_test.head()

Unnamed: 0,Home Ownership,Annual Income,Years in current job,Tax Liens,Number of Open Accounts,Years of Credit History,Maximum Open Credit,Number of Credit Problems,Months since last delinquent,Bankruptcies,Purpose,Term,Current Loan Amount,Current Credit Balance,Monthly Debt,Credit Score
0,Rent,,4 years,0.0,9.0,12.5,220968.0,0.0,70.0,0.0,debt consolidation,Short Term,162470.0,105906.0,6813.0,
1,Rent,231838.0,1 year,0.0,6.0,32.7,55946.0,0.0,8.0,0.0,educational expenses,Short Term,78298.0,46037.0,2318.0,699.0
2,Home Mortgage,1152540.0,3 years,0.0,10.0,13.7,204600.0,0.0,,0.0,debt consolidation,Short Term,200178.0,146490.0,18729.0,7260.0
3,Home Mortgage,1220313.0,10+ years,0.0,16.0,17.0,456302.0,0.0,70.0,0.0,debt consolidation,Short Term,217382.0,213199.0,27559.0,739.0
4,Home Mortgage,2340952.0,6 years,0.0,11.0,23.6,1207272.0,0.0,,0.0,debt consolidation,Long Term,777634.0,425391.0,42605.0,706.0


In [9]:
df_test.shape

(2500, 16)

In [10]:
df_train.iloc[0]

Home Ownership                            Own Home
Annual Income                               482087
Years in current job                           NaN
Tax Liens                                        0
Number of Open Accounts                         11
Years of Credit History                       26.3
Maximum Open Credit                         685960
Number of Credit Problems                        1
Months since last delinquent                   NaN
Bankruptcies                                     1
Purpose                         debt consolidation
Term                                    Short Term
Current Loan Amount                          1e+08
Current Credit Balance                       47386
Monthly Debt                                  7914
Credit Score                                   749
Credit Default                                   0
Name: 0, dtype: object

In [11]:
df_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 16 columns):
Home Ownership                  2500 non-null object
Annual Income                   1987 non-null float64
Years in current job            2414 non-null object
Tax Liens                       2500 non-null float64
Number of Open Accounts         2500 non-null float64
Years of Credit History         2500 non-null float64
Maximum Open Credit             2500 non-null float64
Number of Credit Problems       2500 non-null float64
Months since last delinquent    1142 non-null float64
Bankruptcies                    2497 non-null float64
Purpose                         2500 non-null object
Term                            2500 non-null object
Current Loan Amount             2500 non-null float64
Current Credit Balance          2500 non-null float64
Monthly Debt                    2500 non-null float64
Credit Score                    1987 non-null float64
dtypes: float64(12), object(4)
me

In [12]:
dtest=df_test.describe()

Объеденим тестовый и обучающий датасеты

In [13]:
df = df_train.merge(df_test, how='outer')

In [14]:
df

Unnamed: 0,Home Ownership,Annual Income,Years in current job,Tax Liens,Number of Open Accounts,Years of Credit History,Maximum Open Credit,Number of Credit Problems,Months since last delinquent,Bankruptcies,Purpose,Term,Current Loan Amount,Current Credit Balance,Monthly Debt,Credit Score,Credit Default
0,Own Home,482087.0,,0.0,11.0,26.3,685960.0,1.0,,1.0,debt consolidation,Short Term,99999999.0,47386.0,7914.0,749.0,0.0
1,Own Home,1025487.0,10+ years,0.0,15.0,15.3,1181730.0,0.0,,0.0,debt consolidation,Long Term,264968.0,394972.0,18373.0,737.0,1.0
2,Home Mortgage,751412.0,8 years,0.0,11.0,35.0,1182434.0,0.0,,0.0,debt consolidation,Short Term,99999999.0,308389.0,13651.0,742.0,0.0
3,Own Home,805068.0,6 years,0.0,8.0,22.5,147400.0,1.0,,1.0,debt consolidation,Short Term,121396.0,95855.0,11338.0,694.0,0.0
4,Rent,776264.0,8 years,0.0,13.0,13.6,385836.0,1.0,,0.0,debt consolidation,Short Term,125840.0,93309.0,7180.0,719.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,Home Mortgage,1020053.0,10+ years,0.0,14.0,29.1,559152.0,1.0,68.0,1.0,debt consolidation,Short Term,99999999.0,162735.0,15046.0,745.0,
9996,Home Mortgage,,2 years,0.0,15.0,17.0,1737780.0,0.0,77.0,0.0,debt consolidation,Short Term,468512.0,1439269.0,32996.0,,
9997,Home Mortgage,1171806.0,2 years,0.0,48.0,12.8,1706430.0,0.0,,0.0,debt consolidation,Short Term,430496.0,676438.0,36912.0,695.0,
9998,Rent,723520.0,10+ years,0.0,14.0,28.8,945780.0,0.0,,0.0,debt consolidation,Short Term,257774.0,391248.0,13506.0,744.0,


In [15]:
#Немного дополненная информация по массиву
def describe_plus(df):
    lst = np.array([[df[col].dtype, df[col].nunique()] for col in df.columns]).T
    stat = pd.DataFrame(df.describe(include =['object', 'float', 'int']))
    stat.loc['type'] = lst[0]
    stat.loc['NUnique'] = lst[1]
    stat.loc['NotNull'] =df.notnull().sum()
    return stat.T

In [16]:
stat = describe_plus(df)
stat

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,type,NUnique,NotNull
Home Ownership,10000,4.0,Home Mortgage,4862.0,,,,,,,,object,4,10000
Annual Income,7930,,,,1366520.0,863828.0,106533.0,845989.0,1168810.0,1638690.0,14975600.0,float64,7107,7930
Years in current job,9543,11.0,10+ years,3142.0,,,,,,,,object,11,9543
Tax Liens,10000,,,,0.0314,0.304341,0.0,0.0,0.0,0.0,15.0,float64,9,10000
Number of Open Accounts,10000,,,,11.1443,4.89476,1.0,8.0,10.0,14.0,48.0,float64,42,10000
Years of Credit History,10000,,,,18.3196,7.09536,3.9,13.5,17.0,22.0,57.7,float64,423,10000
Maximum Open Credit,10000,,,,886508.0,13899800.0,0.0,278812.0,478181.0,794360.0,1304730000.0,float64,9096,10000
Number of Credit Problems,10000,,,,0.168,0.51459,0.0,0.0,0.0,0.0,15.0,float64,9,10000
Months since last delinquent,4561,,,,34.5646,21.772,0.0,16.0,32.0,50.0,118.0,float64,89,4561
Bankruptcies,9983,,,,0.114595,0.349729,0.0,0.0,0.0,0.0,5.0,float64,6,9983


1. Видно, что  признаки Annual Income и Credit Scoreимеют одинаковое число пропусков;

In [17]:
#(dtrain[:-1]-dtest)/dtrain[:-1]*100

In [18]:
#dtrain.loc['mean']

In [19]:
#dtest.loc['mean']

### Приведение типов<a class="anchor" id="cast"></a>

In [20]:
#for colname in ['Home Ownership', 'Years in current job', 'Purpose', 'Term']:
#    df[colname] = df[colname].astype(str)

In [21]:
df.dtypes

Home Ownership                   object
Annual Income                   float64
Years in current job             object
Tax Liens                       float64
Number of Open Accounts         float64
Years of Credit History         float64
Maximum Open Credit             float64
Number of Credit Problems       float64
Months since last delinquent    float64
Bankruptcies                    float64
Purpose                          object
Term                             object
Current Loan Amount             float64
Current Credit Balance          float64
Monthly Debt                    float64
Credit Score                    float64
Credit Default                  float64
dtype: object

## Обзор данных<a class="anchor" id="review"></a>

**Обзор целевой переменной**

In [22]:
df['Credit Default'][0:df_train.shape[0]].value_counts()

0.0    5387
1.0    2113
Name: Credit Default, dtype: int64

Видно, что данные сильно не сбалансированы

**Обзор количественных признаков**

In [23]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Annual Income,7930.0,1366525.0,863827.6,106533.0,845989.25,1168813.5,1638693.0,14975610.0
Tax Liens,10000.0,0.0314,0.3043408,0.0,0.0,0.0,0.0,15.0
Number of Open Accounts,10000.0,11.1443,4.89476,1.0,8.0,10.0,14.0,48.0
Years of Credit History,10000.0,18.31958,7.095357,3.9,13.5,17.0,22.0,57.7
Maximum Open Credit,10000.0,886507.9,13899820.0,0.0,278811.5,478181.0,794359.5,1304726000.0
Number of Credit Problems,10000.0,0.168,0.5145896,0.0,0.0,0.0,0.0,15.0
Months since last delinquent,4561.0,34.56457,21.77199,0.0,16.0,32.0,50.0,118.0
Bankruptcies,9983.0,0.1145948,0.3497292,0.0,0.0,0.0,0.0,5.0
Current Loan Amount,10000.0,11943810.0,32008780.0,11242.0,180548.5,311718.0,521070.0,100000000.0
Current Credit Balance,10000.0,291474.1,333997.9,0.0,113225.75,209019.0,361950.0,6506797.0


Каких -то явных выбросов в количественных переменных не видно

**Обзор номинативных признаков**

In [24]:
for cat_colname in df.select_dtypes(include='object').columns:
    print(str(cat_colname) + '\n\n' + str(df[cat_colname].value_counts()) + '\n' + '*' * 100 + '\n')

Home Ownership

Home Mortgage    4862
Rent             4224
Own Home          895
Have Mortgage      19
Name: Home Ownership, dtype: int64
****************************************************************************************************

Years in current job

10+ years    3142
2 years       917
3 years       848
< 1 year      770
5 years       685
1 year        657
4 years       621
6 years       563
7 years       536
8 years       446
9 years       358
Name: Years in current job, dtype: int64
****************************************************************************************************

Purpose

debt consolidation      7917
other                    905
home improvements        552
business loan            159
buy a car                130
medical bills             98
buy house                 53
take a trip               51
major purchase            49
small business            31
wedding                   17
educational expenses      13
moving                    12
vacation  

### 1.Annual Income и Credit Score

Annual Income и Credit Score имеют пропуски. Определим в каком числе  наблюдений оба признака пропущены одновременно.

In [25]:
n_mis = df_train[df_train['Annual Income'].isna() & df_train['Credit Score'].isna()].shape
n_mis

(1557, 17)

Вывод: признаки пропущены парно. 

Проверим соотношение классов целевой переменной, где пропущены данные значения и где  заполнены

In [26]:
t0 = df_train['Credit Default'][df_train['Annual Income'].isna() & df_train['Credit Score'].isna()].value_counts()
t0

0    1028
1     529
Name: Credit Default, dtype: int64

In [27]:
t1 = df_train['Credit Default'][df_train['Annual Income'].notna() & df_train['Credit Score'].notna()].value_counts()
t1

0    4359
1    1584
Name: Credit Default, dtype: int64

Построим талицу сопряженности

In [28]:
table = np.array([t0,t1])
table[:,0]

array([1028, 4359], dtype=int64)

In [29]:
d_temp  = {"0":table[:,0],"1": table[:,1]}

In [30]:
tb = pd.DataFrame(d_temp, index=['не заполнены', 'заполнены'])
tb

Unnamed: 0,0,1
не заполнены,1028,529
заполнены,4359,1584


Проверим значимость  различия с помощью Хи-Квадрат. Нулевой гипотезой будет утверждение, что частота появления класса 1 или 0 между двумя выборками одинакова.

In [31]:
chi2, p, dof, expected = chi2_contingency(table, correction=False)
p, chi2

(1.0809288763223263e-08, 32.68998224676483)

P-value 1.08-08 -   гипотеза отвергается. Различие между частотой классов  имеет место быть. Нельзя заполнить  пропуски средним значением по выборке. Посмотрим на возможность подобрать квартили, **распределение просрочки в которых максимально совпадет с распределением в данных без пропуска**

In [32]:
round(df_train['Annual Income'].describe())

count        5943.0
mean      1366392.0
std        845339.0
min        164597.0
25%        844341.0
50%       1168386.0
75%       1640137.0
max      10149344.0
Name: Annual Income, dtype: float64

In [33]:
t_ann_inc = df_train['Credit Default'][df_train['Annual Income'] <= df_train['Annual Income'].quantile(0.15)].value_counts()
t_ann_inc

0    590
1    302
Name: Credit Default, dtype: int64

In [34]:
chi2, p, dof, expected = chi2_contingency([t0, t_ann_inc], correction=False)
p , chi2

(0.9522367550185078, 0.003587788120565959)

In [35]:
round(df['Credit Score'].describe())

count    7930.0
mean     1172.0
std      1640.0
min       585.0
25%       711.0
50%       731.0
75%       743.0
max      7510.0
Name: Credit Score, dtype: float64

In [36]:
t_cr_sc = df_train['Credit Default'][df_train['Credit Score'] <= df_train['Credit Score'].quantile(0.19)].value_counts()
t_cr_sc

0    759
1    390
Name: Credit Default, dtype: int64

In [37]:
chi2, p, dof, expected = chi2_contingency([t0, t_cr_sc], correction=False)
p , chi2

(0.9856898626940904, 0.0003217022133243376)

In [38]:
annual_income = df_train['Annual Income'].loc[df_train['Annual Income'] <= df_train['Annual Income'].quantile(0.15)].mean()
credit_score = df_train['Credit Score'].loc[df_train['Credit Score'] <= df_train['Credit Score'].quantile(0.19)].mean()
print(annual_income, credit_score)

553752.7421524663 676.1235857267188


In [39]:
df.loc[df['Annual Income'].isna(), 'Annual Income'] = annual_income
df.loc[df['Credit Score'].isna(), 'Credit Score'] = credit_score

### 2.Years in current job

In [40]:
df['Years in current job'].isna()

0        True
1       False
2       False
3       False
4       False
        ...  
9995    False
9996    False
9997    False
9998    False
9999    False
Name: Years in current job, Length: 10000, dtype: bool

In [41]:
df['Years in current job'][df['Years in current job'].isna()].value_counts()

Series([], Name: Years in current job, dtype: int64)

In [42]:
t0 = df_train['Credit Default'][df_train['Years in current job'].isna()].value_counts()
t0

0    234
1    137
Name: Credit Default, dtype: int64

Все пропуски находятся в **train**

In [43]:
t1 = df_train['Credit Default'][df_train['Years in current job'].notna()].value_counts()
t1

0    5153
1    1976
Name: Credit Default, dtype: int64

Заполним **Years in current job** там, где целевая переменная 1 медианным значением **Years in current job** при значении целевой переменной  = 1 и аналогично при значении целевой функции = 0 

In [44]:
med_1_YCJ = df['Years in current job'][(df['Years in current job'].notna()) & (df['Credit Default']==1)].mode()[0]
med_1_YCJ

'10+ years'

In [45]:
med_0_YCJ = df['Years in current job'][(df['Years in current job'].notna()) & (df['Credit Default']==0)].mode()[0]
med_0_YCJ

'10+ years'

In [46]:
med_YCJ = df['Years in current job'].mode()[0]

In [48]:
df.loc[df['Years in current job'].isna(), 'Years in current job'] = med_YCJ

In [49]:
df['YCJ'] = df['Years in current job'].map({'10+ years':'10', '2 years':'2', '3 years':'3', '4 years':'4', 
    '5 years':'5', '6 years':'6', '7 years':'7', '8 years':'8', '9 years':'9', '1 years':'1','< 1 year':'0'}).astype(float)
df['YCJ']

0       10.0
1       10.0
2        8.0
3        6.0
4        8.0
        ... 
9995    10.0
9996     2.0
9997     2.0
9998    10.0
9999    10.0
Name: YCJ, Length: 10000, dtype: float64

In [50]:
df

Unnamed: 0,Home Ownership,Annual Income,Years in current job,Tax Liens,Number of Open Accounts,Years of Credit History,Maximum Open Credit,Number of Credit Problems,Months since last delinquent,Bankruptcies,Purpose,Term,Current Loan Amount,Current Credit Balance,Monthly Debt,Credit Score,Credit Default,YCJ
0,Own Home,4.820870e+05,10+ years,0.0,11.0,26.3,685960.0,1.0,,1.0,debt consolidation,Short Term,99999999.0,47386.0,7914.0,749.000000,0.0,10.0
1,Own Home,1.025487e+06,10+ years,0.0,15.0,15.3,1181730.0,0.0,,0.0,debt consolidation,Long Term,264968.0,394972.0,18373.0,737.000000,1.0,10.0
2,Home Mortgage,7.514120e+05,8 years,0.0,11.0,35.0,1182434.0,0.0,,0.0,debt consolidation,Short Term,99999999.0,308389.0,13651.0,742.000000,0.0,8.0
3,Own Home,8.050680e+05,6 years,0.0,8.0,22.5,147400.0,1.0,,1.0,debt consolidation,Short Term,121396.0,95855.0,11338.0,694.000000,0.0,6.0
4,Rent,7.762640e+05,8 years,0.0,13.0,13.6,385836.0,1.0,,0.0,debt consolidation,Short Term,125840.0,93309.0,7180.0,719.000000,0.0,8.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,Home Mortgage,1.020053e+06,10+ years,0.0,14.0,29.1,559152.0,1.0,68.0,1.0,debt consolidation,Short Term,99999999.0,162735.0,15046.0,745.000000,,10.0
9996,Home Mortgage,5.537527e+05,2 years,0.0,15.0,17.0,1737780.0,0.0,77.0,0.0,debt consolidation,Short Term,468512.0,1439269.0,32996.0,676.123586,,2.0
9997,Home Mortgage,1.171806e+06,2 years,0.0,48.0,12.8,1706430.0,0.0,,0.0,debt consolidation,Short Term,430496.0,676438.0,36912.0,695.000000,,2.0
9998,Rent,7.235200e+05,10+ years,0.0,14.0,28.8,945780.0,0.0,,0.0,debt consolidation,Short Term,257774.0,391248.0,13506.0,744.000000,,10.0
