## Data preparation and EDA

In this section i clean and prepare the dataset for the model which involves the following steps:

- Download the data from the given link.
- Reformat categorical columns (status, home, marital, records, and job) by mapping with appropriate values.
- Replace the maximum value of income, assests, and debt columns with NaNs.
- Replace the NaNs in the dataframe with 0 (will be shown in the next lesson).
- Extract only those rows in the column status who are either ok or default as value.
- Split the data in a two-step process which finally leads to the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11.
- Prepare target variable status by converting it from categorical to binary, where 0 represents ok and 1 represents default.
- Finally delete the target variable from the train/val/test dataframe.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

In [2]:
data=pd.read_csv("/Users/victoroshimua/Machine-learning-zoomcamp-/DATA/credit_risk.csv")

In [3]:
data.head()

Unnamed: 0,Status,Seniority,Home,Time,Age,Marital,Records,Job,Expenses,Income,Assets,Debt,Amount,Price
0,1,9,1,60,30,2,1,3,73,129,0,0,800,846
1,1,17,1,60,58,3,1,1,48,131,0,0,1000,1658
2,2,10,2,36,46,2,2,3,90,200,3000,0,2000,2985
3,1,0,1,60,24,1,1,1,63,182,2500,0,900,1325
4,1,0,1,36,26,1,1,1,46,107,0,0,310,910


In [4]:
data.columns

Index(['Status', 'Seniority', 'Home', 'Time', 'Age', 'Marital', 'Records',
       'Job', 'Expenses', 'Income', 'Assets', 'Debt', 'Amount', 'Price'],
      dtype='object')

In [5]:
data.columns=data.columns.str.lower()

In [6]:
data.status.value_counts()

1    3200
2    1254
0       1
Name: status, dtype: int64

In [7]:
data["status"]=data.status.map({1:"ok",2:"default",0:"unk"})
data.head()

Unnamed: 0,status,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,ok,9,1,60,30,2,1,3,73,129,0,0,800,846
1,ok,17,1,60,58,3,1,1,48,131,0,0,1000,1658
2,default,10,2,36,46,2,2,3,90,200,3000,0,2000,2985
3,ok,0,1,60,24,1,1,1,63,182,2500,0,900,1325
4,ok,0,1,36,26,1,1,1,46,107,0,0,310,910


In [8]:
home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

data.home = data.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

data.marital = data.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

data.records = data.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

data.job = data.job.map(job_values)


In [9]:
data.head()

Unnamed: 0,status,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,ok,9,rent,60,30,married,no,freelance,73,129,0,0,800,846
1,ok,17,rent,60,58,widow,no,fixed,48,131,0,0,1000,1658
2,default,10,owner,36,46,married,yes,freelance,90,200,3000,0,2000,2985
3,ok,0,rent,60,24,single,no,fixed,63,182,2500,0,900,1325
4,ok,0,rent,36,26,single,no,fixed,46,107,0,0,310,910


In [10]:
data.describe().round()

Unnamed: 0,seniority,time,age,expenses,income,assets,debt,amount,price
count,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0
mean,8.0,46.0,37.0,56.0,763317.0,1060341.0,404382.0,1039.0,1463.0
std,8.0,15.0,11.0,20.0,8703625.0,10217569.0,6344253.0,475.0,628.0
min,0.0,6.0,18.0,35.0,0.0,0.0,0.0,100.0,105.0
25%,2.0,36.0,28.0,35.0,80.0,0.0,0.0,700.0,1118.0
50%,5.0,48.0,36.0,51.0,120.0,3500.0,0.0,1000.0,1400.0
75%,12.0,60.0,45.0,72.0,166.0,6000.0,0.0,1300.0,1692.0
max,48.0,72.0,68.0,180.0,99999999.0,99999999.0,99999999.0,5000.0,11140.0


In [11]:
### according to the data description from where i downloaded it, 999999999 represents a mising value
## To make it show.
for c in ["income","assets","debt"]:
    data[c] = data[c].replace(99999999,np.nan)


In [12]:
data.describe().round()

Unnamed: 0,seniority,time,age,expenses,income,assets,debt,amount,price
count,4455.0,4455.0,4455.0,4455.0,4421.0,4408.0,4437.0,4455.0,4455.0
mean,8.0,46.0,37.0,56.0,131.0,5403.0,343.0,1039.0,1463.0
std,8.0,15.0,11.0,20.0,86.0,11573.0,1246.0,475.0,628.0
min,0.0,6.0,18.0,35.0,0.0,0.0,0.0,100.0,105.0
25%,2.0,36.0,28.0,35.0,80.0,0.0,0.0,700.0,1118.0
50%,5.0,48.0,36.0,51.0,120.0,3000.0,0.0,1000.0,1400.0
75%,12.0,60.0,45.0,72.0,165.0,6000.0,0.0,1300.0,1692.0
max,48.0,72.0,68.0,180.0,959.0,300000.0,30000.0,5000.0,11140.0


In [13]:
data.isnull().sum()

status        0
seniority     0
home          0
time          0
age           0
marital       0
records       0
job           0
expenses      0
income       34
assets       47
debt         18
amount        0
price         0
dtype: int64

In [14]:
data.status.value_counts()

ok         3200
default    1254
unk           1
Name: status, dtype: int64

In [15]:
data=data[data.status != "unk"].reset_index(drop=True)

In [16]:
data.status.value_counts()

ok         3200
default    1254
Name: status, dtype: int64

In [17]:
from sklearn.model_selection import train_test_split

In [18]:
data["status"]=(data["status"]=="default").astype(int)
data["status"]

0       0
1       0
2       1
3       0
4       0
       ..
4449    1
4450    0
4451    1
4452    0
4453    0
Name: status, Length: 4454, dtype: int64

In [19]:
data_full_train,data_test = train_test_split(data,test_size=0.2,random_state=11)
data_train,data_val=train_test_split(data_full_train,test_size=0.25,random_state=11)
len(data_train), len(data_val),len(data_test)

(2672, 891, 891)

In [20]:
data_train=data_train.reset_index(drop=True)
data_test=data_test.reset_index(drop=True)
data_val=data_val.reset_index(drop=True)

In [21]:
data_train

Unnamed: 0,status,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,1,10,owner,36,36,married,no,freelance,75,0.0,10000.0,0.0,1000,1400
1,1,6,parents,48,32,single,yes,fixed,35,85.0,0.0,0.0,1100,1330
2,0,1,parents,48,40,married,no,fixed,75,121.0,0.0,0.0,1320,1600
3,1,1,parents,48,23,single,no,partime,35,72.0,0.0,0.0,1078,1079
4,0,5,owner,36,46,married,no,freelance,60,100.0,4000.0,0.0,1100,1897
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2667,0,18,private,36,45,married,no,fixed,45,220.0,20000.0,0.0,800,1600
2668,0,7,private,60,29,married,no,fixed,60,51.0,3500.0,500.0,1000,1290
2669,0,1,parents,24,19,single,no,fixed,35,28.0,0.0,0.0,400,600
2670,0,15,owner,48,43,married,no,freelance,60,100.0,18000.0,0.0,2500,2976


In [22]:
Y_train=data_train["status"].values
Y_test=data_test["status"].values
Y_val=data_val["status"].values

In [23]:
Y_train

array([1, 1, 0, ..., 0, 0, 0])

In [24]:
del data_train["status"]
del data_test["status"]
del data_val["status"]

In [25]:
data_train

Unnamed: 0,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,10,owner,36,36,married,no,freelance,75,0.0,10000.0,0.0,1000,1400
1,6,parents,48,32,single,yes,fixed,35,85.0,0.0,0.0,1100,1330
2,1,parents,48,40,married,no,fixed,75,121.0,0.0,0.0,1320,1600
3,1,parents,48,23,single,no,partime,35,72.0,0.0,0.0,1078,1079
4,5,owner,36,46,married,no,freelance,60,100.0,4000.0,0.0,1100,1897
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2667,18,private,36,45,married,no,fixed,45,220.0,20000.0,0.0,800,1600
2668,7,private,60,29,married,no,fixed,60,51.0,3500.0,500.0,1000,1290
2669,1,parents,24,19,single,no,fixed,35,28.0,0.0,0.0,400,600
2670,15,owner,48,43,married,no,freelance,60,100.0,18000.0,0.0,2500,2976


## Decision trees

In [26]:
def assess_risk(client):
    if client['records'] == 'yes':
        if client['job'] == 'parttime':
            return 'default'
        else:
            return 'ok'
    else:
        if client['assets'] > 6000:
            return 'ok'
        else:
            return 'default'
# decision trees in without Scikit learn

In [27]:
dt=data_train.iloc[0].to_dict()
assess_risk(dt)

'ok'

In [28]:
for i in range(2672):
    dt = data_train.iloc[i].to_dict()
    risk_level = assess_risk(dt)
    print("Risk level for client", i+1, ":", risk_level)


Risk level for client 1 : ok
Risk level for client 2 : ok
Risk level for client 3 : default
Risk level for client 4 : default
Risk level for client 5 : default
Risk level for client 6 : ok
Risk level for client 7 : default
Risk level for client 8 : default
Risk level for client 9 : default
Risk level for client 10 : default
Risk level for client 11 : ok
Risk level for client 12 : default
Risk level for client 13 : default
Risk level for client 14 : default
Risk level for client 15 : default
Risk level for client 16 : default
Risk level for client 17 : default
Risk level for client 18 : default
Risk level for client 19 : default
Risk level for client 20 : default
Risk level for client 21 : default
Risk level for client 22 : default
Risk level for client 23 : default
Risk level for client 24 : default
Risk level for client 25 : ok
Risk level for client 26 : default
Risk level for client 27 : default
Risk level for client 28 : default
Risk level for client 29 : ok
Risk level for client 30

Risk level for client 306 : default
Risk level for client 307 : ok
Risk level for client 308 : ok
Risk level for client 309 : default
Risk level for client 310 : default
Risk level for client 311 : ok
Risk level for client 312 : default
Risk level for client 313 : default
Risk level for client 314 : default
Risk level for client 315 : default
Risk level for client 316 : default
Risk level for client 317 : ok
Risk level for client 318 : ok
Risk level for client 319 : default
Risk level for client 320 : ok
Risk level for client 321 : ok
Risk level for client 322 : default
Risk level for client 323 : ok
Risk level for client 324 : ok
Risk level for client 325 : default
Risk level for client 326 : ok
Risk level for client 327 : default
Risk level for client 328 : default
Risk level for client 329 : default
Risk level for client 330 : default
Risk level for client 331 : ok
Risk level for client 332 : default
Risk level for client 333 : ok
Risk level for client 334 : default
Risk level for c

Risk level for client 595 : default
Risk level for client 596 : default
Risk level for client 597 : ok
Risk level for client 598 : default
Risk level for client 599 : default
Risk level for client 600 : default
Risk level for client 601 : default
Risk level for client 602 : default
Risk level for client 603 : default
Risk level for client 604 : default
Risk level for client 605 : ok
Risk level for client 606 : default
Risk level for client 607 : default
Risk level for client 608 : default
Risk level for client 609 : default
Risk level for client 610 : ok
Risk level for client 611 : ok
Risk level for client 612 : default
Risk level for client 613 : default
Risk level for client 614 : default
Risk level for client 615 : default
Risk level for client 616 : default
Risk level for client 617 : default
Risk level for client 618 : ok
Risk level for client 619 : default
Risk level for client 620 : ok
Risk level for client 621 : ok
Risk level for client 622 : default
Risk level for client 623 :

Risk level for client 914 : default
Risk level for client 915 : default
Risk level for client 916 : ok
Risk level for client 917 : ok
Risk level for client 918 : default
Risk level for client 919 : default
Risk level for client 920 : default
Risk level for client 921 : default
Risk level for client 922 : default
Risk level for client 923 : default
Risk level for client 924 : default
Risk level for client 925 : default
Risk level for client 926 : default
Risk level for client 927 : default
Risk level for client 928 : ok
Risk level for client 929 : default
Risk level for client 930 : ok
Risk level for client 931 : default
Risk level for client 932 : ok
Risk level for client 933 : ok
Risk level for client 934 : ok
Risk level for client 935 : ok
Risk level for client 936 : default
Risk level for client 937 : default
Risk level for client 938 : default
Risk level for client 939 : ok
Risk level for client 940 : default
Risk level for client 941 : default
Risk level for client 942 : ok
Risk l

Risk level for client 1171 : default
Risk level for client 1172 : default
Risk level for client 1173 : ok
Risk level for client 1174 : default
Risk level for client 1175 : default
Risk level for client 1176 : ok
Risk level for client 1177 : default
Risk level for client 1178 : default
Risk level for client 1179 : default
Risk level for client 1180 : default
Risk level for client 1181 : default
Risk level for client 1182 : default
Risk level for client 1183 : ok
Risk level for client 1184 : default
Risk level for client 1185 : default
Risk level for client 1186 : default
Risk level for client 1187 : default
Risk level for client 1188 : ok
Risk level for client 1189 : default
Risk level for client 1190 : ok
Risk level for client 1191 : default
Risk level for client 1192 : default
Risk level for client 1193 : default
Risk level for client 1194 : default
Risk level for client 1195 : default
Risk level for client 1196 : default
Risk level for client 1197 : default
Risk level for client 1198

Risk level for client 1412 : default
Risk level for client 1413 : default
Risk level for client 1414 : ok
Risk level for client 1415 : default
Risk level for client 1416 : default
Risk level for client 1417 : default
Risk level for client 1418 : default
Risk level for client 1419 : ok
Risk level for client 1420 : default
Risk level for client 1421 : default
Risk level for client 1422 : ok
Risk level for client 1423 : ok
Risk level for client 1424 : default
Risk level for client 1425 : default
Risk level for client 1426 : default
Risk level for client 1427 : ok
Risk level for client 1428 : default
Risk level for client 1429 : ok
Risk level for client 1430 : ok
Risk level for client 1431 : default
Risk level for client 1432 : ok
Risk level for client 1433 : ok
Risk level for client 1434 : ok
Risk level for client 1435 : default
Risk level for client 1436 : default
Risk level for client 1437 : default
Risk level for client 1438 : default
Risk level for client 1439 : default
Risk level for

Risk level for client 1662 : default
Risk level for client 1663 : ok
Risk level for client 1664 : ok
Risk level for client 1665 : ok
Risk level for client 1666 : default
Risk level for client 1667 : default
Risk level for client 1668 : default
Risk level for client 1669 : ok
Risk level for client 1670 : default
Risk level for client 1671 : ok
Risk level for client 1672 : default
Risk level for client 1673 : default
Risk level for client 1674 : default
Risk level for client 1675 : default
Risk level for client 1676 : default
Risk level for client 1677 : default
Risk level for client 1678 : default
Risk level for client 1679 : default
Risk level for client 1680 : ok
Risk level for client 1681 : default
Risk level for client 1682 : ok
Risk level for client 1683 : ok
Risk level for client 1684 : ok
Risk level for client 1685 : ok
Risk level for client 1686 : default
Risk level for client 1687 : default
Risk level for client 1688 : ok
Risk level for client 1689 : default
Risk level for clie

Risk level for client 1925 : default
Risk level for client 1926 : default
Risk level for client 1927 : default
Risk level for client 1928 : ok
Risk level for client 1929 : ok
Risk level for client 1930 : default
Risk level for client 1931 : ok
Risk level for client 1932 : ok
Risk level for client 1933 : default
Risk level for client 1934 : default
Risk level for client 1935 : ok
Risk level for client 1936 : default
Risk level for client 1937 : default
Risk level for client 1938 : ok
Risk level for client 1939 : default
Risk level for client 1940 : default
Risk level for client 1941 : ok
Risk level for client 1942 : ok
Risk level for client 1943 : ok
Risk level for client 1944 : default
Risk level for client 1945 : default
Risk level for client 1946 : ok
Risk level for client 1947 : default
Risk level for client 1948 : default
Risk level for client 1949 : default
Risk level for client 1950 : default
Risk level for client 1951 : default
Risk level for client 1952 : default
Risk level for

Risk level for client 2223 : ok
Risk level for client 2224 : default
Risk level for client 2225 : default
Risk level for client 2226 : default
Risk level for client 2227 : default
Risk level for client 2228 : ok
Risk level for client 2229 : default
Risk level for client 2230 : default
Risk level for client 2231 : default
Risk level for client 2232 : default
Risk level for client 2233 : default
Risk level for client 2234 : default
Risk level for client 2235 : default
Risk level for client 2236 : default
Risk level for client 2237 : default
Risk level for client 2238 : ok
Risk level for client 2239 : ok
Risk level for client 2240 : ok
Risk level for client 2241 : default
Risk level for client 2242 : ok
Risk level for client 2243 : ok
Risk level for client 2244 : ok
Risk level for client 2245 : default
Risk level for client 2246 : default
Risk level for client 2247 : default
Risk level for client 2248 : default
Risk level for client 2249 : default
Risk level for client 2250 : default
Risk

Risk level for client 2524 : default
Risk level for client 2525 : ok
Risk level for client 2526 : ok
Risk level for client 2527 : ok
Risk level for client 2528 : ok
Risk level for client 2529 : default
Risk level for client 2530 : default
Risk level for client 2531 : ok
Risk level for client 2532 : ok
Risk level for client 2533 : ok
Risk level for client 2534 : default
Risk level for client 2535 : default
Risk level for client 2536 : default
Risk level for client 2537 : default
Risk level for client 2538 : default
Risk level for client 2539 : default
Risk level for client 2540 : default
Risk level for client 2541 : default
Risk level for client 2542 : default
Risk level for client 2543 : default
Risk level for client 2544 : default
Risk level for client 2545 : default
Risk level for client 2546 : ok
Risk level for client 2547 : default
Risk level for client 2548 : ok
Risk level for client 2549 : ok
Risk level for client 2550 : default
Risk level for client 2551 : default
Risk level for

In [29]:
#decision tree with scikit learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_extraction import DictVectorizer
from sklearn.metrics import roc_auc_score
from sklearn.tree import export_text

In [30]:
train_dicts=data_train.fillna(0).to_dict(orient="records")
dv=DictVectorizer(sparse=False)
dv.fit(train_dicts)
X_train=dv.transform(train_dicts)


In [31]:

dv.get_feature_names()



['age',
 'amount',
 'assets',
 'debt',
 'expenses',
 'home=ignore',
 'home=other',
 'home=owner',
 'home=parents',
 'home=private',
 'home=rent',
 'home=unk',
 'income',
 'job=fixed',
 'job=freelance',
 'job=others',
 'job=partime',
 'job=unk',
 'marital=divorced',
 'marital=married',
 'marital=separated',
 'marital=single',
 'marital=unk',
 'marital=widow',
 'price',
 'records=no',
 'records=yes',
 'seniority',
 'time']

In [32]:
dt=DecisionTreeClassifier()
dt.fit(X_train,Y_train)

DecisionTreeClassifier()

In [33]:
val_dicts=data_val.fillna(0).to_dict(orient="records")
X_val=dv.transform(val_dicts)

In [34]:
y_pred=dt.predict_proba(X_val)[:,1]
y_pred

array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1., 0.,
       0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
       0., 0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0.,
       0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0.,
       0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0.,
       1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.,
       0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1.,
       0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
       0., 0., 1., 0., 0.

In [35]:
roc_auc_score(Y_val,y_pred)
#low score here 

0.6575373568089897

In [36]:
## checking for roc_auc_score of the train data
pred_1= dt.predict_proba(X_train)[:,1]
roc_auc_score(Y_train,pred_1)

1.0

## The model did learn well on the train data but failed to predict well on unseen data this is (OVERFITTING)

In [None]:
## To solve overfiting in decision trees, i have to change the number of trees or train a decision stump

In [37]:
dtm=DecisionTreeClassifier(max_depth=3)
dtm.fit(X_train,Y_train)
y_pred=dtm.predict_proba(X_val)[:,1]
y_pred

array([0.11690761, 0.11690761, 0.26356589, 0.11690761, 0.11690761,
       0.3875969 , 0.11690761, 0.76811594, 0.11690761, 0.11690761,
       0.11690761, 0.11690761, 0.11690761, 0.11690761, 0.26356589,
       0.60294118, 0.11690761, 0.60829493, 0.11690761, 0.11690761,
       0.11690761, 0.3875969 , 0.11690761, 0.3875969 , 0.60829493,
       0.48387097, 0.60829493, 0.11690761, 0.60829493, 0.11690761,
       0.48387097, 0.11690761, 0.11690761, 0.11690761, 0.11690761,
       0.11690761, 0.76811594, 0.60829493, 0.60829493, 0.76811594,
       0.11690761, 0.76811594, 0.11690761, 0.11690761, 0.11690761,
       0.11690761, 0.76811594, 0.11690761, 0.11690761, 0.11690761,
       0.11690761, 0.11690761, 0.11690761, 0.11690761, 0.3875969 ,
       0.11690761, 0.11690761, 0.3875969 , 0.11690761, 0.11690761,
       0.11690761, 0.11690761, 0.26356589, 0.26356589, 0.11690761,
       0.3875969 , 0.26356589, 0.11690761, 0.11690761, 0.11690761,
       0.11690761, 0.11690761, 0.11690761, 0.11690761, 0.11690

In [38]:
roc_auc_score(Y_val,y_pred)

0.7389079944782155

In [40]:
pred_1= dtm.predict_proba(X_train)[:,1]
roc_auc_score(Y_train,pred_1)

0.7761016984958594

In [41]:
### here the model is better and does not over fit after changing the number of dept

In [44]:
print(export_text(dt, feature_names=dv.get_feature_names()))
### Decision trees with plenty layers

|--- records=yes <= 0.50
|   |--- job=partime <= 0.50
|   |   |--- income <= 74.50
|   |   |   |--- assets <= 4250.00
|   |   |   |   |--- income <= 20.00
|   |   |   |   |   |--- seniority <= 1.50
|   |   |   |   |   |   |--- home=parents <= 0.50
|   |   |   |   |   |   |   |--- seniority <= 0.50
|   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |   |--- seniority >  0.50
|   |   |   |   |   |   |   |   |--- price <= 1457.50
|   |   |   |   |   |   |   |   |   |--- expenses <= 55.00
|   |   |   |   |   |   |   |   |   |   |--- home=other <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |   |   |   |   |--- home=other >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- class: 0
|   |   |   |   |   |   |   |   |   |--- expenses >  55.00
|   |   |   |   |   |   |   |   |   |   |--- class: 0
|   |   |   |   |   |   |   |   |--- price >  1457.50
|   |   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |--- h

In [45]:
##decision tree with only three layers
print(export_text(dtm, feature_names=dv.get_feature_names()))

|--- records=yes <= 0.50
|   |--- job=partime <= 0.50
|   |   |--- income <= 74.50
|   |   |   |--- class: 0
|   |   |--- income >  74.50
|   |   |   |--- class: 0
|   |--- job=partime >  0.50
|   |   |--- assets <= 8750.00
|   |   |   |--- class: 1
|   |   |--- assets >  8750.00
|   |   |   |--- class: 0
|--- records=yes >  0.50
|   |--- seniority <= 6.50
|   |   |--- amount <= 862.50
|   |   |   |--- class: 0
|   |   |--- amount >  862.50
|   |   |   |--- class: 1
|   |--- seniority >  6.50
|   |   |--- income <= 103.50
|   |   |   |--- class: 1
|   |   |--- income >  103.50
|   |   |   |--- class: 0

