<span style="color:#929591">Chapter 1.
# <span style="color:#820747">Property Loan Risk.

<img src="pic/home3.jpg">

<span style="color:#610023">Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders.
    
<span style="color:#610023">Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities.
    
<span style="color:#610023">We will use various statistical and machine learning methods to make sure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful.

# <span style="color:#a83c09">Navigator:
<b>[2. Exploratory Data Analysis.](./Chapter 2__Exploratory Data Analysis.ipynb)

<b>[Dictionary](description.csv)

<img src="pic/lin.jpg">

# <span style="color:#820747">1. Data Sources Investigation.

<span style="color:#610023">Data provided by: http://www.homecredit.net/about-us.aspx

# <span style="color:#a83c09">Chapter Structure:

A. <b>7 different sources of data</b><br>

         Source №1 - application_train
         Source №2 - bureau
         Source №3 - bureau_balance
         Source №4 - previous_application
         Source №5 - POS_CASH_balance
         Source №6 - credit_card_balance
         Source №7 - installments_payments


In [33]:
import pandas as pd
import numpy as np

<img src="pic/lin.jpg">

In [2]:
%%time
df_train = pd.read_csv('application_train.csv')
df_bureau = pd.read_csv('bureau.csv')
df_bureau_balance = pd.read_csv('bureau_balance.csv')
df_previous_application = pd.read_csv('previous_application.csv')
df_POS_CASH_balance = pd.read_csv('POS_CASH_balance.csv')
df_credit_card_balance = pd.read_csv('credit_card_balance.csv')
df_installments_payments = pd.read_csv('installments_payments.csv')

Wall time: 47 s


<img src="pic/lin.jpg">

# <span style="color:#1e488f">A. There are 7 different sources of data:

<img src="pic/schema.jpg">

# <span style="color:#ffad01">Source №1

<span style="color: #be0119"><b>application_train</b></span>: the main data with information about each loan application at Home Credit. Every loan has its own row and is identified by the feature <b>SK_ID_CURR</b>. The training application data comes with the <b>TARGET</b> indicating <b>0</b>: the loan was repaid or <b>1</b>: the loan was not repaid.

In [3]:
df_train.columns.values

array(['SK_ID_CURR', 'TARGET', 'NAME_CONTRACT_TYPE', 'CODE_GENDER',
       'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
       'AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE',
       'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE',
       'REGION_POPULATION_RELATIVE', 'DAYS_BIRTH', 'DAYS_EMPLOYED',
       'DAYS_REGISTRATION', 'DAYS_ID_PUBLISH', 'OWN_CAR_AGE',
       'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE',
       'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'OCCUPATION_TYPE',
       'CNT_FAM_MEMBERS', 'REGION_RATING_CLIENT',
       'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START',
       'HOUR_APPR_PROCESS_START', 'REG_REGION_NOT_LIVE_REGION',
       'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION',
       'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY',
       'LIVE_CITY_NOT_WORK_CITY', 'ORGANIZATION_TYPE', 'EXT_SOURCE_1',
       'EXT_SOURCE_2', 'EXT_SOURCE_3',

In [7]:
df_train.head(3)

Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,...,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,100002,1,Cash loans,M,N,Y,0,202500.0,406597.5,24700.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003,0,Cash loans,F,N,N,0,270000.0,1293502.5,35698.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004,0,Revolving loans,M,Y,Y,0,67500.0,135000.0,6750.0,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
print('Source №1 has: ',df_train.shape[0], ' records and', df_train.shape[1], ' features')

Source №1 has:  307511  records and 122  features


In [72]:
# Count number of Null values and their percentage in == df_train == dataset.
order = df_train.isnull().sum().sort_values(ascending = False)
percent = (df_train.isnull().sum()/df_train.isnull().count()*100).sort_values(ascending = False)
missing_df_train  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_train.head(20)

Unnamed: 0,Null count,Null count (%)
COMMONAREA_MEDI,214865,69.872297
COMMONAREA_AVG,214865,69.872297
COMMONAREA_MODE,214865,69.872297
NONLIVINGAPARTMENTS_MODE,213514,69.432963
NONLIVINGAPARTMENTS_MEDI,213514,69.432963
NONLIVINGAPARTMENTS_AVG,213514,69.432963
FONDKAPREMONT_MODE,210295,68.386172
LIVINGAPARTMENTS_MEDI,210199,68.354953
LIVINGAPARTMENTS_MODE,210199,68.354953
LIVINGAPARTMENTS_AVG,210199,68.354953


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №2

<span style="color: #be0119"><b>bureau</b></span>: data concerning client's <b>previous credits from other financial institutions</b>. Each previous credit has its own row in bureau, but one loan in the application data can have multiple previous credits.

In [4]:
df_bureau.columns.values

array(['SK_ID_CURR', 'SK_ID_BUREAU', 'CREDIT_ACTIVE', 'CREDIT_CURRENCY',
       'DAYS_CREDIT', 'CREDIT_DAY_OVERDUE', 'DAYS_CREDIT_ENDDATE',
       'DAYS_ENDDATE_FACT', 'AMT_CREDIT_MAX_OVERDUE',
       'CNT_CREDIT_PROLONG', 'AMT_CREDIT_SUM', 'AMT_CREDIT_SUM_DEBT',
       'AMT_CREDIT_SUM_LIMIT', 'AMT_CREDIT_SUM_OVERDUE', 'CREDIT_TYPE',
       'DAYS_CREDIT_UPDATE', 'AMT_ANNUITY'], dtype=object)

In [13]:
df_bureau.head(3)

Unnamed: 0,SK_ID_CURR,SK_ID_BUREAU,CREDIT_ACTIVE,CREDIT_CURRENCY,DAYS_CREDIT,CREDIT_DAY_OVERDUE,DAYS_CREDIT_ENDDATE,DAYS_ENDDATE_FACT,AMT_CREDIT_MAX_OVERDUE,CNT_CREDIT_PROLONG,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_LIMIT,AMT_CREDIT_SUM_OVERDUE,CREDIT_TYPE,DAYS_CREDIT_UPDATE,AMT_ANNUITY
0,215354,5714462,Closed,currency 1,-497,0,-153.0,-153.0,,0,91323.0,0.0,,0.0,Consumer credit,-131,
1,215354,5714463,Active,currency 1,-208,0,1075.0,,,0,225000.0,171342.0,,0.0,Credit card,-20,
2,215354,5714464,Active,currency 1,-203,0,528.0,,,0,464323.5,,,0.0,Consumer credit,-16,


In [14]:
print('Source №2 has: ',df_bureau.shape[0], ' records and', df_bureau.shape[1], ' features')

Source №2 has:  1716428  records and 17  features


In [73]:
# Count number of Null values and their percentage in == df_bureau == dataset.
order = df_bureau.isnull().sum().sort_values(ascending = False)
percent = (df_bureau.isnull().sum()/df_bureau.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau.head(9)

Unnamed: 0,Null count,Null count (%)
AMT_ANNUITY,1226791,71.47349
AMT_CREDIT_MAX_OVERDUE,1124488,65.513264
DAYS_ENDDATE_FACT,633653,36.916958
AMT_CREDIT_SUM_LIMIT,591780,34.477415
AMT_CREDIT_SUM_DEBT,257669,15.011932
DAYS_CREDIT_ENDDATE,105553,6.149573
AMT_CREDIT_SUM,13,0.000757
CREDIT_TYPE,0,0.0
AMT_CREDIT_SUM_OVERDUE,0,0.0


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №3

<span style="color: #be0119"><b>bureau_balance</b></span>: monthly <b>data about the previous credits in bureau</b>. Each row is one month of a previous credit, and a single previous credit can have multiple rows, one for each month of the credit length.

In [7]:
df_bureau_balance.columns.values

array(['SK_ID_BUREAU', 'MONTHS_BALANCE', 'STATUS'], dtype=object)

In [19]:
df_bureau_balance.head(3)

Unnamed: 0,SK_ID_BUREAU,MONTHS_BALANCE,STATUS
0,5715448,0,C
1,5715448,-1,C
2,5715448,-2,C


In [20]:
print('Source №2 has: ',df_bureau_balance.shape[0], ' records and', df_bureau_balance.shape[1], ' features')

Source №2 has:  27299925  records and 3  features


In [74]:
# Count number of Null values and their percentage in == df_bureau_balance == dataset.
order = df_bureau_balance.isnull().sum().sort_values(ascending = False)
percent = (df_bureau_balance.isnull().sum()/df_bureau_balance.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau_balance  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau_balance.head(3)

Unnamed: 0,Null count,Null count (%)
STATUS,0,0.0
MONTHS_BALANCE,0,0.0
SK_ID_BUREAU,0,0.0


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №4

<span style="color: #be0119"><b>previous_application</b></span>: previous applications for loans at Home Credit of <b>clients who have loans in the application data</b>. Each current loan in the application data can have multiple previous loans. Each previous application has one row and is identified by the feature <b>SK_ID_PREV</b>.

In [8]:
df_previous_application.columns.values

array(['SK_ID_PREV', 'SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'AMT_ANNUITY',
       'AMT_APPLICATION', 'AMT_CREDIT', 'AMT_DOWN_PAYMENT',
       'AMT_GOODS_PRICE', 'WEEKDAY_APPR_PROCESS_START',
       'HOUR_APPR_PROCESS_START', 'FLAG_LAST_APPL_PER_CONTRACT',
       'NFLAG_LAST_APPL_IN_DAY', 'RATE_DOWN_PAYMENT',
       'RATE_INTEREST_PRIMARY', 'RATE_INTEREST_PRIVILEGED',
       'NAME_CASH_LOAN_PURPOSE', 'NAME_CONTRACT_STATUS', 'DAYS_DECISION',
       'NAME_PAYMENT_TYPE', 'CODE_REJECT_REASON', 'NAME_TYPE_SUITE',
       'NAME_CLIENT_TYPE', 'NAME_GOODS_CATEGORY', 'NAME_PORTFOLIO',
       'NAME_PRODUCT_TYPE', 'CHANNEL_TYPE', 'SELLERPLACE_AREA',
       'NAME_SELLER_INDUSTRY', 'CNT_PAYMENT', 'NAME_YIELD_GROUP',
       'PRODUCT_COMBINATION', 'DAYS_FIRST_DRAWING', 'DAYS_FIRST_DUE',
       'DAYS_LAST_DUE_1ST_VERSION', 'DAYS_LAST_DUE', 'DAYS_TERMINATION',
       'NFLAG_INSURED_ON_APPROVAL'], dtype=object)

In [22]:
df_previous_application.head(3)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,NAME_CONTRACT_TYPE,AMT_ANNUITY,AMT_APPLICATION,AMT_CREDIT,AMT_DOWN_PAYMENT,AMT_GOODS_PRICE,WEEKDAY_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,...,NAME_SELLER_INDUSTRY,CNT_PAYMENT,NAME_YIELD_GROUP,PRODUCT_COMBINATION,DAYS_FIRST_DRAWING,DAYS_FIRST_DUE,DAYS_LAST_DUE_1ST_VERSION,DAYS_LAST_DUE,DAYS_TERMINATION,NFLAG_INSURED_ON_APPROVAL
0,2030495,271877,Consumer loans,1730.43,17145.0,17145.0,0.0,17145.0,SATURDAY,15,...,Connectivity,12.0,middle,POS mobile with interest,365243.0,-42.0,300.0,-42.0,-37.0,0.0
1,2802425,108129,Cash loans,25188.615,607500.0,679671.0,,607500.0,THURSDAY,11,...,XNA,36.0,low_action,Cash X-Sell: low,365243.0,-134.0,916.0,365243.0,365243.0,1.0
2,2523466,122040,Cash loans,15060.735,112500.0,136444.5,,112500.0,TUESDAY,11,...,XNA,12.0,high,Cash X-Sell: high,365243.0,-271.0,59.0,365243.0,365243.0,1.0


In [23]:
print('Source №2 has: ',df_previous_application.shape[0], ' records and', df_previous_application.shape[1], ' features')

Source №2 has:  1670214  records and 37  features


In [76]:
# Count number of Null values and their percentage in == df_previous_application == dataset.
order = df_previous_application.isnull().sum().sort_values(ascending = False)
percent = (df_previous_application.isnull().sum()/df_previous_application.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau_balance  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau_balance.head(17)

Unnamed: 0,Null count,Null count (%)
RATE_INTEREST_PRIVILEGED,1664263,99.643698
RATE_INTEREST_PRIMARY,1664263,99.643698
RATE_DOWN_PAYMENT,895844,53.63648
AMT_DOWN_PAYMENT,895844,53.63648
NAME_TYPE_SUITE,820405,49.119754
DAYS_TERMINATION,673065,40.298129
NFLAG_INSURED_ON_APPROVAL,673065,40.298129
DAYS_FIRST_DRAWING,673065,40.298129
DAYS_FIRST_DUE,673065,40.298129
DAYS_LAST_DUE_1ST_VERSION,673065,40.298129


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №5

<span style="color: #be0119"><b>POS_CASH_BALANCE</b></span>: monthly data about <b>previous point of sale or cash loans clients have had with Home Credit</b>. Each row is one month of a previous point of sale or cash loan, and a single previous loan can have many rows.

In [9]:
df_POS_CASH_balance.columns.values

array(['SK_ID_PREV', 'SK_ID_CURR', 'MONTHS_BALANCE', 'CNT_INSTALMENT',
       'CNT_INSTALMENT_FUTURE', 'NAME_CONTRACT_STATUS', 'SK_DPD',
       'SK_DPD_DEF'], dtype=object)

In [25]:
df_POS_CASH_balance.head(3)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,MONTHS_BALANCE,CNT_INSTALMENT,CNT_INSTALMENT_FUTURE,NAME_CONTRACT_STATUS,SK_DPD,SK_DPD_DEF
0,1803195,182943,-31,48.0,45.0,Active,0,0
1,1715348,367990,-33,36.0,35.0,Active,0,0
2,1784872,397406,-32,12.0,9.0,Active,0,0


In [26]:
print('Source №2 has: ',df_POS_CASH_balance.shape[0], ' records and', df_POS_CASH_balance.shape[1], ' features')

Source №2 has:  10001358  records and 8  features


In [78]:
# Count number of Null values and their percentage in == df_POS_CASH_balance == dataset.
order = df_POS_CASH_balance.isnull().sum().sort_values(ascending = False)
percent = (df_POS_CASH_balance.isnull().sum()/df_POS_CASH_balance.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau_balance  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau_balance.head(4)

Unnamed: 0,Null count,Null count (%)
CNT_INSTALMENT_FUTURE,26087,0.260835
CNT_INSTALMENT,26071,0.260675
SK_DPD_DEF,0,0.0
SK_DPD,0,0.0


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №6

<span style="color: #be0119"><b>credit_card_balance</b></span>: monthly data about <b>previous credit cards clients have had with Home Credit</b>. Each row is one month of a credit card balance, and a single credit card can have many rows.

In [10]:
df_credit_card_balance.columns.values

array(['SK_ID_PREV', 'SK_ID_CURR', 'MONTHS_BALANCE', 'AMT_BALANCE',
       'AMT_CREDIT_LIMIT_ACTUAL', 'AMT_DRAWINGS_ATM_CURRENT',
       'AMT_DRAWINGS_CURRENT', 'AMT_DRAWINGS_OTHER_CURRENT',
       'AMT_DRAWINGS_POS_CURRENT', 'AMT_INST_MIN_REGULARITY',
       'AMT_PAYMENT_CURRENT', 'AMT_PAYMENT_TOTAL_CURRENT',
       'AMT_RECEIVABLE_PRINCIPAL', 'AMT_RECIVABLE',
       'AMT_TOTAL_RECEIVABLE', 'CNT_DRAWINGS_ATM_CURRENT',
       'CNT_DRAWINGS_CURRENT', 'CNT_DRAWINGS_OTHER_CURRENT',
       'CNT_DRAWINGS_POS_CURRENT', 'CNT_INSTALMENT_MATURE_CUM',
       'NAME_CONTRACT_STATUS', 'SK_DPD', 'SK_DPD_DEF'], dtype=object)

In [28]:
df_credit_card_balance.head(3)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,MONTHS_BALANCE,AMT_BALANCE,AMT_CREDIT_LIMIT_ACTUAL,AMT_DRAWINGS_ATM_CURRENT,AMT_DRAWINGS_CURRENT,AMT_DRAWINGS_OTHER_CURRENT,AMT_DRAWINGS_POS_CURRENT,AMT_INST_MIN_REGULARITY,...,AMT_RECIVABLE,AMT_TOTAL_RECEIVABLE,CNT_DRAWINGS_ATM_CURRENT,CNT_DRAWINGS_CURRENT,CNT_DRAWINGS_OTHER_CURRENT,CNT_DRAWINGS_POS_CURRENT,CNT_INSTALMENT_MATURE_CUM,NAME_CONTRACT_STATUS,SK_DPD,SK_DPD_DEF
0,2562384,378907,-6,56.97,135000,0.0,877.5,0.0,877.5,1700.325,...,0.0,0.0,0.0,1,0.0,1.0,35.0,Active,0,0
1,2582071,363914,-1,63975.555,45000,2250.0,2250.0,0.0,0.0,2250.0,...,64875.555,64875.555,1.0,1,0.0,0.0,69.0,Active,0,0
2,1740877,371185,-7,31815.225,450000,0.0,0.0,0.0,0.0,2250.0,...,31460.085,31460.085,0.0,0,0.0,0.0,30.0,Active,0,0


In [29]:
print('Source №2 has: ',df_credit_card_balance.shape[0], ' records and', df_credit_card_balance.shape[1], ' features')

Source №2 has:  3840312  records and 23  features


In [81]:
# Count number of Null values and their percentage in == df_credit_card_balance == dataset.
order = df_credit_card_balance.isnull().sum().sort_values(ascending = False)
percent = (df_credit_card_balance.isnull().sum()/df_credit_card_balance.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau_balance  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau_balance.head(11)

Unnamed: 0,Null count,Null count (%)
AMT_PAYMENT_CURRENT,767988,19.998063
AMT_DRAWINGS_OTHER_CURRENT,749816,19.524872
CNT_DRAWINGS_POS_CURRENT,749816,19.524872
CNT_DRAWINGS_OTHER_CURRENT,749816,19.524872
CNT_DRAWINGS_ATM_CURRENT,749816,19.524872
AMT_DRAWINGS_ATM_CURRENT,749816,19.524872
AMT_DRAWINGS_POS_CURRENT,749816,19.524872
CNT_INSTALMENT_MATURE_CUM,305236,7.948208
AMT_INST_MIN_REGULARITY,305236,7.948208
SK_DPD_DEF,0,0.0


<img src="pic/lin.jpg">

# <span style="color:#ffad01">Source №7

<span style="color: #be0119"><b>installments_payment</b></span>: payment <b>history for previous loans at Home Credit</b>. There is one row for every made payment and one row for every missed payment.

In [11]:
df_installments_payments.columns.values

array(['SK_ID_PREV', 'SK_ID_CURR', 'NUM_INSTALMENT_VERSION',
       'NUM_INSTALMENT_NUMBER', 'DAYS_INSTALMENT', 'DAYS_ENTRY_PAYMENT',
       'AMT_INSTALMENT', 'AMT_PAYMENT'], dtype=object)

In [32]:
df_installments_payments.head(3)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,NUM_INSTALMENT_VERSION,NUM_INSTALMENT_NUMBER,DAYS_INSTALMENT,DAYS_ENTRY_PAYMENT,AMT_INSTALMENT,AMT_PAYMENT
0,1054186,161674,1.0,6,-1180.0,-1187.0,6948.36,6948.36
1,1330831,151639,0.0,34,-2156.0,-2156.0,1716.525,1716.525
2,2085231,193053,2.0,1,-63.0,-63.0,25425.0,25425.0


In [33]:
print('Source №2 has: ',df_installments_payments.shape[0], ' records and', df_installments_payments.shape[1], ' features')

Source №2 has:  13605401  records and 8  features


In [83]:
# Count number of Null values and their percentage in == df_installments_payments == dataset.
order = df_installments_payments.isnull().sum().sort_values(ascending = False)
percent = (df_installments_payments.isnull().sum()/df_installments_payments.isnull().count()*100).sort_values(ascending = False)
missing_df_bureau_balance  = pd.concat([order, percent], axis=1, keys=['Null count', 'Null count (%)'])
missing_df_bureau_balance.head(4)

Unnamed: 0,Null count,Null count (%)
AMT_PAYMENT,2905,0.021352
DAYS_ENTRY_PAYMENT,2905,0.021352
AMT_INSTALMENT,0,0.0
DAYS_INSTALMENT,0,0.0


<img src="pic/lin.jpg">

[GO NEXT >>](./Chapter 2__Exploratory Data Analysis.ipynb)