This is basically a compilation of the informations reported on the [data page](https://www.kaggle.com/c/home-credit-default-risk/data) of the competition where I added the variable's descriptions given in the `HomeCredit_columns_description.csv`file. I found the current format easier to retrieve the information. 

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import os
plt.rcParams["patch.force_edgecolor"] = True
plt.style.use('fivethirtyeight')
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last_expr"
pd.options.display.max_columns = 200
print(os.listdir("../input"))

In [None]:
POS_CASH_balance = pd.read_csv('../input/POS_CASH_balance.csv')
bureau_balance = pd.read_csv('../input/bureau_balance.csv')
application_train = pd.read_csv('../input/application_train.csv')
previous_application = pd.read_csv('../input/previous_application.csv')
installments_payments = pd.read_csv('../input/installments_payments.csv')
credit_card_balance = pd.read_csv('../input/credit_card_balance.csv')
sample_submission = pd.read_csv('../input/sample_submission.csv')
application_test = pd.read_csv('../input/application_test.csv')
bureau = pd.read_csv('../input/bureau.csv')

In [None]:
def get_info(df):
    print('Dataframe dimensions:', df.shape)
    print("file sample example:")
    display(df.head())

    tab_info=pd.DataFrame(df.dtypes).T.rename(index={0:'column type'})
    tab_info=tab_info.append(pd.DataFrame(df.isnull().sum()).T.rename(index={0:'null values (nb)'}))
    tab_info=tab_info.append(pd.DataFrame(df.isnull().sum()/df.shape[0]*100)
                             .T.rename(index={0:'null values (%)'}))
    print("Data type and Null values")
    display(tab_info)
    return

## 1. Examen du contenu des dataframes
___
### 1.1 Application train

In [None]:
get_info(application_train)

One row represents one loan in the data sample. Every application is is indexed through the `SK_ID_CURR` id and the `TARGET` column indicates if client may have payment difficulties (`1`) or not (`0`). The `TARGET` is quite imbalanced with a number of defaulted loans that account for:

In [None]:
print("positive values: {:<5.2f}%".format(sum(application_train['TARGET'] == 1) / application_train.shape[0] * 100))

Here comes a description of the variables:
- `NAME_CONTRACT_TYPE`: indicates if loan is cash or revolving
- `CODE_GENDER` / `NAME_FAMILY_STATUS` / `CNT_CHILDREN` / `CNT_FAM_MEMBERS`: client's gender / family status of the client / number of children / number of family members
- `OCCUPATION_TYPE` / `ORGANIZATION_TYPE`: what kind of occupation does the client have and employers origanization type
- `DAYS_BIRTH`: Client's age in days at the time of application
- `DAYS_EMPLOYED`: How many days before the application the person started its current employment
- `DAYS_REGISTRATION` : how many days before the application did client change his registration
- `DAYS_ID_PUBLISH`: how many days before the application did client change the identity document
- `NAME_EDUCATION_TYPE`: level of highest education the client achieved
- `FLAG_OWN_CAR` (`OWN_CAR_AGE`) and `FLAG_OWN_REALTY`: respectively flag if client owns a car (and its age) or a house (flat)
- `FLAG_PHONE` / `FLAG_MOBIL` / `FLAG_WORK_PHONE` / `FLAG_EMP_PHONE`: if client provided home / mobile / work / employers phones
- `FLAG_CONT_MOBLE`: was mobile phone reachable 
- `DAYS_LAST_PHONE_CHANGE`: days since last phone change
- `FLAG_EMAIL`: Did client provide email 
- `REG_REGION_NOT_LIVE_REGION`:  Flag if client's permanent address does not match contact address 
- `REG_REGION_NOT_WORK_REGION`: Flag if client's permanent address does not match work address 
- `LIVE_REGION_NOT_WORK_REGION`: Flag if client's contact address does not match work address 
- `REG_CITY_NOT_LIVE_CITY`: Flag if client's permanent address does not match contact address 
- `REG_CITY_NOT_WORK_CITY`: Flag if client's permanent address does not match work address
- `LIVE_CITY_NOT_WORK_CITY`: Flag if client's contact address does not match work address 
- `NAME_HOUSING_TYPE`: housing situation of the client (renting, living with parents, ...)
____
Informations about the place where the client leaves. The various `_TAG` suffixes correspond to the average (`_AVG`), modus (`_MODE`) and median (`_MEDI`)
- `APARTMENTS_TAG`: apartment size
- `BASEMENTAREA_TAG`:  ???
- `YEARS_BEGINEXPLUATATION_TAG`:  begining of the builiding exploitation
- `YEARS_BUILD_TAG`: age of building
- `COMMONAREA_TAG`: common area 
- `ELEVATORS_TAG`: number of elevators
- `ENTRANCES_TAG`: number of entrances
- `FLOORSMIN_TAG` / `FLOORSMAX_TAG`: number of floors
- `LANDAREA_TAG`: ???
- `LIVINGAPARTMENTS_TAG` / `NONLIVINGAPARTMENTS_TAG`: ???
- `LIVINGAREA_TAG` /  `NONLIVINGAREA_TAG` : living area 
plus a few variables with only `_MODE` suffixes:
- `FONDKAPREMONT_MODE`: ???
- `HOUSETYPE_MODE`: house type
- `TOTAL_AREA_MODE`: total area
- `WALLSMATERIAL_MODE`: walls material
- `EMERGENCYSTATE_MODE`: emergency state
____
- `REGION_POPULATION_RELATIVE`: normalized population of region where client lives
- `AMT_INCOME_TOTAL` and `NAME_INCOME_TYPE`: income of the client and clients income type (businessman, working, maternity leave, ...)
- `AMT_CREDIT`: amount of the loan
- `AMT_ANNUITY`: loan annuity
- `AMT_GOODS_PRICE`: for consumer loans it is the price of the goods for which the loan is given
- `NAME_TYPE_SUITE`: who was accompanying client when he was applying for the loan
- `WEEKDAY_APPR_PROCESS_START` / `HOUR_APPR_PROCESS_START` : on which day of the week / hour did the client apply for the loan
- `REGION_RATING_CLIENT` / `REGION_RATING_CLIENT_W_CIY`: Home Credit's rating of the region where client lives / accounting for the city
- `EXT_SOURCE_1` / `EXT_SOURCE_2` / `EXT_SOURCE_3` : normalized scores from external data source
- `OBS_30_CNT_SOCIAL_CIRCLE` / `DEF_30_CNT_SOCIAL_CIRCLE`: nb. of observations of client's social surroundings observed 30 days past due and how many defaulted
- `OBS_60_CNT_SOCIAL_CIRCLE` / `DEF_60_CNT_SOCIAL_CIRCLE`: nb. of observations of client's social surroundings observed 60 days past due and how many defaulted
- `FLAG_DOCUMENT_X`: if client provided document nºX (X $\in$ [2-21])
- `AMT_REQ_CREDIT_BUREAU_tag` with `tag` in  [`HOUR`, `DAY`, `WEEK`, `MON`, `QRT`, `YEAR`]:  number of enquiries to Credit Bureau about the client one hour (day, week, ...) before application

___
### bureau

All client's previous credits provided by other financial institutions that were reported to Credit Bureau (for clients who have a loan in the sample).

In [None]:
get_info(bureau)

The variables are:

- `SK_ID_CURR`: loan ID
- `SK_BUREAU_ID`: Recoded ID of previous Credit Bureau credit related to our loan 
- `CREDIT_ACTIVE`: Status of the Credit Bureau (CB) reported credits
- `CREDIT_CURRENCY`: Recoded currency of the Credit Bureau credit
- `DAYS_CREDIT`: How many days before current application did client apply for Credit Bureau credit
- `CREDIT_DAY_OVERDUE`:  Number of days past due on CB credit at the time of application for related loan in our sample
- `DAYS_CREDIT_ENDDATE`: Remaining duration of CB credit (in days) at the time of application in Home Credit
- `DAYS_ENDDATE_FACT`: Days since CB credit ended at the time of application in Home Credit (only for closed credit)
- `AMT_CREDIT_MAX_OVERDUE`: Maximal amount overdue on the Credit Bureau credit so far
- `CNT_CREDIT_PROLONG`: How many times was the Credit Bureau credit prolonged
- `AMT_CREDIT_SUM`:  Current credit amount for the Credit Bureau credit
- `AMT_CREDIT_SUM_DEBT`: Current debt on Credit Bureau credit
- `AMT_CREDIT_SUM_LIMIT`: Current credit limit of credit card reported in Credit Bureau
- `AMT_CREDIT_SUM_OVERDUE`:  Current amount overdue on Credit Bureau credit
- `CREDIT_TYPE`:  Type of Credit Bureau credit (Car, cash,...)
- `DAYS_CREDIT_UPDATE`: How many days before loan application did last information about the Credit Bureau credit come
- `AMT_ANNUITY`:  Annuity of the Credit Bureau credit

___
### 1.3 POS_CASH_balance

Monthly balance snapshots of previous POS (point of sales) and cash loans that the applicant had with Home Credit. This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans).

In [None]:
get_info(POS_CASH_balance)

The variables are:

- `SK_ID_CURR`: loan ID
- `SK_ID_PREV`: ID of previous credit in Home
- `MONTH_BALANCE`: Month of balance relative to application date  
- `CNT_INSTALMENT`: Term of previous credit (can change over time)
- `CNT_INSTALMENT_FUTURE`:  Installments left to pay on the previous credit
- `NAME_CONTRACT_STATUS`: Contract status during the month
- `SK_DPD`: days past due during the month of previous credit
- `SK_DPD_DEF`: days past due during the month with tolerance (debts with low loan amounts are ignored) of the previous credit

___
### 1.3 Previous application
For each application in the `application_train` and `application_test` files, the history of previous applications are listed.

In [None]:
get_info(previous_application)

The variables are:

- `SK_ID_CURR`: loan ID
- `SK_ID_PREV`: ID of previous credit in Home
- `NAME_CONTRACT_TYPE`: Contract product type (Cash loan, consumer loan [POS] ,...) of the previous application
- `AMT_ANNUITY`: Annuity of previous application
- `AMT_APPLICATION`: For how much credit did client ask on the previous application
- `AMT_CREDIT`: Final credit amount on the previous application. This differs from AMT_APPLICATION in a way that the AMT_APPLICATION is the amount for which the client initially applied for, but during our approval process he could have received different amount - AMT_CREDIT
- `AMT_DOWN_PAYMENT`: Down payment on the previous application
- `AMT_GOODS_PRICE`: Goods price of good that client asked for (if applicable) on the previous application
- `WEEKDAY_APPR_PROCESS_START`: On which day of the week did the client apply for previous application
- `HOUR_APPR_PROCESS_START`: Approximately at what day hour did the client apply for the previous application
- `FLAG_LAST_APPL_PER_CONTRACT`: Flag if it was last application for the previous contract. Sometimes by mistake of client or our clerk there could be more applications for one single contract
- `NFLAG_LAST_APPL_IN_DAY`: Flag if the application was the last application per day of the client. Sometimes clients apply for more applications a day. Rarely it could also be error in our system that one application is in the database twice
- `NFLAG_MICRO_CASH`: Flag Micro finance loan
- `RATE_DOWN_PAYMENT`: Down payment rate normalized on previous credit
- `RATE_INTEREST_PRIMARY`: Interest rate normalized on previous credit
- `RATE_INTEREST_PRIVILEGED`: Interest rate normalized on previous credit
- `NAME_CASH_LOAN_PURPOSE`: Purpose of the cash loan
- `NAME_CONTRACT_STATUS`: Contract status (approved, cancelled, ...) of previous application
- `DAYS_DECISION`: Relative to current application when was the decision about previous application made,time only relative to the application
- `NAME_PAYMENT_TYPE`: Payment method that client chose to pay for the previous application 
- `CODE_REJECT_REASON`: Why was the previous application rejected
- `NAME_TYPE_SUITE`: Who accompanied client when applying for the previous application
- `NAME_CLIENT_TYPE`: Was the client old or new client when applying for the previous application
- `NAME_GOODS_CATEGORY`: What kind of goods did the client apply for in the previous application
- `NAME_PORTFOLIO`: Was the previous application for CASH, POS, CAR
- `NAME_PRODUCT_TYPE`:Was the previous application x-sell o walk-in
- `CHANNEL_TYPE`: Through which channel we acquired the client on the previous application
- `SELLERPLACE_AREA`: Selling area of seller place of the previous application
- `NAME_SELLER_INDUSTRY`: The industry of the seller
- `CNT_PAYMENT`: Term of previous credit at application of the previous application
- `NAME_YIELD_GROUP`: Grouped interest rate into small medium and high of the previous application
- `PRODUCT_COMBINATION`: Detailed product combination of the previous application
- `DAYS_FIRST_DRAWING`: Relative to application date of current application when was the first disbursement of the previous application,time only relative to the application
- `DAYS_FIRST_DUE`: Relative to application date of current application when was the first due supposed to be of the previous application,time only relative to the application 
- `DAYS_LAST_DUE_1ST_VERSION`: Relative to application date of current application when was the first due of the previous application,time only relative to the application
- `DAYS_LAST_DUE`: Relative to application date of current application when was the last due date of the previous application,time only relative to the application
- `DAYS_TERMINATION`: Relative to application date of current application when was the expected termination of the previous application,time only relative to the application
- `NFLAG_INSURED_ON_APPROVAL`:  Did the client requested insurance during the previous application

### 1.4 Installments payment
Repayment history for the previously disbursed credits in Home Credit.

In [None]:
get_info(installments_payments)

The variables are:

- `SK_ID_CURR`: loan ID
- `SK_ID_PREV`: ID of previous credit in Home
- `NUM_INSTALMENT_VERSION`: Version of installment calendar (0 is for credit card) of previous credit. Change of installment version from month to month signifies that some parameter of payment calendar has changed

- `NUM_INSTALMENT_NUMBER`: On which installment we observe payment
- `DAYS_INSTALMENTS`: When the installment of previous credit was supposed to be paid (relative to application date of current loan) 
- `DAYS_ENTRY_PAYMENT`: When was the installments of previous credit paid actually (relative to application date of current loan)
- `AMT_INSTALMENT`: What was the prescribed installment amount of previous credit on this installment
- `AMT_PAYMENT`: What the client actually paid on previous credit on this installment