# Home Credit Default Risk
Home Credit is an international non-bank, consumer finance group. The company operates in 14 countries and focuses on lending primarily to people with little or no credit history.

*Credit* is the trust which allows one party to provide money or resources to another party where that second party does not reimburse the first party immediately (*thereby generating a debt*), but instead promises either to repay or return those resources at a later date. A *credit bureau* is a collection agency that gathers account information from various creditors.

This notebook aims to be able to **predict how capable an applicant is of repaying a loan**, using datasets, provided by Home Credit.

In [3]:
# Manipulating data
import numpy as np
import pandas as pd

# Managing files
import os

## The different datasets

* **application_train.csv** and **application_test.csv**
  * The main table in two files. The train set contains __labels/targets__ (0: the loan was *repaid* or 1: the loan was *not* repaid), and the test set does not.
  * Static data for all applications. One row represents one loan.  
  
  
* **bureau.csv**
  * All client's previous credits provided by other financial institutions reported by the Credit Bureau (loans).
  * For every loan, there are as many rows as number of credits the client had in Credit Bureau before the application date. In other words, each previous credit has its own row in bureau, but one loan in the application data can have multiple previous credits.
  
  
* **bureau_balance.csv**
  * Monthly balances of previous credits in Credit Bureau (**bureau.csv**).
  * This table has one row for each month of history of every previous credit reported to Credit Bureau - the table has (*# loans in sample * # of relative previous credits * # of months where we have some history observable for the previous credits*) rows. In other words, each row is one month of a previous credit, and a single previous credit can have multiple rows, *one for each month* of the credit length.
  
  
* **POS_CASH_balance.csv**
  * Monthly balance snapshots/data of previous **POS (point of sales) and cash loans** that the applicant had with Home Credit.
  * This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans) related to loans in our sample – i.e. the table has (*# loans in sample * # of relative previous credits * # of months in which we have some history observable for the previous credits*) rows.
  
  
* **credit_card_balance.csv**
  * Monthly balance snapshots of previous **credit cards** that the applicant has with Home Credit.
  * This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans) related to loans in our sample - the table has (*# loans in sample * # of relative previous credit cards * # of months where we have some history observable for the previous credit card*) rows. In other words, each row is one month of a credit card balance, and a single credit card can have many rows.
  
  
* **previous_application.csv**
  * All previous applications for Home Credit loans of *clients who have loans* in our sample.
  * There is one row for each previous application related to loans in our data sample. Each current loan in the application data can have multiple previous loans. Each previous application has one row and is identified by the feature SK_ID_PREV. 
  
  
* **installments_payments.csv** (*A sum of money paid in small parts in a fixed period of time or a single payment within a staged payment plan of a loan*)
  * Repayment history for the previously disbursed credits in Home Credit related to the loans in our sample.
  * There is a) one row for **every payment that was made** plus b) one row each for **missed payment**.
  * One row is equivalent to one payment of one installment OR one installment corresponding to one payment of one previous Home Credit credit related to loans in our sample.
  
  
* **HomeCredit_columns_description.csv**
  * This file contains descriptions for the columns in the various data files.

In [15]:
![the datasets](misc/datasets.png)

/bin/sh: 1: Syntax error: "(" unexpected


In [5]:
print(os.listdir("datasets/"))

['bureau_balance.csv', 'credit_card_balance.csv', 'application_test.csv', 'POS_CASH_balance.csv', 'installments_payments.csv', 'bureau.csv', 'HomeCredit_columns_description.csv', 'previous_application.csv', 'application_train.csv', 'sample_submission.csv']


In [6]:
X_train = pd.read_csv('datasets/application_train.csv')
X_test = pd.read_csv('datasets/application_test.csv')