# Modeling Personal Loan Delinquency with LendingClub Data

## Data

In [1]:
from utils.utils import load_dataframe, preprocess, split_data
from utils.models import build_mle_matrix, build_mc_no_priors, build_mc_with_priors
from utils.inference import compute_mle, infer_mc_no_priors, infer_mc_with_priors

Instructions for updating:
Use the retry module or similar alternatives.


In [None]:
df = load_dataframe()

Loading raw data from cache...
Retrieved 40,268,594 rows, 4 columns in 2.69 seconds


Our variable of interest is called `loan_status` which has eight possible states. These are the Loan Status Descriptions from the LendingClub [website](https://help.lendingclub.com/hc/en-us/articles/215488038-What-do-the-different-Note-statuses-mean-):

- **Current**: Loan is up to date on all outstanding payments. 

- **Fully paid**: Loan has been fully repaid, either at the expiration of the 3- or 5-year year term or as a result of a prepayment.
 
- **Late (16-30)**: Loan has not been current for 16 to 30 days. Learn more about the tools LendingClub has to deal with delinquent borrowers.
 
- **Late (31-120)**: Loan has not been current for 31 to 120 days. Learn more about the tools LendingClub has to deal with delinquent borrowers.
 
- **Charged Off**: Loan for which there is no longer a reasonable expectation of further payments. Upon Charge Off, the remaining principal balance of the Note is deducted from the account balance. Charge Off typically occurs when a loan is 120 days or more past due and there is no reasonable expectation of sufficient payment to prevent the charge off. Loans for which borrowers have filed for bankruptcy may be charged off earlier based on the date of bankruptcy notification. 

- **Default**: Loan has not been current for an extended period of time. More about the difference between Default and Charged Off [here](https://help.lendingclub.com/hc/en-us/articles/216127747)

- **In Grace Period**: Loan is past due but within the 15-day grace period. 

- **Issued**: New loan that has passed all LendingClub reviews, received full funding, and has been issued.

In [None]:
df = preprocess(df)

Mapping column names...
Loading preprocessed data from cache...


In [None]:
x_train, x_test = split_data(df)

## Experiment 1: Markov Model with Maximum Likelihood Estimates

The MLE solution of a Markov Chain is simply the empirical frequencies of each transition. Even though we want to solve the problem from a Bayesian perspective, it's good to look at this estimate and have it in mind later.

### 1.1 Model

In [5]:
realized_transitions = build_mle_matrix(df)

Loading transition matrix from hdf5 cache...
Fetching transition matrix took 0.01 seconds


In [6]:
realized_transitions

Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0,0,0,0,0,0,0,0
Current,774,24453702,3,707322,5831,0,160366,62102
Default,28897,147,2297,71,0,0,4,506
Fully Paid,0,0,0,8063,12,0,101,72
In Grace Period,0,276,0,11,22,0,59,41
Issued,0,17206,0,670,1,0,38,1
Late (16-30 days),4548,32376,0,2066,257,0,13413,119621
Late (31-120 days),105934,25434,29802,2146,56,0,3292,332762


### 1.2 Inference

In [7]:
compute_mle(realized_transitions)

Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Current,0.0,0.96,0.0,0.03,0.0,0.0,0.01,0.0
Default,0.91,0.0,0.07,0.0,0.0,0.0,0.0,0.02
Fully Paid,0.0,0.0,0.0,0.98,0.0,0.0,0.01,0.01
In Grace Period,0.0,0.67,0.0,0.03,0.05,0.0,0.14,0.1
Issued,0.0,0.96,0.0,0.04,0.0,0.0,0.0,0.0
Late (16-30 days),0.03,0.19,0.0,0.01,0.0,0.0,0.08,0.69
Late (31-120 days),0.21,0.05,0.06,0.0,0.0,0.0,0.01,0.67


### 1.3 Criticism

## Experiment 2: Stationary Markov Chain without Priors

In [8]:
chain_len = max(df.age_of_loan)
n_states = df.loan_status.unique().shape[0]

### 2.1 Model

In [9]:
x, T = build_mc_no_priors(n_states, chain_len)

### 2.2 Inference

In [10]:
infer_mc_no_priors(x_train, x, T, n_states, chain_len)

20000/20000 [100%] ██████████████████████████████ Elapsed: 107s | Loss: 2.416


Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.13,0.12,0.13,0.13,0.13,0.13,0.13,0.13
Current,0.18,0.18,0.17,0.17,0.18,0.17,0.17,0.18
Default,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14
Fully Paid,0.11,0.11,0.1,0.11,0.1,0.11,0.11,0.1
In Grace Period,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11
Issued,0.1,0.11,0.1,0.11,0.1,0.11,0.1,0.1
Late (16-30 days),0.1,0.1,0.11,0.1,0.1,0.1,0.1,0.11
Late (31-120 days),0.14,0.13,0.14,0.13,0.13,0.13,0.13,0.13


### 2.3 Criticism

## Experiment 3: Stationary Markov Chain with Priors

### 3.1 Model

In [11]:
batch_size = 1000

In [12]:
x, pi_0, pi_T = build_mc_with_priors(n_states, chain_len, batch_size)

### 3.2 Inference (Batch)

In [13]:
infer_mc_with_priors(x_train, x, pi_0, pi_T, n_states, chain_len, batch_size)

133700/133700 [100%] ██████████████████████████████ Elapsed: 539s | Loss: nan


Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.1,0.11,0.24,0.04,0.04,0.1,0.28,0.08
Current,0.03,0.09,0.17,0.15,0.02,0.03,0.31,0.2
Default,0.01,0.1,0.6,0.0,0.08,0.09,0.02,0.09
Fully Paid,0.07,0.23,0.21,0.14,0.04,0.01,0.13,0.18
In Grace Period,0.07,0.22,0.0,0.07,0.21,0.31,0.08,0.05
Issued,0.04,0.36,0.27,0.1,0.01,0.11,0.0,0.12
Late (16-30 days),0.08,0.21,0.01,0.26,0.17,0.15,0.09,0.04
Late (31-120 days),0.06,0.22,0.04,0.07,0.05,0.14,0.13,0.31


### 3.3 Criticism