# Modeling Personal Loan Delinquency with LendingClub Data

## Imports and Settings

In [2]:
import numpy as np
import pandas as pd

from utils.utils import load_dataframe, preprocess, split_data
from utils.models import build_mle_matrix, build_mc_no_priors, build_mc_with_priors
from utils.inference import compute_mle, infer_mc_no_priors, infer_mc_with_priors

In [3]:
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_colwidth', -1)

## Data Loading and Preprocessing

In [4]:
df = load_dataframe()

Loading raw data from hdf5 cache...
Fetching raw data took 2.78 seconds
Retrieved 40,268,594 rows, 4 columns


Our variable of interest is called `loan_status` which has eight possible states. These are the Loan Status Descriptions from the LendingClub [website](https://help.lendingclub.com/hc/en-us/articles/215488038-What-do-the-different-Note-statuses-mean-):

- **Current**: Loan is up to date on all outstanding payments. 

- **Fully paid**: Loan has been fully repaid, either at the expiration of the 3- or 5-year year term or as a result of a prepayment.
 
- **Late (16-30)**: Loan has not been current for 16 to 30 days. Learn more about the tools LendingClub has to deal with delinquent borrowers.
 
- **Late (31-120)**: Loan has not been current for 31 to 120 days. Learn more about the tools LendingClub has to deal with delinquent borrowers.
 
- **Charged Off**: Loan for which there is no longer a reasonable expectation of further payments. Upon Charge Off, the remaining principal balance of the Note is deducted from the account balance. Charge Off typically occurs when a loan is 120 days or more past due and there is no reasonable expectation of sufficient payment to prevent the charge off. Loans for which borrowers have filed for bankruptcy may be charged off earlier based on the date of bankruptcy notification. 

- **Default**: Loan has not been current for an extended period of time. More about the difference between Default and Charged Off [here](https://help.lendingclub.com/hc/en-us/articles/216127747)

- **In Grace Period**: Loan is past due but within the 15-day grace period. 

- **Issued**: New loan that has passed all LendingClub reviews, received full funding, and has been issued.

In [5]:
df = preprocess(df)

Mapping column names...
Loading preprocessed data from hdf5 cache...
Fetching preprocessed data took 2.58 seconds
Preprocessed 27,641,460 rows, 4 columns


In [6]:
x_train, x_test = split_data(df)

Loading training and test data from hdf5 cache...
Fetching training and test data took 0.69 seconds
Training on 1,337,814 rows, 36 columns
Testing on 148,541 rows, 36 columns


## Experiment 1: Markov Model with Maximum Likelihood Estimates

The MLE solution of a Markov Chain is simply the empirical frequencies of each transition. Even though we want to solve the problem from a Bayesian perspective, it's good to look at this estimate and have it in mind later.

### Model

In [7]:
realized_transitions = build_mle_matrix(df)

Loading transition matrix from hdf5 cache...
Fetching transition matrix took 0.01 seconds


In [8]:
realized_transitions

Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0,0,0,0,0,0,0,0
Current,774,24453702,3,707322,5831,0,160366,62102
Default,28897,147,2297,71,0,0,4,506
Fully Paid,0,0,0,8063,12,0,101,72
In Grace Period,0,276,0,11,22,0,59,41
Issued,0,17206,0,670,1,0,38,1
Late (16-30 days),4548,32376,0,2066,257,0,13413,119621
Late (31-120 days),105934,25434,29802,2146,56,0,3292,332762


### Inference

In [9]:
compute_mle(realized_transitions)

Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Current,0.0,0.96,0.0,0.03,0.0,0.0,0.01,0.0
Default,0.91,0.0,0.07,0.0,0.0,0.0,0.0,0.02
Fully Paid,0.0,0.0,0.0,0.98,0.0,0.0,0.01,0.01
In Grace Period,0.0,0.67,0.0,0.03,0.05,0.0,0.14,0.1
Issued,0.0,0.96,0.0,0.04,0.0,0.0,0.0,0.0
Late (16-30 days),0.03,0.19,0.0,0.01,0.0,0.0,0.08,0.69
Late (31-120 days),0.21,0.05,0.06,0.0,0.0,0.0,0.01,0.67


### Criticism

## Experiment 2: Stationary Markov Chain without Priors

In [10]:
chain_len = max(df.age_of_loan)
n_states = df.loan_status.unique().shape[0]

### Model

In [11]:
x, T = build_mc_no_priors(n_states, chain_len)

### Inference

In [12]:
infer_mc_no_priors(x_train, x, T, n_states, chain_len)

20000/20000 [100%] ██████████████████████████████ Elapsed: 102s | Loss: 0.273


Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.161711,0.162518,0.158327,0.161573,0.162255,0.160442,0.162124,0.160241
Current,0.08678,0.091728,0.089983,0.090928,0.08645,0.089542,0.086424,0.088475
Default,0.146862,0.143745,0.150052,0.148977,0.141731,0.151126,0.154098,0.144713
Fully Paid,0.164884,0.165224,0.16132,0.166085,0.176752,0.164623,0.161951,0.169826
In Grace Period,0.100248,0.100405,0.099736,0.096463,0.100387,0.102522,0.097602,0.094037
Issued,0.114879,0.113395,0.112255,0.112943,0.108466,0.108026,0.114654,0.110054
Late (16-30 days),0.144974,0.144741,0.148639,0.142889,0.146379,0.144438,0.142573,0.151518
Late (31-120 days),0.079662,0.078243,0.079688,0.080142,0.07758,0.07928,0.080574,0.081135


## Experiment 3: Stationary Markov Chain with Priors

### Model

In [13]:
batch_size = 1000

In [14]:
x, pi_0, pi_T = build_mc_with_priors(n_states, chain_len, batch_size)

### Inference (Batch)

In [15]:
infer_mc_with_priors(x_train, x, pi_0, pi_T, n_states, chain_len, batch_size)

6666/6685 [ 99%] █████████████████████████████  ETA: 0s | Loss: 28943.219

Unnamed: 0,Charged Off,Current,Default,Fully Paid,In Grace Period,Issued,Late (16-30 days),Late (31-120 days)
Charged Off,0.18243,0.047205,0.14082,0.090194,0.08191,0.011021,0.390469,0.055952
Current,0.13261,0.085492,0.030641,0.085002,0.176433,0.154572,0.12116,0.21409
Default,0.182153,0.131054,0.132949,0.108605,0.211747,0.090682,0.085633,0.057176
Fully Paid,0.05338,0.11513,0.266013,0.019436,0.01931,0.103019,0.011012,0.4127
In Grace Period,0.081535,0.055636,0.266034,0.130157,0.051865,0.367471,0.033102,0.0142
Issued,0.079212,0.034491,0.175461,0.007455,0.06822,0.371271,0.078835,0.185055
Late (16-30 days),0.027348,0.012592,0.115079,0.402022,0.283953,0.066506,0.031372,0.061128
Late (31-120 days),0.019363,0.361003,0.07609,0.191144,0.005702,0.012486,0.05658,0.277631
