# Loan Default Prediction - Classfication

## Predict Default w/ Jupyter Notebook

**This notebook contains script to predict defaults for open loans directly downloaded from Lendingclub.com**

---
Steps:
1. sign in and go to https://www.lendingclub.com/browse/browse.action ,
2. click "Download All" on the right bottom of the page,
3. find the file on local machine and save it to where you want the predictions to be saved and/or where this jupyter notebook locates, and input the file path in Section 2 below
4. find and locate model_dict file and input the file path in Section 1 below,
5. run all the codes,
6. view loan default probability predictions in the last cell.
---
<a id = 'toc'></a>
**Table of Contents:**
1. [enter model_dict location](#model)
2. [enter open loan data file location](#loan)
---

In [1]:
# import library
import numpy as np
import pandas as pd
pd.set_option('max_columns', 150)
pd.set_option('max_rows', 180)

import pickle

In [2]:
# model class
class LoanDefaultModel:
    """A model that takes loan data from LendingClub.com and predicts loan default probability."""
    
    def __init__(self, model_dict_location):
        with open(model_dict_location, 'rb') as f:
            model_dict = pickle.load(f)
            self.model = model_dict['model']
            self.clean_dict = model_dict['clean_dict']
            self.engineer_dict = model_dict['engineer_dict']
            self.threshold_dict = model_dict['threshold_dict']
    
    def predict_proba(self, X, clean=True, augment = True):
        X_orig = X.copy()
        if clean:
            X = self.clean_data(X)
        if augment:
            X = self.engineer_features(X)
        return X_orig, X, self.model.predict_proba(X)[:,1]
    
    def clean_data(self, df):        
        clean_dict = self.clean_dict
        
        df = df.drop(clean_dict['drop_cols_in_open_not_in_master'], axis = 1)
        df = df.drop(clean_dict['drop_cols'], axis = 1)

        df.loc[:, 'grade'] = df.grade.map(clean_dict['grade_map'])
        df.loc[:, 'sub_grade'] = df.sub_grade.apply(lambda x: clean_dict['grade_map'][x[0]]*5 + int(x[1]))
        df.loc[:, 'emp_length'] = df.emp_length.apply(lambda x: clean_dict['el_map'][x] if pd.notnull(x) else 0)
        df.loc[:, 'sec_app_earliest_cr_line'] = pd.to_datetime(df.sec_app_earliest_cr_line, errors='coerce')
        df.loc[:, 'earliest_cr_line'] = pd.to_datetime(df.earliest_cr_line, errors = 'coerce')
        df.loc[:, 'list_d'] = pd.to_datetime(df.list_d, errors = 'coerce')
        df.loc[:, 'home_ownership'] = df.home_ownership.replace(['NONE', 'OTHER'], 'ANY')
        df.loc[:, 'purpose'] = df.purpose.apply(lambda x: x.replace(" ", "_").lower())
        try:
            df.loc[:, 'purpose'] = df.purpose.replace(clean_dict['purpose_map'])
        except:
            print("Error with categorical feature purpose:", sys.exc_info()[0])
            raise
        df.loc[:, 'term'] = df.term.replace({36: ' 36 months', 60: ' 60 months'})
        df.loc[:, 'application_type'] = df.application_type.replace({'INDIVIDUAL':'Individual', 'JOINT':'Joint App'})

        df.loc[:, clean_dict['cat_to_num_cols']] = df[clean_dict['cat_to_num_cols']].apply(lambda s: pd.to_numeric(s, errors = 'coerce'))
        df.loc[:, clean_dict['cat_cols']] = df[clean_dict['cat_cols']].fillna('Missing')
        df.loc[:, 'verified_status_joint'] = df.verified_status_joint.replace({' ': 'Missing'})
        df = df.drop(clean_dict['num_cols_miss_sparse'], axis = 1)
        for col in clean_dict['num_cols_miss']:
            df.loc[:, col+'_mflag'] = df[col].isnull()
        for col in clean_dict['num_cols_miss_1200']:
            df.loc[:, col] = df[col].fillna(1200)
        for col in clean_dict['num_cols_miss_median']:
            df.loc[:, col] = df[col].fillna(clean_dict['num_cols_miss_median_values'][col])

        return df
    
    
    def engineer_features(self, df):
        engineer_dict = self.engineer_dict
        
        df.loc[:, 'credit_history'] = ((df.list_d - df.earliest_cr_line)/ np.timedelta64(1, 'D')).astype(int)
        df = df.drop(['earliest_cr_line','list_d'], axis = 1)
        df.loc[:, 'itlm'] = df.annual_inc/df.installment
        df = pd.get_dummies(df)
        df, _ = df.align(engineer_dict['abt_df'], join = 'right', axis = 1, fill_value = 0)   

        return df 
    

[back to top](#toc)

<a id = 'model'></a>
### 1. Enter model_dict location

In [3]:
# input model_dict location in bracket
default_model = LoanDefaultModel('model_dict.pkl')

[back to top](#toc)

<a id = 'loan'></a>
### 2. Enter open_loan_data file location

In [4]:
# input new open loan data file location in bracket
df = pd.read_csv('../preds/open_loan_data_v20200309.csv')

In [5]:
orig_df, df, pred = default_model.predict_proba(df, clean = True, augment = True)
orig_df.loc[:, 'predict_proba'] = pred
orig_df.loc[:, 'gr_than_thres_05'] = (orig_df.predict_proba > default_model.threshold_dict['thres_05'])
orig_df.loc[:, 'gr_than_thres_08'] = (orig_df.predict_proba > default_model.threshold_dict['thres_08'])
orig_df.loc[:, 'gr_than_thres_10'] = (orig_df.predict_proba > default_model.threshold_dict['thres_10'])    

In [6]:
# view prediction
orig_df[['id','sub_grade', 'int_rate','term','loan_amnt','exp_default_rate', 'predict_proba', 
         'gr_than_thres_05', 'gr_than_thres_08', 'gr_than_thres_10']]

Unnamed: 0,id,sub_grade,int_rate,term,loan_amnt,exp_default_rate,predict_proba,gr_than_thres_05,gr_than_thres_08,gr_than_thres_10
0,166274577,A5,8.81,36,40000.0,2.63,0.159963,True,True,True
1,166375935,D3,23.05,36,13825.0,12.86,0.28174,True,True,True
2,166325788,D2,20.55,36,16800.0,12.86,0.302715,True,True,True
3,167354641,A1,6.46,36,40000.0,2.63,0.00849,False,False,False
4,167715590,C1,14.3,36,30000.0,7.96,0.130965,True,True,False
5,167897220,D1,18.62,36,8500.0,12.86,0.241747,True,True,True
6,167647256,A4,8.19,36,30000.0,2.63,0.076347,True,False,False
7,166806117,B5,13.08,60,26575.0,5.53,0.175619,True,True,True
8,167608461,B5,13.08,36,36000.0,5.36,0.158185,True,True,True
9,167655531,D4,25.65,36,16575.0,12.86,0.156629,True,True,True


1. **Last 4 columns show predicted probabilities for each loan, and whether or not it will default using thresholds targeting specific default rate**
2. **Once decide on a loan, remember its id and find it on lendingclub platform to order the note.**

**End of the file**

**[back to top](#toc)**