# Loan Default Prediction - Classification

## Part 0b. Data Aggregation

**This notebook contains code to aggregate separate data files into one master data file, after checking and ensuring the columns and rows from each file is consistent with each other.**

---


<a id = 'toc'></a>
**Table of Contents**
1. [Check sample data files - individual file format & column alignment](#sample)
    1. [data file 3a: 2007-2011](#2007)
    2. [data file 3b: 2012-2013](#2012)
    3. [data file 2016Q1](#2016q1)
    4. [data file 2019Q3](#2019q3)
    
2. [Aggregate all data files into one](#agg)

3. [Notes](#notes)

---

In [2]:
import numpy as np
import pandas as pd
pd.set_option('max_columns', 100)
pd.set_option('max_rows', 160)

import os

In [3]:
files = os.listdir("../data/raw")

In [4]:
len(files)

20

In [28]:
files = sorted(files)
for i in range(len(files)):
    print (i, files[i])

0 .DS_Store
1 LoanStats3a_securev1.csv.zip
2 LoanStats3b_securev1.csv.zip
3 LoanStats3c_securev1.csv.zip
4 LoanStats3d_securev1.csv.zip
5 LoanStats_securev1_2016Q1.csv.zip
6 LoanStats_securev1_2016Q2.csv.zip
7 LoanStats_securev1_2016Q3.csv.zip
8 LoanStats_securev1_2016Q4.csv.zip
9 LoanStats_securev1_2017Q1.csv.zip
10 LoanStats_securev1_2017Q2.csv.zip
11 LoanStats_securev1_2017Q3.csv.zip
12 LoanStats_securev1_2017Q4.csv.zip
13 LoanStats_securev1_2018Q1.csv.zip
14 LoanStats_securev1_2018Q2.csv.zip
15 LoanStats_securev1_2018Q3.csv.zip
16 LoanStats_securev1_2018Q4.csv.zip
17 LoanStats_securev1_2019Q1.csv.zip
18 LoanStats_securev1_2019Q2.csv.zip
19 LoanStats_securev1_2019Q3.csv.zip


[**back to top**](#toc)

<a id = 'sample'></a>
### 1. Check sample data files for single file format and column consistency

<a id = '2007'></a>
#### 1.A. Check data file for 2007-2011

In [8]:
%%time
df_2007_2011 = pd.read_csv(dpath+'{}'.format(files[1]), low_memory = False)

CPU times: user 3.18 s, sys: 382 ms, total: 3.56 s
Wall time: 3.6 s


In [9]:
df_2007_2011.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109,Unnamed: 110,Unnamed: 111,Unnamed: 112,Unnamed: 113,Unnamed: 114,Unnamed: 115,Unnamed: 116,Unnamed: 117,Unnamed: 118,Unnamed: 119,Unnamed: 120,Unnamed: 121,Unnamed: 122,Unnamed: 123,Unnamed: 124,Unnamed: 125,Unnamed: 126,Unnamed: 127,Unnamed: 128,Unnamed: 129,Unnamed: 130,Unnamed: 131,Unnamed: 132,Unnamed: 133,Unnamed: 134,Unnamed: 135,Unnamed: 136,Unnamed: 137,Unnamed: 138,Unnamed: 139,Unnamed: 140,Unnamed: 141,Unnamed: 142,Unnamed: 143,Unnamed: 144,Unnamed: 145,Unnamed: 146,Unnamed: 147,Unnamed: 148,Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action)
id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,last_fico_range_high,last_fico_range_low,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_act_il,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
1077501,,5000,5000,4975,36 months,10.65%,162.87,B,B2,,10+ years,RENT,24000,Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1077501,Borrower added on 12/22/11 > I need to upgrade my business technologies.<br>,credit_card,Computer,860xx,AZ,27.65,0,Jan-1985,735,739,1,,,3,0,13648,83.7%,9,f,0.00,0.00,5863.1551866952,5833.84,5000.00,863.16,0.0,0.0,0.0,Jan-2015,171.62,,May-2019,714,710,0,,1,Individual,,,,0,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1077430,,2500,2500,2500,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000,Source Verified,Dec-2011,Charged Off,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1077430,Borrower added on 12/22/11 > I plan to use this money to finance the motorcycle i am looking at. I plan to have it paid off as soon as possible/when i sell my old bike. I only need this money because the deal im looking at is to good to pass up.<br><br> Borrower added on 12/22/11 > I plan to use this money to finance the motorcycle i am looking at. I plan to have it paid off as soon as possible/when i sell my old bike.I only need this money because the deal im looking at is to good to pass up. I have finished college with an associates degree in business and its takingmeplaces<br>,car,bike,309xx,GA,1,0,Apr-1999,740,744,5,,,3,0,1687,9.4%,4,f,0.00,0.00,1014.53,1014.53,456.46,435.17,0.0,122.9,1.11,Apr-2013,119.66,,Oct-2016,499,0,0,,1,Individual,,,,0,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1077175,,2400,2400,2400,36 months,15.96%,84.33,C,C5,,10+ years,RENT,12252,Not Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1077175,,small_business,real estate business,606xx,IL,8.72,0,Nov-2001,735,739,2,,,2,0,2956,98.5%,10,f,0.00,0.00,3005.6668441393,3005.67,2400.00,605.67,0.0,0.0,0.0,Jun-2014,649.91,,Jun-2017,739,735,0,,1,Individual,,,,0,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1076863,,10000,10000,10000,36 months,13.49%,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1076863,"Borrower added on 12/21/11 > to pay for property tax (borrow from friend, need to pay back) & central A/C need to be replace. I'm very sorry to let my loan expired last time.<br>",other,personel,917xx,CA,20,0,Feb-1996,690,694,1,35,,10,0,5598,21%,37,f,0.00,0.00,12231.890000000902,12231.89,10000.00,2214.92,16.97,0.0,0.0,Jan-2015,357.48,,Apr-2016,604,600,0,,1,Individual,,,,0,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [10]:
df_2007_2011.tail()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109,Unnamed: 110,Unnamed: 111,Unnamed: 112,Unnamed: 113,Unnamed: 114,Unnamed: 115,Unnamed: 116,Unnamed: 117,Unnamed: 118,Unnamed: 119,Unnamed: 120,Unnamed: 121,Unnamed: 122,Unnamed: 123,Unnamed: 124,Unnamed: 125,Unnamed: 126,Unnamed: 127,Unnamed: 128,Unnamed: 129,Unnamed: 130,Unnamed: 131,Unnamed: 132,Unnamed: 133,Unnamed: 134,Unnamed: 135,Unnamed: 136,Unnamed: 137,Unnamed: 138,Unnamed: 139,Unnamed: 140,Unnamed: 141,Unnamed: 142,Unnamed: 143,Unnamed: 144,Unnamed: 145,Unnamed: 146,Unnamed: 147,Unnamed: 148,Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action)
72176,,2525.0,2525.0,225.0,36 months,9.33%,80.69,B,B3,,< 1 year,RENT,110000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=72176,"I need to pay $2,100 for fixing my Volvo :) Any help appreciated!",other,Car repair bill,100xx,NY,10.0,,,710.0,714.0,,,,,,0.0,,,f,0.0,0.0,2904.49882892,258.82,2525.0,379.5,0.0,0.0,0.0,Jun-2010,82.03,Jul-2010,May-2007,714.0,710.0,,,1.0,Individual,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
71623,,6500.0,6500.0,0.0,36 months,8.38%,204.84,A,A5,,< 1 year,NONE,,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=71623,"Hi, I'm buying a used car. Anybody on facebook wants to finance me? Thanks",other,Buying a car,100xx,NY,4.0,,,740.0,744.0,,,,,,0.0,,,f,0.0,0.0,7373.904961698404,0.0,6500.0,873.9,0.0,0.0,0.0,Jun-2010,205.32,Jul-2010,Aug-2007,724.0,720.0,,,1.0,Individual,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
70686,,5000.0,5000.0,0.0,36 months,7.75%,156.11,A,A3,Homemaker,10+ years,MORTGAGE,70000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=70686,"I need to make several improvements around the house - fix garage, fix back fencing, and misc other.",other,Aroundthehouse,068xx,CT,8.81,,,770.0,774.0,,,,,,0.0,,,f,0.0,0.0,5619.762090469702,0.0,5000.0,619.76,0.0,0.0,0.0,Jun-2010,156.39,Jul-2010,Feb-2015,794.0,790.0,,,1.0,Individual,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
Total amount funded in policy code 1: 460296150,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Total amount funded in policy code 2: 0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


**NOTE: header = first row, footer = last two rows**

In [11]:
%%time
df_2007_2011 = pd.read_csv(dpath+'{}'.format(files[1]), header = 1, skipfooter = 2, engine = 'python')

CPU times: user 4.59 s, sys: 269 ms, total: 4.86 s
Wall time: 4.9 s


In [12]:
df_2007_2011.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,1077501,,5000.0,5000.0,4975.0,36 months,10.65%,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I need to upgra...,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-1985,735.0,739.0,1.0,,,3.0,0.0,13648.0,83.7%,9.0,f,0.0,0.0,5863.155187,5833.84,5000.0,863.16,0.0,0.0,0.0,Jan-2015,171.62,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,1077430,,2500.0,2500.0,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-2011,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0.0,Apr-1999,740.0,744.0,5.0,,,3.0,0.0,1687.0,9.4%,4.0,f,0.0,0.0,1014.53,1014.53,456.46,435.17,0.0,122.9,1.11,Apr-2013,119.66,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,1077175,,2400.0,2400.0,2400.0,36 months,15.96%,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,small_business,real estate business,606xx,IL,8.72,0.0,Nov-2001,735.0,739.0,2.0,,,2.0,0.0,2956.0,98.5%,10.0,f,0.0,0.0,3005.666844,3005.67,2400.0,605.67,0.0,0.0,0.0,Jun-2014,649.91,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,1076863,,10000.0,10000.0,10000.0,36 months,13.49%,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > to pay for prop...,other,personel,917xx,CA,20.0,0.0,Feb-1996,690.0,694.0,1.0,35.0,,10.0,0.0,5598.0,21%,37.0,f,0.0,0.0,12231.89,12231.89,10000.0,2214.92,16.97,0.0,0.0,Jan-2015,357.48,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,1075358,,3000.0,3000.0,3000.0,60 months,12.69%,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > I plan on combi...,other,Personal,972xx,OR,17.94,0.0,Jan-1996,695.0,699.0,0.0,38.0,,15.0,0.0,27783.0,53.9%,38.0,f,0.0,0.0,4066.908161,4066.91,3000.0,1066.91,0.0,0.0,0.0,Jan-2017,67.3,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [13]:
df_2007_2011.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
42531,73582,,3500.0,3500.0,225.0,36 months,10.28%,113.39,C,C1,,< 1 year,RENT,180000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully ...,n,https://lendingclub.com/browse/loanDetail.acti...,I am getting married on July 28 and will need ...,other,Wedding coming up,100xx,NY,10.0,,,685.0,689.0,,,,,,0.0,,,f,0.0,0.0,3719.43107,239.11,3500.0,219.43,0.0,0.0,0.0,Mar-2008,0.0,Mar-2008,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
42532,72998,,1000.0,1000.0,0.0,36 months,9.64%,32.11,B,B4,Halping hands company inc.,< 1 year,RENT,12000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully ...,n,https://lendingclub.com/browse/loanDetail.acti...,I would like to buy some new furniture in my a...,other,delight,021xx,MA,10.0,,,695.0,699.0,,,,,,0.0,,,f,0.0,0.0,1155.600899,0.0,1000.0,155.6,0.0,0.0,0.0,Jun-2010,32.41,Jul-2010,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
42533,72176,,2525.0,2525.0,225.0,36 months,9.33%,80.69,B,B3,,< 1 year,RENT,110000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully ...,n,https://lendingclub.com/browse/loanDetail.acti...,"I need to pay $2,100 for fixing my Volvo :) A...",other,Car repair bill,100xx,NY,10.0,,,710.0,714.0,,,,,,0.0,,,f,0.0,0.0,2904.498829,258.82,2525.0,379.5,0.0,0.0,0.0,Jun-2010,82.03,Jul-2010,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
42534,71623,,6500.0,6500.0,0.0,36 months,8.38%,204.84,A,A5,,< 1 year,NONE,,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully ...,n,https://lendingclub.com/browse/loanDetail.acti...,"Hi, I'm buying a used car. Anybody on faceb...",other,Buying a car,100xx,NY,4.0,,,740.0,744.0,,,,,,0.0,,,f,0.0,0.0,7373.904962,0.0,6500.0,873.9,0.0,0.0,0.0,Jun-2010,205.32,Jul-2010,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
42535,70686,,5000.0,5000.0,0.0,36 months,7.75%,156.11,A,A3,Homemaker,10+ years,MORTGAGE,70000.0,Not Verified,Jun-2007,Does not meet the credit policy. Status:Fully ...,n,https://lendingclub.com/browse/loanDetail.acti...,I need to make several improvements around the...,other,Aroundthehouse,068xx,CT,8.81,,,770.0,774.0,,,,,,0.0,,,f,0.0,0.0,5619.76209,0.0,5000.0,619.76,0.0,0.0,0.0,Jun-2010,156.39,Jul-2010,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [12]:
df_2007_2011.shape

(42536, 150)

In [14]:
cl_d2007 = df_2007_2011.columns.tolist()

In [15]:
cl_d2007[:6]

['id', 'member_id', 'loan_amnt', 'funded_amnt', 'funded_amnt_inv', 'term']

[**back to top**](#toc)
<a id = '2012'></a>
#### 1.B. Check data file 2012-2013

In [16]:
%%time
df_2012_2013 = pd.read_csv(dpath+'{}'.format(files[2]), low_memory = False)

CPU times: user 14.7 s, sys: 1.2 s, total: 15.9 s
Wall time: 16 s


In [17]:
df_2012_2013.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109,Unnamed: 110,Unnamed: 111,Unnamed: 112,Unnamed: 113,Unnamed: 114,Unnamed: 115,Unnamed: 116,Unnamed: 117,Unnamed: 118,Unnamed: 119,Unnamed: 120,Unnamed: 121,Unnamed: 122,Unnamed: 123,Unnamed: 124,Unnamed: 125,Unnamed: 126,Unnamed: 127,Unnamed: 128,Unnamed: 129,Unnamed: 130,Unnamed: 131,Unnamed: 132,Unnamed: 133,Unnamed: 134,Unnamed: 135,Unnamed: 136,Unnamed: 137,Unnamed: 138,Unnamed: 139,Unnamed: 140,Unnamed: 141,Unnamed: 142,Unnamed: 143,Unnamed: 144,Unnamed: 145,Unnamed: 146,Unnamed: 147,Unnamed: 148,Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action)
id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,last_fico_range_high,last_fico_range_low,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_act_il,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
10129454,,12000,12000,12000,36 months,10.99%,392.81,B,B2,Project Manager,4 years,RENT,60000,Not Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=10129454,Borrower added on 12/31/13 > I would like to use this money to payoff existing credit card debt and use the remaining about to purchase a used car that is fuel efficient.<br>,debt_consolidation,No Regrets,281xx,NC,4.62,0,Dec-2009,720,724,1,,,15,0,7137,24%,18,f,0.00,0.00,13988.6099956242,13988.61,12000.00,1988.61,0.0,0.0,0.0,Apr-2016,3775.55,,Aug-2018,569,565,0,,1,Individual,,,,0,0,7137,,,,,,,,,,,,29700,,,,8,476,15216,15.9,0,0,,48,1,1,0,1,,3,,0,4,7,8,10,0,15,18,7,15,0,0,0,4,100,0,0,0,29700,7137,18100,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
10149488,,4800,4800,4800,36 months,10.99%,157.13,B,B2,Surgical Technician,2 years,MORTGAGE,39600,Source Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=10149488,"Borrower added on 12/31/13 > Just bought a house, and would like a little extra funds to improve aspects of the house such as, duct work, electrical outlets, backyard, and other minor areas.<br>",home_improvement,For The House,782xx,TX,2.49,0,Aug-1995,755,759,2,,,3,0,4136,16.1%,8,w,0.00,0.00,5157.5194567178,5157.52,4800.00,357.52,0.0,0.0,0.0,Sep-2014,3900.48,,Jan-2017,534,530,0,,1,Individual,,,,0,0,4136,,,,,,,,,,,,25700,,,,0,1379,21564,16.1,0,0,104,220,25,25,0,25,,3,,0,2,2,3,4,1,3,7,2,3,0,0,0,0,100,0,0,0,25700,4136,25700,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
10148122,,12000,12000,12000,36 months,7.62%,373.94,A,A3,Systems Engineer,3 years,MORTGAGE,96500,Not Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=10148122,"Borrower added on 12/31/13 > Bought a new house, furniture, water softener, a second car, etc. Got our lives started and now a manageable monthly payment will help keep them going!<br>",debt_consolidation,Debt Consolidation and Credit Transfer,782xx,TX,12.61,0,Sep-2003,705,709,0,,,17,0,13248,55.7%,30,f,0.00,0.00,13397.5399977648,13397.54,12000.00,1397.54,0.0,0.0,0.0,Jun-2016,2927.22,,Nov-2019,814,810,0,,1,Individual,,,,0,0,200314,,,,,,,,,,,,23800,,,,4,11783,2441,83.5,0,0,123,118,10,9,1,10,,10,,0,4,5,4,10,15,8,14,5,17,0,0,0,3,100,100,0,0,233004,46738,14800,53404,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
10149342,,27050,27050,27050,36 months,10.99%,885.46,B,B2,Team Leadern Customer Ops & Systems,10+ years,OWN,55000,Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=10149342,Borrower added on 12/31/13 > Combining high interest credit cards to lower interest rate.<br>,debt_consolidation,Debt Consolidation,481xx,MI,22.87,0,Oct-1986,730,734,0,,,14,0,36638,61.2%,27,w,0.00,0.00,31752.53,31752.53,27050.00,4702.53,0.0,0.0,0.0,Jul-2016,6074.19,,Mar-2018,809,805,0,,1,Individual,,,,0,0,114834,,,,,,,,,,,,59900,,,,3,9570,16473,53.9,0,0,117,326,16,6,4,16,,8,,0,2,4,4,8,8,10,15,4,14,0,0,0,1,100,25,0,0,138554,70186,35700,33054,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [18]:
df_2012_2013.tail()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109,Unnamed: 110,Unnamed: 111,Unnamed: 112,Unnamed: 113,Unnamed: 114,Unnamed: 115,Unnamed: 116,Unnamed: 117,Unnamed: 118,Unnamed: 119,Unnamed: 120,Unnamed: 121,Unnamed: 122,Unnamed: 123,Unnamed: 124,Unnamed: 125,Unnamed: 126,Unnamed: 127,Unnamed: 128,Unnamed: 129,Unnamed: 130,Unnamed: 131,Unnamed: 132,Unnamed: 133,Unnamed: 134,Unnamed: 135,Unnamed: 136,Unnamed: 137,Unnamed: 138,Unnamed: 139,Unnamed: 140,Unnamed: 141,Unnamed: 142,Unnamed: 143,Unnamed: 144,Unnamed: 145,Unnamed: 146,Unnamed: 147,Unnamed: 148,Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action)
1059224,,35000.0,35000.0,35000.0,36 months,15.96%,1229.81,C,C5,Tom and Holly Gores,3 years,MORTGAGE,160000.0,Source Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1059224,,small_business,Small Business Loan,922xx,CA,4.9,0.0,Sep-2000,720.0,724.0,1.0,,,9.0,0.0,23665.0,62.4%,17.0,f,0.0,0.0,44272.9399842368,44272.94,35000.0,9272.94,0.0,0.0,0.0,Dec-2014,1244.72,,Dec-2014,739.0,735.0,0.0,,1.0,Individual,,,,0.0,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1058722,,12000.0,12000.0,12000.0,36 months,16.29%,423.61,D,D1,,,MORTGAGE,35000.0,Source Verified,Jan-2012,Charged Off,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1058722,"Borrower added on 12/06/11 > need to pay off my truck note, other expenses.<br><br> Borrower added on 12/06/11 > I am happy to get this offer, so i can get off my personal loans and pay off other bills.<br>",other,others,770xx,TX,12.93,0.0,Aug-2001,675.0,679.0,0.0,,,14.0,0.0,15006.0,93.2%,27.0,f,0.0,0.0,2077.189999999999,2077.19,1063.61,629.79,0.0,383.79,3.99,Apr-2012,423.61,,Oct-2016,584.0,580.0,0.0,,1.0,Individual,,,,0.0,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1058291,,12000.0,7775.0,7775.0,60 months,15.27%,186.08,C,C4,,7 years,RENT,50000.0,Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.action?loan_id=1058291,"Borrower added on 12/06/11 > Want to close down credit cards and pay them off. This way I will only have one payment a month for the loan,, will make it alot easier for monthly money magagement<br>",credit_card,refinance,220xx,VA,5.5,0.0,Feb-2003,715.0,719.0,1.0,,,13.0,0.0,7008.0,37.9%,25.0,f,0.0,0.0,11163.9482553723,11163.95,7775.0,3388.95,0.0,0.0,0.0,Dec-2016,185.23,,Jan-2019,569.0,565.0,0.0,,1.0,Individual,,,,0.0,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
Total amount funded in policy code 1: 2700702175,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Total amount funded in policy code 2: 81866225,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


**NOTE: file format consistent with first data file: header = first row, footer = last two rows**

In [19]:
%%time
df_2012_2013 = pd.read_csv(dpath+'{}'.format(files[2]), header = 1, skipfooter = 2, engine = 'python' )

CPU times: user 24.1 s, sys: 1.27 s, total: 25.4 s
Wall time: 25.8 s


In [20]:
df_2012_2013.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,10129454,,12000,12000,12000.0,36 months,10.99%,392.81,B,B2,Project Manager,4 years,RENT,60000.0,Not Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/31/13 > I would like to...,debt_consolidation,No Regrets,281xx,NC,4.62,0,Dec-2009,720,724,1,,,15,0,7137,24%,18,f,0.0,0.0,13988.609996,13988.61,12000.0,1988.61,0.0,0.0,0.0,Apr-2016,3775.55,,...,18.0,7.0,15.0,0.0,0.0,0.0,4.0,100.0,0.0,0,0,29700.0,7137.0,18100.0,0.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,10149488,,4800,4800,4800.0,36 months,10.99%,157.13,B,B2,Surgical Technician,2 years,MORTGAGE,39600.0,Source Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/31/13 > Just bought a h...,home_improvement,For The House,782xx,TX,2.49,0,Aug-1995,755,759,2,,,3,0,4136,16.1%,8,w,0.0,0.0,5157.519457,5157.52,4800.0,357.52,0.0,0.0,0.0,Sep-2014,3900.48,,...,7.0,2.0,3.0,0.0,0.0,0.0,0.0,100.0,0.0,0,0,25700.0,4136.0,25700.0,0.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,10148122,,12000,12000,12000.0,36 months,7.62%,373.94,A,A3,Systems Engineer,3 years,MORTGAGE,96500.0,Not Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/31/13 > Bought a new ho...,debt_consolidation,Debt Consolidation and Credit Transfer,782xx,TX,12.61,0,Sep-2003,705,709,0,,,17,0,13248,55.7%,30,f,0.0,0.0,13397.539998,13397.54,12000.0,1397.54,0.0,0.0,0.0,Jun-2016,2927.22,,...,14.0,5.0,17.0,0.0,0.0,0.0,3.0,100.0,100.0,0,0,233004.0,46738.0,14800.0,53404.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,10149342,,27050,27050,27050.0,36 months,10.99%,885.46,B,B2,Team Leadern Customer Ops & Systems,10+ years,OWN,55000.0,Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/31/13 > Combining high ...,debt_consolidation,Debt Consolidation,481xx,MI,22.87,0,Oct-1986,730,734,0,,,14,0,36638,61.2%,27,w,0.0,0.0,31752.53,31752.53,27050.0,4702.53,0.0,0.0,0.0,Jul-2016,6074.19,,...,15.0,4.0,14.0,0.0,0.0,0.0,1.0,100.0,25.0,0,0,138554.0,70186.0,35700.0,33054.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,10129477,,14000,14000,14000.0,36 months,12.85%,470.71,B,B4,Assistant Director - Human Resources,4 years,RENT,88000.0,Not Verified,Dec-2013,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,282xx,NC,10.02,1,Jun-1988,670,674,0,16.0,115.0,6,1,3686,81.9%,14,f,0.0,0.0,16945.318783,16945.32,14000.0,2945.32,0.0,0.0,0.0,Jan-2017,470.47,,...,10.0,4.0,6.0,0.0,0.0,0.0,0.0,78.6,100.0,1,0,31840.0,17672.0,3900.0,27340.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [21]:
df_2012_2013.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
188176,1062400,,20500,20500,20500.0,36 months,16.77%,728.54,D,D2,,7 years,RENT,60000.0,Source Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/08/11 > I will be payin...,debt_consolidation,Payoff Loan,100xx,NY,16.4,1,Oct-1988,700,704,3,20.0,,10,0,15417,58.4%,20,f,0.0,0.0,26176.440002,26176.44,20500.0,5676.44,0.0,0.0,0.0,Nov-2014,10.21,,...,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
188177,1059394,,15000,15000,15000.0,36 months,15.27%,521.97,C,C4,,3 years,RENT,57600.0,Source Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,small_business,business loan,900xx,CA,8.35,2,Jan-2004,680,684,1,10.0,,18,0,8897,33.1%,30,f,0.0,0.0,18790.720008,18790.72,15000.0,3790.72,0.0,0.0,0.0,Jan-2015,541.95,,...,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
188178,1059224,,35000,35000,35000.0,36 months,15.96%,1229.81,C,C5,Tom and Holly Gores,3 years,MORTGAGE,160000.0,Source Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,small_business,Small Business Loan,922xx,CA,4.9,0,Sep-2000,720,724,1,,,9,0,23665,62.4%,17,f,0.0,0.0,44272.939984,44272.94,35000.0,9272.94,0.0,0.0,0.0,Dec-2014,1244.72,,...,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
188179,1058722,,12000,12000,12000.0,36 months,16.29%,423.61,D,D1,,,MORTGAGE,35000.0,Source Verified,Jan-2012,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/06/11 > need to pay off...,other,others,770xx,TX,12.93,0,Aug-2001,675,679,0,,,14,0,15006,93.2%,27,f,0.0,0.0,2077.19,2077.19,1063.61,629.79,0.0,383.79,3.99,Apr-2012,423.61,,...,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
188180,1058291,,12000,7775,7775.0,60 months,15.27%,186.08,C,C4,,7 years,RENT,50000.0,Verified,Jan-2012,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/06/11 > Want to close d...,credit_card,refinance,220xx,VA,5.5,0,Feb-2003,715,719,1,,,13,0,7008,37.9%,25,f,0.0,0.0,11163.948255,11163.95,7775.0,3388.95,0.0,0.0,0.0,Dec-2016,185.23,,...,,,,,,,,,,0,0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [22]:
df_2012_2013.shape

(188181, 150)

In [23]:
cl_d2012 = df_2012_2013.columns.tolist()

In [24]:
cl_d2007 == cl_d2012

True

**NOTE: good sign: columns from both the first and second data files are identical**

[**back to top**](#toc)
<a id = '2016q1'></a>
#### 1.C. Check data file 2016Q1

In [29]:
%%time
df_2016q1 = pd.read_csv(dpath+'{}'.format(files[5]), header = 1, skipfooter = 2, engine = 'python' )

CPU times: user 17.1 s, sys: 378 ms, total: 17.4 s
Wall time: 17.5 s


In [30]:
df_2016q1.shape

(133887, 150)

In [31]:
df_2016q1.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,74121690,,6000,6000,6000.0,36 months,12.99%,202.14,C,C2,Salesman,9 years,MORTGAGE,43000.0,Not Verified,Mar-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,211xx,MD,14.4,0,May-2006,675,679,1.0,,,16,0,15055,63%,21,f,0.0,0.0,7268.153165,7268.15,6000.0,1268.15,0.0,0.0,0.0,Apr-2019,201.91,,...,18,11,16,0.0,0,0,1,100.0,37.5,0,0,23900,15055,8900,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,74724861,,21000,21000,21000.0,60 months,19.53%,550.9,D,D5,Human Resources,10+ years,MORTGAGE,65000.0,Verified,Mar-2016,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,credit_card,Credit card refinancing,430xx,OH,43.83,1,Mar-2000,710,714,2.0,15.0,,13,0,124946,65.5%,25,w,8123.69,8123.69,23643.13,23643.13,12876.31,10766.82,0.0,0.0,0.0,Oct-2019,550.9,Dec-2019,...,12,5,13,0.0,0,0,3,96.0,50.0,0,0,276173,204786,47800,106573,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,74826201,,7200,7200,7200.0,36 months,5.32%,216.83,A,A1,Mechanic,10+ years,MORTGAGE,49000.0,Source Verified,Mar-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,554xx,MN,19.05,0,Dec-2001,750,754,1.0,41.0,,11,0,9309,18.4%,36,w,0.0,0.0,7223.41,7223.41,7200.0,23.41,0.0,0.0,0.0,Apr-2016,7227.67,,...,16,2,11,0.0,0,0,4,97.2,20.0,0,0,83614,33681,43300,33014,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,75061311,,12000,12000,12000.0,60 months,11.99%,266.88,C,C1,Mechanic,10+ years,MORTGAGE,49000.0,Not Verified,Mar-2016,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,,credit_card,,333xx,FL,6.59,1,Dec-1999,670,674,1.0,19.0,,10,0,12152,50.6%,29,w,0.0,0.0,8448.9,8448.9,5038.27,2952.14,15.0,443.49,79.8282,Nov-2018,281.88,,...,26,3,10,0.0,0,0,2,82.8,50.0,0,0,183065,12152,17200,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,75091735,,11425,11425,11425.0,36 months,19.53%,421.87,D,D5,Nurse,5 years,RENT,26000.0,Source Verified,Mar-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,major_purchase,Major purchase,328xx,FL,35.56,0,Sep-2008,730,734,0.0,,,11,0,2096,7.2%,18,f,0.0,0.0,14331.034913,14331.03,11425.0,2906.03,0.0,0.0,0.0,Nov-2017,6399.14,,...,8,4,11,0.0,0,0,1,94.4,0.0,0,0,56880,13148,21800,27780,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [32]:
df_2016q1.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
133882,66055600,,6000,6000,6000.0,36 months,9.17%,191.28,B,B2,,,RENT,32640.0,Not Verified,Jan-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,600xx,IL,22.76,0,Sep-1997,695,699,1.0,,101.0,9,1,6898,37.1%,17,w,0.0,0.0,6903.429166,6903.43,6000.0,903.43,0.0,0.0,0.0,Jan-2019,191.02,,...,11,5,9,0.0,0,0,3,100.0,66.7,1,0,45927,30895,11100,27327,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
133883,65854936,,6000,6000,6000.0,36 months,7.89%,187.72,A,A5,Warehouse Clerk,< 1 year,OWN,38000.0,Source Verified,Jan-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,credit_card,Credit card refinancing,432xx,OH,12.35,0,Feb-2006,695,699,0.0,,,10,0,7867,36.8%,14,w,0.0,0.0,6755.041951,6755.04,6000.0,755.04,0.0,0.0,0.0,Jan-2019,187.47,,...,13,7,10,0.0,0,0,1,100.0,33.3,0,0,25300,9223,11600,3900,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
133884,66141895,,14400,14400,14400.0,60 months,13.18%,328.98,C,C3,Meatcutter,10+ years,RENT,47000.0,Verified,Jan-2016,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,,credit_card,Credit card refinancing,531xx,WI,19.64,0,Oct-1976,670,674,4.0,48.0,,7,0,10164,56%,19,w,0.0,0.0,10368.61,10368.61,5549.75,3651.15,0.0,1167.71,210.1878,May-2018,328.98,,...,7,5,7,,0,0,2,79.0,100.0,0,0,49049,35979,5800,30749,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
133885,65673209,,34050,34050,34050.0,36 months,15.41%,1187.21,D,D1,Supervisor,10+ years,MORTGAGE,87800.0,Source Verified,Jan-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,credit_card,Credit card refinancing,212xx,MD,12.1,1,Nov-2005,680,684,1.0,8.0,,14,0,25473,53.2%,21,w,0.0,0.0,42914.121754,42914.12,34050.0,8864.12,0.0,0.0,0.0,Jan-2019,1186.87,,...,18,9,14,0.0,0,0,1,95.2,50.0,0,0,152900,25473,16900,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
133886,65744272,,5000,5000,5000.0,36 months,11.22%,164.22,B,B5,carpenter,7 years,MORTGAGE,65000.0,Source Verified,Jan-2016,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,home_improvement,Home improvement,201xx,VA,3.1,0,Jul-2005,665,669,1.0,,77.0,6,1,5763,48%,8,w,0.0,0.0,5908.587038,5908.59,5000.0,908.59,0.0,0.0,0.0,Jan-2019,164.01,,...,6,3,6,0.0,0,0,1,100.0,75.0,1,0,223105,5763,11000,0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [33]:
def compare_df_columns(df1, df2):
    cl_df1 = df1.columns.tolist()
    cl_df2 = df2.columns.tolist()
    
    print ('Number of columns: first df: {}, second df: {}'.format(len(cl_df1), len(cl_df2)))
    print ('Columns from both dfs are identical:', cl_df1 == cl_df2)

In [34]:
compare_df_columns(df_2016q1, df_2007_2011)

Number of columns: first df: 150, second df: 150
Columns from both dfs are identical: True


[**back to top**](#toc)

<a id = '2019q3'></a>
#### 1.D. Check data file 2019Q3

In [35]:
%%time
df_2019q3 = pd.read_csv(dpath+'{}'.format(files[-1]), header = 1, skipfooter = 2, engine = 'python' )

CPU times: user 18.7 s, sys: 416 ms, total: 19.1 s
Wall time: 19.1 s


In [36]:
df_2019q3.shape

(143035, 150)

In [37]:
df_2019q3.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,158303403,,12000,12000,12000,36 months,8.19%,377.09,A,A4,Teacher,10+ years,RENT,84000.0,Source Verified,Sep-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,937xx,CA,10.86,1,Jan-1996,680,684,0,17.0,,5,0,22489,71.6%,15,w,11713.0,11713.0,371.63,371.63,287.0,84.63,0.0,0.0,0.0,Nov-2019,377.09,Dec-2019,...,5,3,5,0.0,0,0,0,93.3,33.3,0,0,77713,51361,16400,46313,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,158628181,,20000,20000,20000,36 months,8.81%,634.23,A,A5,Remodel expert,3 years,RENT,44000.0,Not Verified,Sep-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,891xx,NV,13.72,0,Feb-2006,745,749,0,,,6,0,5435,21.8%,12,w,18947.67,18947.67,1258.67,1258.67,1052.33,206.34,0.0,0.0,0.0,Nov-2019,634.23,Dec-2019,...,7,4,6,0.0,0,0,2,100.0,0.0,0,0,34900,11722,18900,10000,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,159231951,,18000,18000,18000,60 months,13.08%,410.3,B,B5,Operations Supervisor,3 years,RENT,50000.0,Source Verified,Sep-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,750xx,TX,14.47,0,Sep-2010,660,664,0,,,13,0,13973,85.7%,13,w,17785.9,17785.9,397.22,397.22,214.1,183.12,0.0,0.0,0.0,Oct-2019,410.3,Dec-2019,...,4,4,13,0.0,0,0,2,100.0,33.3,0,0,47197,43223,15000,30897,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,159289397,,10000,10000,10000,36 months,10.33%,324.23,B,B1,,< 1 year,RENT,59300.0,Source Verified,Sep-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,600xx,IL,19.81,0,Jan-2005,720,724,0,41.0,90.0,17,1,3121,8.9%,20,w,9761.85,9761.85,318.49,318.49,238.15,80.34,0.0,0.0,0.0,Oct-2019,324.23,Dec-2019,...,15,8,17,0.0,0,0,4,94.4,0.0,1,0,78772,25938,26100,43772,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,159296485,,7000,7000,7000,36 months,18.62%,255.25,D,D1,Street sweeper,6 years,RENT,55000.0,Verified,Sep-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,902xx,CA,16.98,2,Sep-2010,670,674,0,17.0,,8,0,4546,58.3%,19,w,6853.37,6853.37,248.01,248.01,146.63,101.38,0.0,0.0,0.0,Oct-2019,255.25,Dec-2019,...,15,5,8,0.0,0,0,3,78.9,100.0,0,0,49928,36027,2200,42128,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [38]:
df_2019q3.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
143030,153709478,,23300,23300,23300,60 months,22.50%,650.17,D,D3,Service Representative,10+ years,MORTGAGE,60000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,336xx,FL,30.0,0,Aug-2000,695,699,0,,,19,0,24776,37.9%,31,w,22422.52,22422.52,2571.56,2571.56,877.48,1694.08,0.0,0.0,0.0,Nov-2019,650.17,Dec-2019,...,21,4,19,0.0,0,0,3,96.8,25.0,0,0,357545,50720,24700,34245,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143031,153714644,,20000,20000,20000,60 months,20.00%,529.88,D,D2,Teen director,7 years,MORTGAGE,55000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,home_improvement,Home improvement,681xx,NE,25.73,0,Jul-1992,730,734,0,,,7,0,9773,82.1%,13,w,19193.94,19193.94,2097.3,2097.3,806.06,1291.24,0.0,0.0,0.0,Nov-2019,529.88,Dec-2019,...,4,2,7,0.0,0,0,0,100.0,100.0,0,0,64985,48490,11400,53085,40798.0,605.0,609.0,Feb-2001,1.0,3.0,13.0,80.8,3.0,10.0,1.0,0.0,5.0,N,,,,,,,,,,,,,,,N,,,,,,
143032,152956054,,26575,26575,26575,60 months,15.24%,635.58,C,C2,,< 1 year,RENT,100000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,900xx,CA,14.15,0,Jul-2002,715,719,0,35.0,,10,0,34813,49.6%,21,w,25359.78,25359.78,2519.82,2519.82,1215.22,1304.6,0.0,0.0,0.0,Nov-2019,635.58,Dec-2019,...,16,6,10,0.0,0,0,1,95.2,16.7,0,0,83385,35545,62200,13185,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143033,153350995,,10000,10000,10000,36 months,8.81%,317.12,A,A5,Program Coordinator,10+ years,MORTGAGE,52116.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,921xx,CA,22.45,0,Feb-1998,685,689,0,48.0,,14,0,6847,21.1%,30,w,9034.41,9034.41,1261.14,1261.14,965.59,295.55,0.0,0.0,0.0,Nov-2019,317.12,Dec-2019,...,17,6,14,0.0,0,0,3,83.3,0.0,0,0,371448,71652,21000,83759,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143034,153062400,,20000,20000,20000,60 months,22.50%,558.08,D,D3,Clerk,10+ years,RENT,48000.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,463xx,IN,27.83,2,Jun-2009,665,669,1,14.0,,8,0,4960,42.4%,21,w,19246.82,19246.82,2169.82,2169.82,753.18,1416.64,0.0,0.0,0.0,Nov-2019,558.08,Dec-2019,...,7,5,8,0.0,0,2,1,80.0,100.0,0,0,39493,21395,2400,27793,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [39]:
compare_df_columns(df_2007_2011, df_2019q3)

Number of columns: first df: 150, second df: 150
Columns from both dfs are identical: True


**NOTE: Data files appear to be consistent in format and column layout.**

[**back to top**](#toc)

<a id = 'agg'></a>
### 2. Aggregate data from all files into one

In [42]:
%%time
df_list = []
for i in range(1, len(files)):
    df = pd.read_csv((dpath+'{}'.format(files[i])), header = 1, skipfooter = 2, engine = 'python')
    df_list.append(df)

CPU times: user 6min 22s, sys: 12.4 s, total: 6min 35s
Wall time: 6min 44s


In [45]:
len(df_list)

19

In [46]:
%%time
df = pd.concat(df_list, ignore_index=True)

CPU times: user 14.4 s, sys: 7.34 s, total: 21.7 s
Wall time: 19.5 s


In [47]:
df.shape

(2650518, 150)

In [48]:
df.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,1077501,,5000.0,5000.0,4975.0,36 months,10.65%,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I need to upgra...,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-1985,735.0,739.0,1.0,,,3.0,0.0,13648.0,83.7%,9.0,f,0.0,0.0,5863.155187,5833.84,5000.0,863.16,0.0,0.0,0.0,Jan-2015,171.62,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,1077430,,2500.0,2500.0,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-2011,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0.0,Apr-1999,740.0,744.0,5.0,,,3.0,0.0,1687.0,9.4%,4.0,f,0.0,0.0,1014.53,1014.53,456.46,435.17,0.0,122.9,1.11,Apr-2013,119.66,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,1077175,,2400.0,2400.0,2400.0,36 months,15.96%,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,small_business,real estate business,606xx,IL,8.72,0.0,Nov-2001,735.0,739.0,2.0,,,2.0,0.0,2956.0,98.5%,10.0,f,0.0,0.0,3005.666844,3005.67,2400.0,605.67,0.0,0.0,0.0,Jun-2014,649.91,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,1076863,,10000.0,10000.0,10000.0,36 months,13.49%,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > to pay for prop...,other,personel,917xx,CA,20.0,0.0,Feb-1996,690.0,694.0,1.0,35.0,,10.0,0.0,5598.0,21%,37.0,f,0.0,0.0,12231.89,12231.89,10000.0,2214.92,16.97,0.0,0.0,Jan-2015,357.48,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,1075358,,3000.0,3000.0,3000.0,60 months,12.69%,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > I plan on combi...,other,Personal,972xx,OR,17.94,0.0,Jan-1996,695.0,699.0,0.0,38.0,,15.0,0.0,27783.0,53.9%,38.0,f,0.0,0.0,4066.908161,4066.91,3000.0,1066.91,0.0,0.0,0.0,Jan-2017,67.3,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [49]:
df_2007_2011.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,1077501,,5000.0,5000.0,4975.0,36 months,10.65%,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I need to upgra...,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-1985,735.0,739.0,1.0,,,3.0,0.0,13648.0,83.7%,9.0,f,0.0,0.0,5863.155187,5833.84,5000.0,863.16,0.0,0.0,0.0,Jan-2015,171.62,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
1,1077430,,2500.0,2500.0,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-2011,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0.0,Apr-1999,740.0,744.0,5.0,,,3.0,0.0,1687.0,9.4%,4.0,f,0.0,0.0,1014.53,1014.53,456.46,435.17,0.0,122.9,1.11,Apr-2013,119.66,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2,1077175,,2400.0,2400.0,2400.0,36 months,15.96%,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,,small_business,real estate business,606xx,IL,8.72,0.0,Nov-2001,735.0,739.0,2.0,,,2.0,0.0,2956.0,98.5%,10.0,f,0.0,0.0,3005.666844,3005.67,2400.0,605.67,0.0,0.0,0.0,Jun-2014,649.91,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
3,1076863,,10000.0,10000.0,10000.0,36 months,13.49%,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > to pay for prop...,other,personel,917xx,CA,20.0,0.0,Feb-1996,690.0,694.0,1.0,35.0,,10.0,0.0,5598.0,21%,37.0,f,0.0,0.0,12231.89,12231.89,10000.0,2214.92,16.97,0.0,0.0,Jan-2015,357.48,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
4,1075358,,3000.0,3000.0,3000.0,60 months,12.69%,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-2011,Fully Paid,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > I plan on combi...,other,Personal,972xx,OR,17.94,0.0,Jan-1996,695.0,699.0,0.0,38.0,,15.0,0.0,27783.0,53.9%,38.0,f,0.0,0.0,4066.908161,4066.91,3000.0,1066.91,0.0,0.0,0.0,Jan-2017,67.3,,...,,,,,,,,,,0.0,0.0,,,,,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [50]:
df.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
2650513,153709478,,23300.0,23300.0,23300.0,60 months,22.50%,650.17,D,D3,Service Representative,10+ years,MORTGAGE,60000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,336xx,FL,30.0,0.0,Aug-2000,695.0,699.0,0.0,,,19.0,0.0,24776.0,37.9%,31.0,w,22422.52,22422.52,2571.56,2571.56,877.48,1694.08,0.0,0.0,0.0,Nov-2019,650.17,Dec-2019,...,21.0,4.0,19.0,0.0,0.0,0.0,3.0,96.8,25.0,0.0,0.0,357545.0,50720.0,24700.0,34245.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2650514,153714644,,20000.0,20000.0,20000.0,60 months,20.00%,529.88,D,D2,Teen director,7 years,MORTGAGE,55000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,home_improvement,Home improvement,681xx,NE,25.73,0.0,Jul-1992,730.0,734.0,0.0,,,7.0,0.0,9773.0,82.1%,13.0,w,19193.94,19193.94,2097.3,2097.3,806.06,1291.24,0.0,0.0,0.0,Nov-2019,529.88,Dec-2019,...,4.0,2.0,7.0,0.0,0.0,0.0,0.0,100.0,100.0,0.0,0.0,64985.0,48490.0,11400.0,53085.0,40798.0,605.0,609.0,Feb-2001,1.0,3.0,13.0,80.8,3.0,10.0,1.0,0.0,5.0,N,,,,,,,,,,,,,,,N,,,,,,
2650515,152956054,,26575.0,26575.0,26575.0,60 months,15.24%,635.58,C,C2,,< 1 year,RENT,100000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,900xx,CA,14.15,0.0,Jul-2002,715.0,719.0,0.0,35.0,,10.0,0.0,34813.0,49.6%,21.0,w,25359.78,25359.78,2519.82,2519.82,1215.22,1304.6,0.0,0.0,0.0,Nov-2019,635.58,Dec-2019,...,16.0,6.0,10.0,0.0,0.0,0.0,1.0,95.2,16.7,0.0,0.0,83385.0,35545.0,62200.0,13185.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2650516,153350995,,10000.0,10000.0,10000.0,36 months,8.81%,317.12,A,A5,Program Coordinator,10+ years,MORTGAGE,52116.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,921xx,CA,22.45,0.0,Feb-1998,685.0,689.0,0.0,48.0,,14.0,0.0,6847.0,21.1%,30.0,w,9034.41,9034.41,1261.14,1261.14,965.59,295.55,0.0,0.0,0.0,Nov-2019,317.12,Dec-2019,...,17.0,6.0,14.0,0.0,0.0,0.0,3.0,83.3,0.0,0.0,0.0,371448.0,71652.0,21000.0,83759.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
2650517,153062400,,20000.0,20000.0,20000.0,60 months,22.50%,558.08,D,D3,Clerk,10+ years,RENT,48000.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,463xx,IN,27.83,2.0,Jun-2009,665.0,669.0,1.0,14.0,,8.0,0.0,4960.0,42.4%,21.0,w,19246.82,19246.82,2169.82,2169.82,753.18,1416.64,0.0,0.0,0.0,Nov-2019,558.08,Dec-2019,...,7.0,5.0,8.0,0.0,0.0,2.0,1.0,80.0,100.0,0.0,0.0,39493.0,21395.0,2400.0,27793.0,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [51]:
df_2019q3.tail()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,...,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_fico_range_low,sec_app_fico_range_high,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
143030,153709478,,23300,23300,23300,60 months,22.50%,650.17,D,D3,Service Representative,10+ years,MORTGAGE,60000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,336xx,FL,30.0,0,Aug-2000,695,699,0,,,19,0,24776,37.9%,31,w,22422.52,22422.52,2571.56,2571.56,877.48,1694.08,0.0,0.0,0.0,Nov-2019,650.17,Dec-2019,...,21,4,19,0.0,0,0,3,96.8,25.0,0,0,357545,50720,24700,34245,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143031,153714644,,20000,20000,20000,60 months,20.00%,529.88,D,D2,Teen director,7 years,MORTGAGE,55000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,home_improvement,Home improvement,681xx,NE,25.73,0,Jul-1992,730,734,0,,,7,0,9773,82.1%,13,w,19193.94,19193.94,2097.3,2097.3,806.06,1291.24,0.0,0.0,0.0,Nov-2019,529.88,Dec-2019,...,4,2,7,0.0,0,0,0,100.0,100.0,0,0,64985,48490,11400,53085,40798.0,605.0,609.0,Feb-2001,1.0,3.0,13.0,80.8,3.0,10.0,1.0,0.0,5.0,N,,,,,,,,,,,,,,,N,,,,,,
143032,152956054,,26575,26575,26575,60 months,15.24%,635.58,C,C2,,< 1 year,RENT,100000.0,Not Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,900xx,CA,14.15,0,Jul-2002,715,719,0,35.0,,10,0,34813,49.6%,21,w,25359.78,25359.78,2519.82,2519.82,1215.22,1304.6,0.0,0.0,0.0,Nov-2019,635.58,Dec-2019,...,16,6,10,0.0,0,0,1,95.2,16.7,0,0,83385,35545,62200,13185,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143033,153350995,,10000,10000,10000,36 months,8.81%,317.12,A,A5,Program Coordinator,10+ years,MORTGAGE,52116.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,921xx,CA,22.45,0,Feb-1998,685,689,0,48.0,,14,0,6847,21.1%,30,w,9034.41,9034.41,1261.14,1261.14,965.59,295.55,0.0,0.0,0.0,Nov-2019,317.12,Dec-2019,...,17,6,14,0.0,0,0,3,83.3,0.0,0,0,371448,71652,21000,83759,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,
143034,153062400,,20000,20000,20000,60 months,22.50%,558.08,D,D3,Clerk,10+ years,RENT,48000.0,Verified,Jul-2019,Current,n,https://lendingclub.com/browse/loanDetail.acti...,,debt_consolidation,Debt consolidation,463xx,IN,27.83,2,Jun-2009,665,669,1,14.0,,8,0,4960,42.4%,21,w,19246.82,19246.82,2169.82,2169.82,753.18,1416.64,0.0,0.0,0.0,Nov-2019,558.08,Dec-2019,...,7,5,8,0.0,0,2,1,80.0,100.0,0,0,39493,21395,2400,27793,,,,,,,,,,,,,,N,,,,,,,,,,,,,,,N,,,,,,


In [46]:
%%time
df.to_csv('./data/master_loan_datav2019q3.csv', index = False)

CPU times: user 7min 1s, sys: 7.6 s, total: 7min 9s
Wall time: 7min 13s


[**back to top**](#toc)

<a id = 'notes'></a>

## 3. Notes

1. Individual data files:
    - header = 1, skipfooter = 2, engine = 'python'
    - columns are aligned with each other
    
2. Raw data: > 2.6 million data points, 150 features
    - too big for local machine to train, consider cloud service, e.g., AWS
    - may start with sample datasets on local machine for ideas
---
**End of current notebook**

**[back to top](#toc)**   

---
**Previous notebook**: **[part 0a - webScrape-dataCollection](proj-classification-loanDefault-p0a-webScrape-dataCollection-max-v2019Dec.ipynb)**

**Next notebook**: **[part 1 - EDA](proj-classification-loanDefault-p1-EDA-max-v2020Jan.ipynb)**