# Exploratory Data Analysis: Loan Payments

The aim of this project is to conduct exploratory data analysis (EDA) on a database of loan payments for a financial institution.

The code block below initialises the SQL table we will be using to perform this analysis as a Pandas dataframe using the 'db_utils.py' script included in the root directory of this project.

In [1]:
import db_utils as dbu
import pandas as pd

credentials = dbu.load_yaml('credentials.yaml')
data = dbu.RDSDatabaseConnector(credentials)
data.start_sqlalchemy_engine()
df = data.get_data('loan_payments') # turning SQL table into a Pandas dataframe
pd.set_option('display.max_columns', 50) # SQL table has 43 columns, pandas shows default 9

In [2]:
# Raw dataframe info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54231 entries, 0 to 54230
Data columns (total 43 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           54231 non-null  int64  
 1   member_id                    54231 non-null  int64  
 2   loan_amount                  54231 non-null  int64  
 3   funded_amount                51224 non-null  float64
 4   funded_amount_inv            54231 non-null  float64
 5   term                         49459 non-null  object 
 6   int_rate                     49062 non-null  float64
 7   instalment                   54231 non-null  float64
 8   grade                        54231 non-null  object 
 9   sub_grade                    54231 non-null  object 
 10  employment_length            52113 non-null  object 
 11  home_ownership               54231 non-null  object 
 12  annual_inc                   54231 non-null  float64
 13  verification_sta

In [3]:
# Raw dataframe sample
df.head(10)

Unnamed: 0,id,member_id,loan_amount,funded_amount,funded_amount_inv,term,int_rate,instalment,grade,sub_grade,employment_length,home_ownership,annual_inc,verification_status,issue_date,loan_status,payment_plan,purpose,dti,delinq_2yrs,earliest_credit_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_accounts,total_accounts,out_prncp,out_prncp_inv,total_payment,total_payment_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_payment_date,last_payment_amount,next_payment_date,last_credit_pull_date,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type
0,38676116,41461848,8000,8000.0,8000.0,36 months,7.49,248.82,A,A4,5 years,MORTGAGE,46000.0,Not Verified,Jan-2021,Current,n,credit_card,19.54,2,Oct-1987,1,5.0,,12,27,5529.7,5529.7,2982.51,2982.51,2470.3,512.21,0.0,0.0,0.0,Jan-2022,248.82,Feb-2022,Jan-2022,0.0,5.0,1,INDIVIDUAL
1,38656203,41440010,13200,13200.0,13200.0,36 months,6.99,407.52,A,A3,9 years,RENT,50000.0,Not Verified,Jan-2021,Current,n,credit_card,24.2,0,Sep-2001,0,,,15,31,9102.83,9102.83,4885.11,4885.11,4097.17,787.94,0.0,0.0,0.0,Jan-2022,407.52,Feb-2022,Jan-2022,0.0,,1,INDIVIDUAL
2,38656154,41439961,16000,16000.0,16000.0,36 months,7.49,497.63,A,A4,8 years,MORTGAGE,73913.0,Source Verified,Jan-2021,Fully Paid,n,credit_card,16.92,0,Sep-1998,0,69.0,,7,18,0.0,0.0,16824.54,16824.54,16000.0,824.54,0.0,0.0,0.0,Oct-2021,12850.16,,Oct-2021,0.0,,1,INDIVIDUAL
3,38656128,41439934,15000,15000.0,15000.0,36 months,14.31,514.93,C,C4,1 year,RENT,42000.0,Source Verified,Jan-2021,Fully Paid,n,debt_consolidation,35.52,0,Jun-2008,0,74.0,,6,13,0.0,0.0,15947.47,15947.47,15000.0,947.47,0.0,0.0,0.0,Jun-2021,13899.67,,Jun-2021,0.0,,1,INDIVIDUAL
4,38656121,41439927,15000,15000.0,15000.0,36 months,6.03,456.54,A,A1,10+ years,MORTGAGE,145000.0,Verified,Jan-2021,Current,n,debt_consolidation,3.33,0,Apr-2002,1,37.0,,23,50,10297.47,10297.47,5473.46,5473.46,4702.53,770.93,0.0,0.0,0.0,Jan-2022,456.54,Feb-2022,Jan-2022,0.0,,1,INDIVIDUAL
5,38656111,41439917,2525,2525.0,2525.0,36 months,11.44,83.2,B,B4,< 1 year,OWN,32000.0,Source Verified,Jan-2021,Current,n,home_improvement,6.6,1,Mar-2011,0,8.0,,3,4,1842.68,1842.68,913.6,913.6,682.32,231.28,0.0,0.0,0.0,Jan-2022,91.39,Feb-2022,Jan-2022,0.0,,1,INDIVIDUAL
6,38656110,41439916,6675,6675.0,6675.0,,21.99,254.89,E,E5,,RENT,13536.0,Verified,Jan-2021,Fully Paid,n,debt_consolidation,16.13,0,Nov-2006,2,,,3,4,0.0,0.0,6963.53,6963.53,6675.0,288.53,0.0,0.0,0.0,Mar-2021,6724.95,,Mar-2021,0.0,,1,INDIVIDUAL
7,38656067,41439872,26500,26500.0,26200.0,,19.99,701.95,E,E3,< 1 year,RENT,78000.0,Source Verified,Jan-2021,Charged Off,n,debt_consolidation,13.71,0,Mar-2001,0,43.0,,10,37,0.0,0.0,4182.27,4134.92,1197.35,2984.92,0.0,0.0,0.0,Aug-2021,701.95,,Jan-2022,0.0,43.0,1,INDIVIDUAL
8,38656063,41439868,10000,10000.0,10000.0,60 months,12.99,227.48,C,C2,< 1 year,RENT,50048.0,Source Verified,Jan-2021,Current,n,credit_card,20.67,0,Nov-2005,0,,,8,11,8480.91,8480.91,2722.54,2722.54,1519.09,1203.45,0.0,0.0,0.0,Jan-2022,227.48,Feb-2022,Jan-2022,0.0,,1,INDIVIDUAL
9,38656052,41439857,10000,,10000.0,36 months,8.19,314.25,A,A5,10+ years,MORTGAGE,103000.0,Not Verified,Jan-2021,Current,n,credit_card,15.95,0,Feb-2002,1,35.0,,14,35,6934.63,6934.63,3766.45,3766.45,3065.37,701.08,0.0,0.0,0.0,Jan-2022,314.25,Feb-2022,Jan-2022,0.0,,1,INDIVIDUAL


#### Columns with null data 
- funded_amount,
- term,
- int_rate,
- employment_length,
- mths_since_last_delinq,
- mths_since_last_record,
- last_payment_date,
- next_payment_date,
- last_credit_pull_date,
- collections_12_mths_ex_med,
- mths_since_last_major_derog 

#### Columns to Convert
1. CATEGORICAL DATA
- term: category OK, null data present, probably better as int in case need to use in calculations later
- grade: category OK
- sub_grade: category OK
- employment_length: category OK, null data present
- home_ownership: category OK
- verification_status: category OK
- loan_status: category OK
- payment_plan: category OK, but 99.9% of data is n, only 1 response y, potential unnecessary column
- purpose: category OK
- application_type: category OK, but unnecessary column since only 1 type of response
2. DATE DATA
- issue_date: period
- earliest_credit_line: period
- last_payment_date: period
- next_payment_date: period
- last_credit_pull_date: period


In [4]:
# Converting categorical data columns from object to category data type
category_columns = ["grade", "sub_grade", "employment_length", "home_ownership", "verification_status", "loan_status", "payment_plan", "purpose", "application_type"]
convert_to_categories = dbu.DataTransform(df, category_columns)
df = convert_to_categories.to_category()

In [5]:
# Converting columns containing dates from object(month, year) to datetime64(yyyy-mm-dd) data type
date_columns = ["issue_date", "earliest_credit_line", "last_payment_date", "next_payment_date", "last_credit_pull_date"]
convert_to_datetime = dbu.DataTransform(df, date_columns)
df = convert_to_datetime.to_datetime()

In [6]:
# Renaming term column to prevent confusion later
df = df.rename(columns={"term": "term_in_months"})

# Removing "months" from each row
df["term_in_months"] = df["term_in_months"].str.replace(" months", "")

In [7]:
# Converting term column from object to int
term = ["term_in_months"]
convert_to_int = dbu.DataTransform(df, "term_in_months")
df = convert_to_int.to_int()

In [8]:
# Dataframe info post-conversion
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54231 entries, 0 to 54230
Data columns (total 43 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   id                           54231 non-null  int64         
 1   member_id                    54231 non-null  int64         
 2   loan_amount                  54231 non-null  int64         
 3   funded_amount                51224 non-null  float64       
 4   funded_amount_inv            54231 non-null  float64       
 5   term_in_months               49459 non-null  float64       
 6   int_rate                     49062 non-null  float64       
 7   instalment                   54231 non-null  float64       
 8   grade                        54231 non-null  category      
 9   sub_grade                    54231 non-null  category      
 10  employment_length            52113 non-null  category      
 11  home_ownership               54231 non-nu

In [9]:
# Dataframe sample post-conversion
df.head(10)

Unnamed: 0,id,member_id,loan_amount,funded_amount,funded_amount_inv,term_in_months,int_rate,instalment,grade,sub_grade,employment_length,home_ownership,annual_inc,verification_status,issue_date,loan_status,payment_plan,purpose,dti,delinq_2yrs,earliest_credit_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_accounts,total_accounts,out_prncp,out_prncp_inv,total_payment,total_payment_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_payment_date,last_payment_amount,next_payment_date,last_credit_pull_date,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type
0,38676116,41461848,8000,8000.0,8000.0,36.0,7.49,248.82,A,A4,5 years,MORTGAGE,46000.0,Not Verified,2021-01-01,Current,n,credit_card,19.54,2,1987-10-01,1,5.0,,12,27,5529.7,5529.7,2982.51,2982.51,2470.3,512.21,0.0,0.0,0.0,2022-01-01,248.82,2022-02-01,2022-01-01,0.0,5.0,1,INDIVIDUAL
1,38656203,41440010,13200,13200.0,13200.0,36.0,6.99,407.52,A,A3,9 years,RENT,50000.0,Not Verified,2021-01-01,Current,n,credit_card,24.2,0,2001-09-01,0,,,15,31,9102.83,9102.83,4885.11,4885.11,4097.17,787.94,0.0,0.0,0.0,2022-01-01,407.52,2022-02-01,2022-01-01,0.0,,1,INDIVIDUAL
2,38656154,41439961,16000,16000.0,16000.0,36.0,7.49,497.63,A,A4,8 years,MORTGAGE,73913.0,Source Verified,2021-01-01,Fully Paid,n,credit_card,16.92,0,1998-09-01,0,69.0,,7,18,0.0,0.0,16824.54,16824.54,16000.0,824.54,0.0,0.0,0.0,2021-10-01,12850.16,NaT,2021-10-01,0.0,,1,INDIVIDUAL
3,38656128,41439934,15000,15000.0,15000.0,36.0,14.31,514.93,C,C4,1 year,RENT,42000.0,Source Verified,2021-01-01,Fully Paid,n,debt_consolidation,35.52,0,2008-06-01,0,74.0,,6,13,0.0,0.0,15947.47,15947.47,15000.0,947.47,0.0,0.0,0.0,2021-06-01,13899.67,NaT,2021-06-01,0.0,,1,INDIVIDUAL
4,38656121,41439927,15000,15000.0,15000.0,36.0,6.03,456.54,A,A1,10+ years,MORTGAGE,145000.0,Verified,2021-01-01,Current,n,debt_consolidation,3.33,0,2002-04-01,1,37.0,,23,50,10297.47,10297.47,5473.46,5473.46,4702.53,770.93,0.0,0.0,0.0,2022-01-01,456.54,2022-02-01,2022-01-01,0.0,,1,INDIVIDUAL
5,38656111,41439917,2525,2525.0,2525.0,36.0,11.44,83.2,B,B4,< 1 year,OWN,32000.0,Source Verified,2021-01-01,Current,n,home_improvement,6.6,1,2011-03-01,0,8.0,,3,4,1842.68,1842.68,913.6,913.6,682.32,231.28,0.0,0.0,0.0,2022-01-01,91.39,2022-02-01,2022-01-01,0.0,,1,INDIVIDUAL
6,38656110,41439916,6675,6675.0,6675.0,,21.99,254.89,E,E5,,RENT,13536.0,Verified,2021-01-01,Fully Paid,n,debt_consolidation,16.13,0,2006-11-01,2,,,3,4,0.0,0.0,6963.53,6963.53,6675.0,288.53,0.0,0.0,0.0,2021-03-01,6724.95,NaT,2021-03-01,0.0,,1,INDIVIDUAL
7,38656067,41439872,26500,26500.0,26200.0,,19.99,701.95,E,E3,< 1 year,RENT,78000.0,Source Verified,2021-01-01,Charged Off,n,debt_consolidation,13.71,0,2001-03-01,0,43.0,,10,37,0.0,0.0,4182.27,4134.92,1197.35,2984.92,0.0,0.0,0.0,2021-08-01,701.95,NaT,2022-01-01,0.0,43.0,1,INDIVIDUAL
8,38656063,41439868,10000,10000.0,10000.0,60.0,12.99,227.48,C,C2,< 1 year,RENT,50048.0,Source Verified,2021-01-01,Current,n,credit_card,20.67,0,2005-11-01,0,,,8,11,8480.91,8480.91,2722.54,2722.54,1519.09,1203.45,0.0,0.0,0.0,2022-01-01,227.48,2022-02-01,2022-01-01,0.0,,1,INDIVIDUAL
9,38656052,41439857,10000,,10000.0,36.0,8.19,314.25,A,A5,10+ years,MORTGAGE,103000.0,Not Verified,2021-01-01,Current,n,credit_card,15.95,0,2002-02-01,1,35.0,,14,35,6934.63,6934.63,3766.45,3766.45,3065.37,701.08,0.0,0.0,0.0,2022-01-01,314.25,2022-02-01,2022-01-01,0.0,,1,INDIVIDUAL
