# Loan Data Exploration
This document explores a dataset containing 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. Dataset comes from "Prosper" peer-to-peer lending platform.


### Question to ask.
What are the characteristics of a defaulter?

In [13]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import datetime as dt

In [None]:
df_total = pd.read_csv("prosperLoanData.csv")

In [None]:
pd.set_option("display.max_columns", len(df_total.columns))
df_total.head()

## Overview of Data 

In [None]:
df_total.shape

In [None]:
df_total.info()

#### We can see this Dataset is having 81 Columns. It is not possible to evaulate all the columns simultaneously. So we wiill firs take only the variables which might be helpful to our analysis. Also most of the variables are Numeric in Nature.

In [10]:
#Columns to be discarded 
discarded_vars = ["MemberKey","LP_CustomerPayments","LP_CustomerPrincipalPayments","LP_InterestandFees",
                  "LP_ServiceFees","LP_CollectionFees","LP_GrossPrincipalLoss","LP_NetPrincipalLoss",
                  "LP_NonPrincipalRecoverypayments","LoanOriginationQuarter","LoanOriginationDate",
                  "LoanNumber","LoanMonthsSinceOrigination","LoanFirstDefaultedCycleNumber",
                  "LoanCurrentDaysDelinquent","ScorexChangeAtTimeOfListing",
                 "LoanKey","ListingKey","ListingNumber","ListingCreationDate",
                 "CreditGrade","AmountDelinquent","BorrowerState","ClosedDate","CurrentDelinquencies",
                  "GroupKey","TotalTrades","TradesNeverDelinquent (percentage)","DateCreditPulled",
                 "DelinquenciesLast7Years","EstimatedEffectiveYield","EstimatedLoss","EstimatedReturn",
                 "FirstRecordedCreditLine","OnTimeProsperPayments","OpenCreditLines","TradesOpenedLast6Months",
                 "TotalInquiries"]

In [11]:
df_total.drop(discarded_vars,axis =1,inplace=True)

KeyError: "labels ['MemberKey' 'LP_CustomerPayments' 'LP_CustomerPrincipalPayments'\n 'LP_InterestandFees' 'LP_ServiceFees' 'LP_CollectionFees'\n 'LP_GrossPrincipalLoss' 'LP_NetPrincipalLoss'\n 'LP_NonPrincipalRecoverypayments' 'LoanOriginationQuarter'\n 'LoanOriginationDate' 'LoanNumber' 'LoanMonthsSinceOrigination'\n 'LoanFirstDefaultedCycleNumber' 'LoanCurrentDaysDelinquent'\n 'ScorexChangeAtTimeOfListing' 'LoanKey' 'ListingKey' 'ListingNumber'\n 'ListingCreationDate' 'CreditGrade' 'AmountDelinquent' 'BorrowerState'\n 'ClosedDate' 'CurrentDelinquencies' 'GroupKey' 'TotalTrades'\n 'TradesNeverDelinquent (percentage)' 'DateCreditPulled'\n 'DelinquenciesLast7Years' 'EstimatedEffectiveYield' 'EstimatedLoss'\n 'EstimatedReturn' 'FirstRecordedCreditLine' 'OnTimeProsperPayments'\n 'OpenCreditLines' 'TradesOpenedLast6Months' 'TotalInquiries'] not contained in axis"

In [12]:
df_total.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113937 entries, 0 to 113936
Data columns (total 42 columns):
Term                                   113937 non-null int64
LoanStatus                             113937 non-null object
BorrowerAPR                            113937 non-null float64
BorrowerRate                           113937 non-null float64
LenderYield                            113937 non-null float64
ProsperRating (numeric)                84853 non-null float64
ProsperRating (Alpha)                  84853 non-null object
ProsperScore                           84853 non-null float64
ListingCategory (numeric)              113937 non-null int64
Occupation                             113937 non-null object
EmploymentStatus                       113937 non-null object
EmploymentStatusDuration               113937 non-null float64
IsBorrowerHomeowner                    113937 non-null bool
CurrentlyInGroup                       113937 non-null bool
CreditScoreRangeLower   

## Data Wrangling

In [27]:
#ListingCategory (numeric)

In [10]:
# Removing PercentFunded not 1 

### Filling Missing Values

In [9]:
df_total.AvailableBankcardCredit.fillna(0,inplace= True)
df_total.BankcardUtilization.fillna(0,inplace= True)
df_total.BorrowerAPR.fillna(0,inplace= True)
df_total.CurrentCreditLines.fillna(0,inplace= True)
df_total.EmploymentStatus.fillna('Not available',inplace= True)
df_total.EmploymentStatusDuration.fillna(0,inplace= True)
df_total.InquiriesLast6Months.fillna(0,inplace= True)
df_total.Occupation.fillna("Other",inplace= True)
df_total.ProsperPaymentsLessThanOneMonthLate.fillna(0,inplace= True)
df_total.ProsperPaymentsOneMonthPlusLate.fillna(0,inplace= True)
df_total.TotalProsperLoans.fillna(0,inplace= True)
df_total.RevolvingCreditBalance.fillna(0,inplace= True)
df_total.ProsperPrincipalBorrowed.fillna(0,inplace= True)
df_total.ProsperPrincipalOutstanding.fillna(0,inplace= True)

AttributeError: 'DataFrame' object has no attribute 'ProsperPrincipalOutstanding'

In [15]:
df_total.DebtToIncomeRatio.isna().sum()

8554

In [165]:
def inspect(df,col):
    print(df[col].isna().sum())
    print(df[col].head())

In [218]:
inspect(df,"ProsperPrincipalOutstanding")

91852
0       NaN
1       NaN
2       NaN
3       NaN
4    9947.9
Name: ProsperPrincipalOutstanding, dtype: float64


### Checking Percentage of Missing Values

In [22]:
df_total.isnull().sum()[:40]/len(df_total)

ListingKey                     0.000000
ListingNumber                  0.000000
ListingCreationDate            0.000000
CreditGrade                    0.745886
Term                           0.000000
LoanStatus                     0.000000
ClosedDate                     0.516496
BorrowerAPR                    0.000219
BorrowerRate                   0.000000
LenderYield                    0.000000
EstimatedEffectiveYield        0.255264
EstimatedLoss                  0.255264
EstimatedReturn                0.255264
ProsperRating (numeric)        0.255264
ProsperRating (Alpha)          0.255264
ProsperScore                   0.255264
ListingCategory (numeric)      0.000000
BorrowerState                  0.048404
Occupation                     0.031491
EmploymentStatus               0.019792
EmploymentStatusDuration       0.066923
IsBorrowerHomeowner            0.000000
CurrentlyInGroup               0.000000
GroupKey                       0.882909
DateCreditPulled               0.000000


In [23]:
df_total.isnull().sum()[40:]/len(df_total)

RevolvingCreditBalance                 0.066739
BankcardUtilization                    0.066739
AvailableBankcardCredit                0.066212
TotalTrades                            0.066212
TradesNeverDelinquent (percentage)     0.066212
TradesOpenedLast6Months                0.066212
DebtToIncomeRatio                      0.075077
IncomeRange                            0.000000
IncomeVerifiable                       0.000000
StatedMonthlyIncome                    0.000000
LoanKey                                0.000000
TotalProsperLoans                      0.806165
TotalProsperPaymentsBilled             0.806165
OnTimeProsperPayments                  0.806165
ProsperPaymentsLessThanOneMonthLate    0.806165
ProsperPaymentsOneMonthPlusLate        0.806165
ProsperPrincipalBorrowed               0.806165
ProsperPrincipalOutstanding            0.806165
ScorexChangeAtTimeOfListing            0.833873
LoanCurrentDaysDelinquent              0.000000
LoanFirstDefaultedCycleNumber          0