# Effects of  Prosper Loan Characteristics on Their Outcomes
## by Samuel Aderemi

## Investigation Overview


> In this investigation, I wanted to look at the factors that affect the loan status of Prosper loans. The main focus was on the borrower's rate and rating systems (CreditGrade & ProsperRating)


## Dataset Overview

> The data consisted of loan statuses and attributes of approximately 113,937 enlisted loans. The attributes included stated monthly income, employment duration as well as listing category for each loan. Eighty-one of the data points were removed from the analysis due to duplicated records

In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [None]:
# load in the dataset into a pandas dataframe
loans  =  pd.read_csv('prosperLoanData.csv')
loans_sub  = loans.copy()
loans_sub = loans_sub[["ListingKey", "LenderYield", "EmploymentStatus", "ListingCategory (numeric)", "StatedMonthlyIncome", "AmountDelinquent", "IncomeRange", "TotalProsperLoans", "EmploymentStatusDuration", "LoanStatus", "CreditGrade", "Term", "BorrowerRate", 'ProsperRating (Alpha)']]
loans_sub = loans_sub[~(loans_sub.ListingKey.duplicated())]
loans_sub.ListingKey.duplicated().sum()

In [None]:
def rename_listing_cat_values(loans_df):
    """Creates listing category column that uses the original reasons for listing"""

    values_dict = {0:'Not Available', 1:'Debt Consolidation', 2:'Home Improvement', 3:'Business',
                    4:'Personal Loan', 5:'Student Use', 6:'Auto', 7:'Other',
                    8:'Baby&Adoption', 9:'Boat', 10:'Cosmetic Procedure', 11:'Engagement Ring',
                    12:'Green Loans', 13:'Household Expenses', 14:'Large Purchases', 15:'Medical/Dental',
                    16:'Motorcycle', 17:'RV', 18:'Taxes', 19:'Vacation', 20:'Wedding Loans'}

    for value in values_dict:
        if loans_df['ListingCategory (numeric)'] == value:
            return values_dict[value]
        


loans_sub['ListingCategory'] = loans_sub.apply(rename_listing_cat_values, axis=1)

In [None]:
# making of categorical columns 
ordinal_var_dict = {'IncomeRange': ['Not employed', 'Not displayed', '$0', '$1-24,999', '$25,000-49,999', '$50,000-74,999', '$75,000-99,999', '$100,000+'],
                    'ProsperRating (Alpha)': ['AA', 'A', 'B', 'C', 'D', 'E', 'HR'],
                    'CreditGrade': ['AA', 'A', 'B', 'C', 'D', 'E', 'HR', 'NC']
                        }

for var in ordinal_var_dict:
    ordered_var = pd.api.types.CategoricalDtype(ordered=True, categories=ordinal_var_dict[var])

    loans_sub[var] = loans_sub[var].astype(ordered_var)

loans_sub.EmploymentStatus = loans_sub.EmploymentStatus.astype('category')
loans_sub.ListingCategory = loans_sub.ListingCategory.astype('category')

## Distribution of Monthly Income

> Monthly Income in the dataset take on a very large range of values, from about \$0 at the lowest, to about \$1.75M at the highest. Plotted on a logarithmic scale, the distribution of monthly income is somewhat normally distributed

In [None]:
log_binsize = 0.05
bins = 10 ** np.arange(0, np.log(loans_sub['StatedMonthlyIncome'].max())+log_binsize, log_binsize)

plt.figure(figsize=[8,5])
plt.hist(data=loans_sub, x='StatedMonthlyIncome', bins=bins)
plt.xscale('log')
plt.xlim([80, loans_sub['StatedMonthlyIncome'].max()+log_binsize])
plt.xticks([100, 200, 500, 1e3, 2e3, 5e3, 1e4, 2e4, 5e4, 1e5, 2e5, 5e5, 1e6, 2e6], [100, 200, 500, '1K', '2K', '5K', '1OK', '20K', '50K', '100K', '200K', '500K', '1M', '2M'])
plt.xlabel('Monthly Income ($)')
plt.ylabel('Count')
plt.title('Distribution of Monthly Income')
plt.show()

## Loan Status vs. Borrower Rate

> Payments made in due time and those that that are already in succession to be completed have borrower rates below 0.20, while those which are defaulting have rates above 0.20

In [None]:
plt.figure(figsize=[12,8])
sns.barplot(data=loans_sub, x='BorrowerRate', y='LoanStatus', color=sns.color_palette()[5])
plt.title('Loan Status vs. Borrower Rate');

## Borrower Rate vs. Prosper Rating
There is a steady decline in borrowers rate as the level of classified risk decreases. Minimal risk have rates slightly above 0.15, intermediates, slightly above 0.25 while the much risky investments above 0.25


In [None]:
plt.figure(figsize=[12,8])
sns.barplot(data=loans_sub, x='ProsperRating (Alpha)', y='BorrowerRate', palette=sns.color_palette('viridis', 9))
plt.title('Borrower Rate vs. Prosper Rating');

## Mean Borrower Rates for Loan Status and Credit Grade

> Plotting only valid values of Credit Grade and grouping the mean of Borrower Rate for both Loan Status and Credit Grade produces a clear relationship of all variables. Lower risk investment and show minimum values borrower rates for completed payment. The completed loan status does show lower borrower rates across all risk classification, though increases along the level of rated risk. Other loan Statuses: Charged off and Defaulted show a higher disribution of mean borrower rates and increase in the order of rated risk



In [None]:
credit_grade_df = loans_sub.loc[loans_sub['LoanStatus'].isin(['Cancelled', 'Chargedoff', 'Completed', 'Defaulted'])]
borrow_rate_means = credit_grade_df.groupby(['LoanStatus', 'CreditGrade']).mean()['BorrowerRate']
borrow_rate_means = borrow_rate_means.reset_index(name = 'BorrowRate_avg')
borrow_rate_means = borrow_rate_means.pivot(index = 'LoanStatus', columns = 'CreditGrade',
                            values = 'BorrowRate_avg')
sns.heatmap(borrow_rate_means, annot = True, fmt = '.3f',
           cbar_kws = {'label' : 'mean(borrower_rate_avg)'})

plt.title('Matrix of Loan Status & Credit Grade for Mean Borrower Rate');

In [None]:
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt