<a href="https://colab.research.google.com/github/bthodla/danano/blob/master/prj5/exploration_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prosper Loan Data
## by Bhasker Thodla

## Preliminary Wrangling

> The dataset includes about 114,000 loan records from Prosper Loans, a company founded in 2005 to facilitate peer-to-peer lending in the US. The loan data is duly obfuscated to hide the identities of both borrowers and lenders and contains no peronally identifiable information (PII) to protect the privacy of the participants.

> The loan data provided ranges over a period from Nov 2005 to Mar 2014 (by loan origination date)

> The data is mostly clean although there are missing values in several fields

> No data wrangling has been done on this data



In [0]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from google.colab import drive
%matplotlib inline

In [7]:
# Loading the Prosper Loan data saved in Google Drive
from google.colab import drive

drive.mount('/content/gdrive')
pld_full_df = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/data_visualization/prj5/prosperLoanData.csv')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
# Some cleanup
# We will begin with renaming some columns

pld_full_df = pld_full_df.rename(columns = {'ProsperRating (numeric)': 'ProsperRating', 'ProsperRating (Alpha)': 'ProsperRatingStr', 'ListingCategory (numeric)': 'ListingCategory', 'TradesNeverDelinquent (percentage)': 'TradesNeverDelinquentPct'})

# We will then pick a subset of columns for analysis and exclude the rest by creating a new dataframe
sel_columns = [
'ListingKey',
'ListingCreationDate',
'CreditGrade',
'Term',
'LoanStatus',
'BorrowerAPR',
'BorrowerRate',
'LenderYield',
'EstimatedEffectiveYield',
'EstimatedLoss',
'EstimatedReturn',
'ProsperRating',
'ProsperRatingStr',
'ProsperScore',
'ListingCategory',
'BorrowerState',
'Occupation',
'EmploymentStatus',
'EmploymentStatusDuration',
'IsBorrowerHomeowner',
'CreditScoreRangeLower',
'CreditScoreRangeUpper',
'FirstRecordedCreditLine',
'CurrentCreditLines',
'OpenCreditLines',
'TotalCreditLinespast7years',
'OpenRevolvingAccounts',
'OpenRevolvingMonthlyPayment',
'InquiriesLast6Months',
'TotalInquiries',
'CurrentDelinquencies',
'AmountDelinquent',
'DelinquenciesLast7Years',
'PublicRecordsLast10Years',
'PublicRecordsLast12Months',
'RevolvingCreditBalance',
'BankcardUtilization',
'AvailableBankcardCredit',
'TotalTrades',
'TradesNeverDelinquentPct',
'TradesOpenedLast6Months',
'DebtToIncomeRatio',
'IncomeRange',
'StatedMonthlyIncome',
'TotalProsperLoans',
'TotalProsperPaymentsBilled',
'OnTimeProsperPayments',
'ProsperPaymentsLessThanOneMonthLate',
'ProsperPaymentsOneMonthPlusLate',
'ProsperPrincipalBorrowed',
'ProsperPrincipalOutstanding',
'LoanCurrentDaysDelinquent',
'LoanMonthsSinceOrigination',
'LoanOriginalAmount',
'LoanOriginationDate',
'LoanOriginationQuarter',
'MonthlyLoanPayment',
'LP_ServiceFees',
'LP_CollectionFees',
'LP_GrossPrincipalLoss',
'LP_NetPrincipalLoss',
'LP_NonPrincipalRecoverypayments',
'Recommendations',
'InvestmentFromFriendsCount',
'InvestmentFromFriendsAmount',
'Investors'
]

pld_df = pld_full_df[sel_columns]

In [12]:
# Some preliminary statistics about the data
print ('%s %s' % ('Number of rows and columns: ', pld_df.shape))
print (pld_df.describe())

Number of rows and columns:  (113937, 66)
                Term    BorrowerAPR   BorrowerRate    LenderYield  \
count  113937.000000  113912.000000  113937.000000  113937.000000   
mean       40.830248       0.218828       0.192764       0.182701   
std        10.436212       0.080364       0.074818       0.074516   
min        12.000000       0.006530       0.000000      -0.010000   
25%        36.000000       0.156290       0.134000       0.124200   
50%        36.000000       0.209760       0.184000       0.173000   
75%        36.000000       0.283810       0.250000       0.240000   
max        60.000000       0.512290       0.497500       0.492500   

       EstimatedEffectiveYield  EstimatedLoss  EstimatedReturn  ProsperRating  \
count             84853.000000   84853.000000     84853.000000   84853.000000   
mean                  0.168661       0.080306         0.096068       4.072243   
std                   0.068467       0.046764         0.030403       1.673227   
min         

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

### What is the structure of your dataset?

> Your answer here!

### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!