# Part I - Exploring Loan Data from Prosper
## by Aly Mobarak

## Introduction
> This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. It comes from Prosper, an institution that introduced U.S. consumers to an innovative new approach to personal finance called peer-to-peer lending. Almost twenty years later, Prosper has helped over 1.7 million customers achieve financial well-being through a comprehensive suite of products.


## Preliminary Wrangling


In [3]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

In [5]:
loans = pd.read_csv('loans.csv')

In [10]:
loans.shape

(113937, 81)

In [9]:
loans.head(5)

Unnamed: 0,ListingKey,ListingNumber,ListingCreationDate,CreditGrade,Term,LoanStatus,ClosedDate,BorrowerAPR,BorrowerRate,LenderYield,...,LP_ServiceFees,LP_CollectionFees,LP_GrossPrincipalLoss,LP_NetPrincipalLoss,LP_NonPrincipalRecoverypayments,PercentFunded,Recommendations,InvestmentFromFriendsCount,InvestmentFromFriendsAmount,Investors
0,1021339766868145413AB3B,193129,2007-08-26 19:09:29.263000000,C,36,Completed,2009-08-14 00:00:00,0.16516,0.158,0.138,...,-133.18,0.0,0.0,0.0,0.0,1.0,0,0,0.0,258
1,10273602499503308B223C1,1209647,2014-02-27 08:28:07.900000000,,36,Current,,0.12016,0.092,0.082,...,0.0,0.0,0.0,0.0,0.0,1.0,0,0,0.0,1
2,0EE9337825851032864889A,81716,2007-01-05 15:00:47.090000000,HR,36,Completed,2009-12-17 00:00:00,0.28269,0.275,0.24,...,-24.2,0.0,0.0,0.0,0.0,1.0,0,0,0.0,41
3,0EF5356002482715299901A,658116,2012-10-22 11:02:35.010000000,,36,Current,,0.12528,0.0974,0.0874,...,-108.01,0.0,0.0,0.0,0.0,1.0,0,0,0.0,158
4,0F023589499656230C5E3E2,909464,2013-09-14 18:38:39.097000000,,36,Current,,0.24614,0.2085,0.1985,...,-60.27,0.0,0.0,0.0,0.0,1.0,0,0,0.0,20


### Structure of Dataset

> There are 81 features, and 113,937 loan entries!

### Main feature(s) of Interest

1. **CreditGrade & ProsperRating (numeric/Alpha)**: These ratings are crucial for assessing the credit risk associated with each loan. CreditGrade is applicable for listings pre-2009, while ProsperRating is for loans originated after July 2009. Analyzing these ratings can help in understanding the risk profile of the borrowers.

2. **LoanStatus**: Indicates the current status of the loan, which is vital for analyzing loan performance, identifying patterns in loan defaults, or early repayments.

3. **BorrowerAPR & BorrowerRate**: These provide insights into the cost of borrowing. Analyzing APR and rates in conjunction with credit ratings and loan performance could reveal trends and borrower sensitivity to interest rates.

4. **EstimatedLoss, EstimatedEffectiveYield, EstimatedReturn**: Applicable for loans originated after July 2009, these estimates are crucial for assessing the expected performance of loans, allowing for a risk-return analysis.

5. **ProsperScore**: A custom risk score could be a strong indicator of the probability of a loan defaulting. Analyzing how ProsperScore correlates with loan performance and ProsperRatings could uncover additional insights into risk assessment accuracy.

6. **DebtToIncomeRatio**: This is an important measure of a borrower's financial stability and capacity to repay the loan. High ratios may indicate higher risk of default.

7. **LoanOriginalAmount & LoanOriginationDate**: Analyzing the amount of the loan along with the origination date can help in understanding lending trends over time and the average loan size per certain months or years, which could uncover some interesting consumer behaviour.

8. **ListingCategory**: The purpose of the loan might influence its risk profile and repayment behavior. For instance, loans for debt consolidation might have different performance characteristics than loans for home improvement as they vary in amount and time needed.

9. **EmploymentStatus & StatedMonthlyIncome**: These variables offer insight into the borrower's job stability and income level, which are critical for assessing their ability to repay the loan.

10. **CurrentDelinquencies & DelinquenciesLast7Years**: These are indicators of past financial behavior which could predict future loan performance.

11. **Investors**: The number of investors that funded the loan might indicate the level of confidence in the borrower's ability to repay, and could influence other factors.

12. **LP_GrossPrincipalLoss, LP_NetPrincipalLoss**: These are pivotal for understanding the actual loss experienced due to loan defaults.

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.


> **Rubric Tip**: The project (Parts I alone) should have at least 15 visualizations distributed over univariate, bivariate, and multivariate plots to explore many relationships in the data set.  Use reasoning to justify the flow of the exploration.



>**Rubric Tip**: Use the "Question-Visualization-Observations" framework  throughout the exploration. This framework involves **asking a question from the data, creating a visualization to find answers, and then recording observations after each visualisation.** 




>**Rubric Tip**: Visualizations should depict the data appropriately so that the plots are easily interpretable. You should choose an appropriate plot type, data encodings, and formatting as needed. The formatting may include setting/adding the title, labels, legend, and comments. Also, do not overplot or incorrectly plot ordinal data.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

## Conclusions
>You can write a summary of the main findings and reflect on the steps taken during the data exploration.



> Remove all Tips mentioned above, before you convert this notebook to PDF/HTML


> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML or PDF` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!

