# Part II - Prosper Loan dataset
## by Zainab Mustapha


## Investigation Overview

The goal of this presentation is to present the relationships between key variables from my analysis and how they help in answering the following questions about the prosper loan dataset.

1. What affects the borrower’s APR?
2. What factors affect a loan’s outcome status?
3. What factors affect loan amount?




## Dataset Overview

The loan dataset comprises 113,937 loans from 2005 to 2014 with 81 variables per loan. Notable variables for drawing conclusions include loan status, prosper rating, prosper score, borrower APR, loan amount, home ownership, state, income range, and employment status. The dataset required minimal cleaning, with null values in the APR column filled using median values. Employment statuses and listing categories were re-classified and mapped to texts, respectively.

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [2]:
# load in the dataset into a pandas dataframe
loan = pd.read_csv('/Users/zom3/Project 3/Zainab Mustapha_Project 3/prosperLoanData.csv')

### What factors affect borrower's APR?



## Visualization 1

From this multivariate scatter plot, it is clear that the higher the risk the loan posed (HR being the most high-risk loan and AA being the least-risk loan), the higher is the APR and the lesser is the amount of loan offered to the borrower. 



![image.png](attachment:image.png)


## Visualization 2

This is a plot of borrower APR against prosper rating which further strengthens the conclusion above.

![image.png](attachment:image.png)

## Visualization 3

This is a plot of borrower APR against prosper score. Prosper score is a measure of creditworthiness of a borrower (1 being the least creditworthy while 11 being the most creditworthy). This plot shows an expected relationship, as prosper score decreases, the APR increases.

![image.png](attachment:image.png)

### What factors affect a loan’s outcome status?

## Visualization 4
This clustered bar chart shows that the prosper rating is appropriate for the categories of loans. The low-risk loans have low values of defaulted and chargedoffs compared to high risk loans.

![image.png](attachment:image.png)

### What factors affect loan amount?

### Visualization 5

A plot of loan amount in dollars against propser rating shows that the lower the risk associated with the loan, the higher is the loan amount.

![image.png](attachment:image.png)

## Visualization 6
A plot of loan amount in dollars against income range shows that the higher income range recieved higher loan amount.

![image.png](attachment:image.png)

### Visualization 7
A plot of loan amount in dollars against propser score shows a similar trend to that of prosper rating. The higher the prosper score, (the better the creditworthiness of the borrower) the higher is the loan amount.

![image.png](attachment:image.png)

### Take away Points

1. High-risk loans have higher APR and lower loan amounts.
2. There is a strong negative relationship between borrower APR and prosper rating and prosper score.
3. Prosper rating is appropriate for categorizing loans, as low-risk loans have lower default and charged-off rates compared to high-risk loans.
4. The lower the risk associated with a loan (higher prosper rating), the higher the loan amount offered.
5. Higher income ranges correspond to higher loan amounts.
6. Higher prosper score (better creditworthiness) corresponds to higher loan amounts.




### Conclusion

The loan amount ranges from 1,000 to 35,000 dollars, with most loans falling within the 1,000 to 15,000 dollars range. The borrower rate and APR show a bimodal distribution. The loan status shows that most loans are still ongoing, with a smaller number past due or completed. The highest number of borrowers are in California, while North Dakota has the least. The borrowers' income range mostly falls within 25,000 to 74,999 dollars.

There is a linear relationship between APR and interest rate, with higher interest rates leading to higher APR. Loans for debt consolidation and business have the highest loan amount, while employed individuals received more loans than unemployed individuals. Borrowers with higher prosper scores tend to get higher loan amounts. The plots of APR vs prosper score and prosper rating show that better creditworthiness is associated with lower APR. High-risk loans have a higher default and charged-off rate compared to low-risk loans. Most defaulted and charged off loans have a 36-month term.

In [4]:
!jupyter nbconvert <Part_II_slide_deck_Zainab Mustapha>.ipynb --to slides --post serve --no-input --no-prompt