# Part II -  (Prosper Loan Data Explanatory Analysis)
## by (Imaobong Anwana)

>**Before you start**: You must have the README.md file ready that include a summary of main findings that reflects on the steps taken during the data exploration (Part I notebook). The README.md file should also describes the key insights that will be conveyed by the explanatory slide deck (Part II  outcome)



## Investigation Overview


Can borrowers monthly salary influence loan amount, does the fact that the borrower is a homeowner affect loan amount.The main purpose of this exploration was to figure out the demography of borrowers like (state, occupation, employment status) and check for pointers to the loan amount that was granted to borrowers, what determined the amount borrowers could get, if there was any relationship between other variables and the loan amount


## Dataset Overview

Dataset Overview
The dataset that was explored contained information about over 100,000 loan listings and their characteristics. Variables include listing category to signify why the loan was taken, credit scores, employment status, occupation, investors to show how many investors contributed to a particular loan and other interesting variables.

In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [None]:
# load in the dataset into a pandas dataframe
loan = pd.read_csv('prosperLoanData.csv')

In [None]:
def fig_size(a,b):
    # calling figsize parameters
    return plt.figure(figsize=(a,b))

In [None]:
def labels(a,b,c):
    return plt.title(a),plt.ylabel(b),plt.xlabel(c)

In [None]:
fig_size(12,8);
color = sns.color_palette()[0]
order = loan.BorrowerState.value_counts().iloc[:10].index
sns.countplot(data=loan, y='BorrowerState', color=color, order=order)
labels('Borrower State','State Abbreviation','Frequency');


As we saw in the previous slide, most borrowers are from California, Texas and New york. Next we would limit the plot categories to 10 to see that Professionals and Computer Programmers are the occupations that take the most loans.



In [None]:
fig_size(12,8)
color = sns.color_palette()[0]
order = loan.Occupation.value_counts().iloc[:10].index
sns.countplot(data=loan, y='Occupation', color=color, order=order)
labels('Borrower\'s Occupation','Occupation','Frequency');


## (Visualization 2)

Now that we have a general idea of the population of the borrowers, let's further show their employment status and income range to gain further insight into the finances of people taking loans.



In [None]:
fig_size(12,8)
color = sns.color_palette()[0]
order = loan.EmploymentStatus.value_counts().index
sns.countplot(data=loan, x='EmploymentStatus', color=color, order=order)
labels('Employment Status','Frequency','Status');


Most of the borrowers population are employed or have a full time job. Unlike initial impression, because most of the loans were taken for business, self-employed people do not top the list. People between the 25k-75k income range take the most loans as we can see in the plot below.



In [None]:
fig_size(12,8)
color = sns.color_palette()[0]
order = ['Not displayed','Not employed','$0','$1-24,999','$25,000-49,999','$50,000-74,999','$75,000-99,999','$100,000+']
sns.countplot(data=loan, x='IncomeRange', color=color, order=order)
labels('Income Range','Frequency','Income Range');

## (Visualization 3)

Finally we'd show how the higher your income the higher your chances of getting larger loans as is obvious in the violin plot below. The higher the income, the wider the width of the violin plot in top areas that signifies higher loan amounts.

In [None]:

fig_size(12,8)
color = sns.color_palette()[0]
order = ['Not displayed','Not employed','$0','$1-24,999','$25,000-49,999','$50,000-74,999','$75,000-99,999','$100,000+']
sns.violinplot(data=loan, x='IncomeRange', y='LoanOriginalAmount', color=color, order=order)
labels('Loan Amount by Income Range','Loan Amount ($)','Income Range');

Also, the more the income the more likelihood it is that the borrower is a homeowner. The population below the 50k income range, have lesser homeowners than renters.

In [None]:
fig_size(12,8)
order = ['Not displayed','Not employed','$0','$1-24,999','$25,000-49,999','$50,000-74,999','$75,000-99,999','$100,000+']
sns.countplot(data=loan, x='IncomeRange', hue='IsBorrowerHomeowner', order=order)
labels('Home Owner Status by Income Range','Frequency','Income range');

### Generate Slideshow
Once you're ready to generate your slideshow, use the `jupyter nbconvert` command to generate the HTML
slide show.  

In [None]:
# Use this command if you are running this file in local

!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt