In [3]:
from __future__ import division
import glob
import os
import pandas as pd
import numpy as np
from scipy.stats import binned_statistic, linregress
from scipy.stats import randint as sp_randint
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
import seaborn as sns
import statsmodels.api as sm
from IPython.display import clear_output, Image

from s3_connect import s3_connect

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

from EDA_plotting_functions import (make_kdeplot, default_rate_binned_barplot, default_rate_categorical_barplot, 
                                   default_rate_by_state)

init_notebook_mode(connected=True)

tmp_localdir = '~/'

pd.options.display.max_columns = 999

%pylab inline
clear_output()

# Optimizing Loan Choices
Now that we have established an intuition about what makes a loan attractive, let's deep dive into how we would make automated loan selections. This can be accomplished with the following steps:
1. Compute loan amortization for each loan.
2. Extrapolate potential profit for each loan assuming repayment.
3. Calculate expected profit based on modeled default probability.
4. Invest in loans with the highest expected profit.

## 1. Compute loan amortization for each loan.

We will need to know the monthly payment for each loan before we can calculate potential profit. 

Although this information is not avaiable in the data, it can be calculated using the following amortization formula:

* p = principal amount
* i = interest rate
* n = total number of payments

$$\Large payment = p\frac{i(1+i)^n}{(1+i)^n-1}$$

## 2. Extrapolate potential profit for each loan assuming repayment.
We can calculate the profit by multiplying the monthly loan payment amount by the life of the loan and subtracting the principal:

$$\Large profit = (n * payment) - principal$$

## 3. Calculate expected profit based on modeled default probability.
We can now use our deployed machine learning model to assess risk. Given our model and a set of features for each loan, we can assign a probability of default to each loan, P(default). Using P(default), profit, and principal, we can estimate the amount of profit we can expect to receive after taking into account default risk. 

The expected profit has two terms: a repayment term (left) and a default term (right): 
1. We can expect the loan to be repayed with a probability of `1-P(default)`. By multiplying this probability by the total profit we stand to earn, `(1-P(default)) * profit`, we can observe the average profit we would receive if we simulated this process many times. 
2. On the other hand, we also know that a loan will default with `P(default)`. By multipying this probability by the principal `P(default) * principal`, we can observe the average loss we will accrue if we simulated this process many times.

Finally, we subtract the expected loss from expected gain to observe the expected profit for a particular loan:

$$\Large E[profit] = [ (1-P(default)) * profit ] - [ P(default) * principal ]$$

*Note: For simplicity we assume that the lender loses the entire principal in the event of a default.*

## 4. Invest in loans with the highest expected profit.

Now that three decisions parameters (interest rate, loan amount, probability of default) have now been folded into a single metric, we can pick the loans with the highest expected profit to maximize our return. 

While this method will **on average** produce the highest return, some investors may still want throttle their risk tolerance.

While choosing loans with a fairly conservative produces lower yield on average, one can be more certain of a favorable payout. Illustrated below is the expected ROI (%) after funding the 10 loans with the greatest expected profit that meet 3 different risk tolerance thresholds.