# Analyzing loan approval decisions automated by IBM Operational Decision Manager
## Analyzing your decisions in Python with Panda and Brunel

This Python 2.10 notebook shows how to load a decision set produced by IBM ODM, and how to apply analytics with Brunel library to get insights on the decisions.
The decision set has been automated by running business rules on randomly generated loan applications. The decision set has been writtern in a CSV format. 

This notebook has been developed with a Panda dataframe and runs on Spark 2.0. 

The intent of applying data science on decisions is to check that decision automation works as expected. In other words, we want to check that the executed rules fit well with the segmentation of the data. From there we will potentialy find optimizations to better automate your decision making. You will be able to extend the notebook to create new views on your decisions by using Panda dataframes and Brunel visualization capabilities.
    
To get the most out of this notebook, you should have some familiarity with the Python programming language.

## Contents 
This notebook contains the following main sections:

1. [Load the loan validation decision set.](#overview)
2. [View an approval distribution pie chart.](#viewapprovaldistribution)
3. [View approvals in a chord chart.](#viewapprovaldistributionincordchart) 
4. [View the income on loan amount distribution.](#incomeoncreditscoredistribution)
5. [View the loan amount on credit score distribution.](#viewamountdistribution)
6. [Summary and next steps.](#next)    

<a id="overview"></a>
## 1. Load the Loan Validation decision set.
The loan validation dataset has been generated with Operational Decision Manager as a CSV file.
The following code accesses to this dataset file to construct a dataframe of simple processed loan applications.

In [1]:
from io import StringIO

import requests
import json
import pandas as pd
import brunel

df = pd.read_csv("https://raw.githubusercontent.com/ODMDev/decisions-on-spark/master/data/miniloan/miniloan-decisions-ls-10K.csv")
df.head()

Unnamed: 0,name,creditScore,income,loanAmount,monthDuration,approval,rate,yearlyReimbursement
0,John Doe,436,290532,136331,19,True,0.08,13979.403261089992
1,John Doe,333,95440,516245,24,False,0.08,48447.78364788128
2,John Doe,805,43242,982572,14,False,0.067,108746.07131251293
3,John Doe,313,3773,286564,19,False,0.08,30025.806398532397
4,John Doe,639,141075,603802,14,False,0.067,69141.80143043594


A dataframe has been created to capture 10 000 loan application decisions automated with business rules. Business rules have been used to determine eligibility based credit score, loan amount, income to debt ratio. Decision outcomes are represented by the approval and yearlyReplayment columns.

Table above represents a decision set. Each row shows a decision with its input and output parameters:
   * inputs
      * name, 
      * creditScore, 
      * income, 
      * loanAmount, 
      * monthDuration
   * outputs
      * approval, 
      * the loan rate,
      * yearlyReimbursement.
     
By example on row 0 a loan application has been submited for John Doe, with a credit score of 436, a yearly income of USD 290532, a loan amount of USD 136331 and a loan duration set to 19 months. The automated decision gives an approval at a rate of 8% and a yearly reimbursement of USD 13979.4 .

In [2]:
total_rows = df.shape[0]
print("The size of the decision set is " + str(total_rows))

The size of the decision set is 10000


<a id="viewapprovaldistribution"></a>
## 2.View the loan approval distribution in a pie chart.
A simple pie chart that shows the approval distribution in the decision set.

In [3]:
%brunel data('df') stack polar bar x("const") y(#count) color(approval) legends(none) label(approval) :: width=200, height=300

<IPython.core.display.Javascript object>

<a id="viewapprovaldistributionincordchart"></a>
## 3.View the loan approval distribution per credit score in a chord chart.
A chord chart that shows the approval count per credit score.

In [4]:
%brunel data('df') chord x(approval) y(creditScore) color(#count) tooltip(#all)

<IPython.core.display.Javascript object>

<a id="incomeoncreditscoredistribution"></a>
## 4.View income on credit score distribution.
Do we see trends or limits in credit score or income for accepted loan applications? We can observe graphically that the larger are the credit score and income values the more accepted approval we get.

In [5]:
%brunel data('df') x(income) y(creditScore) color(approval:yellow-green) :: width=800, height=300

<IPython.core.display.Javascript object>

<a id="loanamountoncreditscoredistribution"></a>
## 5.View loan amount / credit score distribution-
Do we see limits in score or amount for accepted loan applications? We observe that as expected:
- the higher the loan amount, the higher the rejection rate.
- the lower credit score, the higher the rejection rate.

We observe the absence of green points identified for loan amount greater that USD 1 000 000. It is consistent with a rule that rejects the application for amounts greater than this threshold.

In [6]:
%brunel data('df') x(loanAmount) y(creditScore) color(approval:yellow-green) :: width=800, height=300

<IPython.core.display.Javascript object>

<a id="viewamountdistribution"></a>
## 5.Loan amount distribution.
The amount of loan applications visualized into a bar chart pie chart.
Bar chart shows a balanced distribution as input data have been ramdomly generated.

In [7]:
%brunel data('df') bar x(loanAmount) y(#count) bin(loanAmount) style("size:100%") :: width=800, height=300

<IPython.core.display.Javascript object>

<a id="next"></a>
# Summary and next steps
You have manipulated dataframes and views of a decision set powered by IBM ODM and captured in a CSV format. You can expand this notebook by adapting the views and adding new ones to get more insights about your decisions and make better decisions in the future.

Copyright © 2018 IBM. This notebook and its source code are released under the terms of the MIT License.

<a id="authors"></a>
## Authors

Pierre Feillet is engineer at the IBM Decision Lab. Pierre is architect in decision automation, and is passionate about data science and machine learning.

Copyright © 2018 IBM. This notebook and its source code are released under the terms of the MIT License.