# AccelerateAI: Logistic Regression

## Assignment : Credit Grant Outcome

A group of 20 customers possess portfolio ranging from 0.5 to 6.5 million USD in one of the largest Financial Services majors in South America. As an analyst you are tasked to find out how the portfolio amount affect the probability of a customer getting a credit grant? Please refer to the dataset provided in GitHub - CreditGrantOutcome.csv.

Portfolio Value is given in million USD = X; And Credit Grant Decision = y which is either 0 or 1, i.e. 1 for getting a grant.

- Find out the Odds ratio for every customer data that is captured here. What is the Odds ratio when Portfolio Value X=2 mUSD?
- Find out optimum values of coefficients beta_0 and beta_1?



In [1]:
import pandas as pd 
import statsmodels.api as sm
from statsmodels.api import Logit, add_constant

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('./CreditGrantOutcome.csv') 

df.sample(5)

Unnamed: 0,PortfolioValue,CreditGrantDecision
10,2.75,1
14,4.0,1
16,4.5,1
6,1.75,1
17,4.75,1


In [3]:
df.columns

Index(['PortfolioValue', 'CreditGrantDecision'], dtype='object')

In [4]:
y = df[['CreditGrantDecision']]
X = df[['PortfolioValue']]

In [5]:
log_reg1 = sm.Logit(y, X).fit()

# print the model summary
print(log_reg1.summary())

Optimization terminated successfully.
         Current function value: 0.639808
         Iterations 5
                            Logit Regression Results                           
Dep. Variable:     CreditGrantDecision   No. Observations:                   20
Model:                           Logit   Df Residuals:                       19
Method:                            MLE   Df Model:                            0
Date:                 Mon, 19 Sep 2022   Pseudo R-squ.:                 0.07695
Time:                         15:27:51   Log-Likelihood:                -12.796
converged:                        True   LL-Null:                       -13.863
Covariance Type:             nonrobust   LLR p-value:                       nan
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
PortfolioValue     0.2179      0.157      1.390      0.165      -0.089       0.525


Above approach is not complete as we need to add constant term to X.

So let's re-execute after including intercept/constant term. (we have intentionally fitted above model here to showcase what problem that we see at times. If we refer the statistical significance at feature level is not accurate when we don't include the intercept/constant term.)

Now, let's do that below and see how it looks as below.

In [6]:
# add intercept manually
X_const = add_constant(X)
# build model and fit training data
log_reg2 = Logit(y, X_const).fit()
# print the model summary
print(log_reg2.summary())

Optimization terminated successfully.
         Current function value: 0.401494
         Iterations 7
                            Logit Regression Results                           
Dep. Variable:     CreditGrantDecision   No. Observations:                   20
Model:                           Logit   Df Residuals:                       18
Method:                            MLE   Df Model:                            1
Date:                 Mon, 19 Sep 2022   Pseudo R-squ.:                  0.4208
Time:                         15:27:51   Log-Likelihood:                -8.0299
converged:                        True   LL-Null:                       -13.863
Covariance Type:             nonrobust   LLR p-value:                 0.0006365
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const             -4.0777      1.761     -2.316      0.021      -7.529      -0.626
Portfolio

Hence beta_0 = -4.077 and beta_1 = 1.504

For X=2, Odds Ratio = exp(-4.077 + 2 * 1.504)
                    = exp(-1.069)
                    = 0.3437

In [8]:
import numpy as np
np.exp(-4.077 + 2 * 1.5046)

0.3437639668467007

So, we can generate odds ratio for every customer data. For X=2, odds ratio is approx. 34.37%

Secondly, beta_0 = -4.077 and beta_1 = 1.504