# Credit Card Application

To create a model that forecasts the propensity (probability) of customers responding to a personal loan campaign, we will utilize logistic regression. The outcomes will be categorized and the factors influencing the answer will be found using the model's probability. Building a model that identifies clients who are most likely to accept the loan offer in upcoming personal loan campaigns is the objective.

### 1) Importing required libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
import joblib
import itertools
import subprocess
from time import time
from scipy import stats
import scipy.optimize as opt  
from scipy.stats import chi2_contingency
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve


  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,


### 2) Importing and Descriptive Stats

To market their loan products to people who already have deposit accounts, BankABC wants to create a direct marketing channel. To cross-sell personal loans to its current clients, the bank ran a test campaign. An enticing personal loan offer and processing charge waiver were aimed at a random group of 20000 clients. The targeted clients' information has been provided, together with information on how they responded to the marketing offer.

In [2]:
# READ DATA
data = pd.read_excel("CreditCard.xlsx") 
data.shape  

data.head()

Unnamed: 0,upgraded,purchases,extraCards
0,0,32.1007,0
1,1,34.3706,1
2,0,4.8749,0
3,0,8.1263,0
4,0,12.9783,0


In [3]:
import statsmodels.formula.api as sm 
import statsmodels.api as sma 
# glm stands for Generalized Linear Model
mylogit = sm.glm( formula = "upgraded ~ purchases + extraCards", 
    data = data, 
    family = sma.families.Binomial() ).fit() 

mylogit.summary()

0,1,2,3
Dep. Variable:,upgraded,No. Observations:,30.0
Model:,GLM,Df Residuals:,27.0
Model Family:,Binomial,Df Model:,2.0
Link Function:,logit,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-10.038
Date:,"Fri, 24 Jun 2022",Deviance:,20.077
Time:,06:55:52,Pearson chi2:,18.5
No. Iterations:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-6.9398,2.947,-2.355,0.019,-12.716,-1.163
purchases,0.1395,0.068,2.049,0.040,0.006,0.273
extraCards,2.7743,1.193,2.326,0.020,0.437,5.112


In [4]:
#Train-Test Split
credittrain, credittest = train_test_split(data, train_size=0.70, random_state=1)

In [6]:
import statsmodels.api as sm

# defining the dependent and independent variables
Xtrain = credittrain[['purchases', 'extraCards']]
ytrain = credittrain[['upgraded']]
   
# building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain).fit()

Optimization terminated successfully.
         Current function value: 0.491848
         Iterations 6


In [7]:
print(log_reg.summary())

                           Logit Regression Results                           
Dep. Variable:               upgraded   No. Observations:                   21
Model:                          Logit   Df Residuals:                       19
Method:                           MLE   Df Model:                            1
Date:                Fri, 24 Jun 2022   Pseudo R-squ.:                  0.2273
Time:                        06:56:26   Log-Likelihood:                -10.329
converged:                       True   LL-Null:                       -13.367
Covariance Type:            nonrobust   LLR p-value:                   0.01370
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
purchases     -0.0556      0.027     -2.070      0.038      -0.108      -0.003
extraCards     3.4853      1.440      2.420      0.016       0.662       6.308


In [8]:
Xtest = credittest[['purchases', 'extraCards']]
ytest = credittest['upgraded']
  
# performing predictions on the test datdaset
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
  
# comparing original and predicted values of y
print('Actual values', list(ytest.values))
print('Predictions :', prediction)

Actual values [1, 1, 1, 1, 1, 0, 0, 0, 1]
Predictions : [1, 1, 1, 1, 1, 1, 0, 0, 0]


In [9]:
from sklearn.metrics import (confusion_matrix, 
                           accuracy_score)
  
# confusion matrix
cm = confusion_matrix(ytest, prediction) 
print ("Confusion Matrix : \n", cm) 
  
# accuracy score of the model
print('Test accuracy = ', accuracy_score(ytest, prediction))
#accuracy is greater than 0.7 (0.778 hence it is a accurate model)

Confusion Matrix : 
 [[2 1]
 [1 5]]
Test accuracy =  0.7777777777777778


In [None]:
#fores



# define dataset
X, y = (n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=3)
# summarize the dataset
print(X.shape, y.shape)