# Naive Bayes Classifier - Personal Loan Acceptance

This program is a solution to the problem 8.1 of chapter 8 of following book. 

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python, First Edition.

Galit Shmueli, Peter C. Bruce, Peter Gedeck, and Nitin R. Patel

© 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc.

## Importing Libraries

In [1]:
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split

Printing versions of libraries

In [2]:
print('pandas version: {}'.format(pd.__version__))
print('sklearn version: {}'.format(sklearn.__version__))

pandas version: 1.5.3
sklearn version: 1.2.1


## Loading Dataset

In [3]:
df = pd.read_csv('UniversalBank.csv')
df.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1


Converting to categorical

In this excercise we are concerned with only three variable, Online, CreditCard and Personal Loan. Therefore, we will convert these variables to categorical. 

In [4]:
df.Online = df.Online.astype('category')
df.CreditCard = df.CreditCard.astype('category')
df['Personal Loan'] = df['Personal Loan'].astype('category')

## Partitioning Data

Partitioning the data into training (60%) and validation (40%) data.

In [5]:
# split the original data frame into a train and test using same random_state
train_df, valid_df = train_test_split(df[['CreditCard', 'Personal Loan', 'Online']], test_size=0.4, random_state = 1)
display(train_df.head())

Unnamed: 0,CreditCard,Personal Loan,Online
4522,0,0,0
2851,0,0,1
2313,1,0,1
982,1,0,0
1164,0,1,1


## Create a pivot table for the training data

Creating a pivot table for the training data with Online as a column variable, CC as a row variable, and Loan as a secondary row variable. The values inside the table will convey the count. Using the pandas dataframe methods melt() and pivot().

In [38]:
# Melt the dataframe to have a single column for each variable
train_df_melted = train_df.melt(id_vars=['CreditCard', 'Personal Loan'], var_name='Online', value_name='Count')

# Pivot the melted dataframe to create the desired table
train_df_pivoted = train_df_melted.pivot_table(index=['CreditCard', 'Personal Loan'], columns='Online', values='Count', aggfunc='sum')
# Display the pivot table
display(train_df_pivoted)

Unnamed: 0_level_0,Online,Online
CreditCard,Personal Loan,Unnamed: 2_level_1
0,0,1117
0,1,126
1,0,477
1,1,49


Considering the task of classifying a customer who owns a bank credit card and is actively using online banking services. Looking at the pivot table created above, calculating the probability that this customer will accept the loan offer. This is the probability of loan acceptance (Loan = 1) conditional on having a bank credit card (CC = 1) and being an active user of online banking services (Online = 1).

In [40]:
# I calculate following values only to make sure that above pivot table creation was correct.
total_count = len(train_df[(train_df['CreditCard'] == 1) & (train_df['Online'] == 1)].index)
print(total_count)
personal_loan_n_count = len(train_df[(train_df['CreditCard'] == 1) & (train_df['Online'] == 1) & (train_df['Personal Loan'] == 0)].index)
print(personal_loan_n_count)
personal_loan_y_count = len(train_df[(train_df['CreditCard'] == 1) & (train_df['Online'] == 1) & (train_df['Personal Loan'] == 1)].index)
print(personal_loan_y_count)

# Calculating the probability that the customer mentioned above will accept the loan offer. 
# We know probability = (number of desired or successful outcomes)/(total number of possible outcomes). 
#Let's name 'number of desired or successful outcomes' as numerator and 'total number of possible outcomes' as denominator
numerator = 49 # I got this value by observing above pivot table 
denoninator = 49 + 477 # I got this value by observing above pivot table
probability = numerator/denoninator
print('Probability:', probability)

526
477
49
Probability: 0.09315589353612168


## Result

Therefore, the probability that the customer will accept the loan offer is 0.09.