# **Marketing Campaign for Banking Products**

# **Objective**
The classification goal is to predict the likelihood of a liability customer buying personal
loans.

# **Context**

The bank has a growing customer base. The bank wants to increase borrowers (asset customers) base to bring in more loan business and earn more through the interest on loans. So , the bank wants to convert the liability based customers to personal loan customers. (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. The department wants you to build a model that will help them identify the potential customers who have a higher probability of purchasing the loan. This will increase the success ratio while at the same time reduce the cost of the campaign.

# **Data Description**


The file Bank.xls contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.

Data Link:https://www.kaggle.com/itsmesunil/bank-loan-modelling/downloadnk text

## Data Attribute Information:

● ID: Customer ID

● Age: Customer's age in completed years

● Experience: #years of professional experience

● Income: Annual income of the customer ($000)

● ZIP Code: Home Address ZIP code.

● Family: Family size of the customer

● CCAvg: Avg. spending on credit cards per month ($000)

● Education: Education Level. 1: Undergrad; 2: Graduate; 3:
Advanced/Professional

● Mortgage: Value of house mortgage if any. ($000)

● Personal Loan: Did this customer accept the personal loan offered in the last
campaign?

● Securities Account: Does the customer have a securities account with the bank?

● CD Account: Does the customer have a certificate of deposit (CD) account with
the bank?

● Online: Does the customer use internet banking facilities?

● Credit card: Does the customer use a credit card issued by the bank?

# CODE

In [None]:
#Importing Libraries
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt #graph plotting
import seaborn as sns
%matplotlib inline
sns.set(style="ticks")
from scipy.stats import zscore
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

"""print("Hello1")
from scipy.stats import zscore
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn import model_selection
print("Hello2")"""

In [None]:
#Importing Dataset
data = pd.read_excel('Bank_Personal_Loan_Modelling.xlsx','Data')

In [None]:
#Displaying number of rows and columns
data.shape

In [None]:
#Displaying 1st 5 data values
data.head(10)

In [None]:
data.tail()

In [None]:
data.dtypes

In [None]:
#Displaying Column Values
data.columns

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
#Counting the missing values in the column 
data.apply(lambda x: sum(x.isnull().values), axis = 0) # For columns

In [None]:
#Counting the missing values in the rows 
data.apply(lambda x: sum(x.isnull().values), axis = 1) # For  rows

In [None]:
np.count_nonzero(data.isnull().values) 

In [None]:
#finding unique data
data.apply(lambda x: len(x.unique()))

In [None]:
#Removing ID column as it doesn't have impact on the model
data.drop('ID',axis=1,inplace=True)

In [None]:
data.describe()

In [None]:
#Age Distribution
data.hist(column="Age")

In [None]:
sns.distplot(data['Age'])

In [None]:
sns.distplot(data['Experience'])

In [None]:
sns.distplot(data['Income'])

In [None]:
sns.distplot(data['CCAvg'])

In [None]:
sns.distplot(data['Mortgage'])

In [None]:
#skewness
data.skew()

In [None]:
data_z= data.apply(zscore)

In [None]:
plt.matshow(data_z.corr())

In [None]:
data_z.corr()

In [None]:
sns.heatmap(data_z.corr(),vmin=-1,vmax=1,cmap='seismic')

In [None]:
data_z.var()

In [None]:
sns.pairplot(data_z)

In [None]:
data['Personal Loan'].value_counts()

In [None]:
sns.countplot(data['Personal Loan'],label='Count')

In [None]:
df=data
loan_true = len(df.loc[df['Personal Loan'] == 1])
loan_false = len(df.loc[df['Personal Loan'] == 0])
print("Number of customers who accepted loan: {0} ({1:2.2f}%)".format(loan_true, (loan_true / (loan_true + loan_false)) * 100 ))
print("Number of customers who do not accepted loan: {0} ({1:2.2f}%)".format(loan_false, (loan_false / (loan_true + loan_false)) * 100))

## **STEP 5**

In [None]:
array =data_z.values
bank_feature_df = data_z.drop(labels= "Personal Loan" , axis = 1)
bank_labels = bank_data_z["Personal Loan"]
X = np.array(bank_feature_df)
Y = np.array(bank_labels)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=1)

**Till NOW**

In [None]:
# there are 52 records with negative experience. Before proceeding any further we need to clean the same
data[data['Experience'] < 0]['Experience'].count()

In [30]:
#clean the negative variable
dfExp = data.loc[data['Experience'] >0]
negExp = data.Experience < 0
column_name = 'Experience'
mylist = data.loc[negExp]['ID'].tolist() # getting the customer ID who has negative experience

KeyError: ignored

In [None]:
# there are 52 records with negative experience
negExp.value_counts()

In [None]:
for id in mylist:
    age = data.loc[np.where(data['ID']==id)]["Age"].tolist()[0]
    education = data.loc[np.where(data['ID']==id)]["Education"].tolist()[0]
    df_filtered = dfExp[(dfExp.Age == age) & (dfExp.Education == education)]
    exp = df_filtered['Experience'].median()
    data.loc[data.loc[np.where(data['ID']==id)].index, 'Experience'] = exp

In [None]:
# checking if there are records with negative experience
data[data['Experience'] < 0]['Experience'].count()

In [None]:
data.describe().transpose()