# Bank Marketing

## Overview

The  data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.  

We chose this dataset because its big enough, so when we split it into train, validation and test, these subsets will be relatively big for our algorithms to learn well increasing their accuracy and reliability also because Marketing is a very crucial part of every business, so knowing how to win a customer sounded very interesting to us since we are anticipating on changing the world in the business field in future.

## Dataset Overview
### The Dataset has 20 Input Variables  (Features):

**#bank client data:**  
1 - _Age_ (numeric)  
2 - _Job_ **:** type of job (categorical: 'admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown')  
3 - _Marital_ **:** marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)  
4 - _Education_ (categorical:   'basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate', 'professional.course', 'university.degree', 'unknown')  
5 - _Default_ **:** has credit in default? (categorical: 'no', 'yes', 'unknown')
6 - _Housing_ **:** has housing loan? (categorical: 'no', 'yes', 'unknown')  
7 - _Loan_ **:** has personal loan? (categorical: 'no', 'yes', 'unknown')  

**#related with the last contact of the current campaign:**  
8 - _Contact_ **:** contact communication type (categorical: 'cellular', 'telephone')  
9 - _Month_ **:** last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')  
10 - _Day_of_week_ **:** last contact day of the week (categorical: 'mon', 'tue', 'wed', 'thu', 'fri')  
11 - _Duration_ **:** last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.  

**#other attributes:**  
12 - _Campaign_ **:** number of contacts performed during this campaign and for this client (numeric, includes last contact)  
13 - _pDays_ **:** number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)  
14 - _Previous_ **:** number of contacts performed before this campaign and for this client (numeric)  
15 - _pOutcome_ **:** outcome of the previous marketing campaign (categorical: 'failure', 'nonexistent', 'success')  

**#social and economic context attributes**  
16 - _emp.var.rate_ **:** employment variation rate - quarterly indicator (numeric)  
17 - _cons.price.idx_ **:** consumer price index - monthly indicator (numeric)  
18 - _cons.conf.idx_ **:** consumer confidence index - monthly indicator (numeric)  
19 - _Euribor3m_ **:** euribor 3 month rate - daily indicator (numeric)  
20 - _nr.employed_ **:** number of employees - quarterly indicator (numeric)  

### Output Variable (Desired Target):
y - has the client subscribed a term deposit? (binary: 'yes', 'no')

## Contributors:
* Phillip Moyo – 2185695   
* Moshito Charles Makgakga – 1445435   
* Godfrey T Chamunogwa – 2234379
* Fankholoro Vincent Sebothoma – 1671848   

# Neural Networks

# Import Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder,OneHotEncoder,MinMaxScaler

# Import the Dataset


In [2]:
df = pd.read_csv('bank-full.csv', sep=";")

## Trimming the Data
The Data is heavily biased, so we are trimming it to make the different output classes more even (unbiased)

In [3]:
df_yes = df[df['y']=='yes']
df_no = df[df['y']=='no']
df_no = df_no.iloc[:5289, :]

df = pd.concat([df_yes, df_no])
df = df.sample(frac=1).reset_index(drop=True)       #shuffle the rows
df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,42,blue-collar,single,primary,no,19,yes,no,unknown,7,may,158,1,-1,0,unknown,no
1,19,student,single,secondary,no,329,no,no,cellular,22,oct,252,2,-1,0,unknown,yes
2,72,retired,married,secondary,no,5715,no,no,cellular,17,may,1114,1,181,2,success,yes
3,28,technician,single,secondary,no,312,yes,no,unknown,6,may,392,1,-1,0,unknown,no
4,59,management,married,tertiary,no,5397,no,no,cellular,23,jun,671,3,-1,0,unknown,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10573,32,technician,married,primary,no,62,yes,yes,unknown,13,may,120,1,-1,0,unknown,no
10574,42,management,married,tertiary,no,372,yes,no,cellular,4,aug,153,3,1,2,success,yes
10575,32,management,married,tertiary,no,0,yes,no,unknown,5,may,179,1,-1,0,unknown,no
10576,45,technician,married,primary,no,718,yes,no,unknown,14,may,135,1,-1,0,unknown,no


# Feature Scaling

In [4]:
##--scaling column(y)=> 'yes'=1 and 'no'=0 also @there are no null values in our dataset---##
y = LabelEncoder()
df.iloc[:,-1] = y.fit_transform(df.iloc[:,-1])

In [5]:
##--scaling column(poutcome)=> 'failure'=0, 'other'=1, 'success'=2, 'unknown'=3
poutcome = LabelEncoder()
df.iloc[:,-2] = poutcome.fit_transform(df.iloc[:,-2])

In [6]:
##--scaling column(contact)=> 'cellular'=0, 'telephone'=1, 'unknown'=2
contact = LabelEncoder()
df.iloc[:,8] = contact.fit_transform(df.iloc[:,8])

In [7]:
##--scaling column(marital)=> 'married'=1, 'divorced'=0, 'single'=2
marital = LabelEncoder()
df.iloc[:,2] = marital.fit_transform(df.iloc[:,2])

In [8]:
##--scaling column(education)=> 'primary'=0, 'secondary'=1, 'tertiary'=2, 'unknown'=3
education = LabelEncoder()
df.iloc[:,3] = education.fit_transform(df.iloc[:,3])

In [9]:
##--scaling column(default)=> 'yes'=1, 'no'=0'
default = LabelEncoder()
df.iloc[:,4] = default.fit_transform(df.iloc[:,4])

In [10]:
##--scaling column(housing)=> 'yes'=1, 'no'=0'
housing = LabelEncoder()
df.iloc[:,6] = housing.fit_transform(df.iloc[:,6])

In [11]:
##--scaling column(loan)=> 'yes'=1, 'no'=0'
loan = LabelEncoder()
df.iloc[:,7] = loan.fit_transform(df.iloc[:,7])

In [12]:
##--scaling column(month)=> 'jan'=1,'feb'=2, 'mar'=3,...,'dec'=12
month = df.iloc[:,10].replace(['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'],[1,2,3,4,5,6,7,8,9,10,11,12])
df.iloc[:,10] = month 

In [13]:
##--scaling column(job)
job = df.iloc[:,1].replace(['blue-collar', 'admin.', 'technician', 'management', 'retired','student', 'entrepreneur', 'services', 'self-employed','unemployed', 'housemaid', 'unknown'],[1,2,3,4,5,6,7,8,9,10,11,12])
df.iloc[:,1] = job 

In [14]:
# add bias column of 1's
df.insert(0, 'bias', np.ones(df.shape[0]))

In [15]:
## scale the features
scaler = MinMaxScaler()
columns = ['age', 'job', 'marital', 'education', 'default', 'balance', 'housing', 'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays', 'previous', 'poutcome'] 
df[columns] = scaler.fit_transform(df[columns])
df

Unnamed: 0,bias,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,1.0,0.311688,0.000000,1.0,0.000000,0.0,0.039424,1.0,0.0,1.0,0.200000,0.363636,0.040217,0.000000,0.000000,0.000000,1.000000,0
1,1.0,0.012987,0.454545,1.0,0.333333,0.0,0.043092,0.0,0.0,0.0,0.700000,0.818182,0.064450,0.016129,0.000000,0.000000,1.000000,1
2,1.0,0.701299,0.363636,0.5,0.333333,0.0,0.106819,0.0,0.0,0.0,0.533333,0.363636,0.286672,0.000000,0.212865,0.034483,0.666667,1
3,1.0,0.129870,0.181818,1.0,0.333333,0.0,0.042891,1.0,0.0,1.0,0.166667,0.363636,0.100541,0.000000,0.000000,0.000000,1.000000,0
4,1.0,0.532468,0.272727,0.5,0.666667,0.0,0.103056,0.0,0.0,0.0,0.733333,0.454545,0.172467,0.032258,0.000000,0.000000,1.000000,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10573,1.0,0.181818,0.181818,0.5,0.000000,0.0,0.039933,1.0,1.0,1.0,0.400000,0.363636,0.030420,0.000000,0.000000,0.000000,1.000000,0
10574,1.0,0.311688,0.272727,0.5,0.666667,0.0,0.043601,1.0,0.0,0.0,0.100000,0.636364,0.038928,0.032258,0.002339,0.034483,0.666667,1
10575,1.0,0.181818,0.272727,0.5,0.666667,0.0,0.039199,1.0,0.0,1.0,0.133333,0.363636,0.045630,0.000000,0.000000,0.000000,1.000000,0
10576,1.0,0.350649,0.181818,0.5,0.000000,0.0,0.047695,1.0,0.0,1.0,0.433333,0.363636,0.034287,0.000000,0.000000,0.000000,1.000000,0


# Splitting the Dataset into Training, Validation and Test set

#### Training data (60% of the data)

In [16]:
# trining dataset
train_data = df.iloc[:6347:]

# training features
train_features = train_data.iloc[:,:-1].values

# training targets
train_targets = train_data.iloc[:,-1].values


#### Validation data (20% of the data)

In [17]:
# validation dataset
validate_data = df.iloc[6347:8463:]

# validation features
validate_features = validate_data.iloc[:,:-1].values

# validation targets
validate_targets = validate_data.iloc[:,-1].values


#### Testing data (20% of the data)

In [18]:
# testing dataset
test_data = df.iloc[8463::]

# testing features
test_features = test_data.iloc[:,:-1].values

# testing targets
test_targets = test_data.iloc[:,-1].values

# Training the Nueral Networks Model on the Training Set

#### Architectural veiw of the network
<img src="files/neuralNetwork.png"  width='800rem'>

#### Initial weights and they include the intercept (bias)

In [19]:
Q1 = np.random.uniform(0, 1, size=(2,17)) # 2x17
Q2 = np.random.uniform(0, 1, size=(1, 3)) # 2x3 -> 1X3

#### Useful functions

In [20]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

In [21]:
# forward prop. X data point, W1, W2 weight for the layers
def forwardprop(X, W1, W2):
    a1 = sigmoid(W1 @ X)
    
    a1 = np.insert(a1,0,1)
    
    a2 = sigmoid(W2 @ a1)
    
    return [a1,a2]
# no change

In [22]:
def backprop(Y, h, W, a):
    e2 = h - Y
    e1 = (W.T @ e2) * (a *(1 - a)) 
    return [e1[1::], e2]
# no change

In [23]:
def grad(x, e1, a, e2):
    DL2 = np.array([e2]).T @ np.array([a])
    DL1 = np.array([e1]).T @ np.array([x])

    return [DL1, DL2]
# no change

In [24]:
def Errors(xValues,yValues,parametre1):
    hx1 = 1/(1 + np.exp(-(parametre1 @ xValues.T)))
    ans = -(1/len(yValues))*((yValues @ np.log(hx1).T +  (1 - yValues) @ np.log(1 - hx1).T))
    return ans

In [25]:
def regularizer(W,lambda_):
    y = W
    y[:,0] = 0
    return lambda_ * (W+y)

In [26]:
def NNlearning(data, target, Q1, Q2, alpha,epsalon, lambda_):
    W1 = np.zeros((Q1.shape[0],Q1.shape[1]))
    W2 = np.zeros((Q2.shape[0],Q2.shape[1]))
    itr = 0
    
    while (np.linalg.norm(Q1-W1) > epsalon and np.linalg.norm(Q2-W2) > epsalon) and (itr != 100):
        W1 = Q1
        W2 = Q2
        
        for i in range(len(data)):
            a, h = forwardprop(data[i], Q1, Q2) # forward prope
            e1, e2 = backprop(target[i], h, Q2, a) # backpropergation
            grad1, grad2 = grad(data[i], e1, a, e2) # gradients
            
            # gradient decent update
            Q1 = W1 - (1/len(W1))*(alpha*(grad1) + regularizer(W1,lambda_))
            Q2 = W2 - (1/len(W2))*(alpha*(grad2) + regularizer(W2,lambda_))
            
        itr = itr + 1
        
    return [Q1, Q2, itr]

In [27]:
# drows the confusion matrix
def printConfusionMatrix(predicted_test):
    #find the confusion matrix for the model
    class_0_0 = 0 # in class 0 and classified in class 0
    class_0_1 = 0 # in class 0 and classified in class 1
    class_1_0 = 0 # in class 1 and classified in class 0
    class_1_1 = 0 # in class 1 and classified in class 1
    predicted = np.round(predicted_test)
    for i in range(len(predicted)):
        if(predicted[i] == 0 and test_targets[i] == 0):
            ++class_0_0
        if(predicted[i] == 0 and test_targets[i] == 0):
            class_0_0 += 1

        elif (predicted[i] == 0 and test_targets[i] == 1):
            class_0_1 += 1

        elif (predicted[i] == 1 and test_targets[i] == 0):
            class_1_0 += 1

        elif (predicted[i] == 1 and test_targets[i] == 1):
            class_1_1 += 1
        else:
            print("i couldint classify: ", y_predicted)

    print('       confusion Matrix        ')
    print('-------------------------------')
    print('%-s %-7s %-s %-5s %-s %-5s %-s' %('|',' ','|','class 0','|','class 1','|',))
    print('-------------------------------')
    print('%-s %-5s %-s %-7i %-s %-7i %-s' %('|','class 0','|',class_0_0,'|',class_0_1,'|'))
    print('-------------------------------')
    print('%-s %-5s %-s %-7i %-s %-7i %-s' %('|','class 1','|',class_1_0,'|',class_1_1,'|'))
    print('-------------------------------')
    accuracy = ((class_0_0 + class_1_1) / (class_0_0 + class_0_1 + class_1_0 + class_1_1)) * 100
    print('The model is', accuracy,'%', 'acurate')

#### Training the model on the training data

In [28]:

alpha = 0.001
epsalon = 0.0001
lambda_ = 0.1
W_t1, W_t2, count_t  = NNlearning(train_features, train_targets, Q1, Q2, alpha, epsalon, lambda_)


#### Tweeking the hypaparameter using validation data
1. Alpha 
2. lambda
3. epsalon

In [56]:
# 0.0001
# 0.001
# 0.1
alpha = 0.0001
epsalon = 0.001
lambda_ = 0.1
W_v1, W_v2, count_v  = NNlearning(validate_features, validate_targets, Q1, Q2, alpha, epsalon, lambda_)

# Predicting the Test Results


In [57]:
predicted_test1 = np.array([])
for i in range(len(test_features)):
    predicted_test1 = np.append(predicted_test1,forwardprop(test_features[i],W_t1,W_t2)[1])

# validation
predicted_test2 = np.array([])
for i in range(len(test_features)):
    predicted_test2 = np.append(predicted_test2,forwardprop(test_features[i],W_v1,W_v2)[1])

# Analyzing the Accuracy of the Model



#### confusion matrix

In [60]:
print("MODEL WITHOUT VALIDATED PARAMETRES")
printConfusionMatrix(predicted_test1)
print()
print("MODEL WITH VALIDATED PARAMETRES")
printConfusionMatrix(predicted_test2)

MODEL WITHOUT VALIDATED PARAMETRES
       confusion Matrix        
-------------------------------
|         | class 0 | class 1 |
-------------------------------
| class 0 | 1055    | 1060    |
-------------------------------
| class 1 | 0       | 0       |
-------------------------------
The model is 49.881796690307326 % acurate

MODEL WITH VALIDATED PARAMETRES
       confusion Matrix        
-------------------------------
|         | class 0 | class 1 |
-------------------------------
| class 0 | 0       | 0       |
-------------------------------
| class 1 | 1055    | 1060    |
-------------------------------
The model is 50.11820330969267 % acurate


# Visualizing Set Results

