# Bank Marketing

## Overview

The  data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.  

We chose this dataset because its big enough, so when we split it into train, validation and test, these subsets will be relatively big for our algorithms to learn well increasing their accuracy and reliability also because Marketing is a very crucial part of every business, so knowing how to win a customer sounded very interesting to us since we are anticipating on changing the world in the business field in future.

## Dataset Overview
### The Dataset has 20 Input Variables  (Features):

**#bank client data:**  
1 - _Age_ (numeric)  
2 - _Job_ **:** type of job (categorical: 'admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown')  
3 - _Marital_ **:** marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)  
4 - _Education_ (categorical:   'basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate', 'professional.course', 'university.degree', 'unknown')  
5 - _Default_ **:** has credit in default? (categorical: 'no', 'yes', 'unknown')
6 - _Housing_ **:** has housing loan? (categorical: 'no', 'yes', 'unknown')  
7 - _Loan_ **:** has personal loan? (categorical: 'no', 'yes', 'unknown')  

**#related with the last contact of the current campaign:**  
8 - _Contact_ **:** contact communication type (categorical: 'cellular', 'telephone')  
9 - _Month_ **:** last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')  
10 - _Day_of_week_ **:** last contact day of the week (categorical: 'mon', 'tue', 'wed', 'thu', 'fri')  
11 - _Duration_ **:** last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.  

**#other attributes:**  
12 - _Campaign_ **:** number of contacts performed during this campaign and for this client (numeric, includes last contact)  
13 - _pDays_ **:** number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)  
14 - _Previous_ **:** number of contacts performed before this campaign and for this client (numeric)  
15 - _pOutcome_ **:** outcome of the previous marketing campaign (categorical: 'failure', 'nonexistent', 'success')  

**#social and economic context attributes**  
16 - _emp.var.rate_ **:** employment variation rate - quarterly indicator (numeric)  
17 - _cons.price.idx_ **:** consumer price index - monthly indicator (numeric)  
18 - _cons.conf.idx_ **:** consumer confidence index - monthly indicator (numeric)  
19 - _Euribor3m_ **:** euribor 3 month rate - daily indicator (numeric)  
20 - _nr.employed_ **:** number of employees - quarterly indicator (numeric)  

### Output Variable (Desired Target):
y - has the client subscribed a term deposit? (binary: 'yes', 'no')

## Contributors:
* Phillip Moyo – 2185695   
* Moshito Charles Makgakga – 1445435   
* Godfrey T Chamunogwa – 2234379
* Fankholoro Vincent Sebothoma – 1671848   

# Decision Trees Model

## Import Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder,MinMaxScaler

## Import the Dataset


In [2]:
df = pd.read_csv('bank-full.csv', sep=";")

## Trimming the Data
The Data is heavily biased, so we are trimming it to make the different output classes more even (unbiased)

In [3]:
df_yes = df[df['y']=='yes']
df_no = df[df['y']=='no']
df_no = df_no.iloc[:5289, :]

df = pd.concat([df_yes, df_no])
df = df.sample(frac=1).reset_index(drop=True)       #shuffle the rows
df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,34,blue-collar,married,secondary,no,243,yes,no,cellular,18,may,507,1,-1,0,unknown,yes
1,36,technician,single,secondary,no,1053,yes,no,unknown,21,may,147,1,-1,0,unknown,no
2,31,admin.,single,secondary,no,150,yes,yes,unknown,13,may,431,2,-1,0,unknown,no
3,32,management,single,tertiary,no,-345,yes,no,unknown,21,may,74,1,-1,0,unknown,no
4,44,management,married,tertiary,no,309,yes,no,cellular,5,feb,676,1,-1,0,unknown,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10573,38,management,married,unknown,no,1759,yes,no,unknown,6,may,440,1,-1,0,unknown,no
10574,27,technician,single,tertiary,no,931,yes,no,cellular,4,feb,1078,1,-1,0,unknown,yes
10575,37,technician,married,unknown,no,189,no,no,cellular,1,oct,238,1,107,2,success,yes
10576,25,student,single,secondary,no,1957,no,no,cellular,24,jun,1207,4,385,1,failure,yes


## Feature Scaling

In [4]:
##--scaling column(y)=> 'yes'=1 and 'no'=0 also there are no null values in our dataset---##
y = LabelEncoder()
df.iloc[:,-1] = y.fit_transform(df.iloc[:,-1])

In [5]:
##--scaling column(poutcome)=> 'failure'=0, 'other'=1, 'success'=2, 'unknown'=3
poutcome = LabelEncoder()
df.iloc[:,-2] = poutcome.fit_transform(df.iloc[:,-2])

In [6]:
##--scaling column(contact)=> 'cellular'=0, 'telephone'=1, 'unknown'=2
contact = LabelEncoder()
df.iloc[:,8] = contact.fit_transform(df.iloc[:,8])

In [7]:
##--scaling column(marital)=> 'married'=1, 'divorced'=0, 'single'=2
marital = LabelEncoder()
df.iloc[:,2] = marital.fit_transform(df.iloc[:,2])

In [8]:
##--scaling column(education)=> 'primary'=0, 'secondary'=1, 'tertiary'=2, 'unknown'=3
education = LabelEncoder()
df.iloc[:,3] = education.fit_transform(df.iloc[:,3])

In [9]:
##--scaling column(default)=> 'yes'=1, 'no'=0'
default = LabelEncoder()
df.iloc[:,4] = default.fit_transform(df.iloc[:,4])

In [10]:
##--scaling column(housing)=> 'yes'=1, 'no'=0'
housing = LabelEncoder()
df.iloc[:,6] = housing.fit_transform(df.iloc[:,6])

In [11]:
##--scaling column(loan)=> 'yes'=1, 'no'=0'
loan = LabelEncoder()
df.iloc[:,7] = loan.fit_transform(df.iloc[:,7])

In [12]:
##--scaling column(month)=> 'jan'=1,'feb'=2, 'mar'=3,...,'dec'=12
month = df.iloc[:,10].replace(['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'],[1,2,3,4,5,6,7,8,9,10,11,12])
df.iloc[:,10] = month 

In [13]:
##--scaling column(job)
job = df.iloc[:,1].replace(['blue-collar', 'admin.', 'technician', 'management', 'retired','student', 'entrepreneur', 'services', 'self-employed','unemployed', 'housemaid', 'unknown'],[1,2,3,4,5,6,7,8,9,10,11,12])
df.iloc[:,1] = job 

In [14]:
# # add bias column of 1's
# df.insert(0, 'bias', np.ones(df.shape[0]))
df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,34,1,1,1,0,243,1,0,0,18,5,507,1,-1,0,3,1
1,36,3,2,1,0,1053,1,0,2,21,5,147,1,-1,0,3,0
2,31,2,2,1,0,150,1,1,2,13,5,431,2,-1,0,3,0
3,32,4,2,2,0,-345,1,0,2,21,5,74,1,-1,0,3,0
4,44,4,1,2,0,309,1,0,0,5,2,676,1,-1,0,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10573,38,4,1,3,0,1759,1,0,2,6,5,440,1,-1,0,3,0
10574,27,3,2,2,0,931,1,0,0,4,2,1078,1,-1,0,3,1
10575,37,3,1,3,0,189,0,0,0,1,10,238,1,107,2,2,1
10576,25,6,2,1,0,1957,0,0,0,24,6,1207,4,385,1,0,1


In [15]:
# ## scale the features
# scaler = MinMaxScaler()
# columns = ['age', 'job', 'marital', 'education', 'default', 'balance', 'housing', 'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays', 'previous', 'poutcome'] 
# df[columns] = scaler.fit_transform(df[columns])
# df

## Splitting the Dataset into Training, Validation and Test set

#### Training data (60% of the data)

In [16]:
# trining dataset
train_data = df.iloc[:6347:]

# training features
train_features = train_data.iloc[:,:-1].values

# training targets
train_targets = train_data.iloc[:,-1].values


#### Validation data (20% of the data)

In [17]:
# validation dataset
validate_data = df.iloc[6347:8463:]

# validation features
validate_features = validate_data.iloc[:,:-1].values

# validation targets
validate_targets = validate_data.iloc[:,-1].values


#### Testing data (20% of the data)

In [18]:
# testing dataset
test_data = df.iloc[8463::]

# testing features
test_features = test_data.iloc[:,:-1].values

# testing targets
test_targets = test_data.iloc[:,-1].values

## Training the Decision Trees Model on the Training Set

#### Calculate Probability

In [19]:
def calc_probability(target,targets):
    count = 0
    for i in targets:
        if i== target:
            count += 1
    if len(targets)!=0:
        return count/len(targets)
    else:
        return 0
# calc_probability(1,train_targets)

#### Calculating Entropy

In [20]:
#probability of class0 = prob0
def calc_entropy(prob0,prob1):
    if prob0==0 or prob1==0:
        return 0
    if prob0>1 or prob0<0 or prob1>1 or prob1<0:
        return 'error probability'
    return -(prob0*np.log2(prob0) + prob1*np.log2(prob1))

#### Calculate Gain

In [21]:
def calc_gain(indexf,data):
    f1 = data[:,indexf] 
    unique,counts = np.unique(f1,return_counts=True)

    #feature_entropies =[]
    fval_entropy = {}
    for val in range(len(unique)):
        fval1 = data[data[:,indexf]==unique[val]] #assumes discrete values for the feature
        prob0 = calc_probability(0,fval1[:,-1])
        prob1 = 1-prob0
        fval1_len = len(fval1)
        #feature_entropies.append(calc_entropy(prob0,prob1))
        fval_entropy[unique[val]] = calc_entropy(prob0,prob1)

    sum_entropy_len = 0
    for j in range(len(unique)):
        sum_entropy_len += fval_entropy[unique[j]] * counts[j]
        #print(unique[j],fval_entropy[unique[j]],counts[j])

    entropy = calc_entropy(calc_probability(0,data[:,-1]),calc_probability(1,data[:,-1]))
    length = len(data[:,-1])
    
    if length!=0:
        gain_f1 =  entropy - (1/length) * sum_entropy_len
    else:
        gain_f1 = 0
    
    return gain_f1

#### Get Feature with maximum gain

In [22]:
def best_feature(dataset):
    feature_index = 0
    gainf = calc_gain(feature_index,dataset)
    
    col = len(dataset[0])-1
    for indexf in range(1,col):
        
        if(calc_gain(indexf,dataset) > gainf):
            feature_index = indexf
            gainf = calc_gain(indexf,dataset)
    return feature_index, gainf
    

#### Get the most probable class

In [23]:
def get_class(classes):
    unique,counts = np.unique(classes,return_counts=True)
    
    clss =  unique[0]
    count=  counts[0]
    for i in range(1,len(unique)):
        if counts[i] > count:
            count = counts[i]
            clss  = unique[i]
    return clss

### Feature, TreeNode, Tree objects

In [24]:
class Feature:
    def __init__(self,name,values,gain,clss):
        self.name=name
        self.gain=gain
        self.values=values
        self.clss=clss

In [25]:
class TreeNode:
    def __init__(self,feature):
        self.feature=feature
        self.children=[]
        self.childnames=[]
        self.parent = None
        self.parent_edge = None
        self.isLeaf = False
        
        
    def add_child(self, child,parent_edge):
        if child.feature.name == self.feature.name:
            pass
        elif child.feature.name not in self.childnames:

            child.parent = self
            child.parent_edge = parent_edge
            self.children.append(child)
            self.childnames.append(child.feature.name)
                
        
    def get_parent(self):
        return self.parent.feature.name
    
    def get_level(self):
        level=0
        p=self.parent
        while p:
            level+=1
            p=p.parent
        return level
    
    def display(self):
        spaces = ' '*self.get_level()*4
        prefix_node = spaces+'|___' if self.parent else ""
        prefix = spaces+'      ' if self.parent else ""
        print(prefix_node + '\033[1;31;36m'+self.feature.name+' \033[0;0m')
        #print('\033[2;31;43m'+self.feature.name+' \033[0;0m')
        print(prefix+ 'gain', self.feature.gain)
        #print(prefix+ 'values', len(self.feature.values))
        print(prefix+ '\033[1;31;33m class', self.feature.clss,' \033[0;0m')
        
        if self.parent != None:
            print(prefix + 'parent =',self.get_parent())
            print(prefix + 'parent_edge =',self.parent_edge)
        if self.isLeaf:
            print(prefix + 'leaf node')
        if self.children:
            #children=[]
            #for i in range(len(self.children)):
                #children.append(self.children[i].feature.name)
            #print(prefix+ 'children =',self.childnames)
            for child in self.children:
                child.display()

In [26]:
class Tree:
    def __init__(self,root):
        self.root = root
        
    def join(self,tree):
        for child in self.root.children:
            if(child.isLeaf == False):
                child.add_child(tree.root,child.feature.values[0])
        
    def display(self):
        self.root.display()  

### The Decision Tree

In [27]:
class DecisionTreeClassifier:
    
    #the function gets the dataset and builds a branch whose root is the best feature of the very dataset
    def branch(self,the_data,dataframe):
        #1 best feature
        feature_index, gainf = best_feature(the_data)

        feature= Feature(dataframe.columns[feature_index],np.unique(the_data[:,feature_index]),gainf,get_class(the_data[:,-1]))
        root = TreeNode(feature)
        #2 create branches on the best feature
        for value in root.feature.values:

            n_data = the_data[the_data[:,feature_index]==value]

            f_index, f_gain = best_feature(n_data)

            ft= Feature(dataframe.columns[f_index],np.unique(n_data[:,f_index]),f_gain,get_class(n_data[:,-1]))       

            child = TreeNode(ft)
            root.add_child(child,value)


        #
        for node in root.children:
            if node.feature.gain <=0.3 and not node.children:
                node.isLeaf = True

        return root,feature_index
    
    def fit(self,the_data,dataframe):
        #ID3 Algorithm
        root,col = self.branch(the_data,dataframe)
        tree = Tree(root)

        while len(the_data[0])>2:
            the_data = np.delete(the_data,np.s_[col],1)
            dataframe = dataframe.drop(dataframe.columns[col],axis=1)
            rt,col = self.branch(the_data,dataframe)
            tr = Tree(rt)
            tree.join(tr)

        return tree
    
    def classifier(self,data_point):
        clss = None

        if(data_point[8]==2):
            clss=0
        else:
            clss=1

        return clss
    
    def predict(self,the_data):
        length = len(the_data)
        results =  np.empty([length,2])

        for i in range(length):
            results[i][0]= the_data[i,-1] #actual target
            results[i][1]= self.classifier(the_data[i])#predicted target
        return results


### The model on Validation data

In [28]:
dtree = DecisionTreeClassifier()
tree = dtree.fit(np.array(train_data),train_data)
#tree.display()

In [29]:
validation_results = dtree.predict(np.array(validate_data))

### Predicting the Test Results


In [30]:
test_results = dtree.predict(np.array(test_data))

### Analyzing the Accuracy of the Model



In [31]:
def calc_error(results):
    
    uni,ct =  np.unique(results[:,0]==results[:,1],return_counts=True)
    return ct[0]/len(results)


def metrics(results):
    count00 = 0
    count11 = 0
    count01= 0 #predicted 1 but actual 0
    count10= 0
    
    for i in range(len(results)):
        if results[i,0]== 0 and results[i,1]==0:
            count00 +=1
        elif results[i,0]== 1 and results[i,1]==1:
            count11 +=1
        elif results[i,0]== 0 and results[i,1]==1:
            count01 +=1
        elif results[i,0]== 1 and results[i,1]==0:
            count10 +=1
            
    classification_error = (count10+count01)/len(results)
    accuracy = 1 - classification_error
    false_alarm = count01/(count01+ count11)
    miss = count10/(count10 + count00)
    recall = 1 - miss
    precision = count00/(count00+count01)
    
    return np.array([[count00,count10],[count01,count11]]),classification_error,accuracy,false_alarm,miss,recall,precision

#### Error on the training data

In [32]:
training_error = calc_error(dtree.predict(np.array(train_data)))
training_error

0.0499448558374035

#### Error on the validation data

In [33]:
validation_error = calc_error(dtree.predict(np.array(validate_data)))
validation_error

0.04962192816635161

#### Error on the testing data

In [34]:
test_error = calc_error(dtree.predict(np.array(test_data)))
test_error

0.05106382978723404

#### confusion matrix

In [35]:
cm,c_error,accuracy,false_alarm,miss,recall,precision = metrics(test_results)
    
print('Confusion Matrix')
print('               actual')
print('             |   0   |    1 |')
print('-----------------------------')
print('predicted: 0 |  {0}  |   {1}  |'.format(cm[0,0],cm[0,1]))
print('-----------------------------')
print('predicted: 1 |  {0}  |   {1}  |'.format(cm[1,0],cm[1,1]))

print('classification error:',c_error)
print('accuracy:            ',accuracy)
print('false_alarm:         ',false_alarm)
print('miss:                ',miss)
print('recall:              ',recall)
print('precision:           ',precision)

Confusion Matrix
               actual
             |   0   |    1 |
-----------------------------
predicted: 0 |  1022  |   108  |
-----------------------------
predicted: 1 |  0  |   985  |
classification error: 0.05106382978723404
accuracy:             0.948936170212766
false_alarm:          0.0
miss:                 0.09557522123893805
recall:               0.904424778761062
precision:            1.0


## Visualizing Set Results



In [36]:
#decision tree
tree.display()  

[1;31;36mcontact [0;0m
gain 0.7581862606860327
[1;31;33m class 0  [0;0m
    |___[1;31;36mage [0;0m
          gain 0.0
          [1;31;33m class 1  [0;0m
          parent = contact
          parent_edge = 0
          leaf node
    |___[1;31;36mduration [0;0m
          gain 0.3477523712672367
          [1;31;33m class 0  [0;0m
          parent = contact
          parent_edge = 2
        |___[1;31;36mmonth [0;0m
              gain 0.6574826492377019
              [1;31;33m class 0  [0;0m
              parent = duration
              parent_edge = 2
            |___[1;31;36mage [0;0m
                  gain 0.0
                  [1;31;33m class 1  [0;0m
                  parent = month
                  parent_edge = 1
                  leaf node
            |___[1;31;36mduration [0;0m
                  gain 0.3896580364124955
                  [1;31;33m class 0  [0;0m
                  parent = month
                  parent_edge = 5
        |___[1;31;36mbalance [