## Data Set Information:
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

## Citation
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, In press, http://dx.doi.org/10.1016/j.dss.2014.03.001

## The benefits of using this tool
There are many clients to make the phone call to sell a term deposit. The full dataset has 41188 clients, but only 4640 client (11.27%) subscribed a term deposit. The telemarketers spent a lot of time to make phone calls but the success rate is only about 11.27%. That wasted a lot of money and time.
If we could predict which clients will subscribe a term deposit, the telemarketers could forcus on the clients who is predicted would subscribe a term deposit. That will increase the probobility of success for the marketing campaigns and in the mean time significantly reduce the phone call time. If on average the telemarketers will spend 5 minutes for one phone call, it will save about 3046 hours (36548*5/60).

In [123]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

In [124]:
pd.options.mode.chained_assignment = None  # default='warn'

In [125]:
# Download data without header
df = pd.read_csv("https://raw.githubusercontent.com/Yali20212021/Marketing---Classifier-TensorFlow-/main/bank-additional.csv",header=None)
df.head(5)

Unnamed: 0,0
0,"age;""job"";""marital"";""education"";""default"";""hou..."
1,"30;""blue-collar"";""married"";""basic.9y"";""no"";""ye..."
2,"39;""services"";""single"";""high.school"";""no"";""no""..."
3,"25;""services"";""married"";""high.school"";""no"";""ye..."
4,"38;""services"";""married"";""basic.9y"";""no"";""unkno..."


In [126]:
# split column
df1=df[0].str.split(';',expand=True)
df1.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,17,18,19,20
0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""",...,"""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""euribor3m""","""nr.employed""","""y"""
1,30,"""blue-collar""","""married""","""basic.9y""","""no""","""yes""","""no""","""cellular""","""may""","""fri""",...,2,999,0,"""nonexistent""",-1.8,92.893,-46.2,1.313,5099.1,"""no"""
2,39,"""services""","""single""","""high.school""","""no""","""no""","""no""","""telephone""","""may""","""fri""",...,4,999,0,"""nonexistent""",1.1,93.994,-36.4,4.855,5191,"""no"""
3,25,"""services""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jun""","""wed""",...,1,999,0,"""nonexistent""",1.4,94.465,-41.8,4.962,5228.1,"""no"""
4,38,"""services""","""married""","""basic.9y""","""no""","""unknown""","""unknown""","""telephone""","""jun""","""fri""",...,3,999,0,"""nonexistent""",1.4,94.465,-41.8,4.959,5228.1,"""no"""


In [127]:
# define column names
df1.columns = df1.iloc[0]
df1.head(5)

Unnamed: 0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""",...,"""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""euribor3m""","""nr.employed""","""y"""
0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""",...,"""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""euribor3m""","""nr.employed""","""y"""
1,30,"""blue-collar""","""married""","""basic.9y""","""no""","""yes""","""no""","""cellular""","""may""","""fri""",...,2,999,0,"""nonexistent""",-1.8,92.893,-46.2,1.313,5099.1,"""no"""
2,39,"""services""","""single""","""high.school""","""no""","""no""","""no""","""telephone""","""may""","""fri""",...,4,999,0,"""nonexistent""",1.1,93.994,-36.4,4.855,5191,"""no"""
3,25,"""services""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jun""","""wed""",...,1,999,0,"""nonexistent""",1.4,94.465,-41.8,4.962,5228.1,"""no"""
4,38,"""services""","""married""","""basic.9y""","""no""","""unknown""","""unknown""","""telephone""","""jun""","""fri""",...,3,999,0,"""nonexistent""",1.4,94.465,-41.8,4.959,5228.1,"""no"""


In [128]:
# Drop row 0
df2=df1.drop([0])
df2.head(5)

Unnamed: 0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""",...,"""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""euribor3m""","""nr.employed""","""y"""
1,30,"""blue-collar""","""married""","""basic.9y""","""no""","""yes""","""no""","""cellular""","""may""","""fri""",...,2,999,0,"""nonexistent""",-1.8,92.893,-46.2,1.313,5099.1,"""no"""
2,39,"""services""","""single""","""high.school""","""no""","""no""","""no""","""telephone""","""may""","""fri""",...,4,999,0,"""nonexistent""",1.1,93.994,-36.4,4.855,5191.0,"""no"""
3,25,"""services""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jun""","""wed""",...,1,999,0,"""nonexistent""",1.4,94.465,-41.8,4.962,5228.1,"""no"""
4,38,"""services""","""married""","""basic.9y""","""no""","""unknown""","""unknown""","""telephone""","""jun""","""fri""",...,3,999,0,"""nonexistent""",1.4,94.465,-41.8,4.959,5228.1,"""no"""
5,47,"""admin.""","""married""","""university.degree""","""no""","""yes""","""no""","""cellular""","""nov""","""mon""",...,1,999,0,"""nonexistent""",-0.1,93.2,-42.0,4.191,5195.8,"""no"""


In [129]:
#Check if there is null value
print(df2.isnull().sum())

0
age                 0
"job"               0
"marital"           0
"education"         0
"default"           0
"housing"           0
"loan"              0
"contact"           0
"month"             0
"day_of_week"       0
"duration"          0
"campaign"          0
"pdays"             0
"previous"          0
"poutcome"          0
"emp.var.rate"      0
"cons.price.idx"    0
"cons.conf.idx"     0
"euribor3m"         0
"nr.employed"       0
"y"                 0
dtype: int64


In [130]:
# Make sure data types are correct (no objects)

df2.dtypes

0
age                 object
"job"               object
"marital"           object
"education"         object
"default"           object
"housing"           object
"loan"              object
"contact"           object
"month"             object
"day_of_week"       object
"duration"          object
"campaign"          object
"pdays"             object
"previous"          object
"poutcome"          object
"emp.var.rate"      object
"cons.price.idx"    object
"cons.conf.idx"     object
"euribor3m"         object
"nr.employed"       object
"y"                 object
dtype: object

In [131]:
#Convert the datatype of some variables to numeric
df2[["age",'"campaign"','"pdays"','"previous"','"emp.var.rate"','"cons.price.idx"','"cons.conf.idx"','"euribor3m"','"nr.employed"']] = df2[["age",'"campaign"','"pdays"','"previous"','"emp.var.rate"','"cons.price.idx"','"cons.conf.idx"','"euribor3m"','"nr.employed"']].apply(pd.to_numeric)

In [132]:
df2.dtypes

0
age                   int64
"job"                object
"marital"            object
"education"          object
"default"            object
"housing"            object
"loan"               object
"contact"            object
"month"              object
"day_of_week"        object
"duration"           object
"campaign"            int64
"pdays"               int64
"previous"            int64
"poutcome"           object
"emp.var.rate"      float64
"cons.price.idx"    float64
"cons.conf.idx"     float64
"euribor3m"         float64
"nr.employed"       float64
"y"                  object
dtype: object

In [133]:
# drop"duration" column, because it is not suitable to include this column for prediction as described in the dataset
df3=df2.drop(['"duration"'], axis=1)

In [134]:
# Test multicollinearity between predictors. Correlations between numeric IVs < 0.8
df3.corr()

Unnamed: 0_level_0,age,"""campaign""","""pdays""","""previous""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""euribor3m""","""nr.employed"""
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
age,1.0,-0.014169,-0.043425,0.050931,-0.019192,-0.000482,0.098135,-0.015033,-0.041936
"""campaign""",-0.014169,1.0,0.058742,-0.09149,0.176079,0.145021,0.007882,0.159435,0.161037
"""pdays""",-0.043425,0.058742,1.0,-0.587941,0.270684,0.058472,-0.09209,0.301478,0.381983
"""previous""",0.050931,-0.09149,-0.587941,1.0,-0.415238,-0.164922,-0.05142,-0.458851,-0.514853
"""emp.var.rate""",-0.019192,0.176079,0.270684,-0.415238,1.0,0.755155,0.195022,0.970308,0.897173
"""cons.price.idx""",-0.000482,0.145021,0.058472,-0.164922,0.755155,1.0,0.045835,0.657159,0.47256
"""cons.conf.idx""",0.098135,0.007882,-0.09209,-0.05142,0.195022,0.045835,1.0,0.276595,0.107054
"""euribor3m""",-0.015033,0.159435,0.301478,-0.458851,0.970308,0.657159,0.276595,1.0,0.942589
"""nr.employed""",-0.041936,0.161037,0.381983,-0.514853,0.897173,0.47256,0.107054,0.942589,1.0


In [135]:
# drop"euribor3m" and "nr.employed" columns, because they are highly correlated with "emp.var.rate"	
df5=df3.drop(['"euribor3m"','"nr.employed"'], axis=1)

In [136]:
df5

Unnamed: 0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""","""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""y"""
1,30,"""blue-collar""","""married""","""basic.9y""","""no""","""yes""","""no""","""cellular""","""may""","""fri""",2,999,0,"""nonexistent""",-1.8,92.893,-46.2,"""no"""
2,39,"""services""","""single""","""high.school""","""no""","""no""","""no""","""telephone""","""may""","""fri""",4,999,0,"""nonexistent""",1.1,93.994,-36.4,"""no"""
3,25,"""services""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jun""","""wed""",1,999,0,"""nonexistent""",1.4,94.465,-41.8,"""no"""
4,38,"""services""","""married""","""basic.9y""","""no""","""unknown""","""unknown""","""telephone""","""jun""","""fri""",3,999,0,"""nonexistent""",1.4,94.465,-41.8,"""no"""
5,47,"""admin.""","""married""","""university.degree""","""no""","""yes""","""no""","""cellular""","""nov""","""mon""",1,999,0,"""nonexistent""",-0.1,93.200,-42.0,"""no"""
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4115,30,"""admin.""","""married""","""basic.6y""","""no""","""yes""","""yes""","""cellular""","""jul""","""thu""",1,999,0,"""nonexistent""",1.4,93.918,-42.7,"""no"""
4116,39,"""admin.""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jul""","""fri""",1,999,0,"""nonexistent""",1.4,93.918,-42.7,"""no"""
4117,27,"""student""","""single""","""high.school""","""no""","""no""","""no""","""cellular""","""may""","""mon""",2,999,1,"""failure""",-1.8,92.893,-46.2,"""no"""
4118,58,"""admin.""","""married""","""high.school""","""no""","""no""","""no""","""cellular""","""aug""","""fri""",1,999,0,"""nonexistent""",1.4,93.444,-36.1,"""no"""


In [137]:
df5.describe()

Unnamed: 0,age,"""campaign""","""pdays""","""previous""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx"""
count,4119.0,4119.0,4119.0,4119.0,4119.0,4119.0,4119.0
mean,40.11362,2.537266,960.42219,0.190337,0.084972,93.579704,-40.499102
std,10.313362,2.568159,191.922786,0.541788,1.563114,0.579349,4.594578
min,18.0,1.0,0.0,0.0,-3.4,92.201,-50.8
25%,32.0,1.0,999.0,0.0,-1.8,93.075,-42.7
50%,38.0,2.0,999.0,0.0,1.1,93.749,-41.8
75%,47.0,3.0,999.0,0.0,1.4,93.994,-36.4
max,88.0,35.0,999.0,6.0,1.4,94.767,-26.9


In [138]:
df5.corr()

Unnamed: 0_level_0,age,"""campaign""","""pdays""","""previous""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx"""
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
age,1.0,-0.014169,-0.043425,0.050931,-0.019192,-0.000482,0.098135
"""campaign""",-0.014169,1.0,0.058742,-0.09149,0.176079,0.145021,0.007882
"""pdays""",-0.043425,0.058742,1.0,-0.587941,0.270684,0.058472,-0.09209
"""previous""",0.050931,-0.09149,-0.587941,1.0,-0.415238,-0.164922,-0.05142
"""emp.var.rate""",-0.019192,0.176079,0.270684,-0.415238,1.0,0.755155,0.195022
"""cons.price.idx""",-0.000482,0.145021,0.058472,-0.164922,0.755155,1.0,0.045835
"""cons.conf.idx""",0.098135,0.007882,-0.09209,-0.05142,0.195022,0.045835,1.0


In [139]:
df5['"y"'].value_counts()

"no"     3668
"yes"     451
Name: "y", dtype: int64

In [140]:
# Convert output variable to binary variable
df5['"y"'] = [1 if status == '"yes"' else 0 for status in
df5['"y"']]

In [141]:
df5

Unnamed: 0,age,"""job""","""marital""","""education""","""default""","""housing""","""loan""","""contact""","""month""","""day_of_week""","""campaign""","""pdays""","""previous""","""poutcome""","""emp.var.rate""","""cons.price.idx""","""cons.conf.idx""","""y"""
1,30,"""blue-collar""","""married""","""basic.9y""","""no""","""yes""","""no""","""cellular""","""may""","""fri""",2,999,0,"""nonexistent""",-1.8,92.893,-46.2,0
2,39,"""services""","""single""","""high.school""","""no""","""no""","""no""","""telephone""","""may""","""fri""",4,999,0,"""nonexistent""",1.1,93.994,-36.4,0
3,25,"""services""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jun""","""wed""",1,999,0,"""nonexistent""",1.4,94.465,-41.8,0
4,38,"""services""","""married""","""basic.9y""","""no""","""unknown""","""unknown""","""telephone""","""jun""","""fri""",3,999,0,"""nonexistent""",1.4,94.465,-41.8,0
5,47,"""admin.""","""married""","""university.degree""","""no""","""yes""","""no""","""cellular""","""nov""","""mon""",1,999,0,"""nonexistent""",-0.1,93.200,-42.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4115,30,"""admin.""","""married""","""basic.6y""","""no""","""yes""","""yes""","""cellular""","""jul""","""thu""",1,999,0,"""nonexistent""",1.4,93.918,-42.7,0
4116,39,"""admin.""","""married""","""high.school""","""no""","""yes""","""no""","""telephone""","""jul""","""fri""",1,999,0,"""nonexistent""",1.4,93.918,-42.7,0
4117,27,"""student""","""single""","""high.school""","""no""","""no""","""no""","""cellular""","""may""","""mon""",2,999,1,"""failure""",-1.8,92.893,-46.2,0
4118,58,"""admin.""","""married""","""high.school""","""no""","""no""","""no""","""cellular""","""aug""","""fri""",1,999,0,"""nonexistent""",1.4,93.444,-36.1,0


In [144]:
df5['"y"'].value_counts()

0    3668
1     451
Name: "y", dtype: int64

In [145]:
df5.dtypes

0
age                   int64
"job"                object
"marital"            object
"education"          object
"default"            object
"housing"            object
"loan"               object
"contact"            object
"month"              object
"day_of_week"        object
"campaign"            int64
"pdays"               int64
"previous"            int64
"poutcome"           object
"emp.var.rate"      float64
"cons.price.idx"    float64
"cons.conf.idx"     float64
"y"                   int64
dtype: object

In [146]:
# Will use categorical predictors to predict
# Categorical predictors
catpredictors = ['"job"', '"marital"', '"education"', '"default"', '"housing"', '"loan"','"contact"','"month"','"day_of_week"','"poutcome"']
catpredictors

['"job"',
 '"marital"',
 '"education"',
 '"default"',
 '"housing"',
 '"loan"',
 '"contact"',
 '"month"',
 '"day_of_week"',
 '"poutcome"']

In [147]:
# Convert Categorical features to Dummies
Dummy_Categorical_features = pd.get_dummies(df5[catpredictors], drop_first=True)
Dummy_Categorical_features

Unnamed: 0,"""job""_""blue-collar""","""job""_""entrepreneur""","""job""_""housemaid""","""job""_""management""","""job""_""retired""","""job""_""self-employed""","""job""_""services""","""job""_""student""","""job""_""technician""","""job""_""unemployed""",...,"""month""_""may""","""month""_""nov""","""month""_""oct""","""month""_""sep""","""day_of_week""_""mon""","""day_of_week""_""thu""","""day_of_week""_""tue""","""day_of_week""_""wed""","""poutcome""_""nonexistent""","""poutcome""_""success"""
1,1,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,1,0,0,0,...,1,0,0,0,0,0,0,0,1,0
3,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,1,1,0
4,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
5,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,1,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4115,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
4116,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4117,0,0,0,0,0,0,0,1,0,0,...,1,0,0,0,1,0,0,0,0,0
4118,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [148]:
# Pop label

target = df5.pop('"y"')

In [104]:
target

1       0
2       0
3       0
4       0
5       0
       ..
4115    0
4116    0
4117    0
4118    0
4119    0
Name: "y", Length: 4119, dtype: int64

In [149]:
# split into training and test
train_Dummy_Categorical_features, test_Dummy_Categorical_features, train_target, test_target = train_test_split(Dummy_Categorical_features, target, test_size=0.4,
random_state=1)

In [150]:
# check training and validation dataset
len(df5)

4119

In [151]:
len(train_Dummy_Categorical_features)

2471

In [152]:
len(test_Dummy_Categorical_features)

1648

In [153]:
len(train_target)

2471

In [154]:
len(test_target)

1648

In [155]:
# Convert data to Tensorflow tensor

train_Dummy_Categorical_features = tf.convert_to_tensor(train_Dummy_Categorical_features)
train_Dummy_Categorical_features

<tf.Tensor: shape=(2471, 43), dtype=uint8, numpy=
array([[1, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 1, 0],
       ...,
       [0, 0, 0, ..., 0, 1, 0],
       [1, 0, 0, ..., 0, 1, 0],
       [1, 0, 0, ..., 0, 1, 0]], dtype=uint8)>

In [156]:
test_Dummy_Categorical_features = tf.convert_to_tensor(test_Dummy_Categorical_features)
test_Dummy_Categorical_features

<tf.Tensor: shape=(1648, 43), dtype=uint8, numpy=
array([[1, 0, 0, ..., 1, 1, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1],
       ...,
       [1, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 1, 1, 0],
       [0, 0, 0, ..., 0, 1, 0]], dtype=uint8)>

In [157]:
# Normalize data

train_normalizer = tf.keras.layers.Normalization(axis=-1)
train_normalizer.adapt(train_Dummy_Categorical_features)

In [158]:
test_normalizer = tf.keras.layers.Normalization(axis=-1)
test_normalizer.adapt(test_Dummy_Categorical_features)

In [159]:
train_normalizer

<keras.layers.preprocessing.normalization.Normalization at 0x7faac0eb82d0>

In [160]:
# Define model

def get_basic_model():
  model = tf.keras.Sequential([
    train_normalizer,
    tf.keras.layers.Dense(3, activation='relu'),
   # tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1) # Possible labels 0, 1
  ])

  model.compile(optimizer='adam',
                # loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                 loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model

In [161]:
# Train model

BATCH_SIZE = 128

model = get_basic_model()
model.fit(Dummy_Categorical_features, target, epochs=500, batch_size=BATCH_SIZE)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.callbacks.History at 0x7faac0d74450>

## Evaluating of loss and accuracy
About loss and accuracy, for now, I only used 1 hidden layer with 3 neurons, the loss and accuracy is not that bad, I could try to increase hidden layers and neurons to get lower loss and higher accuracy. 
At the end of the 500 epoches, the loss is slowly decrease, and the accuracy is slowly increase, maybe I could try to increase more epoches to see if the loss will continuely decrease and the accuracy will continuely increase. 
And BATCH_SIZE is not the bigger the better, bigger BATCH_SIZE may have lower accuracy. 

In [162]:
# Evaluate test data based on training data

score = model.evaluate(test_Dummy_Categorical_features, test_target, verbose=1)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Test loss: 0.26409968733787537 / Test accuracy: 0.9065533876419067


## Checking of overfitting
The accuracy of the test dataset is almost the same as the accuracy of the training dataset, there is no overfitting problem, and the model works well on the test dataset

In [163]:
# Save model (optional)

# model.save('iris_model')

In [164]:
# Download cases to be predicted

df_predict = pd.read_csv("https://raw.githubusercontent.com/Yali20212021/Marketing---Classifier-TensorFlow-/main/bank_dummy_categorical_predict.csv")
df_predict

Unnamed: 0,"""job""_""blue-collar""","""job""_""entrepreneur""","""job""_""housemaid""","""job""_""management""","""job""_""retired""","""job""_""self-employed""","""job""_""services""","""job""_""student""","""job""_""technician""","""job""_""unemployed""",...,"""month""_""may""","""month""_""nov""","""month""_""oct""","""month""_""sep""","""day_of_week""_""mon""","""day_of_week""_""thu""","""day_of_week""_""tue""","""day_of_week""_""wed""","""poutcome""_""nonexistent""","""poutcome""_""success"""
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
1,0,0,0,0,0,1,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,1,1,0


In [165]:
# Check data type
df_predict.dtypes

"job"_"blue-collar"                  int64
"job"_"entrepreneur"                 int64
"job"_"housemaid"                    int64
"job"_"management"                   int64
"job"_"retired"                      int64
"job"_"self-employed"                int64
"job"_"services"                     int64
"job"_"student"                      int64
"job"_"technician"                   int64
"job"_"unemployed"                   int64
"job"_"unknown"                      int64
"marital"_"married"                  int64
"marital"_"single"                   int64
"marital"_"unknown"                  int64
"education"_"basic.6y"               int64
"education"_"basic.9y"               int64
"education"_"high.school"            int64
"education"_"illiterate"             int64
"education"_"professional.course"    int64
"education"_"university.degree"      int64
"education"_"unknown"                int64
"default"_"unknown"                  int64
"default"_"yes"                      int64
"housing"_"

In [166]:
# Convert data to Tensorflow tensor

predict_categrical_features = tf.convert_to_tensor(df_predict)
predict_categrical_features

<tf.Tensor: shape=(3, 43), dtype=int64, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]])>

In [167]:
# Normalize

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(predict_categrical_features)

In [168]:
# Predict labels

class_names = ['yes', 'no']

predictions = model(predict_categrical_features, training=False)

# Create new columns in dataframe
df_predict['label'] = None
df_predict['certainty'] = None

for i, logits in enumerate(predictions):
  class_idx = tf.argmax(logits).numpy()
  p = tf.nn.softmax(logits)[class_idx]
  name = class_names[class_idx]
  print("Example {} prediction: {} ({:4.1f}%)".format(i, name, 100*p))

  # Save predictions to dataframe
  df_predict["label"].iloc[i] = name
  df_predict['certainty'].iloc[i] = format(p)


Example 0 prediction: yes (100.0%)
Example 1 prediction: yes (100.0%)
Example 2 prediction: yes (100.0%)


In [169]:
df_predict

Unnamed: 0,"""job""_""blue-collar""","""job""_""entrepreneur""","""job""_""housemaid""","""job""_""management""","""job""_""retired""","""job""_""self-employed""","""job""_""services""","""job""_""student""","""job""_""technician""","""job""_""unemployed""",...,"""month""_""oct""","""month""_""sep""","""day_of_week""_""mon""","""day_of_week""_""thu""","""day_of_week""_""tue""","""day_of_week""_""wed""","""poutcome""_""nonexistent""","""poutcome""_""success""",label,certainty
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,yes,1.0
1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,yes,1.0
2,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,1,0,yes,1.0
