Bank Marketing
Abstract:
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Data Set Information:
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

Attribute Information:
Bank client data:
Age (numeric)
Job : type of job (categorical: 'admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown')

Marital : marital status (categorical: 'divorced', 'married', 'single', 'unknown' ; note: 'divorced' means divorced or widowed)

Education (categorical: 'basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate', 'professional.course', 'university.degree', 'unknown')

Default: has credit in default? (categorical: 'no', 'yes', 'unknown')

Housing: has housing loan? (categorical: 'no', 'yes', 'unknown')

Loan: has personal loan? (categorical: 'no', 'yes', 'unknown')

Related with the last contact of the current campaign:

Contact: contact communication type (categorical:
'cellular','telephone')

Month: last contact month of year (categorical: 'jan', 'feb', 'mar',
…, 'nov', 'dec')

Day_of_week: last contact day of the week (categorical:
'mon','tue','wed','thu','fri')

Duration: last contact duration, in seconds (numeric). Important
note: this attribute highly affects the output target (e.g., if
duration=0 then y='no'). Yet, the duration is not known before a call
is performed. Also, after the end of the call y is obviously known.
Thus, this input should only be included for benchmark purposes and
should be discarded if the intention is to have a realistic
predictive model.

Other attributes:

Campaign: number of contacts performed during this campaign and for
this client (numeric, includes last contact)

Pdays: number of days that passed by after the client was last
contacted from a previous campaign (numeric; 999 means client was not
previously contacted)

Previous: number of contacts performed before this campaign and for
this client (numeric)

Poutcome: outcome of the previous marketing campaign (categorical:
'failure','nonexistent','success')

Social and economic context attributes
Emp.var.rate: employment variation rate - quarterly indicator
(numeric)

Cons.price.idx: consumer price index - monthly indicator (numeric)

Cons.conf.idx: consumer confidence index - monthly indicator
(numeric)

Euribor3m: euribor 3 month rate - daily indicator (numeric)
Nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):
y - has the client subscribed a term deposit? (binary: 'yes', 'no')

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

<h1> EDA <h1>

In [None]:
df = pd.read_csv("bank.csv",delimiter=";")
print(df.shape)

df.head()

In [None]:
df.info()

In [None]:
#know number of column NAN
df.isna().sum()

In [None]:
#influnce the age by the target y
fig = px.histogram(df, x="age",title="frequence the age client of this data", nbins=10)
#fig.update(xaxis_title = "Age",yaxis_title = "Frequence")
fig.update_xaxes(title_text="Age")
fig.update_yaxes(title_text="Frequency")
fig.show()
#the most client for this compaign between 30 and 50 years old

In [None]:
#modeifier column age
df_age = pd.DataFrame()
df_age["Age"] = df["age"].copy()
df_age["y"] = df["y"].copy()
for i,k in zip(df_age["Age"].index,df_age["Age"].values):
    if k <= 30 :
        df_age["Age"][i] = "<30 y_old"
    if 30 <= k < 40 :
        df_age["Age"][i] = "[30-40[ y_old"  
    if 40 <= k < 50 :
        df_age["Age"][i] = "[40-50[ y_old"
    if 50 <= k < 60 :
        df_age["Age"][i] = "[50-60[ y_old"
    if k >= 60 :
        df_age["Age"][i] = ">60 y_old"          
 
 #we see here influence the age for target
fig = px.bar(data_frame=df_age,x="Age",color="y",title=" influence Age about target", barmode='group')
fig.update_xaxes(title_text="Age")
fig.update_yaxes(title_text="Frequency")
fig.show()
# we see here though people bigger than 60 years old and smaller than 30 are little but they are percent very important  subscribed a term deposit

In [None]:
fig = px.bar(data_frame=df,x="marital",color="y",title=" influence marital about target", barmode='group')
fig.update_xaxes(title_text="Marital Status")
fig.update_yaxes(title_text="Frequency")
fig.show()

In [None]:
# Scatter plot with Seaborn
plt.figure(figsize=(20, 10))

# plot Marital
plt.subplot(2, 2, 1)
table_marital = pd.pivot_table(df, index=["marital","y"], aggfunc='size').reset_index(name='count')
table_marital.columns = ["Marital", "Target","Frequence"]
table_marital
sns.barplot(data = table_marital , x="Marital" , y="Frequence",hue="Target")
plt.title("influance Marital")
plt.xlabel("Status")
plt.ylabel("Frequence")


# plot Education
plt.subplot(2, 2, 2)
table_education = pd.pivot_table(df, index=["education","y"], aggfunc='size').reset_index(name='count')
table_education.columns = ["education", "Target1","Frequence1"]
table_education
sns.barplot(data = table_education , x="education" , y="Frequence1",hue="Target1")
plt.title("influance Education")
plt.xlabel("education")
plt.ylabel("Frequence")

# plot default
plt.subplot(2, 2, 3)
table_default = pd.pivot_table(df, index=["default","y"], aggfunc='size').reset_index(name='count')
table_default.columns = ["default", "Target3","Frequence3"]
table_default
sns.barplot(data = table_default , x="default" , y="Frequence3",hue="Target3")
plt.title("influance Default")
plt.xlabel("default")
plt.ylabel("Frequence")

# plot Housing
plt.subplot(2, 2, 4)
table_housing = pd.pivot_table(df, index=["housing","y"], aggfunc='size').reset_index(name='count')
table_housing.columns = ["housing", "Target","Frequence"]
table_housing
sns.barplot(data = table_housing , x="housing" , y="Frequence",hue="Target")
plt.title("influance housing")
plt.xlabel("housing")
plt.ylabel("Frequence")

In [None]:
#sns.pairplot(df,hue = "y")
# Scatter plot with Seaborn
plt.figure(figsize=(20, 10))

# plot loan
plt.subplot(2, 2, 1)
table_loan = pd.pivot_table(df, index=["loan","y"], aggfunc='size').reset_index(name='count')
table_loan.columns = ["loan", "Target","Frequence"]
table_loan
sns.barplot(data = table_loan , x="loan" , y="Frequence",hue="Target")
plt.title("influance loan")
plt.xlabel("loan")
plt.ylabel("Frequence")


# plot Contact
plt.subplot(2, 2, 2)
table_contact = pd.pivot_table(df, index=["contact","y"], aggfunc='size').reset_index(name='count')
table_contact.columns = ["contact", "Target1","Frequence1"]
table_contact
sns.barplot(data = table_contact , x="contact" , y="Frequence1",hue="Target1")
plt.title("influance contact")
plt.xlabel("contact")
plt.ylabel("Frequence")

# plot month
plt.subplot(2, 2, 3)
table_month = pd.pivot_table(df, index=["month","y"], aggfunc='size').reset_index(name='count')
table_month.columns = ["month", "Target3","Frequence3"]
table_month
sns.barplot(data = table_month , x="month" , y="Frequence3",hue="Target3")
plt.title("influance month")
plt.xlabel("month")
plt.ylabel("Frequence")

# plot poutcome
plt.subplot(2, 2, 4)
table_poutcome = pd.pivot_table(df, index=["poutcome","y"], aggfunc='size').reset_index(name='count')
table_poutcome.columns = ["poutcome", "Target3","Frequence3"]
table_poutcome
sns.barplot(data = table_poutcome , x="poutcome" , y="Frequence3",hue="Target3")
plt.title("influance poutcome")
plt.xlabel("poutcome")
plt.ylabel("Frequence")


1-we see the best month for this compaign are dec and oct and sep and mar
2-when poutcome sucess the probabilty for sucess this compaign is also bigger to sucess
3-contact with telephone is more perfomance
4 - in general loan affect about probality of sucess

In [None]:
#delete columns influence directly for target
df = df.drop(columns=["duration","balance"])

<h1>Split<h1>

In [None]:
#split data
target = df["y"]
X = df.drop(columns="y")

In [None]:
X.head()
#target.value_counts()

In [None]:
from sklearn.preprocessing import LabelEncoder,StandardScaler
from sklearn.model_selection import train_test_split

clomuns = X.columns
#prepare tranform
transform_scaler = StandardScaler()
transform_categorical = LabelEncoder()

#transfor for target
income_tar = {"no":0,"yes":1}
target = target.replace(income_tar).astype("int")

#transform for categorical column 
cols_cat = X. select_dtypes(include=["object"]).columns
for col in cols_cat:
    X[col] = transform_categorical.fit_transform(X[col])

#scaler data
X = transform_scaler.fit_transform(X)
X = pd.DataFrame(X,columns=clomuns)

#split data
X_train,X_test,y_train,y_test = train_test_split(X,target,test_size=0.2)

print('X_train:',X_train.shape)
print('y_train:',y_train.shape)
print('X_test:',X_test.shape)
print('y_test:',y_test.shape)

<h1>Module<h1>

In [None]:
from keras import layers,callbacks
from keras.models import Sequential

#module 
module = Sequential()
module.add(layers.Dense(256,activation='relu',input_shape=(X_train.shape[1],)))
module.add(layers.Dense(64,activation='relu'))
module.add(layers.Dense(32,activation='relu'))
module.add(layers.Dense(1,activation="sigmoid"))
module.summary()



In [None]:
# Define EarlyStopping callback
early_stopping = callbacks.EarlyStopping(
    monitor='val_accuracy',  # Monitor validation accuracy
    patience=20,               # Number of epochs with no improvement after which training will be stopped
    min_delta=0.01,           # Minimum change to qualify as an improvement
    mode='max',               # 'max' means training will stop when the quantity monitored has stopped increasing
    verbose=1                 # Print messages about early stopping to the console
)

module.compile(optimizer="adam",loss="binary_crossentropy",metrics=['accuracy'])
module.fit(X_train,y_train,validation_data=(X_test,y_test),callbacks=[early_stopping],epochs=100)

In [None]:
#evualute my model
test_loss, test_acc = module.evaluate(X_test, y_test, verbose=2)
test_loss, test_acc

In [None]:
#predictions
preds = module.predict(X_test).reshape(-1,)
binary_preds = (preds > 0.5).astype(int)
print(binary_preds.shape)



In [None]:
#preds VS y_test

data = {
    "y_test" : y_test,
    "predections" : binary_preds
}
data = pd.DataFrame(data)
data