![o](https://www.appletechsoft.com/wp-content/uploads/2021/05/AI-Machine-Learning-are-Transforming-the-Banking-Industry.jpg)

### Problem Definition:

The objective is to analyze the dataset to find insights and develop strategies to improve the effectiveness of future marketing campaigns for a financial institution. Specifically, we aim to identify patterns and factors influencing whether clients subscribe to a term deposit account. By understanding these patterns, the bank can tailor its marketing strategies to target the right audience segments and enhance campaign performance.
<hr/>

**Term** **deposit**
    is a fixed deposit or time deposit, is a type of investment offered by banks and financial institutions. In a term deposit, an individual deposits a certain amount of money with the bank for a fixed period of time, known as the term or maturity period. The money is held by the bank for the specified duration, during which it earns a fixed interest rate.



### Dataset Overview:
The dataset contains information collected during the bank's marketing campaigns. It includes various features related to bank clients, their interactions with the bank, and the outcomes of previous marketing efforts. The target variable indicates whether a client has subscribed to a term deposit account.


### Description of Columns:
1. **Age**: Numeric feature representing the age of the bank client.
2. **Job**: Categorical feature indicating the type of job the client has.
3. **Marital**: Categorical feature indicating the marital status of the client.
4. **Education**: Categorical feature representing the educational level of the client.
5. **Default**: Categorical feature indicating whether the client has credit in default.
6. **Housing**: Categorical feature indicating whether the client has a housing loan.
7. **Loan**: Categorical feature indicating whether the client has a personal loan.
8. **Balance**: Numeric feature representing the balance of the individual.
9. **Contact**: Categorical feature indicating the communication type used to contact the client.
10. **Month**: Categorical feature indicating the month of the last contact.
11. **Day**: Categorical feature indicating the day of the week of the last contact.
12. **Duration**: Numeric feature representing the duration of the last contact in seconds.
13. **Campaign**: Numeric feature representing the number of contacts performed during the current campaign for this client.
14. **Pdays**: Numeric feature representing the number of days since the client was last contacted from a previous campaign.
15. **Previous**: Numeric feature representing the number of contacts performed before the current campaign for this client.
16. **Poutcome**: Categorical feature representing the outcome of the previous marketing campaign.
17. **deposite (Target)**: Binary feature indicating whether the client has subscribed to a term deposit.

### Methodology

We use models like:
- Logistic Regression.
- Random Forest.
- Convolutional Neural Networks 

Evaluated using metrics like 
- accuracy
- precision
- recall
- ROC-AUC. 
This data-driven approach enables organizations to tailor strategies for enhanced customer engagement and business growth.

# Import Libraries

In [None]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.svm import SVC,LinearSVC
from tensorflow.keras.models import Sequential
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression,LogisticRegression
from tensorflow.keras.layers import Dense,Dropout, Conv2D, MaxPooling2D, Flatten, Dense
from sklearn.metrics import mean_squared_error,accuracy_score,f1_score,precision_score,recall_score,confusion_matrix,classification_report,roc_curve,auc
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")


#### Data Loading and Exploration

In [None]:
df=pd.read_csv('/kaggle/input/bank-marketing-dataset/bank.csv')
df.head()

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.sample(5)

#### Missing Values

In [None]:
null_counts = df.isnull().sum()
print(null_counts)

we see the is no missing values

# Exploratory Data Analysis (EDA)

In [None]:
df.head()

people have loans counts

In [None]:
counts=df["loan"].value_counts()
counts
plt.bar(['no','yes'] ,counts)
# set x/y labels and plot title
plt.xlabel("loan")
plt.ylabel("loan counts")
plt.title("people have a loan")

In [None]:
sns.pairplot(df[df.select_dtypes(exclude="object").columns])
plt.title('Pair Plot of Numerical Variables')
plt.show()

In [None]:

plt.figure(figsize=(8, 5));
plt.title('Job vs Deposit')
g = sns.countplot(x= 'job', hue = 'deposit', data=df)
plt.xticks(rotation=70)
plt.yticks([])
plt.legend(title='deposit?', ncol=1, fancybox=True, shadow=True)
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
job_counts = df['job'].value_counts()
job_counts.plot(kind='bar', color='lightgreen')
plt.xlabel('Job')
plt.ylabel('Count')
plt.title('Bar Plot of Job Distribution')
plt.xticks(range(len(job_counts)), job_counts.index, rotation=45)
plt.show()

In [None]:
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.violinplot(data=df, x='education', y='balance')
plt.xlabel('Education')
plt.ylabel('Balance')
plt.title('Violin Plot of Education by Balance')
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
df['deposit'].value_counts().plot(kind='bar', color=['blue', 'green'])
plt.xlabel('Deposit')
plt.ylabel('Count')
plt.title('Bar Plot of Deposit')
plt.xticks(rotation=0)
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
sns.boxplot(data=df, x='marital', y='age')
plt.xlabel('Marital Status')
plt.ylabel('Age')
plt.title('Box Plot of Age by Marital Status')
plt.show()

In [None]:
df2=df.copy()
fig = plt.figure(figsize=(12,8))
df2['deposit'] = LabelEncoder().fit_transform(df2['deposit'])



numeric_df = df2.select_dtypes(exclude="object")

corr= numeric_df.corr()


sns.heatmap(corr, cbar=True)
plt.title("Correlation Matrix", fontsize=16)
plt.show()

# prepare data for modeling

In [None]:
df.head(2)

## Separate categorical and numerical columns

In [None]:
categorical_columns = df.select_dtypes(include=['object']).columns
numerical_columns = df.select_dtypes(exclude=['object']).columns

In [None]:
df[numerical_columns].head(3)

In [None]:
df[categorical_columns].head(3)

## convert categorical columns to numerical

Apply label encoding to categorical columns 

In [None]:
label_encoder = LabelEncoder()
for column in categorical_columns:
    df[column] = label_encoder.fit_transform(df[column])

Apply scaling to numerical columns

In [None]:
scaler = StandardScaler()
df[numerical_columns] = scaler.fit_transform(df[numerical_columns])
df.head(3)

# Data spliting 

Split features (data) and target variable

In [None]:
X = df.drop(columns=['deposit'])  
y = df['deposit']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
print('train features shape: ',X_train.shape)
print('train target   shape: ',y_train.shape)
print('__________\n')
print('test  features shape: ',X_test.shape)
print('test  target   shape: ',y_test.shape)

# Modeling

## LinearRegression

### training

In [None]:
LR=LinearRegression()
LR.fit(X_train,y_train)

### Evaluation

In [None]:
y_pred=LR.predict(X_test)
y_pred

In [None]:
y_test

In [None]:
threshold = 0.5
y_pred = np.where(y_pred >= threshold, 1, 0)
print(classification_report(y_test,y_pred,target_names=['not deposit','deposit']))

In [None]:
LR.coef_

In [None]:
LR.intercept_

save the results

In [None]:
results = pd.DataFrame(columns=['Model Name','Accuracy','Precision Score','Recall Score','F1-Score','roc_auc'])

### Confusion matrix 

In [None]:
cm_matrix = confusion_matrix(y_test,y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm_matrix, display_labels = ['Not Deposite','Deposite'])
cm_display.plot()
plt.title("Confusion Matrix of Linear Regression Classifier")
plt.show()

In [None]:
# Compute ROC curve and ROC area for each class
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

In [None]:
model_result = ['Linear Regression',accuracy_score(y_test,y_pred), 
              precision_score(y_test,y_pred), recall_score(y_test,y_pred),
              f1_score(y_test,y_pred),roc_auc]
results.loc[len(results)]=model_result
results

## Random forest classifier

### model training

In [None]:
RF=RandomForestClassifier(n_estimators=10,criterion='entropy')
RF.fit(X_train,y_train)

### Evaluation

In [None]:
y_pred=RF.predict(X_test)
y_pred

In [None]:
print(classification_report(y_test,y_pred,target_names=['not deposit','deposit']))

### the Confusion matrix 

In [None]:

cm_matrix = confusion_matrix(y_test,y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm_matrix, display_labels = ['Not Deposite','Deposite'])
cm_display.plot()
plt.title("Confusion Matrix of Random forest Classifier")
plt.show()

In [None]:
# Compute ROC curve and ROC area for each class
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

### save reults

In [None]:
model_result = ['Random Forest',accuracy_score(y_test,y_pred), 
              precision_score(y_test,y_pred), recall_score(y_test,y_pred),
              f1_score(y_test,y_pred),roc_auc]
results.loc[len(results)]=model_result
results

## Logistic Regression

### model training

In [None]:
Logistic_regression = LogisticRegression(penalty='l2', dual=False, C=0.9, fit_intercept=True
                        , random_state = 41, max_iter=1000)
Logistic_regression.fit(X_train, y_train)

### model evaluation

In [None]:
y_pred=Logistic_regression.predict(X_test)
y_pred

In [None]:
print(classification_report(y_test,y_pred,target_names=['not deposit','deposit']))

In [None]:

cm_matrix = confusion_matrix(y_test,y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm_matrix, display_labels = ['Not Deposite','Deposite'])
cm_display.plot()
plt.title("Confusion Matrix of Logistic Regression Classifier")
plt.show()

In [None]:
# Compute ROC curve and ROC area for each class
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

save results

In [None]:
model_result = ['Logistic Regression',accuracy_score(y_test,y_pred), 
              precision_score(y_test,y_pred), recall_score(y_test,y_pred),
              f1_score(y_test,y_pred),roc_auc]
results.loc[len(results)]=model_result
results

## Linear Support-Vector Classifier

### mosel training

In [None]:
LSVC = LinearSVC(penalty='l2',dual=False, C=.9)
LSVC.fit(X_train,y_train)

### Model Evaluation

In [None]:
y_pred=LSVC.predict(X_test)
y_pred

In [None]:
print(classification_report(y_test,y_pred,target_names=['not deposit','deposit']))

In [None]:

cm_matrix = confusion_matrix(y_test,y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm_matrix, display_labels = ['Not Deposite','Deposite'])
cm_display.plot()
plt.title("Confusion Matrix of Logistic Regression Classifier")
plt.show()

In [None]:
# Compute ROC curve and ROC area for each class
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

In [None]:
model_result = ['Linear SVC',accuracy_score(y_test,y_pred), 
              precision_score(y_test,y_pred), recall_score(y_test,y_pred),
              f1_score(y_test,y_pred),roc_auc]
results.loc[len(results)]=model_result
results

## CNN

### convert data to images 
cnn mainly work with images, so we convert our numerical data image shape

In [None]:
#Normalize Data
X_normalized = (df - df.min()) / (df.max() - df.min())

X_array = X_normalized.values

N_samples, N_features = X_array.shape
height = 1 
width = N_features  
channels = 1  
X_reshaped = X_array.reshape(N_samples, height, width, channels)


In [None]:
img=X_reshaped[3]
plt.imshow(img)
plt.show()

### split data


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.15, random_state=42)

In [None]:
print('train features shape: ',X_train.shape)
print('train target   shape: ',y_train.shape)
print('__________\n')
print('valid  features shape: ',X_valid.shape)
print('valid  target   shape: ',y_valid.shape)
print('__________\n')
print('test  features shape: ',X_test.shape)
print('test  target   shape: ',y_test.shape)

In [None]:
model = Sequential([
    Conv2D(2, (3, 3), activation='relu', input_shape=(height, width, channels), padding='same'),
    MaxPooling2D((1, 2)),
    Flatten(),
    Dense(6, activation='relu'),
    Dropout(.1),
    Dense(1, activation='sigmoid') 
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
# Train Model
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid,y_valid))

### Evaluate Model

In [None]:

test_loss, test_acc = model.evaluate(X_test, y_test)
print('Test Accuracy:', test_acc)


In [None]:
y_pred=model.predict(X_test)
print(y_pred.shape)
y_pred[:5]

In [None]:
threshold = 0.5
y_pred = np.where(y_pred >= threshold, 1, 0)
print(classification_report(y_test,y_pred,target_names=['not deposit','deposit']))

In [None]:

cm_matrix = confusion_matrix(y_test,y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm_matrix, display_labels = ['Not Deposite','Deposite'])
cm_display.plot()
plt.title("Confusion Matrix of CNN")
plt.show()

## 

In [None]:
# Compute ROC curve and ROC area for each class
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

In [None]:
model_result = ['CNN',accuracy_score(y_test,y_pred), 
              precision_score(y_test,y_pred), recall_score(y_test,y_pred),
              f1_score(y_test,y_pred),roc_auc]
results = results.drop(results[results['Model Name'] == 'CNN'].index, errors='ignore')
results.loc[len(results)]=model_result
results

In [None]:
model.save('cnn_model.h5')

# Evaluation and comaprison

In [None]:
results

### Accuracy Comparison

In [None]:
import seaborn as sns
plt.subplots(figsize=(5,4))
sns.barplot(x="Model Name", y="Accuracy",data=results)
plt.xticks(rotation=90)
plt.title('Models Testing & Evaluation Accuracy Comparison')
plt.show()