# __Protugese Bank Marketing__ 
### Predicting customer response based on telemarketing data

## ___Table of Contents:___

1. [__Problem Statement__](#Problem)
2. [__Data Cleaning__](#DataCleaning)
3. [__Exploratory Data Analysis__]()
4. [__Data Visualization__]()
5. [__Model Selection__]()
6. [__Model Evaluation__]()
7. [__Hyperparameter Tuning__]()
8. [__Conclusion & Further Research recommandations__]()

### __Part 1. Problem Statement__

- The dataset provided is the record of all campaign calls performed during marketing campaign of a portugese bank for their __term deposit__ service. The task is to predict the client's acceptance or declination to the purchase of the aforementioned service offered by the bank based on the features provided in the dataset

- The dataset is a record of the following features

| feature          | brief                                             | type        |
| ---------------- | ------------------------------------------------- | ----------- |
| age              | age                                               | numeric     |
| job              | type of job                                       | categorical |
| marital          | marital status                                    | categorical |
| education        | education level                                   | categorical |
| default          | has credit in default ?                           | categorical |
| housing          | has housing loan ?                                | categorical |
| loan             | has personal loan ?                               | categorical |
| contact          | contact communication type                        | categorical |
| month            | last contact month of the year                    | categorical |
| day\_of\_week    | day of the week on which contacted                | categorical |
| duration         | last call duration in seconds                     | numeric     |
| campaign         | number of contacts performed during this campaign | numeric     |
| pdays            | number of days since last contact                 | numeric     |
| previous         | number of contacts performed before this campaign | numeric     |
| poutcome         | outcome of previous marketing campaign            | numeric     |
| emp\_var\_rate   | employment variation rate                         | numeric     |
| cons\_price\_idx | consumer price index                              | numeric     |
| cons\_conf\_idx  | consumer confidence index                         | numeric     |
| euribor3m        | euribor 3 month rate                              | numeric     |
| nr\_employed     | number of employees                               | numeric     |

- Banking entities spend a huge amount on telemarketing campaigns to extend their services and make customers aware about new services and bring more customers for the same. The purpose of this project is to analyse the data and create a predictive model that can based on the features provided predict the customer's response to a purchase of the term deposit offered by the bank.
- Doing so considerably reduces the cost of telemarketing incurred by the bank as this analysis makes the entity aware of the customers and other features that lead to a successfull campaign ahead of time.

### __Part 1.2: Data Loading__

#### Import all required libraries and dependencies

In [1]:
# important data analytics and data manipulation lib
import os
import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis

# Data Visualization lib
import matplotlib.pyplot as plt
import seaborn as sns
import bokeh

# ML libraries for data prep and modeling
import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder, MultiLabelBinarizer, OneHotEncoder
from sklearn.model_selection import train_test_split

%matplotlib inline

#### Data is loaded using a .env file to maintain data privacy

In [3]:
# loading data using .env file

from dotenv import load_dotenv

dotenv_path = os.getcwd()+'\\local.env'
load_dotenv(dotenv_path=dotenv_path)

True

##### For our modelling purpose we shall use the bank additional data set as it has socio-economic features along with the marketing features that can provide for a better prediciton

### Load the data

In [4]:
df = pd.read_csv(os.getenv('bank_data'), sep=';')
df.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
1,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
4,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no


In [5]:
df.shape

(45211, 17)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        45211 non-null  int64 
 1   job        45211 non-null  object
 2   marital    45211 non-null  object
 3   education  45211 non-null  object
 4   default    45211 non-null  object
 5   balance    45211 non-null  int64 
 6   housing    45211 non-null  object
 7   loan       45211 non-null  object
 8   contact    45211 non-null  object
 9   day        45211 non-null  int64 
 10  month      45211 non-null  object
 11  duration   45211 non-null  int64 
 12  campaign   45211 non-null  int64 
 13  pdays      45211 non-null  int64 
 14  previous   45211 non-null  int64 
 15  poutcome   45211 non-null  object
 16  y          45211 non-null  object
dtypes: int64(7), object(10)
memory usage: 5.9+ MB


In [7]:
df.describe()

Unnamed: 0,age,balance,day,duration,campaign,pdays,previous
count,45211.0,45211.0,45211.0,45211.0,45211.0,45211.0,45211.0
mean,40.93621,1362.272058,15.806419,258.16308,2.763841,40.197828,0.580323
std,10.618762,3044.765829,8.322476,257.527812,3.098021,100.128746,2.303441
min,18.0,-8019.0,1.0,0.0,1.0,-1.0,0.0
25%,33.0,72.0,8.0,103.0,1.0,-1.0,0.0
50%,39.0,448.0,16.0,180.0,2.0,-1.0,0.0
75%,48.0,1428.0,21.0,319.0,3.0,-1.0,0.0
max,95.0,102127.0,31.0,4918.0,63.0,871.0,275.0


### __Part 2: Data Cleaning__

### __Part 3: Exploratory Data Analysis__

### __Part 4: Data Visualization__

### __Part 5: Model Selection__

NameError: name 'X' is not defined

### __Part 6: Model Evaluation__

### __Part 7: Hyperparameter Tuning__

### __Part 8: Conclusion and Further Research Recommandation__