# BankView – Exploratory Data Analysis (EDA)

This notebook explores the raw banking dataset used in the BankView project.
The goal is to understand the data structure, assess data quality, and identify
key metrics (KPIs) relevant for business decision support.


In [6]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/raw/bank.csv")
df.head()
df.shape
df.columns
df.info()

df.isnull().sum()
df.describe(include="all")




<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11162 entries, 0 to 11161
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        11162 non-null  int64 
 1   job        11162 non-null  object
 2   marital    11162 non-null  object
 3   education  11162 non-null  object
 4   default    11162 non-null  object
 5   balance    11162 non-null  int64 
 6   housing    11162 non-null  object
 7   loan       11162 non-null  object
 8   contact    11162 non-null  object
 9   day        11162 non-null  int64 
 10  month      11162 non-null  object
 11  duration   11162 non-null  int64 
 12  campaign   11162 non-null  int64 
 13  pdays      11162 non-null  int64 
 14  previous   11162 non-null  int64 
 15  poutcome   11162 non-null  object
 16  deposit    11162 non-null  object
dtypes: int64(7), object(10)
memory usage: 1.4+ MB


Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit
count,11162.0,11162,11162,11162,11162,11162.0,11162,11162,11162,11162.0,11162,11162.0,11162.0,11162.0,11162.0,11162,11162
unique,,12,3,4,2,,2,2,3,,12,,,,,4,2
top,,management,married,secondary,no,,no,no,cellular,,may,,,,,unknown,no
freq,,2566,6351,5476,10994,,5881,9702,8042,,2824,,,,,8326,5873
mean,41.231948,,,,,1528.538524,,,,15.658036,,371.993818,2.508421,51.330407,0.832557,,
std,11.913369,,,,,3225.413326,,,,8.42074,,347.128386,2.722077,108.758282,2.292007,,
min,18.0,,,,,-6847.0,,,,1.0,,2.0,1.0,-1.0,0.0,,
25%,32.0,,,,,122.0,,,,8.0,,138.0,1.0,-1.0,0.0,,
50%,39.0,,,,,550.0,,,,15.0,,255.0,2.0,-1.0,0.0,,
75%,49.0,,,,,1708.0,,,,22.0,,496.0,3.0,20.75,1.0,,


## Initial Business Observations

- **Main entities identified:**
  - Customers (demographic and financial profile)
  - Marketing campaigns (contact attempts and outcomes)
  - Banking products (deposits, loans, housing loans)

- **Potential customer identifier:**
  - No explicit customer ID is provided.
  - Each row represents a unique customer–campaign interaction and will be treated as a unique customer record for analysis purposes.

- **Date columns:**
  - `day` and `month` (represent the date of the marketing contact)
  - These fields can be combined into a derived contact date for time-based analysis.

- **Numeric fields useful for KPIs:**
  - `age` (customer demographics)
  - `balance` (average yearly account balance)
  - `duration` (last contact duration)
  - `campaign` (number of contacts during the campaign)
  - `pdays` (days since last contact)
  - `previous` (number of previous contacts)

- **Categorical fields for segmentation:**
  - `job`
  - `marital`
  - `education`
  - `default`
  - `housing`
  - `loan`
  - `contact`
  - `month`
  - `poutcome`
  - `deposit` (target variable indicating product subscription)
