# Customer Segmentation Analysis - Banking Marketing Campaign

This notebook focuses on analyzing and segmenting customers in the banking marketing campaign dataset to identify key customer segments and their characteristics.

## Contents
1. Data Loading and Initial Exploration
2. Customer Segmentation Analysis
3. High-Potential Customer Identification
4. Detailed Customer Profiling

In [1]:
import sys

# add the src to the path
sys.path.append(os.path.abspath(".."))  

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.banking_utils import DataLoader, CampaignAnalysis, CustomerSegmentation

## 1. Data Loading and Initial Exploration

In [2]:
# Load the dataset
# download the dataset
df = DataLoader.load_banking_data(
    data_path='../data/banking/bank-additional',           # Store data in a data directory one level up from notebooks
    file_name='bank-additional-full.csv',                   # Use the full dataset
    force_download=False                                    # Only download if file doesn't exist
)

# Display basic information about the dataset
print("\nDataset Overview:")
print(df.info())
print("\nFirst few rows:")
print(df.head())
print("Dataset Overview:")
print(df.info())
print("\nFirst few rows:")
print(df.head())


Successfully loaded bank-additional-full.csv
Dataset shape: 41188 rows × 21 columns

Dataset Overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             41188 non-null  int64  
 1   job             41188 non-null  object 
 2   marital         41188 non-null  object 
 3   education       41188 non-null  object 
 4   default         41188 non-null  object 
 5   housing         41188 non-null  object 
 6   loan            41188 non-null  object 
 7   contact         41188 non-null  object 
 8   month           41188 non-null  object 
 9   day_of_week     41188 non-null  object 
 10  duration        41188 non-null  int64  
 11  campaign        41188 non-null  int64  
 12  pdays           41188 non-null  int64  
 13  previous        41188 non-null  int64  
 14  poutcome        41188 non-null  object 
 15  emp.var.rate    4

## 2. Customer Segmentation Analysis

In [3]:
# Initialize segmentation analysis
segmentation = CustomerSegmentation(df)

# Create and analyze customer profiles
customer_profiles = segmentation.create_customer_profiles()
print("\nCustomer Profiles Analysis:")
print(customer_profiles)

KeyError: 'balance'

## 3. High-Potential Customer Identification

In [None]:
# Identify high-potential customers
high_potential = segmentation.identify_high_potential_customers()

print("\nHigh Potential Customer Overview:")
print(f"Number of high potential customers: {len(high_potential)}")
print("\nSuccess rate in high potential segment:")
print(f"{(high_potential['y'] == 'yes').mean() * 100:.2f}%")

## 4. Visualization of Customer Segments

In [None]:
# Create visualizations
viz = Visualization(df)
viz.plot_key_insights()