In [2]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

In [3]:
# 1. Load data
df = pd.read_csv('../Data/Bank Customer Segmentation.csv')
df.head()

Unnamed: 0,CUST_ID,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE
0,C10001,40.900749,0.818182,95.4,0.0,95.4,0.0,0.166667,0.0,0.083333,0.0,0,2,1000.0,201.802084,139.509787,0.0,12
1,C10002,3202.467416,0.909091,0.0,0.0,0.0,6442.945483,0.0,0.0,0.0,0.25,4,0,7000.0,4103.032597,1072.340217,0.222222,12
2,C10003,2495.148862,1.0,773.17,773.17,0.0,0.0,1.0,1.0,0.0,0.0,0,12,7500.0,622.066742,627.284787,0.0,12
3,C10004,1666.670542,0.636364,1499.0,1499.0,0.0,205.788017,0.083333,0.083333,0.0,0.083333,1,1,7500.0,0.0,,0.0,12
4,C10005,817.714335,1.0,16.0,16.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0,1,1200.0,678.334763,244.791237,0.0,12


In [7]:
# 2. Handle missing values (simple example)
df['MINIMUM_PAYMENTS'].fillna(df['MINIMUM_PAYMENTS'].median(), inplace=True)

In [8]:
# 2. Handle missing values (simple example)
df['MINIMUM_PAYMENTS'].fillna(df['MINIMUM_PAYMENTS'].median(), inplace=True)

In [9]:
# 3. Select features for clustering
features = [
    'BALANCE', 'BALANCE_FREQUENCY', 'PURCHASES', 'ONEOFF_PURCHASES',
    'INSTALLMENTS_PURCHASES', 'CASH_ADVANCE', 'PURCHASES_FREQUENCY',
    'CASH_ADVANCE_FREQUENCY', 'PAYMENTS', 'PRC_FULL_PAYMENT'
]
X = df[features]

In [10]:
# 3. Select features for clustering
features = [
    'BALANCE', 'BALANCE_FREQUENCY', 'PURCHASES', 'ONEOFF_PURCHASES',
    'INSTALLMENTS_PURCHASES', 'CASH_ADVANCE', 'PURCHASES_FREQUENCY',
    'CASH_ADVANCE_FREQUENCY', 'PAYMENTS', 'PRC_FULL_PAYMENT'
]
X = df[features]

In [11]:
# 4. Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [12]:
# 4. Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [13]:
# 5. Determine optimal k using silhouette
best_k = 0
best_score = -1

for k in range(2, 10):
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(X_scaled)
    score = silhouette_score(X_scaled, labels)
    if score > best_score:
        best_k = k
        best_score = score

print("Best k:", best_k, "with silhouette score:", best_score)

Best k: 2 with silhouette score: 0.3323427494971046


In [14]:
# 6. Final K-Means with best_k
kmeans = KMeans(n_clusters=best_k, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)

In [15]:
# 7. Analyze cluster profiles
cluster_profiles = df.groupby('Cluster')[features].mean()
print(cluster_profiles)

             BALANCE  BALANCE_FREQUENCY    PURCHASES  ONEOFF_PURCHASES  \
Cluster                                                                  
0        4269.415218           0.968436  1620.283752       1051.306889   
1         849.551326           0.853175   840.109106        471.156841   

         INSTALLMENTS_PURCHASES  CASH_ADVANCE  PURCHASES_FREQUENCY  \
Cluster                                                              
0                    569.042961   3514.283434             0.376528   
1                    369.314316    308.754365             0.520434   

         CASH_ADVANCE_FREQUENCY     PAYMENTS  PRC_FULL_PAYMENT  
Cluster                                                         
0                      0.403065  3920.481355          0.050056  
1                      0.064332  1155.024278          0.181112  


Below is a concise interpretation of each cluster based on the average feature values. You have 2 clusters—let’s call them Cluster 0 and Cluster 1.

High-Level Comparison

Feature	Cluster 0	Cluster 1	Interpretation
BALANCE	4,269.42	849.55	Cluster 0 maintains a significantly higher balance.
BALANCE_FREQUENCY	0.97	0.85	Cluster 0’s balance is updated more frequently, indicating active or ongoing balance.
PURCHASES	1,620.28	840.11	Cluster 0 spends about twice as much in total purchases.
ONEOFF_PURCHASES	1,051.31	471.16	Cluster 0 makes larger one-off purchases on average.
INSTALLMENTS_PURCHASES	569.04	369.31	Cluster 0 also spends more via installments.
CASH_ADVANCE	3,514.28	308.75	Cluster 0 takes far more cash advances in total amount.
PURCHASES_FREQUENCY	0.38	0.52	Cluster 0’s purchase frequency is lower, meaning fewer months with purchases.
CASH_ADVANCE_FREQUENCY	0.40	0.06	Cluster 0 uses cash advances regularly (40% of months) vs. very low frequency in Cluster 1.
PAYMENTS	3,920.48	1,155.02	Cluster 0 makes large payments but still carries a higher balance overall.
PRC_FULL_PAYMENT	0.05	0.18	Cluster 0 rarely pays the full statement (5% of the time) vs. Cluster 1 (18% of the time).

Cluster-by-Cluster Insights

Cluster 0
	•	High BALANCE & High CASH_ADVANCE:
They carry a large balance (4,269) and take significant cash advances (3,514 on average). This suggests they often rely on credit/cash advances, potentially incurring higher interest/fees.
	•	Large Purchases But Lower Frequency:
While total purchases are quite high (1,620), the frequency is only 0.38—meaning they have fewer but bigger transactions, including large one-off purchases (1,051).
	•	Rarely Pay in Full:
Their PRC_FULL_PAYMENT is just 0.05, indicating they mostly revolve their balance and rarely clear the statement completely. This could be profitable for the bank (interest income) but also riskier.
	•	Heavy Users:
They make large payments (3,920) but still maintain a high balance, implying they cycle through credit usage and repayment frequently.

Possible Characterization:
“High-Spend, High-Credit, Revolver Segment” — They use a lot of credit, frequently take cash advances, and don’t pay the full balance. They might be profitable due to interest/fees but carry higher risk if their financial situation changes.

Potential Marketing Actions:
	•	Risk Assessment: Monitor credit risk closely; they might default if circumstances change.
	•	Upsell Specialized Products: If they’re high-income or stable, offer premium credit lines or personal loans (to consolidate balances).
	•	Encourage Payment Plans: Since they revolve balances, offering structured installment plans might help them (and reduce bank risk).

Cluster 1
	•	Lower BALANCE & Purchases:
With a balance around 850 and total purchases at 840, they’re relatively low-volume credit card users compared to Cluster 0.
	•	Higher Purchase Frequency:
Their purchase frequency is 0.52 (vs. 0.38), meaning they use the card more regularly across different months, but likely for smaller amounts.
	•	Minimal Cash Advances:
Both the amount (309) and frequency (0.06) are quite low, suggesting they rarely rely on cash advances.
	•	Higher PRC_FULL_PAYMENT:
At 0.18, they pay off the statement in full nearly 1 out of 5 months, which is significantly higher than Cluster 0, though still not extremely high overall.

Possible Characterization:
“Moderate Spenders, Steadier Payment Habits” — They keep modest balances, make smaller but more frequent purchases, and are more likely to pay off their balance in full from time to time.

Potential Marketing Actions:
	•	Reward Programs: Since they purchase frequently (but in smaller amounts) and pay more regularly, a cash-back or points reward system could encourage them to spend more on the card.
	•	Limit Increase: If they appear lower risk, the bank could consider gradually increasing their credit limit to capture more spend.
	•	Cross-Sell: Offer additional products (like auto loans or mortgages) if they have stable repayment behavior.

Overall Interpretation
	•	Cluster 0 is a “high usage, high balance, cash-advance heavy” group with relatively lower full payment rates. They make fewer but larger purchases, rely significantly on credit (including cash advances), and revolve debt more often.
	•	Cluster 1 is a “moderate-spend, higher frequency, lower balance” group who occasionally pay off the card in full and rarely use cash advances.

In Banking Terms:
	•	Cluster 0 might be more profitable in terms of interest and fees but also carry higher credit risk.
	•	Cluster 1 is more conservative in usage, lower risk, but might not generate as much interest revenue—though they have growth potential if incentivized properly.

Conclusion & Next Steps
	1.	Tailored Marketing:
	•	Cluster 0: Possibly offer installment plans or debt consolidation loans. Monitor risk.
	•	Cluster 1: Provide rewards to encourage higher spending, or upsell other banking products.
	2.	Risk Management:
	•	Cluster 0: Watch credit utilization and delinquency risks.
	•	Cluster 1: Lower risk; might be eligible for credit line increases or premium product offers.
	3.	Further Analysis:
	•	Evaluate if these clusters align with additional data (demographics, income) for even finer targeting.
	•	Consider if 2 clusters is enough or if you might find more nuanced segments with k=3 or k=4.