# Cluster Analysis

Customer segmentation is a crucial step in understanding the unique characteristics and behaviors of different groups within a customer base. In this project, clustering was performed using **Principal Component Analysis (PCA)** for dimensionality reduction and **KMeans** for grouping customers into clusters. The resulting segmentation aims to uncover patterns and insights that can inform targeted strategies and decision-making.

This phase focuses on analyzing the clusters to understand their distinct features and distributions. By exploring the characteristics that define each group and differentiate them from others, we aim to:

- Identify key attributes that influence the formation of clusters.
- Interpret the patterns and preferences within each segment.
- Highlight differences across clusters to provide actionable insights.

The findings from this analysis will serve as a foundation for strategic business decisions, such as personalized marketing, product recommendations, and resource allocation.


In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import sweetviz as sv

%run ../customer_personality_analysis/utils/pandas_explorer.py

In [23]:
path = '../customer_personality_analysis/data/clustered.csv'
df = pd.read_csv(path)
df.head()

Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Tenure_Months,Age,Total_bought,Total_purchases,Cmp_Accepted,Kmeans_labels
0,1957,Graduation,Uncommitted,58138.0,0,0,58,635,88,546,...,0,0,0,1,21,57,1617,25,0,1
1,1954,Graduation,Uncommitted,46344.0,1,1,38,11,1,6,...,0,0,0,0,3,60,27,6,0,3
2,1965,Graduation,Committed,71613.0,0,0,26,426,49,127,...,0,0,0,0,10,49,776,21,0,0
3,1984,Graduation,Committed,26646.0,1,0,26,11,4,20,...,0,0,0,0,4,30,53,8,0,6
4,1981,PhD,Committed,58293.0,1,0,94,173,43,118,...,0,0,0,0,5,33,422,19,0,6


In [None]:
df[''].value_counts()

Cmp_Accepted
0    1747
1     322
2      81
3      44
4      11
Name: count, dtype: int64

## Selecting features used for clustering

In [31]:
categorical_features = ['Education','Marital_Status','Kidhome','Teenhome','Cmp_Accepted']
continuous_features = ['Tenure_Months','Age','Income','Total_bought','Total_purchases']

## Cluster 0 Summary

In [47]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 0][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 0][col].describe()}\n")

################################ Education ################################
Education
Graduation    51.211073
PhD           24.567474
Master        15.916955
2n Cycle       8.304498
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Committed    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
0    92.733564
1     7.266436
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
0    55.017301
1    44.290657
2     0.692042
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    64.013841
1    21.107266
2     8.650519
3     5.190311
4     1.038062
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    289.000000
mean       5.304498



- **Education**: Predominantly well-educated; 51.21% Graduation, 24.57% PhD, 15.92% Master.
- **Marital Status**: Entirely "Commited."
- **Family Structure**:
  - **Kidhome**: 92.73% have no children, 7.27% have one.
  - **Teenhome**: 55.02% have no teenagers, 44.29% one, and 0.69% two.
- **Campaign Engagement (Cmp_Accepted)**: Low engagement, with 64.01% not accepting any campaigns; smaller percentages accept 1-4 campaigns.


- **Tenure**: Short, averaging 5.30 months (range: 0–11).
- **Age**: Middle-aged, mean 45.93 years (range: 20–71).
- **Income**: Financially stable, mean $71,356 (range: $49,605–$96,876).
- **Consumption**:
  - **Total Bought**: High, averaging 1037 (range: 42–2524).
  - **Total Purchases**: Medium, averaging 19.63 (range: 5–32).

### Key Insights
Highly educated, financially stable, and consumption-driven middle-aged individuals with minimal family responsibilities, moderate campaign engagement and relatively short tenure.



## Cluster 1 Summary

In [48]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 1][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 1][col].describe()}\n")

################################ Education ################################
Education
Graduation    47.791165
PhD           24.096386
Master        18.875502
2n Cycle       9.236948
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Uncommitted    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
0    86.345382
1    12.851406
2     0.803213
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
1    57.028112
0    40.963855
2     2.008032
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    73.092369
1    18.875502
2     4.417671
3     2.008032
4     1.606426
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    249.000000
mea


- **Education**: Highly educated; 47.79% Graduation, 24.10% PhD, 18.88% Master, 9.24% 2n Cycle.
- **Marital Status**: Entirely "Uncommitted."
- **Family Structure**:
  - **Kidhome**: 86.35% have no children, 12.85% one, and 0.80% two.
  - **Teenhome**: 57.03% have one teenager, 40.96% one, and 2.01% two.
- **Campaign Engagement (Cmp_Accepted)**: Low acceptance rates; 73.09% accepted none, smaller proportions accepted 1–4 campaigns.


- **Tenure**: Long tenure, averaging 17.19 months (range: 12–22).
- **Age**: Middle-aged, mean 48.97 years (range: 19–71).
- **Income**: Moderate, mean $63,439 (range: $22,304–$102,692).
- **Consumption**:
  - **Total Bought**: Medium, averaging 1021 (range: 18–2209).
  - **Total Purchases**: Medium, averaging 20.44 (range: 4–33).

### Key Insights
This cluster represents financially stable, middle-aged individuals with high education levels but "Uncommitted" marital status. They have longer tenures, moderate income, and significant consumption. Engagement with marketing campaigns remains low, suggesting room for tailored strategies.


## Cluster 2 Summary

In [49]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 2][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 2][col].describe()}\n")

################################ Education ################################
Education
Graduation    40.972222
Master        27.777778
PhD           26.736111
2n Cycle       4.166667
Basic          0.347222
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Committed    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
0    61.111111
1    36.458333
2     2.430556
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
1    84.027778
2     8.680556
0     7.291667
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    85.416667
1    12.500000
2     1.736111
3     0.347222
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    288.000


- **Education**: Predominantly well-educated; 40.97% Graduation, 27.78% Master, 26.74% PhD. Few with 2n Cycle (4.17%) or Basic education (0.35%).
- **Marital Status**: Entirely "Committed."
- **Family Structure**:
  - **Kidhome**: 61.11% have no children, 36.46% one, and 2.43% two.
  - **Teenhome**: 84.03% have one teenager, 8.68% two, and 7.29% none.
- **Campaign Engagement (Cmp_Accepted)**: Very low; 85.42% accepted none, with minimal engagement in 1–3 campaigns.

- **Tenure**: Short tenure, averaging 10.07 months (range: 0–22).
- **Age**: Older demographic, mean 56.04 years (range: 38–71).
- **Income**: Modest, mean $48,110 (range: $4,428–$93,404).
- **Consumption**:
  - **Total Bought**: Low, mean 359.56 (range: 11–1616).
  - **Total Purchases**: Moderate frequency, mean 14.11 (range: 1–35).

### Key Insights
This cluster represents older, married individuals with modest incomes and shorter tenure. Despite their high education levels, consumption is relatively low, and engagement with marketing campaigns is minimal. Strategies could focus on increasing loyalty and targeting low-cost value propositions.


## Cluster 3 Summary

In [50]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 3][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 3][col].describe()}\n")

################################ Education ################################
Education
Graduation    51.778656
PhD           29.644269
Master        14.229249
2n Cycle       4.347826
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Uncommitted    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
0    77.470356
1    22.134387
2     0.395257
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
1    49.011858
0    47.430830
2     3.557312
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    71.936759
1    18.181818
2     5.928854
3     3.557312
4     0.395257
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    253.000000
mea


- **Education**: High proportion of highly educated individuals; 51.78% have a Graduation degree, 29.64% PhD, 14.23% Master, and 4.35% with 2n Cycle.
- **Marital Status**: Entirely "Uncommitted."
- **Family Structure**:
  - **Kidhome**: 77.47% have no children, 22.13% one, and 0.40% two.
  - **Teenhome**: Evenly distributed between households with one teenager (49.01%) and none (47.43%), with 3.56% having two teenagers.
- **Campaign Engagement (Cmp_Accepted)**: Moderate; 71.94% accepted none, while 18.18% accepted one, and smaller proportions accepted more.

- **Tenure**: Very short tenure, averaging 5.25 months (range: 0–11).
- **Age**: Diverse age range with a mean of 48.36 years (range: 19–74).
- **Income**: Relatively high, mean $63,369 (range: $7,144–$113,734).
- **Consumption**:
  - **Total Bought**: High, mean 785.77 (range: 11–2525).
  - **Total Purchases**: Frequent, mean 16.25 (range: 4–32).

### Key Insights
This cluster is characterized by uncommitted, educated individuals with short tenure and relatively high income. They demonstrate active purchasing behavior, both in terms of total amount and frequency, but moderate engagement with marketing campaigns. Efforts could focus on loyalty programs and campaigns that emphasize value for educated, transient individuals.



## Cluster 4 Summary

In [51]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 4][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 4][col].describe()}\n")

################################ Education ################################
Education
Graduation    51.181102
PhD           15.748031
2n Cycle      12.992126
Master        11.023622
Basic          9.055118
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Committed    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
1    75.984252
0    21.259843
2     2.755906
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
0    72.834646
1    25.984252
2     1.181102
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    91.338583
1     8.267717
2     0.393701
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    254.000000
mean      1


- **Education**: Balanced educational background:
  - 51.18% with Graduation, 15.75% PhD, 12.99% 2n Cycle, 11.02% Master, and 9.06% Basic.
- **Marital Status**: Entirely "Committed."
- **Family Structure**:
  - **Kidhome**: 75.98% have one child, 21.26% have none, and 2.76% have two children.
  - **Teenhome**: Predominantly no teenagers (72.83%), followed by one teenager (25.98%) and 1.18% with two teenagers.
- **Campaign Engagement (Cmp_Accepted)**: Low; 91.34% accepted none, while 8.27% accepted one, and 0.39% accepted two.

- **Tenure**: Long tenure, averaging 17.28 months (range: 12–22).
- **Age**: Younger demographic, mean age 37.80 years (range: 18–67).
- **Income**: Low to moderate, mean $31,897 (range: $2,447–$66,373).
- **Consumption**:
  - **Total Bought**: Low, mean 187.38 (range: 11–1,730).
  - **Total Purchases**: Moderate, mean 10.20 (range: 4–43).

### Key Insights
This cluster comprises younger, committed individuals with low to moderate income. They are characterized by long tenure and minimal engagement with campaigns. Purchasing behavior is modest in both frequency and total amount. Strategies for this group could focus on personalized offers to increase purchasing activity, particularly for families with one child and no teenagers.


## Cluster 5 Summary

In [55]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 5][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 5][col].describe()}\n")

################################ Education ################################
Education
Graduation    57.097792
PhD           22.712934
Master        12.933754
2n Cycle       7.255521
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Committed    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
0    89.905363
1    10.094637
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
1    50.788644
0    47.634069
2     1.577287
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    67.507886
1    19.873817
2     7.255521
3     4.416404
4     0.946372
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    317.000000
mean      17.593060



- **Education**: This cluster has a high level of education:
  - 57.10% with a Graduation, 22.71% have a PhD, 12.93% have a Master's degree, and 7.26% completed the 2nd Cycle of education.
- **Marital Status**: All individuals in this cluster are "Committed."
- **Family Structure**:
  - **Kidhome**: The majority (89.91%) have no children, while 10.09% have one child.
  - **Teenhome**: Half (50.79%) have one teenager, while 47.63% have no teenagers, and 1.58% have two teenagers.
- **Campaign Engagement (Cmp_Accepted)**: Most individuals (67.51%) have not accepted any campaigns, with 19.87% accepting one, 7.26% accepting two, 4.42% accepting three, and 0.95% accepting four campaigns.

- **Tenure**: The average tenure is 17.59 months, with a range from 12 to 23 months.
- **Age**: The average age is 46.42 years, ranging from 20 to 73 years.
- **Income**: On average, income is $68,281, ranging from $35,797 to $105,471.
- **Consumption**:
  - **Total Bought**: The average number of items purchased is 1,179, ranging from 68 to 2,440.
  - **Total Purchases**: On average, 21.21 purchases were made, ranging from 7 to 39.

### Key Insights
This cluster represents individuals with medium and high education levels and average incomes. The majority of individuals have no children, but a significant portion has teenagers. They show a moderate level of engagement with campaigns, having accepted one or more offers. This group is characterized by an average consumption, with frequent purchases of a wide variety of products. Targeting this cluster for more personalized campaigns could further increase engagement, especially with products or services that align with their higher purchasing habits and family needs.



## Cluster 6 Summary

In [56]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 6][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 6][col].describe()}\n")

################################ Education ################################
Education
Graduation    48.920863
Master        16.187050
2n Cycle      15.827338
PhD           15.467626
Basic          3.597122
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Committed    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
1    82.374101
0    11.151079
2     6.474820
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
0    73.381295
1    26.618705
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    91.007194
1     8.992806
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    278.000000
mean       5.402878
std        3.493613
m


- **Education**: The education distribution in this cluster includes:
  - 48.92% with a Graduation, 16.19% with a Master's degree, 15.83% with a 2nd Cycle of education, 15.47% with a PhD, and 3.60% with a Basic level of education.
- **Marital Status**: All individuals are "Committed."
- **Family Structure**:
  - **Kidhome**: 82.37% of individuals in this cluster have one child, 11.15% have no children, and 6.47% have two children.
  - **Teenhome**: 73.38% have no teenagers, while 26.62% have one teenager.
- **Campaign Engagement (Cmp_Accepted)**: A significant majority (91.01%) have not accepted any campaigns, while 8.99% have accepted one campaign.

- **Tenure**: The average tenure is 5.4 months, with a range from 0 to 11 months.
- **Age**: The average age is 38.14 years, with a range from 18 to 55 years.
- **Income**: On average, income is $32,922, with a range from $4,023 to $71,427.
- **Consumption**:
  - **Total Bought**: The average number of items purchased is 98, ranging from 8 to 989.
  - **Total Purchases**: On average, 7.97 purchases were made, with a range from 4 to 25.

### Key Insights
This cluster is characterized by individuals with moderate to high levels of education. They are mostly young to middle-aged (average age of 38 years) and have a relatively short tenure (average of 5.4 months). Most individuals have one child and few engage in campaigns. Income levels are moderate, with an average income of $32,922. This group has a lower level of consumption compared to other groups, with an average of 98 items purchased. Targeting them with specific offers might be effective, especially with products that appeal to young families and early-career professionals.


## Cluster 7 Summary

In [57]:
for index, col in enumerate(df[categorical_features].columns):
    print("#"*32,col,"#"*32)
    print(df[df['Kmeans_labels'] == 7][col].value_counts(normalize=True) * 100, "\n")
    
    
for index, col in enumerate(df[continuous_features].columns):
    print("#"*32,col,"#"*32)
    print(f"{df[df['Kmeans_labels'] == 7][col].describe()}\n")

################################ Education ################################
Education
Graduation    54.151625
Master        14.801444
PhD           13.718412
2n Cycle      10.108303
Basic          7.220217
Name: proportion, dtype: float64 

################################ Marital_Status ################################
Marital_Status
Uncommitted    100.0
Name: proportion, dtype: float64 

################################ Kidhome ################################
Kidhome
1    77.617329
0    18.411552
2     3.971119
Name: proportion, dtype: float64 

################################ Teenhome ################################
Teenhome
0    71.119134
1    28.158845
2     0.722022
Name: proportion, dtype: float64 

################################ Cmp_Accepted ################################
Cmp_Accepted
0    91.335740
1     8.303249
2     0.361011
Name: proportion, dtype: float64 

################################ Tenure_Months ################################
count    277.000000
mean     


- **Education**: The education distribution in this cluster includes:
  - 54.15% have a Graduation degree, 14.80% hold a Master's degree, 13.72% have a PhD, 10.11% have a 2nd Cycle education, and 7.22% have a Basic education level.
- **Marital Status**: All individuals are "Uncommitted."
- **Family Structure**:
  - **Kidhome**: 77.62% have one child, 18.41% have no children, and 3.97% have two children.
  - **Teenhome**: 71.12% have no teenagers, 28.16% have one teenager, and 0.72% have two teenagers.
- **Campaign Engagement (Cmp_Accepted)**: A large proportion (91.34%) of individuals have not accepted any campaigns, with a small percentage (8.30%) accepting one campaign and 0.36% accepting two.

- **Tenure**: The average tenure in this cluster is 11.5 months, with a range from 0 to 22 months.
- **Age**: The average age is 38.53 years, with a range from 19 to 65 years.
- **Income**: On average, income is $31,121, ranging from $1,730 to $68,274.
- **Consumption**:
  - **Total Bought**: The average number of items purchased is 119, ranging from 5 to 835.
  - **Total Purchases**: On average, 8.52 purchases were made, with a range from 0 to 25.

### Key Insights
This cluster is characterized by individuals with a high proportion of people holding a Graduation education level (54.2%) and a relatively young to middle-aged group (average age of 38.5 years). Most individuals have one child, and a significant portion have no teenagers. Their tenure at the company is relatively moderate (average of 11.5 months). This group has an average income of 
$31,122, which is lower compared to other clusters, and shows low purchasing activity, with an average of 119 items bought. Only a small fraction has accepted campaigns, making them a potential target for specific offers, particularly for lower-cost or essential products.