In [None]:
# Ok Let's Get Started 
import pandas as pd

# Load the dataset
df = pd.read_csv('/kaggle/input/customer-segmentation-data-for-marketing-analysis/customer_segmentation_data.csv')

# Display the first few rows of the dataframe
print(df.head())


Ok So now that we have viewed our data i would like to proceed with - performing some basic exploratory data analysis (EDA) to understand the distribution and characteristics of the data. This will include:

    1) Summary statistics of the numerical columns.
    2) Distribution of categorical variables.
    3) Checking for any missing values.


In [None]:
# Summary statistics of numerical columns
summary_stats = df.describe()
print(summary_stats)

# Distribution of categorical variables
gender_distribution = df['gender'].value_counts()
category_distribution = df['preferred_category'].value_counts()
print(gender_distribution)
print(category_distribution)

# Checking for missing values
missing_values = df.isnull().sum()
print(missing_values)

Let's proceed with the clustering analysis using the K-means algorithm. We'll use the following features for clustering:

    Age
    Income
    Spending Score
    Membership Years
    Purchase Frequency
    Last Purchase Amount

We'll first standardize these features to ensure they are on the same scale, and then apply the K-means algorithm to identify clusters.



In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns

# Features to be used for clustering
features = ['age', 'income', 'spending_score', 'membership_years', 'purchase_frequency', 'last_purchase_amount']

# Standardize the features
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[features])

# Determine the optimal number of clusters using the elbow method
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=42)
    kmeans.fit(scaled_features)
    wcss.append(kmeans.inertia_)

# Plot the elbow graph
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method For Optimal Number of Clusters')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

In [None]:
# Apply K-means clustering with 4 clusters
kmeans = KMeans(n_clusters=4, init='k-means++', max_iter=300, n_init=10, random_state=42)
clusters = kmeans.fit_predict(scaled_features)

# Add the cluster labels to the original dataframe
df['cluster'] = clusters

# Display the first few rows of the dataframe with the cluster labels
print(df.head())

In [None]:
# Visualize the clusters using a pair plot
sns.pairplot(df, hue='cluster', vars=['age', 'income', 'spending_score', 'membership_years', 'purchase_frequency', 'last_purchase_amount'], palette='viridis')
plt.show()

In [None]:
# Calculate the mean values of the numeric features for each cluster
numeric_features = ['age', 'income', 'spending_score', 'membership_years', 'purchase_frequency', 'last_purchase_amount']
cluster_summary_numeric = df.groupby('cluster')[numeric_features].mean()
print(cluster_summary_numeric)

# **Summary of Key Characteristics for Each Cluster:**

    Cluster 0:
        Younger customers with an average age of approximately 30.67 years.
        Moderate income with an average of around $98,862.
        Low spending score of about 23.84.
        Average membership years of about 5.40.
        High purchase frequency of approximately 28.39.
        Moderate last purchase amount of around $442.92.

    Cluster 1:
        Older customers with an average age of approximately 53.26 years.
        Lower income with an average of around $55,592.
        Moderate spending score of about 43.32.
        Average membership years of about 5.66.
        Moderate purchase frequency of approximately 25.76.
        High last purchase amount of around $665.22.

    Cluster 2:
        Older customers with an average age of approximately 56.74 years.
        Higher income with an average of around $121,045.
        Moderate spending score of about 54.36.
        Average membership years of about 5.11.
        Moderate purchase frequency of approximately 25.46.
        Moderate last purchase amount of around $473.45.

    Cluster 3:
        Younger customers with an average age of approximately 33.84 years.
        Moderate income with an average of around $80,967.
        High spending score of about 77.69.
        Average membership years of about 5.67.
        Moderate purchase frequency of approximately 26.92.
        Lower last purchase amount of around $385.28.


# **Conclusion**

Improving customer satisfaction using the insights from customer segmentation involves tailoring your strategies to meet the specific needs and preferences of each customer segment. Here are some actionable steps based on the characteristics of each cluster:

**Cluster 0: Younger, Moderate Income, Low Spending Score, High Purchase Frequency**

* Personalized Promotions: Offer targeted promotions and discounts to encourage higher spending. Since they have a high purchase frequency, loyalty programs or rewards for frequent purchases can be effective.
* Engagement Campaigns: Use engagement campaigns to increase their spending score. This could include personalized recommendations, exclusive offers, and early access to sales.
* Product Recommendations: Highlight products that align with their purchasing history and preferences to increase the average purchase amount.

**Cluster 1: Older, Lower Income, Moderate Spending Score, High Last Purchase Amount**

* Value-Based Offers: Provide value-based offers and discounts to cater to their lower income. Highlight the value and quality of products to justify the higher last purchase amounts.
* Customer Service: Enhance customer service and support for this segment. Older customers may appreciate more personalized and attentive service.
* Loyalty Programs: Implement loyalty programs that reward them for their spending and encourage repeat purchases.

**Cluster 2: Older, Higher Income, Moderate Spending Score, Moderate Purchase Frequency**

* Premium Products: Promote premium and high-end products to this segment, as they have a higher income. Highlight the quality and exclusivity of these products.
* Exclusive Offers: Provide exclusive offers and personalized experiences to make them feel valued. This could include VIP events, early access to new products, and personalized recommendations.
* Customer Feedback: Actively seek feedback from this segment to understand their needs and preferences better. Use this feedback to improve products and services.

**Cluster 3: Younger, Moderate Income, High Spending Score, Lower Last Purchase Amount**

* Upselling and Cross-Selling: Implement upselling and cross-selling strategies to increase the last purchase amount. Recommend complementary products and bundles.
* Engagement and Retention: Focus on engagement and retention strategies to maintain their high spending score. This could include personalized emails, social media engagement, and loyalty rewards.
* Innovative Products: Introduce innovative and trendy products that appeal to younger customers. Keep them engaged with new and exciting offerings.

**General Strategies:**

* Personalization: Use data-driven personalization to tailor marketing messages, product recommendations, and offers to each segment.
* Customer Experience: Enhance the overall customer experience by providing excellent service, easy navigation on your website, and a seamless shopping experience.
* Feedback and Improvement: Continuously gather feedback from customers and use it to improve your products and services. Show customers that their opinions matter and that you are committed to meeting their needs.

By understanding the unique characteristics and preferences of each customer segment, you can create targeted strategies that improve customer satisfaction and drive loyalty.