### Clustering Method for Analyzing Relationships Between Interviewed and Non-interviewed Customers

#### Project Overview
This data science project delves into a scenario with 200 customers, of which 10 have participated in in-depth discovery interviews. Our goal is to identify patterns and similarities between the interviewed and non-interviewed groups to better understand customer needs.

#### Goals
- **Identify Similarities:** We use clustering algorithms to detect similarities and patterns in customer behavior and preferences.
- **Extrapolate Interview Insights:** Insights from the interviews are leveraged to gain a deeper understanding of the entire customer base.
- **Enhance Value Delivery:** These findings will inform strategies to improve efficiency in delivering value to all customers, especially those not interviewed.

#### Approach
Our methodology involves a detailed analysis of customer data, utilizing clustering techniques to categorize customers based on various attributes. This process aims to reveal the extent to which non-interviewed customers resemble those interviewed, allowing us to apply these insights more broadly.

#### Data Source
[Our customer data, sourced from Kaggle](https://www.kaggle.com/datasets/shrutimechlearn/customer-data), comprises 200 data points with the following features:
- **CustomerID:** Assigned IDs range from 1 to 200.
- **Gender:** Distribution includes 56% Female and 44% Male customers.
- **Age:** Customers aged between 18 and 70 years.
- **Annual Income:** Ranges from $15,000 to $137,000.
- **Spending Score:** Scored between 1 and 99, indicating purchasing behavior.


In [22]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

In [33]:
# Import data
data_path = 'data/Mall_Customers.csv'
data = pd.read_csv(data_path)

In [34]:
# Format and Clean Data
data = data.drop(columns=['CustomerID'])
data = data.rename(columns={'Genre': 'Gender'})
data.columns = data.columns.str.lower()
data['gender'] = data['gender'].astype('category')

df = data.copy()

In [35]:
# Normalize Data
df[['age', 'annual_income_(k$)', 'spending_score']] = MinMaxScaler().fit_transform(data[['age', 'annual_income_(k$)', 'spending_score']])