## Project 2: Customer Segmentation

## Objective: 
### Assist the automobile company in segmenting new potential customers for targeted outreach and communication based on the existing successful segmentation strategy in their current market.

Project Stages:

1. Familiarize Myself with the Dataset:  

    <ins>Objective</ins>: Familiarize myself with the dataset provided for the analysis.  
    <ins>Steps</ins>: Clean and preprocess the dataset to ensure accuracy and relevance.  

2. Exploratory Data Analysis (EDA):  

    <ins>Objective</ins>: Understand the characteristics of the dataset and identify patterns that may influence customer segmentation.  
    <ins>Steps</ins>: Conduct exploratory analysis to grasp the distribution of variables such as gender, marital status, age, education, profession, work experience, spending score, family size, and anonymized category (Var_1).
    Visualize relationships and trends within the dataset to uncover potential insights that may inform customer segmentation.   

3. Data Preprocessing:  

    <ins>Objective</ins>: Prepare the dataset for segmentation modeling by handling missing values and encoding categorical variables.  
    <ins>Steps</ins>: Address any missing or inconsistent data points.
    Encode categorical variables, such as gender, marital status, education, profession, and Var_1, to make them suitable for segmentation modeling.  

4. Customer Segmentation Modeling:  

    <ins>Objective</ins>: Develop a segmentation model to categorize new potential customers into groups similar to the existing customer segments (A, B, C, D).  
    <ins>Steps</ins>: Define the target variable as "Segmentation" and other relevant features for segmentation modeling.
    Choose an appropriate segmentation algorithm (e.g., k-means clustering).
    Train the model on the existing customer data with known segments (A, B, C, D).
    Apply the trained model to predict segments for the new potential customers.  

5. Segmentation Validation:  

    <ins>Objective</ins>: Validate the segmentation model's effectiveness and assess its performance on the existing dataset.  
    <ins>Steps</ins>: Evaluate the model's performance using relevant metrics (e.g., silhouette score, if using k-means).
    Validate the predicted segments against the known segments in the existing dataset.
    Adjust the model as needed based on validation results.  

6. Interpretation and Profiling:  

    <ins>Objective</ins>: Interpret the characteristics of each customer segment and create customer profiles.  
    <ins>Steps</ins>: Analyze the features that contribute to the segmentation of customers.
    Create detailed profiles for each segment, highlighting the distinguishing characteristics.
    Provide insights into the preferences and behaviors associated with each segment.  

5. Recommendations for Outreach:  

    <ins>Objective</ins>: Provide actionable recommendations for targeted outreach and communication strategies for each customer segment.  
    <ins>Steps</ins>: Based on the identified customer profiles, suggest personalized communication approaches for each segment.
    Highlight specific product offerings or marketing messages that may resonate with each segment.
    Propose strategies for engaging customers in a way that aligns with their segment preferences.  

6. Presentation of Results:  

    <ins>Objective</ins>: Compile and present a comprehensive report for the management team, emphasizing the identified segments and actionable recommendations.  
    <ins>Steps</ins>: Organize key findings, segmentation results, and outreach recommendations into a clear and cohesive report.
    Prepare a visually compelling presentation summarizing the project, showcasing the effectiveness of the segmentation model, and providing strategic guidance for customer outreach in new markets.  

8. Deliverables:  

    <ins>Dataset</ins>: Cleaned and preprocessed dataset.  
    <ins>Analysis Report</ins>: Detailed report showcasing customer segments, behavioral insights, and actionable recommendations.  
    <ins>Machine Learning Model (if applicable)</ins>: Presentation of the developed ML model and its recommendations.  
    <ins>Presentation</ins>: Engaging presentation summarizing key findings and strategies for tailored marketing.  

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px    
import plotly.graph_objects as go

### Stage 1 - Familiarization with the Data

In [3]:
customer_data = pd.read_csv("Customer_Segmentation_Dataset.csv")

In [4]:
customer_data.columns

Index(['ID', 'Gender', 'Ever_Married', 'Age', 'Graduated', 'Profession',
       'Work_Experience', 'Spending_Score', 'Family_Size', 'Var_1',
       'Segmentation'],
      dtype='object')

In [5]:
customer_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8068 entries, 0 to 8067
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   ID               8068 non-null   int64  
 1   Gender           8068 non-null   object 
 2   Ever_Married     7928 non-null   object 
 3   Age              8068 non-null   int64  
 4   Graduated        7990 non-null   object 
 5   Profession       7944 non-null   object 
 6   Work_Experience  7239 non-null   float64
 7   Spending_Score   8068 non-null   object 
 8   Family_Size      7733 non-null   float64
 9   Var_1            7992 non-null   object 
 10  Segmentation     8068 non-null   object 
dtypes: float64(2), int64(2), object(7)
memory usage: 693.5+ KB


In [6]:
customer_data.describe()

Unnamed: 0,ID,Age,Work_Experience,Family_Size
count,8068.0,8068.0,7239.0,7733.0
mean,463479.214551,43.466906,2.641663,2.850123
std,2595.381232,16.711696,3.406763,1.531413
min,458982.0,18.0,0.0,1.0
25%,461240.75,30.0,0.0,2.0
50%,463472.5,40.0,1.0,3.0
75%,465744.25,53.0,4.0,4.0
max,467974.0,89.0,14.0,9.0


In [2]:
customer_data[customer_data['Segmentation'] == 'A'].info()

NameError: name 'customer_data' is not defined

In [None]:
customer_data[customer_data['Segmentation'] == 'B'].info()

In [None]:
customer_data[customer_data['Segmentation'] == 'C'].info()

In [None]:
customer_data[customer_data['Segmentation'] == 'D'].info()

In [7]:
customer_data.head()

Unnamed: 0,ID,Gender,Ever_Married,Age,Graduated,Profession,Work_Experience,Spending_Score,Family_Size,Var_1,Segmentation
0,462809,Male,No,22,No,Healthcare,1.0,Low,4.0,Cat_4,D
1,462643,Female,Yes,38,Yes,Engineer,,Average,3.0,Cat_4,A
2,466315,Female,Yes,67,Yes,Engineer,1.0,Low,1.0,Cat_6,B
3,461735,Male,Yes,67,Yes,Lawyer,0.0,High,2.0,Cat_6,B
4,462669,Female,Yes,40,Yes,Entertainment,,High,6.0,Cat_6,A


In [8]:
customer_data.shape

(8068, 11)

In [9]:
customer_data.isnull().sum()

ID                   0
Gender               0
Ever_Married       140
Age                  0
Graduated           78
Profession         124
Work_Experience    829
Spending_Score       0
Family_Size        335
Var_1               76
Segmentation         0
dtype: int64

In [10]:
customer_data_cleaned = customer_data.dropna()

In [11]:
customer_data_cleaned.shape

(6665, 11)

In [13]:
8068 - 6665

1403

In [14]:
customer_data_cleaned.isnull().sum()

ID                 0
Gender             0
Ever_Married       0
Age                0
Graduated          0
Profession         0
Work_Experience    0
Spending_Score     0
Family_Size        0
Var_1              0
Segmentation       0
dtype: int64

Need to assign imputation of data to the columns Ever_Married, Graduated, Profession, Work_Experience, Family_Size, and Var_1 where possible. Probably need to segment data by Segmentation and then by other columns as much as possible in order to attempt accurate imputation.

In [1]:
customer_data[customer_data['Ever_Married'].isnull()]

NameError: name 'customer_date' is not defined

In [15]:
customer_data_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6665 entries, 0 to 8067
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   ID               6665 non-null   int64  
 1   Gender           6665 non-null   object 
 2   Ever_Married     6665 non-null   object 
 3   Age              6665 non-null   int64  
 4   Graduated        6665 non-null   object 
 5   Profession       6665 non-null   object 
 6   Work_Experience  6665 non-null   float64
 7   Spending_Score   6665 non-null   object 
 8   Family_Size      6665 non-null   float64
 9   Var_1            6665 non-null   object 
 10  Segmentation     6665 non-null   object 
dtypes: float64(2), int64(2), object(7)
memory usage: 624.8+ KB


In [None]:
customer_data_cleaned.to_csv("Cleaned_Customer_Segmentation_Dataset.csv")

In [16]:
customer_data.groupby("Profession")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f9f71bc4a50>