# Customer Personality Analysis

## Understanding and Cleaning the Dataset

## ID:

Customer's unique identifier

## Year_Birth:

Customer's birth year

## Education: 

Customer's education level

## Marital_Status:

Customer's marital status

## Income: 

Customer's yearly household income

## Kidhome:

Number of children in customer's household

## Teenhome: 

Number of teenagers in customer's household

## Dt_Customer: 

Date of customer's enrollment with the company

## Recency:

Number of days since customer's last purchase

## Complain: 

1 if the customer complained in the last 2 years, 0 otherwise

## MntWines:

Amount spent on wine in last 2 years

## MntFruits: 

Amount spent on fruits in last 2 years

## MntMeatProducts:

Amount spent on meat in last 2 years

## MntFishProducts: 

Amount spent on fish in last 2 years

## MntSweetProducts: 

Amount spent on sweets in last 2 years

## MntGoldProds: 

Amount spent on gold in last 2 years


## NumDealsPurchases: 

Number of purchases made with a discount

## AcceptedCmp1:

1 if customer accepted the offer in the 1st campaign, 0 otherwise

## AcceptedCmp2: 

1 if customer accepted the offer in the 2nd campaign, 0 otherwise

## AcceptedCmp3:

1 if customer accepted the offer in the 3rd campaign, 0 otherwise

## AcceptedCmp4:

1 if customer accepted the offer in the 4th campaign, 0 otherwise

## AcceptedCmp5: 

1 if customer accepted the offer in the 5th campaign, 0 otherwise

## Response: 

1 if customer accepted the offer in the last campaign, 0 otherwise

## NumWebPurchases:

Number of purchases made through the company’s website

## NumCatalogPurchases: 

Number of purchases made using a catalogue

## NumStorePurchases: 

Number of purchases made directly in stores

## NumWebVisitsMonth: 

Number of visits to company’s website in the last month

In [53]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn import metrics

In [9]:
# Loading the Dataset
df = pd.read_csv('marketing_campaign.csv', sep='\t')

In [10]:
df

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,08-03-2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,10-02-2014,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,19-01-2014,94,173,...,5,0,0,0,0,0,0,3,11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2235,10870,1967,Graduation,Married,61223.0,0,1,13-06-2013,46,709,...,5,0,0,0,0,0,0,3,11,0
2236,4001,1946,PhD,Together,64014.0,2,1,10-06-2014,56,406,...,7,0,0,0,1,0,0,3,11,0
2237,7270,1981,Graduation,Divorced,56981.0,0,0,25-01-2014,91,908,...,6,0,1,0,0,0,0,3,11,0
2238,8235,1956,Master,Together,69245.0,0,1,24-01-2014,8,428,...,3,0,0,0,0,0,0,3,11,0


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   i

In [13]:
# Identifying Null Values
df.isnull().sum()

ID                      0
Year_Birth              0
Education               0
Marital_Status          0
Income                 24
Kidhome                 0
Teenhome                0
Dt_Customer             0
Recency                 0
MntWines                0
MntFruits               0
MntMeatProducts         0
MntFishProducts         0
MntSweetProducts        0
MntGoldProds            0
NumDealsPurchases       0
NumWebPurchases         0
NumCatalogPurchases     0
NumStorePurchases       0
NumWebVisitsMonth       0
AcceptedCmp3            0
AcceptedCmp4            0
AcceptedCmp5            0
AcceptedCmp1            0
AcceptedCmp2            0
Complain                0
Z_CostContact           0
Z_Revenue               0
Response                0
dtype: int64

In [14]:
#Columns with '?'
columns_with_null = df.columns[df.map(lambda x: '?' in str(x)).any()].values
print('The columns with Null Values(?):', columns_with_null)

#Rows with '?'
rows_with_null = df[df.map(lambda x: '?' in str(x)).any(axis=1)]
print('\nRows with Null Values(?):')
rows_with_null

The columns with Null Values(?): []

Rows with Null Values(?):


Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response


In [18]:
# Dropping Null Values
df.dropna(inplace=True)

In [19]:
# Final Shape
print(df.shape)

(2216, 29)


In [52]:
# Value counts for 'Education' & 'Marital Status'
print('Value Count for:\n')
print(df['Education'].value_counts())
print('\nValue Count for:\n')
print(df['Marital_Status'].value_counts())

Value Count for:

Education
Graduation    1116
PhD            481
Master         365
2n Cycle       200
Basic           54
Name: count, dtype: int64

Value Count for:

Marital_Status
Married     857
Together    573
Single      471
Divorced    232
Widow        76
Alone         3
Absurd        2
YOLO          2
Name: count, dtype: int64


# Feature Engineering

In [58]:
# Changing 'Year_Birth' to current age
df['Age'] = 2024 - df['Year_Birth']

# 'Marital Status' has too many values. Convert it to only 'Attached' & 'Single'
df['Relationship_Status'] = df['Marital_Status'].replace({'Married': 'Attached',
                                                          'Together': 'Attached', 
                                                          'Divorced': 'Single',
                                                          'Widow': 'Single',
                                                          'Alone': 'Single',
                                                          'Absurd': 'Single',
                                                          'YOLO': 'Single'})
# Doing the same for 'Education'
df['Education'] = df['Education'].replace({'Graduation': 'Graduate',
                                           'PhD': 'Post-Grad',
                                          'Master': 'Post-Grad',
                                          '2n Cycle': 'Pre-Grad',
                                          'Basic': 'Pre-Grad'})


# For Comprhensibility
df = df.rename(columns={'MntWines': 'Wines',
                        'MntFruits': 'Fruits',
                        'MntMeatProducts': 'Meat',
                        'MntFishProducts': 'Fish',
                        'MntSweetProducts': 'Sweet',
                        'MntGoldProds': 'Gold'})

# Total number of people in a household
df['Children'] = df['Kidhome'] + df['Teenhome']
df['Total_Household'] = df['Relationship_Status'].replace({'Single': 1, 'Attached': 2}) + df['Children']

# Adding new column labeled 'Parent'
df['Parent'] = np.where(df['Children'] > 0, 1, 0)

# Dropping recurring feature
df.drop(columns=['Marital_Status', 'Dt_Customer', 'Z_CostContact', 'Z_Revenue', 'Year_Birth', 'ID'], inplace=True)

In [60]:
# Updated Dataset
df

Unnamed: 0,Education,Income,Kidhome,Teenhome,Recency,Wines,Fruits,Meat,Fish,Sweet,...,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Age,Relationship_Status,Children,Total_Household,Parent
0,Graduate,58138.0,0,0,58,635,88,546,172,88,...,0,0,0,0,1,67,Single,0,1,0
1,Graduate,46344.0,1,1,38,11,1,6,2,1,...,0,0,0,0,0,70,Single,2,3,1
2,Graduate,71613.0,0,0,26,426,49,127,111,21,...,0,0,0,0,0,59,Attached,0,2,0
3,Graduate,26646.0,1,0,26,11,4,20,10,3,...,0,0,0,0,0,40,Attached,1,3,1
4,Post-Grad,58293.0,1,0,94,173,43,118,46,27,...,0,0,0,0,0,43,Attached,1,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2235,Graduate,61223.0,0,1,46,709,43,182,42,118,...,0,0,0,0,0,57,Attached,1,3,1
2236,Post-Grad,64014.0,2,1,56,406,0,30,0,0,...,0,1,0,0,0,78,Attached,3,5,1
2237,Graduate,56981.0,0,0,91,908,48,217,32,12,...,0,0,0,0,0,43,Single,0,1,0
2238,Post-Grad,69245.0,0,1,8,428,30,214,80,30,...,0,0,0,0,0,68,Attached,1,3,1


In [61]:
# Describe the Dataset
df.describe()

Unnamed: 0,Income,Kidhome,Teenhome,Recency,Wines,Fruits,Meat,Fish,Sweet,Gold,...,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Age,Children,Total_Household,Parent
count,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,...,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0,2216.0
mean,52247.251354,0.441787,0.505415,49.012635,305.091606,26.356047,166.995939,37.637635,27.028881,43.965253,...,0.074007,0.073105,0.064079,0.013538,0.009477,0.150271,55.179603,0.947202,2.592509,0.71435
std,25173.076661,0.536896,0.544181,28.948352,337.32792,39.793917,224.283273,54.752082,41.072046,51.815414,...,0.261842,0.260367,0.24495,0.115588,0.096907,0.357417,11.985554,0.749062,0.905722,0.451825
min,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,28.0,0.0,1.0,0.0
25%,35303.0,0.0,0.0,24.0,24.0,2.0,16.0,3.0,1.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,47.0,0.0,2.0,0.0
50%,51381.5,0.0,0.0,49.0,174.5,8.0,68.0,12.0,8.0,24.5,...,0.0,0.0,0.0,0.0,0.0,0.0,54.0,1.0,3.0,1.0
75%,68522.0,1.0,1.0,74.0,505.0,33.0,232.25,50.0,33.0,56.0,...,0.0,0.0,0.0,0.0,0.0,0.0,65.0,1.0,3.0,1.0
max,666666.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,262.0,321.0,...,1.0,1.0,1.0,1.0,1.0,1.0,131.0,3.0,5.0,1.0
