A company can develop rapidly when it knows the personality behavior of its customers so that it can provide better services and benefits to customers who have the potential to become loyal customers. By processing historical marketing campaign data to improve performance and target the right customers so they can make transactions on the company's platform, from these data insights our focus is to create a cluster prediction model to make it easier for companies to make decisions.
- Conversion Rate Analysis Based On Income, Spending And Age
- Data Modeling
- Customer Personality Analysis for Marketing Retargeting
Feature Name | Description |
---|---|
CUSTOMER ATTRIBUTES | |
Unnamed : 0 | Index number |
ID | Customer's unique identifier |
Year_Birth | Customer's birth year |
Education | Customer's education level |
Marital_Status | Customer's marital status |
Income | Customer's yearly household income |
Kidhome | Number of children in customer's household |
Teenhome | Number of teenagers in customer's household |
Dt_Customer | Date of customer's enrollment with the company |
Recency | Number of days since customer's last purchase |
Complain | 1 if the customer complained in the last 2 years, 0 otherwise |
PRODUCTS ATTRIBUTES | |
MntCoke | Amount spent on coke in last 2 years |
MntFruits | Amount spent on fruits in last 2 years |
MntMeatProducts | Amount spent on meat in last 2 years |
MntFishProducts | Amount spent on fish in last 2 years |
MntSweetProducts | Amount spent on sweets in last 2 years |
MntGoldProds | Amount spent on gold in last 2 years |
PROMOTION ATTRIBUTES | |
NumDealsPurchases | Number of purchases made with a discount |
AcceptedCmp1 | 1 if customer accepted the offer in the 1st campaign, 0 otherwise |
AcceptedCmp2 | 1 if customer accepted the offer in the 2nd campaign, 0 otherwise |
AcceptedCmp3 | 1 if customer accepted the offer in the 3rd campaign, 0 otherwise |
AcceptedCmp4 | 1 if customer accepted the offer in the 4th campaign, 0 otherwise |
AcceptedCmp5 | 1 if customer accepted the offer in the 5th campaign, 0 otherwise |
Response | 1 if the customer accepted the offer in the last campaign, 0 otherwise |
PLACE ATTRIBUTES | |
NumWebPurchases | Number of purchases made through the company’s website |
NumCatalogPurchases | Number of purchases made using a catalog |
NumStorePurchases | Number of purchases made directly in stores |
NumWebVisitsMonth | Number of visits to the company’s website in the last month |
Z_CostContact | Cost to contact a customer |
Z_Revenue | Revenue after client accepting campaign |
- There are 2240 lines with 30 features
- There is only 1 column with a null value, namely the Income column (24 null values)
- The data type for the Dt_Customers column needs to be changed to DateTime
- No duplicate data
- There is a lot of numerical data but not many outliers
- Perform feature extraction in the form of age features, number of children, number of transactions, number of expenses, conversion rate, etc. to become 36 features
1. Data Distribution
From the data distribution, it can be seen that many features are close to a normal distribution, despite Children
and TotalAccCmp
having a small real value. Meanwhile, other features have right-skewed.
2. Outliers Checking
Feature Age
, Income
, TotalSpending
, TotalTrx
, and CVR
have outliers. If we look at the outlier for Age
, it can be seen that the data does not make sense because it is more than 80 years old, so it is best to delete this row so that the clustering process avoids outliers. Likewise, the outliers in the Income column are worth more than 600,000,000. TotalSpending, TotalTrx, and CVR also show outliers so they need further handling.
3. Regression Plot of Features and Conversion Rate
4. Categorical Features
The categorical features look neat and clean, but for Marital Status
it can be simplified into some values.
Conversion rate analysis is a search for insight into data on the percentage of website visitors what actions they take while visiting the site, and whether their actions result in a purchase transaction or not while visiting the website. This can be done by performing feature engineering on the data variables presented so that it can produce a new column, that is the Conversion Rate.
Based on the cleaned data, the youngest age is 27 and the eldest is 80. Late twenties to thirties are our potential customers as we can see on the graph shows the highest conversion rate. The least potential is from groups 41-50 which is the middle group. The graph moves lower from the highest to the lowest group and the conversion rate then starts to grow as they get older (>51 years old).
The conversion rate tends to increase along with higher income groups. The highest conversion rate comes from the 90-100M income group. It indicates that income has a linear correlation with the conversion rate.
It can be seen that customer spending has a strong correlation with the conversion rate. The higher the spending the higher the conversion rate for them to do other transactions.
Before modeling, make sure the data has been cleaned and preprocessed. (The detailed steps are in Jupyter Notebook) In this stage, we will try to cluster the data based on some aspects or variables.
First, let's use the elbow method and visualize the inertia. Elbow method is a method that is often used to determine the number of clusters to be used in K-Means clustering. Inertia measures how well a dataset was clustered by K-Means. It is calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster.
The silhouette score of a point measures how close that point lies to its nearest neighbor points, across all clusters. It provides information about clustering quality which can be used to determine whether further refinement by clustering should be performed on the current clustering.
From the Elbow Method and Silhouette Score, the optimal cluster is 4 clusters and has good distribution data for each cluster.
The distribution of each cluster can be seen below.
The results of the clustering that has been carried out previously can be interpreted based on the characteristics of each group, how the cluster tends to respond to existing marketing campaigns, and what the potential revenue results will be if we carry out marketing retargeting to that cluster.
Now, let's see the statistics for each cluster from some features (Recency, Total Transactions, Spending, Total Accepted Campaign, and Conversion Rate).
The graph of the median from some features corresponding to each cluster.
It looks like Recency
and Age
don't have a big impact on differentiating the cluster because the gap between each cluster is low. We only know that cluster 0 has the biggest recency.
Meanwhile, Total Transactions
has a similar mean and median and we can conclude that clusters 0 and 3 are the highest. For other features, all the patterns seem similar where the most potential cluster in order are 0 > 3 > 2 > 1.
They tend to respond to existing marketing campaigns. This cluster has the most total transactions and the highest income & spending among others. This cluster also has the highest Conversion Rate. For this cluster, rewarding or sometimes giving a gift is highly recommended. The best campaign for Cluster 0 is they will get a special gift after spending a certain money (for example: for a minimum transaction of 1 million).
This cluster has many transactions same as Cluster 0 but they spent lower than Cluster 0. We can say that they may often make transactions but in small amounts because they also have lower income than Cluster 0. But when we look at the Conversion Rate is low compared to Cluster 0. It may be indicated that the large total transactions are coming from a large number of customers since this cluster has the most total customers (615 customers), because the tendency to convert the campaign is low. The best campaigns for Cluster 1 are to get lower prices for bundling products so that in one transaction the spending is higher than before or they can get special discounts after purchasing for some times (for example after 5 transactions) which will increase the conversion rate.
This cluster has total transactions and spending lower compared to the 2 previous clusters. But if we see from their income, it's quite normal (range 4 of 8). So, it may be indicated that these customers are economical customers who would only buy what they need. The best campaign for Cluster 2 is to offer high-quality products with high prices so even if they make fewer transactions, the spending still can be high.
This cluster has the least potential customers. They have the lowest rank for all indicators. It can be interpreted that because this cluster has the lowest income, it affected the total amounts of spending and total transactions, even the Conversion Rate. The best campaign for Cluster 1 is to make them start to buy new kinds of products to make them interested in buying by giving special prices for the first purchase.