Skip to content

Vaibhav3M/Customer_Segmentation-Arvato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arvato-Udacity-Capstone

In this project, I will analyze demographics data for customers of a mail-order sales company in Germany, comparing it against demographics information for the general population.

Throughout this project, I will be focusing on the following:

  • Use unsupervised learning techniques to perform customer segmentation
  • identifying the parts of the population that best describe the core customer base of the company.
  • Apply learning on a third dataset with demographics information for targets of a marketing campaign for the company and use a model to predict which individuals are most likely to convert into becoming customers for the company.

Dataset

The data for this project was provided by Arvato and cannot be shared publicly.

Kaggle link

https://www.kaggle.com/c/udacity-arvato-identify-customers

Libraries used:

  • numpy==1.18.3
  • pandas==0.23.4
  • scikit-learn==0.22.2.post1
  • matplotlib==3.0.3
  • seaborn==0.9.1

Files

  • Arvato Project Workbook.ipynb
    The notebook is divided into 3 major segments:
    Part 0: Get to Know the Data : In this part I have a look at the data and perform necessary data preprocessing steps like handling missing values, scaling the data and modifying column names.
    Part 1: Customer Segmentation Report : Performed PCA and k-means to describe the relationship between the demographics of the company's existing customers and the general population of Germany.
    Part 2: Supervised Learning Model : Here I have tested and finalized a classification model for prediction. Various models were tried and GridSearchCV was used for hypertuning paramteres for the final model.

  • Helper.py

This file contains helper methods to perform analysis above. It contains data preprocessing, plotting and gridsearch implementations.

Results

After training multiple machine learning models and comparing their results, CatBoost Classifier achieved the best results with ROC AUC score of 0.80028

For detailed result analysis read the below Medium article:

Medium post :
https://medium.com/@malhotra.vaibhav0304/effectively-target-customers-use-data-for-customer-segmentation-fb6425b593fd

About

Arvato Customer Segmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published