Skip to content

Project for customer identification, and purchase prediction modelling using Principal Component Analysis, K-means clustering, XGBoost and Random Forest Classifiers.

Notifications You must be signed in to change notification settings

Danieldacruz7/Customer-Segmentation-Modelling

Repository files navigation

Customer-Segmentation-Modelling

Project for customer identification, and purchase prediction modelling using Principal Component Analysis, K-means clustering, Random Forest Classifiers and XGBoost.

Table of Contents

  1. Project Motivation
  2. Installations
  3. File Descriptions
  4. How To Interact With the Project
  5. Licensing, Authors, Acknowledgements

Project Motivation

Arvato, a financial services company in Germany, is hoping to expand its consumer base. However, they need insight into which consumers they could target. They have provided demographic information about their customers, and would like someone to figure out the characteristics of their customers.

In order to expand their consumer base, they would like to know which individuals are most likely to buy their product from the general population. They hope someone is able to build a model to identify these customers. There is a lot of data to deal with, and their hope is that someone can simplify the process and extract as much value from the data as possible.

Installations

Below are a list of libraries that were used in this project:

  • Numpy
  • Pandas
  • Random
  • Seaborn
  • Sklearn
  • Matplotlib
  • Statistics
  • Yellowbrick

If you would like to clone the project, make sure to use pip install to download the libraries.

File Descriptions

The main file, Arvato Project Workbook, contains the code and analysis of the project. Open the file to view the entire analysis of the Arvato financial services data.

The two main data files are the CUSTOMERS and AZDIAS files. These are not included as the datasets are too large to upload to Github. However, the Udacity_MAILOUT_052018_TEST and Udacity_MAILOUT_052018_TRAIN files, will be used to build a model for customer identification as these are subsets of the CUSTOMERS file.

How To Interact With the Project

The project layout is relatively simple. Click on the Arvato Project Workbook.ipynb file to view the entire project. The blog post can be viewed at https://bit.ly/35816oG. The blog post is more concise, and covers the most important aspects of the project.

If, however, you have any issues viewing the notebook, it may be due to the notebook not being able to render on GitHub. Alternatively, you can view the project at https://nbviewer.org/github/Danieldacruz7/Customer-Segmentation-Modelling/blob/main/Arvato%20Project%20Workbook.ipynb.

Licensing, Authors, Acknowledgements

I would like to thank Arvato financial services for providing the private datasets, as well as the idea for the project. These included the customer, population and training and test datasets. I would also like to thank Udacity for all the lessons that were taught, and for the highly engaging content.

About

Project for customer identification, and purchase prediction modelling using Principal Component Analysis, K-means clustering, XGBoost and Random Forest Classifiers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published