Project for customer identification, and purchase prediction modelling using Principal Component Analysis, K-means clustering, Random Forest Classifiers and XGBoost.
- Project Motivation
- Installations
- File Descriptions
- How To Interact With the Project
- Licensing, Authors, Acknowledgements
Arvato, a financial services company in Germany, is hoping to expand its consumer base. However, they need insight into which consumers they could target. They have provided demographic information about their customers, and would like someone to figure out the characteristics of their customers.
In order to expand their consumer base, they would like to know which individuals are most likely to buy their product from the general population. They hope someone is able to build a model to identify these customers. There is a lot of data to deal with, and their hope is that someone can simplify the process and extract as much value from the data as possible.
Below are a list of libraries that were used in this project:
- Numpy
- Pandas
- Random
- Seaborn
- Sklearn
- Matplotlib
- Statistics
- Yellowbrick
If you would like to clone the project, make sure to use pip install to download the libraries.
The main file, Arvato Project Workbook, contains the code and analysis of the project. Open the file to view the entire analysis of the Arvato financial services data.
The two main data files are the CUSTOMERS and AZDIAS files. These are not included as the datasets are too large to upload to Github. However, the Udacity_MAILOUT_052018_TEST and Udacity_MAILOUT_052018_TRAIN files, will be used to build a model for customer identification as these are subsets of the CUSTOMERS file.
The project layout is relatively simple. Click on the Arvato Project Workbook.ipynb file to view the entire project. The blog post can be viewed at https://bit.ly/35816oG. The blog post is more concise, and covers the most important aspects of the project.
If, however, you have any issues viewing the notebook, it may be due to the notebook not being able to render on GitHub. Alternatively, you can view the project at https://nbviewer.org/github/Danieldacruz7/Customer-Segmentation-Modelling/blob/main/Arvato%20Project%20Workbook.ipynb.
I would like to thank Arvato financial services for providing the private datasets, as well as the idea for the project. These included the customer, population and training and test datasets. I would also like to thank Udacity for all the lessons that were taught, and for the highly engaging content.