Skip to content

Analyzing the Bank Marketing dataset using various data mining techniques to predict if the costumer subscribe term deposit.

Notifications You must be signed in to change notification settings

cghimire/Bank-Marketing-Data-Mining

Repository files navigation

Project logo

Bank Marketing Data Mining


This project applied different data mining classification techniques to build a model to predict whether the customer will subscribe bank long-term deposit. A Portuguese retail bank was collected customer data from 2008 through 2013 based on telephone communication.

📝 Table of Contents

🧐 About

This project aims to build a suitable model to recommend the best model because it can help the bank filter clients and use available resources more efficiently to achieve their goal.

The objective of this data analysis project is to demonstrate how the classification and predictive analytics study can be applied to produce real, tangible improvements in a company’s business performance.

🎈 Data Understanding and Exploring

alt text

This plot demonstrates the correlation between different variables. There is no strong relation between predictors and predicted variable y, however, there is some relationship between predictor variable duration and predicted output y. alt text

This plot shows the number of clients Vs job category. The highest number of clients are from the job category "admin" followed by blue-color category. Similarly, there are less students involved in the telemarketing campaign.

⛏️ Data Preparation

alt text

This figure compares the two different plots with outliers and without outliers.

🚀 Data Modeling

In order to model the data, I am performing three data-mining classification techniques: 1) Logistic Regression 2)Decision Tree Model 3) Random Forest Model.

alt text

This figure represents the decision tree structure. For example, If number of employed is greater than 5088, then that client belongs to NO category with 94% of probability: that means the client is more likely to say NO.

Model Evaluation and Conclusion

alt text

This figure shows Effect of increasing tree count on accuracy in Random Forest Model.

I performed three different classification models to classify whether a customer would open a bank account or not. Based on the model build for this project, Decision Tree and Random Forest model are more accurate to predict the output. The Random Forest model is a recommended model for this classification problem.

Since I have been using different data mining techniques, I am expecting the proposed classification models are powerful to predict the output. However, the proposed methods has some limitations. It is not feasible to study all the variables in detail, which might be interesting to predict the output, because of time limitation.

🎉 Acknowledgements

I would like to thank some special peoples who helped me a lot on this project.My terrific professor, Dr. Xinlian Liu, encouraged me to start this project. His ideas and suggestions are always valuable for me not only for this project but also for my further career in data science. Finally, I am very thankful to my entire CS 522- Data Mining class for their feedback and encouragement.

About

Analyzing the Bank Marketing dataset using various data mining techniques to predict if the costumer subscribe term deposit.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published