This is a supervised classification problem to predict if the client would subscribe to a term deposit based on a marketing campaign.
There has been a revenue decline for a Portuguese bank and they would like to know what actions to take. After investigation, they found out that the root cause is that their clients are not depositing as frequently as before. Knowing that term deposits allow banks to hold onto a deposit for a specific amount of time, so banks can invest in higher gain financial products to make a profit. In addition, banks also hold better chance to persuade term deposit clients into buying other products such as funds or insurance to further increase their revenues. As a result, the Portuguese bank would like to identify existing clients that have higher chance to subscribe for a term deposit and focus marketing efforts on such clients.
Predict if the client will subscribe to a term deposit based on the analysis of the marketing campaigns the bank performed.
We will be using AUC - Probability to discriminate between subscriber and non-subscriber.
The main objective of this template is to take you through the entire working pipeline that was followed while appraoching a Machine Learning problem.
- Analysis and data processing of Bank Term Subscription past data
- Training & testing from Past data and prediction on new data.
- Understand business object.
- Translate business objectives to Data Science problem.
- Load the data and check the dimensions.
- Create a data dictionary.
- Look for null value identification.
- Null values treatment.
- Outlier detection.
- EDA- Univariate.
- EDA- Bivariate
- Outlier treatment(Optional).
- Convert categories to numbers: Label encoding/one-hot encoding.
- Split data into train and validation sets(20:80 ratio)
- Treat class imbalance only on training data.
- Train ML model on training dataset.
- Make predictions on validation dataset.
- Compare predictions with actual values (AUC score)
- Test points 13,14 and 15 with multiple algorithms.
- Select the algorithm with best result for deployment.
- Evaluate results on blind data/ future data and retrain the model if needed.
- Present findings to stake holders
- Improvements.