Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Predicted probabilities from machine learning classification algorithms may be used to tackle imbalance data. The study uses the Portuguese bank marketing dataset as a case study, as published in Towards Data Science on Medium.com

Notifications You must be signed in to change notification settings

at-tan/Predicted_Probabilities_Bank_Marketing

Repository files navigation

Tackling Imbalanced Data with Predicted Probabilities

Predicted probabilities from classification algorithms provide a key tuning mechanism to help boost their predictive power, especially in cases of imbalanced data. However, the predicted probabilities need to be calibrated before they may be used to indicate the optimal probability threshold to maximise the scoring metric of choice.

The article discusses some key scoring metrics for imbalanced data. It then explores the differences in predicted probabilities across five machine learning algorithms, namely Logistic Regression, Naive Bayes, Random Forest, Support Vector Classification and XG Boost. It finally demonstrates how predicted probabilities may be used to improve these models' performance in a case study. The data is adapted from the 2014 Portuguese bank marketing dataset, where the target variable is successful subscriptions to a term deposit. The probability of picking a "yes" in the target variable is just 11.27%.

These files contain the data and Python code used for the article published in: https://towardsdatascience.com/tackling-imbalanced-data-with-predicted-probabilities-3293602f0f2?sk=b3f5fda7915625b7008f454d9c4046a6

About

Predicted probabilities from machine learning classification algorithms may be used to tackle imbalance data. The study uses the Portuguese bank marketing dataset as a case study, as published in Towards Data Science on Medium.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published