Inroduction

This project predicts the purchase of a bank product using Naive Bayes based on 15 features of 4520 customer records. Some of these features are categorical and rest of them are numerical. The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable target).

Dataset

The Dataset consists of 4521 individual customer records with 15 different features each. 7 of the features are continious feature and rest of them are numerical. The dataset can be downloaded here: https://drive.google.com/file/d/14B93Y0wfHHkH4B1V5l_JdgCEnnPEiuXt/view?usp=sharing

Source: https://archive.ics.uci.edu/ml/datasets/bank+marketing

Approach

The problem being a binary classification problem I use a naive bayes algorithm which requires the dataset to be divided into two parts. Subset for the 'Yes' class and Subset for the 'No' class. Then featurewise likelihoods are estimated (Gaussian Distribution for Continious features and counting for categorical features.)

Features

Naive Bayes Algorithm is wriiten with Hadoop MapReduce
Big Data file handling using HiveQL
The Project assumes the dataset to be of Very Large size, so this approach can be equally used for any other big datasets (Petabytes)
The Performance tweaking by dropping features is very easy and can be achieved by a very nominal change in the Probability_Compute_Mapper.
One can easily change the Gaussian Distribution to any other distribution (e.g - Chi Square) by changing the code of Gauusian.java to explore more improvements.

Future Scope

The Dataset is not sufficiently large to capture a very general domain distribution hence Data Augmentation is required to generate more data on the domain to achieve better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Categorical_Probs.java		Categorical_Probs.java
Gaussian.java		Gaussian.java
LICENSE		LICENSE
Load_Split.sql		Load_Split.sql
Predict_Mapper.java		Predict_Mapper.java
Prediction_Driver.java		Prediction_Driver.java
Probability_Compute_Driver.java		Probability_Compute_Driver.java
Probability_Compute_Mapper.java		Probability_Compute_Mapper.java
README.md		README.md
Summary.java		Summary.java
cat_count_n.sql		cat_count_n.sql
cat_count_y.sql		cat_count_y.sql
mean_sd.sql		mean_sd.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inroduction

Dataset

Approach

Features

Future Scope

About

Releases

Packages

Languages

License

Arko98/Product_Purchase_Prediction

Folders and files

Latest commit

History

Repository files navigation

Inroduction

Dataset

Approach

Features

Future Scope

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages