Skip to content

The goal of this project is to Predict whether income exceeds $50K/yr based on census data. SparkMLlib is used to build a prediction model. Techniques of SparkMLlib like Feature Transformation, Model Selection, Hyper parameter Tuning and Evaluation metrics are used.

RavaliGupta/Customer-Identification-for-Business-Expansion-using-SparkML

Repository files navigation

A fictitious company in San Diego is trying to expand its Clothing Business and planning to identify the customers in the city and also planning to send out their offers and deals to those customers who have income >50k to promote their brand. To determine the potential customers they obtained a census data from UCI repository (https://archive.ics.uci.edu/ml/datasets/Census+Income). Now, the goal is to build a classification predictive model which identifies the potential customers using the collected census data.

SparkML is used for building the predictive model. Techniques of SparkMLlib like Feature Transformation, Model Selection, Hyper parameter Tuning and Evaluation metrics are used to transform the data into usable format, to identify the best model with best parameters and finally evaluating the model built on metrics like AreaROC.

To view the notebook and results, please use this link: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/43821639432640/601698659811114/3675512428940269/latest.html

About

The goal of this project is to Predict whether income exceeds $50K/yr based on census data. SparkMLlib is used to build a prediction model. Techniques of SparkMLlib like Feature Transformation, Model Selection, Hyper parameter Tuning and Evaluation metrics are used.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published