A fictitious company in San Diego is trying to expand its Clothing Business and planning to identify the customers in the city and also planning to send out their offers and deals to those customers who have income >50k to promote their brand. To determine the potential customers they obtained a census data from UCI repository (https://archive.ics.uci.edu/ml/datasets/Census+Income). Now, the goal is to build a classification predictive model which identifies the potential customers using the collected census data.
SparkML is used for building the predictive model. Techniques of SparkMLlib like Feature Transformation, Model Selection, Hyper parameter Tuning and Evaluation metrics are used to transform the data into usable format, to identify the best model with best parameters and finally evaluating the model built on metrics like AreaROC.
To view the notebook and results, please use this link: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/43821639432640/601698659811114/3675512428940269/latest.html