Skip to content

Data Analysis and Feature selection on Spark on anonymous data

Notifications You must be signed in to change notification settings

akshatshreemali/Anonymized_Data_Analysis

Repository files navigation

Anonymized_Data_Analysis

Research Project on a completely anonymized dataset (no idea about the data, features and features had no labels). Data files are apache parquet and size is 7 million. Highly imbalanced dataset with 99.995 and 0.005 percentage of 0's and 1's respectively.

Goal 1: To find useful features which are affecting the output variables from the list of 500+ variables. Goal 2: To build a predictive model based on the new features

About

Data Analysis and Feature selection on Spark on anonymous data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published