Research Project on a completely anonymized dataset (no idea about the data, features and features had no labels). Data files are apache parquet and size is 7 million. Highly imbalanced dataset with 99.995 and 0.005 percentage of 0's and 1's respectively.
Goal 1: To find useful features which are affecting the output variables from the list of 500+ variables. Goal 2: To build a predictive model based on the new features