Azure Databricks notebooks that use Iris dataset from Sklearn for Feature Engineering of Continuous Values and Feature Selection
Mounts containers for storing processed files
Reads iris dataset from sklearn libraries and preprocesses dataframes for Features, Targets and Features + Targets and saves dataframes to Parquet files in mounted containers
Reads in Features + Targets from Parquet files in mounted containers and Peforms Pandas Profiling on the entire dataframe. Identify which columns are not highly correlated with target and each other, identifies duplicates and rows / columns that are missing
Reads in Features from Parquet files in mounted containers and scales the values using different methods
Reads in Features + Targets from Parquet files in mounted containers and computes / plots importance of each feature