Skip to content

Enderlogic/Markov-Blanket-based-Feature-Selection

Repository files navigation

Markov Blanket-based Feature Selection

This is an example code for this ICLR 2023 paper in which we proposed a feature selection algorithm called Markov Blanket-based Feature Selection (MBFS) for missing data imputation.

Command inputs:

  • data_type: synthetic or real-world
  • data_name: ecoli70 (synthetic) or breast (real-world)
  • data_size: data size (for synthetic data only)
  • missing_type: type of missingness (MCAR, MAR or MNAR)
  • error_rate: maximum error rate for partially observed variables
  • ratio_of_partially_observed_variables: proportion of partially observed variables
  • feature_selection: feature selection approach (None or mbfs)

Example:

$ python3 main.py --data_type 'synthetic' --data_name 'ecoli70' --data_size 1000 --missing_type 'MCAR' --error_rate 0.3 --ratio_of_partially_observed_variables 0.5 --feature_selection 'mbfs'

Output:

  • imputed_data: imputed data set

Reproducibility:

In order to reproduce our results, please do the following steps:

  1. Download the data sets from this repository and replace the original data folder.
  2. Execute the experiments_bn.py/experiments_uci.py in the reproducible files folder to generate imputed data sets for synthetic/real-world experiments. The imputed data will be saved in the imputed_data folder
  3. Execute the evaluation_bn.py/evaluation_uci.py in the reproducible files folder to produce the evaluation results for synthetic/real-world experiments. The final results will be saved as results_bn.csv/results_uci.csv in the main folder.
  4. Execute the plot.R in the reproducible files folder to plot the results of synthetic experiments. The plot will be saved in the main folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published