This is an example code for this ICLR 2023 paper in which we proposed a feature selection algorithm called Markov Blanket-based Feature Selection (MBFS) for missing data imputation.
- data_type: synthetic or real-world
- data_name: ecoli70 (synthetic) or breast (real-world)
- data_size: data size (for synthetic data only)
- missing_type: type of missingness (MCAR, MAR or MNAR)
- error_rate: maximum error rate for partially observed variables
- ratio_of_partially_observed_variables: proportion of partially observed variables
- feature_selection: feature selection approach (None or mbfs)
$ python3 main.py --data_type 'synthetic' --data_name 'ecoli70' --data_size 1000 --missing_type 'MCAR' --error_rate 0.3 --ratio_of_partially_observed_variables 0.5 --feature_selection 'mbfs'
- imputed_data: imputed data set
In order to reproduce our results, please do the following steps:
- Download the data sets from this repository and replace the original data folder.
- Execute the experiments_bn.py/experiments_uci.py in the reproducible files folder to generate imputed data sets for synthetic/real-world experiments. The imputed data will be saved in the imputed_data folder
- Execute the evaluation_bn.py/evaluation_uci.py in the reproducible files folder to produce the evaluation results for synthetic/real-world experiments. The final results will be saved as results_bn.csv/results_uci.csv in the main folder.
- Execute the plot.R in the reproducible files folder to plot the results of synthetic experiments. The plot will be saved in the main folder.