We use python3.6
for all our experiments. They will not work with earlier versions.
To run our scripts, we have packaged the dependencies in requirements.txt
.
To install them, you will need pip
. Run pip install -r requirements.txt
.
It is recommended that you do this inside a virtual environment like virtualenv
though.
For each of the figures in paper, we included the script that has generated the figure. The scripts have been simplified to remove some of the command line parameters and options, but are otherwise very similar to what we use.
Here is the mapping between figures and scripts:
- Figure 1 is generated by
throughput_vs_size_hexplot.py
. Simply runpython throughput_vs_size_hexplot.py
to create the figure. - Figure 2 is generated by
distance_matrices.py
. Simply runpython distance_matrices.py
to create the figure. - Figure 3 is generated by
tree_breakdown.py
. Simply runpython tree_breakdown.py
to create the figure. - Figure 4 is generated by
local_vs_global_models.py
. Simply runpython local_vs_global_models.py
to create the figure. - Figure 5 is generated by
permutation_feature_importance.py
. Simply runpython permutation_feature_importance.py
to create the figure. - Figure 6 is generated by
dashboard.py
. Simply runpython dashboard.py
to create the figure.
Aside from that, the root directory contains only two scripts:
dataset.py
, which loads the (anonimized) dataset and contains the preprpocessing pipeline. Note that the pipeline is not used, since as we had to anonimize data, and as we are forced to keep files under 100MB. Instead, we preprocessed the data ahead of time and stored it indata/anonimized_io.csv
- feature_name_mapping.py is just a map from features to human-friendly features.
Good luck!