Skip to content

mosco/unsupervised-preprocessing

Repository files navigation

This code base produces the figures of the paper: "On the cross-validation bias due to unsupervised pre-processing" by Amit Moscovich and Saharon Rosset. https://arxiv.org/abs/1901.08974v4

By running produce_all_figures_from_scratch.py, you should be able to exactly reproduce the figures in the paper.

Using the default number of repetitions (as used in the paper), this simulation takes 1-2 years on a single core. Therefore it is highly recommended to:

  1. Do a test run with much smaller values of the constants RESCALED_LASSO_LOW_DIM_N_REPETITIONS, etc.
  2. Run this program on a strong multi-core machine. The code automatically parallelizes the simulations using Python's multiprocessing.Pool.

Prerequisites

Python 3 is required with SciPy, scikit-learn, mkl and mkl_random modules. The easiest way to install these to download the Anaconda python distribution.

Since the figures use latex rendering for the labels, you need:

  • TeXLive. The latex binary must be in the command path.
  • dvipdf and dvipng (or you can just remove the TeX code from the labels used in plotting the figures)

Contact

Feel free to shoot me an email.

Amit Moscovich amit@moscovich.org

About

Supporting source code for the paper "Rescaling and other forms of unsupervised preprocessing may bias cross-validation" by Amit Moscovich and Saharon Rosset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages