Prerequisites

This code base produces the figures of the paper: "On the cross-validation bias due to unsupervised pre-processing" by Amit Moscovich and Saharon Rosset. https://arxiv.org/abs/1901.08974v4

By running produce_all_figures_from_scratch.py, you should be able to exactly reproduce the figures in the paper.

Using the default number of repetitions (as used in the paper), this simulation takes 1-2 years on a single core. Therefore it is highly recommended to:

Do a test run with much smaller values of the constants RESCALED_LASSO_LOW_DIM_N_REPETITIONS, etc.
Run this program on a strong multi-core machine. The code automatically parallelizes the simulations using Python's multiprocessing.Pool.

Prerequisites

Python 3 is required with SciPy, scikit-learn, mkl and mkl_random modules. The easiest way to install these to download the Anaconda python distribution.

Since the figures use latex rendering for the labels, you need:

TeXLive. The latex binary must be in the command path.
dvipdf and dvipng (or you can just remove the TeX code from the labels used in plotting the figures)

Contact

Feel free to shoot me an email.

Amit Moscovich amit@moscovich.org

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
superconductivity		superconductivity
LICENSE		LICENSE
README.md		README.md
grouping_rare_categories.py		grouping_rare_categories.py
latex-paper.mplstyle		latex-paper.mplstyle
model_selection_cv_pipeline.py		model_selection_cv_pipeline.py
pickler.py		pickler.py
produce_all_figures_from_scratch.py		produce_all_figures_from_scratch.py
rescaled_lasso.py		rescaled_lasso.py
simulations_framework.py		simulations_framework.py
utils.py		utils.py
variable_selected_linear_regression.py		variable_selected_linear_regression.py
variable_selected_linear_regression_realdata.py		variable_selected_linear_regression_realdata.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

superconductivity

superconductivity

LICENSE

LICENSE

README.md

README.md

grouping_rare_categories.py

grouping_rare_categories.py

latex-paper.mplstyle

latex-paper.mplstyle

model_selection_cv_pipeline.py

model_selection_cv_pipeline.py

pickler.py

pickler.py

produce_all_figures_from_scratch.py

produce_all_figures_from_scratch.py

rescaled_lasso.py

rescaled_lasso.py

simulations_framework.py

simulations_framework.py

utils.py

utils.py

variable_selected_linear_regression.py

variable_selected_linear_regression.py

variable_selected_linear_regression_realdata.py

variable_selected_linear_regression_realdata.py

Repository files navigation

Prerequisites

Contact

About

Releases

Packages

Languages

License

mosco/unsupervised-preprocessing

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Contact

About

Resources

License

Stars

Watchers

Forks

Languages