Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020 - Data and code

How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. We ran a simple survey asking to authors of two leading conferences, NeurIPS 2019 and ICLR 2020, a few quantitative questions on their experimental procedures.

A technical report on HAL summarizes our finding. It gives a simple picture of how hyper-parameters are set, how many baselines and datasets are included, or how seeds are used. Below, we give a very short summary, but please read (and cite) the full report if you are interested.

Highlights The response rates were 35.6% for NeurIPS and 48.6% for ICLR. A vast majority of empirical works optimize model hyper-parameters, thought almost half of these use manual tuning and most of the automatic hyper-parameter optimization is done with grid search. The typical number of hyper-parameter set is in interval 3-5, and less than 50 model fits are used to explore the search space. In addition, most works also optimized their baselines (typically, around 4 baselines). Finally, studies typically reported 4 results per model per task to provide a measure of variance, and around 50% of them used a different random seed for each experiment.

Sample results

How many papers with experiments optimized hyperparameters.

What hyperparameter optimization method were used.

Number of different datasets used for benchmarking.

Number of results reported for each model (ex: for different seeds)

These are just samples. Read the full report for more results.


The results are saved in 2 csv files: data/iclr.csv and data/neurips.csv.

The function load_data_pd(path) in file gives an example on how to load the data with panda.


There is no strict requirement to work with the csv files. To use the code in or, you can install the requirements using this command.

$ pip install -r requirements.txt

Generating plots

You can generate the plots from report using the script

$ python --help
usage: [-h] [--output OUTPUT] [--type {pdf,png}]
               [--plot [{emp-vs-theory,hpo,hpo-methods,n-hps,n-hps-grid-search,n-trials,n-trials-grid-search,baseline,n-baselines,n-datasets,n-seeds,seeding,n-points} [{emp-vs-theory,hpo,hpo-methods,n-hps,n-hps-grid-search,n-trials,n-trials-grid-search,baseline,n-baselines,n-datasets,n-seeds,seeding,n-points} ...]]]

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT
  --type {pdf,png}
  --plot [{emp-vs-theory,hpo,hpo-methods,n-hps,n-hps-grid-search,n-trials,n-trials-grid-search,baseline,n-baselines,n-datasets,n-seeds,seeding,n-points} [{emp-vs-theory,hpo,hpo-methods,n-hps,n-hps-grid-search,n-trials,n-trials-grid-search,baseline,n-baselines,n-datasets,n-seeds,seeding,n-points} ...]]

Example to generate the plot for question 3, 4, 5 and 6 in pdf format, in folder img.

$ python --output img --type pdf --plot hpo hpo-methods n-hps n-nps-trials


If this report is helpful for your research, please cite it using the following bibtex entry.

  TITLE = {{Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020}},
  AUTHOR = {Bouthillier, Xavier and Varoquaux, Ga{\"e}l},
  URL = {},
  TYPE = {Research Report},
  INSTITUTION = {{Inria Saclay Ile de France}},
  YEAR = {2020},
  MONTH = Jan,
  PDF = {},
  HAL_ID = {hal-02447823},
  HAL_VERSION = {v1},


Results of the survey on machine-learning experimental methods at NeurIPS2019 and ICLR2020






No releases published


No packages published