Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Self-Tuning Networks

This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).


  • Python 3.6.x
  • Pytorch 0.4.1


The following is an example of how to create an environment with the appropriate versions of the dependencies:

conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt


CNN Experiments

The CNN code in this reposistory is built on the Cutout codebase. These commands should be run from inside the cnn folder.

To train a Self-Tuning CNN:

python --tune_all --tune_scales --entropy_weight=1e-3 --save

To train a baseline CNN:


LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. The commands for the LSTM experiments should be run from inside the lstm folder.

First, download the PTB dataset:


Schedule Experiments

The commands in this section can be used to obtain results for Table 1 in the paper.

  • Using a fixed value for output dropout discovered by grid search

    python --dropouto=0.68 --prefix=dropouto
  • Gaussian-perturbed output dropout rate, with std=0.05

    python --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto
  • Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches

    python --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto
  • STN-tuned output dropout

    python --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn
  • Train from scratch following the STN schedule for output dropout (replace the path in --load_schedule with the one generated by the STN command above):

    python --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
  • Train from scratch with the final output dropout value from STN training:

    python --dropouto=0.78 --prefix=dropouto_final
Schedules with Different Hyperparameter Initializations
  • The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})

    python --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05
    python --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30
    python --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50
    python --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70
    python --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90
  • To plot the schedules, first modify the variables log_dir_init05, log_dir_init30, log_dir_init50, log_dir_init70, log_dir_init90 in to point to the appropriate directories created by the commands above, and then run:


Tuning Multiple LSTM Hyperparameters

Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):

python --seed=3 --tune_all --save_dir=st-lstm

Project Structure

├── cnn
│   ├── datasets
│   │   ├──
│   │   ├──
│   │   └──
│   ├── hypermodels
│   │   ├──
│   │   ├──
│   │   ├──
│   │   ├──
│   │   └──
│   ├──
│   ├──
│   ├── models
│   │   ├──
│   │   ├──
│   │   └──
│   ├──
│   └── util
│       ├──
│       ├──
│       ├──
│       └──
├── lstm
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   └──
├── requirements.txt
└── stn_utils

7 directories, 34 files

Code Contributors


If you use this code, please cite:

  • Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.
  title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
  author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
  booktitle={{International Conference on Learning Representations (ICLR)}},


Code for Self-Tuning Networks (ICLR 2019)



No releases published


No packages published