Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Self-Tuning Networks

This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).


  • Python 3.6.x
  • Pytorch 0.4.1


The following is an example of how to create an environment with the appropriate versions of the dependencies:

conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt


CNN Experiments

The CNN code in this reposistory is built on the Cutout codebase. These commands should be run from inside the cnn folder.

To train a Self-Tuning CNN:

python --tune_all --tune_scales --entropy_weight=1e-3 --save

To train a baseline CNN:


LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. The commands for the LSTM experiments should be run from inside the lstm folder.

First, download the PTB dataset:


Schedule Experiments

The commands in this section can be used to obtain results for Table 1 in the paper.

  • Using a fixed value for output dropout discovered by grid search

    python --dropouto=0.68 --prefix=dropouto
  • Gaussian-perturbed output dropout rate, with std=0.05

    python --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto
  • Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches

    python --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto
  • STN-tuned output dropout

    python --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn
  • Train from scratch following the STN schedule for output dropout (replace the path in --load_schedule with the one generated by the STN command above):

    python --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
  • Train from scratch with the final output dropout value from STN training:

    python --dropouto=0.78 --prefix=dropouto_final
Schedules with Different Hyperparameter Initializations
  • The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})

    python --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05
    python --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30
    python --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50
    python --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70
    python --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90
  • To plot the schedules, first modify the variables log_dir_init05, log_dir_init30, log_dir_init50, log_dir_init70, log_dir_init90 in to point to the appropriate directories created by the commands above, and then run:


Tuning Multiple LSTM Hyperparameters

Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):

python --seed=3 --tune_all --save_dir=st-lstm

Project Structure

├── cnn
│   ├── datasets
│   │   ├──
│   │   ├──
│   │   └──
│   ├── hypermodels
│   │   ├──
│   │   ├──
│   │   ├──
│   │   ├──
│   │   └──
│   ├──
│   ├──
│   ├── models
│   │   ├──
│   │   ├──
│   │   └──
│   ├──
│   └── util
│       ├──
│       ├──
│       ├──
│       └──
├── lstm
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   └──
├── requirements.txt
└── stn_utils

7 directories, 34 files

Code Contributors


If you use this code, please cite:

  • Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.
  title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
  author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
  booktitle={{International Conference on Learning Representations (ICLR)}},


Code for Self-Tuning Networks (ICLR 2019)






No releases published


No packages published