Skip to content
Code for Self-Tuning Networks (ICLR 2019) https://arxiv.org/abs/1903.03088
Branch: master
Clone or download
lorraine2 Update README.md
Adding project structure to README
Latest commit 3c42049 Jun 18, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cnn Initial commit Jun 16, 2019
lstm Initial commit Jun 16, 2019
stn_utils Initial commit Jun 16, 2019
.gitignore Initial commit Jun 16, 2019
README.md Update README.md Jun 18, 2019
requirements.txt Initial commit Jun 16, 2019

README.md

Self-Tuning Networks

This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).

Requirements

  • Python 3.6.x
  • Pytorch 0.4.1

Setup

The following is an example of how to create an environment with the appropriate versions of the dependencies:

conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt

Experiments

CNN Experiments

The CNN code in this reposistory is built on the Cutout codebase. These commands should be run from inside the cnn folder.

To train a Self-Tuning CNN:

python hypertrain.py --tune_all --tune_scales --entropy_weight=1e-3 --save

To train a baseline CNN:

python train_basic.py

LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. The commands for the LSTM experiments should be run from inside the lstm folder.

First, download the PTB dataset:

./getdata.sh

Schedule Experiments

The commands in this section can be used to obtain results for Table 1 in the paper.

  • Using a fixed value for output dropout discovered by grid search

    python train_basic.py --dropouto=0.68 --prefix=dropouto
    
  • Gaussian-perturbed output dropout rate, with std=0.05

    python train_basic.py --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto
    
  • Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches

    python train_basic.py --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto
    
  • STN-tuned output dropout

    python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn
    
  • Train from scratch following the STN schedule for output dropout (replace the path in --load_schedule with the one generated by the STN command above):

    python train_basic.py --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
    
  • Train from scratch with the final output dropout value from STN training:

    python train_basic.py --dropouto=0.78 --prefix=dropouto_final
    
Schedules with Different Hyperparameter Initializations
  • The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})

    python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05
    python train.py --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30
    python train.py --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50
    python train.py --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70
    python train.py --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90
    
  • To plot the schedules, first modify the variables log_dir_init05, log_dir_init30, log_dir_init50, log_dir_init70, log_dir_init90 in save_dropouto_schedule_plot.py to point to the appropriate directories created by the commands above, and then run:

    python save_dropouto_schedule_plot.py
    

Tuning Multiple LSTM Hyperparameters

Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):

python train.py --seed=3 --tune_all --save_dir=st-lstm

Project Structure

.
├── README.md
├── cnn
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── cifar.py
│   │   └── loaders.py
│   ├── hypermodels
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   ├── hyperconv2d.py
│   │   ├── hyperlinear.py
│   │   └── small.py
│   ├── hypertrain.py
│   ├── logger.py
│   ├── models
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   └── small.py
│   ├── train_basic.py
│   └── util
│       ├── __init__.py
│       ├── cutout.py
│       ├── dropout.py
│       └── hyperparameter.py
├── lstm
│   ├── data.py
│   ├── embed_regularize.py
│   ├── getdata.sh
│   ├── hyperlstm.py
│   ├── locked_dropout.py
│   ├── logger.py
│   ├── model_basic.py
│   ├── save_dropouto_schedule_plot.py
│   ├── train.py
│   ├── train_basic.py
│   ├── utils.py
│   └── weight_drop.py
├── requirements.txt
└── stn_utils
    ├── __init__.py
    └── hyperparameter.py

7 directories, 34 files

Code Contributors

Citation

If you use this code, please cite:

  • Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.
@inproceedings{STN2019,
  title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
  author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
  booktitle={{International Conference on Learning Representations (ICLR)}},
  year={2019}
}
You can’t perform that action at this time.