Skip to content

asteroidhouse/self-tuning-networks

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
cnn
 
 
 
 
 
 
 
 
 
 

Self-Tuning Networks

This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).

Requirements

  • Python 3.6.x
  • Pytorch 0.4.1

Setup

The following is an example of how to create an environment with the appropriate versions of the dependencies:

conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt

Experiments

CNN Experiments

The CNN code in this reposistory is built on the Cutout codebase. These commands should be run from inside the cnn folder.

To train a Self-Tuning CNN:

python hypertrain.py --tune_all --tune_scales --entropy_weight=1e-3 --save

To train a baseline CNN:

python train_basic.py

LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. The commands for the LSTM experiments should be run from inside the lstm folder.

First, download the PTB dataset:

./getdata.sh

Schedule Experiments

The commands in this section can be used to obtain results for Table 1 in the paper.

  • Using a fixed value for output dropout discovered by grid search

    python train_basic.py --dropouto=0.68 --prefix=dropouto
    
  • Gaussian-perturbed output dropout rate, with std=0.05

    python train_basic.py --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto
    
  • Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches

    python train_basic.py --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto
    
  • STN-tuned output dropout

    python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn
    
  • Train from scratch following the STN schedule for output dropout (replace the path in --load_schedule with the one generated by the STN command above):

    python train_basic.py --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
    
  • Train from scratch with the final output dropout value from STN training:

    python train_basic.py --dropouto=0.78 --prefix=dropouto_final
    
Schedules with Different Hyperparameter Initializations
  • The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})

    python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05
    python train.py --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30
    python train.py --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50
    python train.py --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70
    python train.py --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90
    
  • To plot the schedules, first modify the variables log_dir_init05, log_dir_init30, log_dir_init50, log_dir_init70, log_dir_init90 in save_dropouto_schedule_plot.py to point to the appropriate directories created by the commands above, and then run:

    python save_dropouto_schedule_plot.py
    

Tuning Multiple LSTM Hyperparameters

Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):

python train.py --seed=3 --tune_all --save_dir=st-lstm

Project Structure

.
├── README.md
├── cnn
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── cifar.py
│   │   └── loaders.py
│   ├── hypermodels
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   ├── hyperconv2d.py
│   │   ├── hyperlinear.py
│   │   └── small.py
│   ├── hypertrain.py
│   ├── logger.py
│   ├── models
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   └── small.py
│   ├── train_basic.py
│   └── util
│       ├── __init__.py
│       ├── cutout.py
│       ├── dropout.py
│       └── hyperparameter.py
├── lstm
│   ├── data.py
│   ├── embed_regularize.py
│   ├── getdata.sh
│   ├── hyperlstm.py
│   ├── locked_dropout.py
│   ├── logger.py
│   ├── model_basic.py
│   ├── save_dropouto_schedule_plot.py
│   ├── train.py
│   ├── train_basic.py
│   ├── utils.py
│   └── weight_drop.py
├── requirements.txt
└── stn_utils
    ├── __init__.py
    └── hyperparameter.py

7 directories, 34 files

Code Contributors

Citation

If you use this code, please cite:

  • Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.
@inproceedings{STN2019,
  title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
  author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
  booktitle={{International Conference on Learning Representations (ICLR)}},
  year={2019}
}

About

Code for Self-Tuning Networks (ICLR 2019) https://arxiv.org/abs/1903.03088

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published