SEDONA: Searching for Decoupled Neural Architectures

Official PyTorch implementation of the paper

SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning
Myeongjang Pyeon, Jihwan Moon, Taeyoung Hahn, and Gunhee Kim
ICLR 2021

In greedy block-wise learning, layers are grouped into gradient-isolated blocks and trained with local error signals. The local error signals of each block are generated by its own auxiliary network.

Given an architecture, SEDONA automatically finds a block configuration for greedy learning. The block configuration consists of two types of decisions: 1) how to group layers into blocks and 2) which auxiliary network to use for each block.

Requirements

Python >= 3.8, PyTorch >= 1.6.0, torchvision >= 0.7.0

Install required packages by:

pip3 install -r requirements.txt

Search

We already upload found decouplings by SEDONA for VGG-19, ResNet-50/101/152. If you want to search them manually, you can use the following scripts.

1. Pretrain

python init_memory.py --name [ARCHITECTURE] --train-iters 40000 --valid-size 0.4

ARCHITECTURE should be one of [vgg19, resnet50, resnet101, resnet152].

2. Search `alpha` and `beta`

python search.py --train-iters 40000 --meta-train-iters 2000 --name [ARCHITECTURE] --num-inner-steps [T] --valid-batch-size 1024 --valid-size 0.4

Here T deontes number of the inner step size for meta learning (default: 5).

For meta-learning, we use a modified version of higher, which is included in this repository.

See config.py for more options and their descriptions.

Evaluation

For evaluation, you can use evaluate.py. In this literature, evaluating includes training a found decoupling. In order to select the number of blocks for greedy learning (i.e. K in the paper), you can modify --target-num-blocks.

See config.py for more options and their descriptions.

CIFAR-10

SEDONA

python evaluate.py --batch-size 128 --train-iters 64000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset cifar10 \
                   --target-num-blocks [K] --eval-base-dir sedona_outputs/[ARCHITECTURE] --name [ARCHITECTURE]

DGL & PredSim

python evaluate.py --batch-size 128 --train-iters 64000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset cifar10 \
                   --target-num-blocks [K] --baseline-dec-type [BASELINE] --name [ARCHITECTURE]-[BASELINE]

BASELINE should be either mlp_sr or predsim.

Backprop

python evaluate.py --batch-size 128 --train-iters 64000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset cifar10 \
                   --target-num-blocks 1 --name [ARCHITECTURE]-backprop

Tiny-ImageNet

SEDONA

python evaluate.py --batch-size 256 --train-iters 30000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset tiny-imagenet \
                   --target-num-blocks [K] --eval-base-dir sedona_outputs/[ARCHITECTURE] --feat-mult 4 --fp16 --num-workers 32 \
                   --name [ARCHITECTURE]

PredSim

python evaluate.py --batch-size 256 --train-iters 30000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset tiny-imagenet \
                   --target-num-blocks [K] --baseline-dec-type predsim --feat-mult 4 --fp16 --num-workers 32 \
                   --optimizer adam --classes-per-batch 40 --classes-per-batch-until-iter 15000 --name [ARCHITECTURE]-predsim

DGL

python evaluate.py --batch-size 256 --train-iters 30000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset tiny-imagenet \
                   --target-num-blocks [K] --baseline-dec-type mlp_sr --feat-mult 4 --fp16 --num-workers 32 --name [ARCHITECTURE]-dgl

Backprop

python evaluate.py --batch-size 256 --train-iters 30000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset tiny-imagenet \
                   --target-num-blocks 1 --feat-mult 4 --fp16 --num-workers 32 --name [ARCHITECTURE]-backprop

ImageNet

SEDONA

python evaluate.py --batch-size 256 --train-iters 600000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset imagenet \
                   --target-num-blocks [K] --eval-base-dir sedona_outputs/[ARCHITECTURE] --feat-mult 4 --fp16 --num-workers 32 \
                   --valid-freq 5000 --name [ARCHITECTURE]

DGL

python evaluate.py --batch-size 256 --train-iters 600000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset imagenet \
                   --target-num-blocks [K] --baseline-dec-type mlp_sr --feat-mult 4 --fp16 --num-workers 32 --valid-freq 5000 \
                   --name [ARCHITECTURE]-dgl

Backprop

python evaluate.py --batch-size 256 --train-iters 600000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset imagenet \
                   --target-num-blocks 1 --feat-mult 4 --fp16 --num-workers 32 --valid-freq 5000 --name [ARCHITECTURE]-backprop

test with aux. ensemble (no training)

python evaluate.py --batch-size 256 --train-iters 600000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset imagenet \
                   --target-num-blocks [K] --eval-base-dir sedona_outputs/[ARCHITECTURE] --feat-mult 4 --fp16 --num-workers 32 \
                   --valid-freq 5000 -name [ARCHITECTURE] --test --num-ensemble 2

python evaluate.py --batch-size 256 --train-iters 600000 --weight-decay [WEIGHT_DECAY] --lr [LR] --lr-min [LR_MIN] --dataset imagenet \
                   --target-num-blocks [K] --feat-mult 4 --fp16 --num-workers 32 --valid-freq 5000 --name [ARCHITECTURE]-[BASELINE] \
                   --test --num-ensemble 2

Citation

@inproceedings{
    pyeon2021sedona,
    title={{\{}SEDONA{\}}: Search for Decoupled Neural Networks toward Greedy Block-wise Learning},
    author={Myeongjang Pyeon and Jihwan Moon and Taeyoung Hahn and Gunhee Kim},
    booktitle={International Conference on Learning Representations},
    year={2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
higher		higher
misc		misc
models		models
sedona_outputs		sedona_outputs
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
decoder.py		decoder.py
evaluate.py		evaluate.py
init_memory.py		init_memory.py
loss.py		loss.py
module.py		module.py
requirements.txt		requirements.txt
search.py		search.py
sedona.py		sedona.py
train.py		train.py
utils.py		utils.py
weight_memory.py		weight_memory.py

mjpyeon/sedona

Folders and files

Latest commit

History

Repository files navigation

SEDONA: Searching for Decoupled Neural Architectures

Requirements

Search

1. Pretrain

2. Search alpha and beta

Evaluation

CIFAR-10

SEDONA

DGL & PredSim

Backprop

Tiny-ImageNet

SEDONA

PredSim

DGL

Backprop

ImageNet

SEDONA

DGL

Backprop

test with aux. ensemble (no training)

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages

2. Search `alpha` and `beta`