Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Anonymous Anonymous
Anonymous Anonymous initial commit
Latest commit 7fe9f43 May 28, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments initial commit May 28, 2019
learning_schedules initial commit May 28, 2019
LICENSE initial commit May 28, 2019
README.md initial commit May 28, 2019
main.py initial commit May 28, 2019
model.py initial commit May 28, 2019
parameterized_tensors.py initial commit May 28, 2019
reparameterized_layers.py initial commit May 28, 2019
requirements.txt initial commit May 28, 2019

README.md

Dynamic parameter reallocation in deep CNNs

The code implements the experiments in the ICML 2019 submission: Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Instructions

This code implements the dynamic parameterization scheme in the ICML 2019 submission: Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. It also implements previous dynamic parameterization schemes such as the DeepR algorithm by Bellec at al. 2018 and the SET algorithm by Mocanu et al. 2018 as well as static parameterizations based on tied parameters similar to the HashedNet paper. It also implements iterative pruning where it can take a dense model and prune it down to the required sparsity.

The main python executable is main.py. Results are saved under a ./runs/ directory created at the invocation directory. An invocation of main.py will save various accuracy metrics as well as the model parameters in the file ./runs/{model name}_{job idx}. Accuracy figures as well as several diagnostics are also printed out.

General usage

main.py [-h] [--epochs EPOCHS] [--start-epoch START_EPOCH]
               [--model {mnist_mlp,cifar10_WideResNet,imagenet_resnet50}]
               [-b BATCH_SIZE] [--lr LR] [--momentum MOMENTUM]
               [--nesterov NESTEROV] [--weight-decay WEIGHT_DECAY]
               [--L1-loss-coeff L1_LOSS_COEFF] [--print-freq PRINT_FREQ]
               [--layers LAYERS]
               [--start-pruning-after-epoch START_PRUNING_AFTER_EPOCH]
               [--prune-epoch-frequency PRUNE_EPOCH_FREQUENCY]
               [--prune-target-sparsity-fc PRUNE_TARGET_SPARSITY_FC]
               [--prune-target-sparsity-conv PRUNE_TARGET_SPARSITY_CONV]
               [--prune-iterations PRUNE_ITERATIONS]
               [--post-prune-epochs POST_PRUNE_EPOCHS]
               [--n-prune-params N_PRUNE_PARAMS] [--threshold-prune] [--prune]
               [--validate-set] [--rewire-scaling] [--tied]
               [--rescale-tied-gradient] [--rewire] [--no-validate-train]
               [--DeepR] [--DeepR_eta DEEPR_ETA]
               [--stop-rewire-epoch STOP_REWIRE_EPOCH] [--no-batch-norm]
               [--rewire-fraction REWIRE_FRACTION]
               [--sub-kernel-granularity SUB_KERNEL_GRANULARITY]
               [--cubic-prune-schedule] [--sparse-resnet-downsample]
               [--conv-group-lasso] [--big-new-weights]
               [--widen-factor WIDEN_FACTOR]
               [--initial-sparsity-conv INITIAL_SPARSITY_CONV]
               [--initial-sparsity-fc INITIAL_SPARSITY_FC] [--job-idx JOB_IDX]
               [--no-augment] [--data DIR] [-j N]
               [--copy-mask-from COPY_MASK_FROM] [--resume RESUME]
               [--schedule-file SCHEDULE_FILE] [--name NAME]

Optional arguments:

-h, --help            show this help message and exit
  --epochs EPOCHS       number of total epochs to run
  --start-epoch START_EPOCH
                        manual epoch number (useful on restarts)
  --model {mnist_mlp,cifar10_WideResNet,imagenet_resnet50}
                        network name (default: mnist_mlp)
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        mini-batch size (default: 100)
  --lr LR, --learning-rate LR
                        initial learning rate
  --momentum MOMENTUM   momentum
  --nesterov NESTEROV   nesterov momentum
  --weight-decay WEIGHT_DECAY, --wd WEIGHT_DECAY
                        weight decay (default: 1e-4)
  --L1-loss-coeff L1_LOSS_COEFF
                        Lasso coefficient (default: 0.0)
  --print-freq PRINT_FREQ, -p PRINT_FREQ
                        print frequency (default: 10)
  --layers LAYERS       total number of layers for wide resnet (default: 28)
  --start-pruning-after-epoch START_PRUNING_AFTER_EPOCH
                        Epoch after which to start pruning (default: 20)
  --prune-epoch-frequency PRUNE_EPOCH_FREQUENCY
                        Intervals between prunes (default: 2)
  --prune-target-sparsity-fc PRUNE_TARGET_SPARSITY_FC
                        Target sparsity when pruning fully connected layers
                        (default: 0.98)
  --prune-target-sparsity-conv PRUNE_TARGET_SPARSITY_CONV
                        Target sparsity when pruning conv layers (default:
                        0.5)
  --prune-iterations PRUNE_ITERATIONS
                        Number of prunes. Set to 1 for single prune, larger
                        than 1 for gradual pruning (default: 1)
  --post-prune-epochs POST_PRUNE_EPOCHS
                        Epochs to train after pruning is done (default: 10)
  --n-prune-params N_PRUNE_PARAMS
                        Number of parameters to re-allocate per re-allocation
                        iteration (default: 600)
  --threshold-prune     Prune based on a global threshold and not a fraction
                        (default: False)
  --prune               whether to use pruning or not (default: False)
  --validate-set        whether to use a validation set to select epoch with
                        best accuracy or not (default: False)
  --rewire-scaling      Move weights between layers during parameter re-
                        allocation (default: False)
  --tied                whether to use tied weights instead of sparse ones
                        (default: False)
  --rescale-tied-gradient
                        whether to divide the gradient of tied weights by the
                        number of their repetitions (default: False)
  --rewire              whether to run parameter re-allocation (default:
                        False)
  --no-validate-train   whether to run validation on training set (default:
                        False)
  --DeepR               Train using deepR. prune and re-allocated weights that
                        cross zero every iteration (default: False)
  --DeepR_eta DEEPR_ETA
                        eta coefficient for DeepR (default: 0.1)
  --stop-rewire-epoch STOP_REWIRE_EPOCH
                        Epoch after which to stop rewiring (default: 1000)
  --no-batch-norm       no batch normalization in the mnist_mlp
                        network(default: False)
  --rewire-fraction REWIRE_FRACTION
                        Fraction of weight to rewire (default: 0.1)
  --sub-kernel-granularity SUB_KERNEL_GRANULARITY
                        prune granularity (default: 2)
  --cubic-prune-schedule
                        Use sparsity schedule following a cubic function as in
                        Zhu et al. 2018 (instead of an exponential function).
                        (default: False)
  --sparse-resnet-downsample
                        Use sub-kernel granularity while rewiring(default:
                        False)
  --conv-group-lasso    Use group lasso to penalize an entire kernel
                        patch(default: False)
  --big-new-weights     Use weights initialized from the initial distribution
                        for the new connections instead of zeros(default:
                        False)
  --widen-factor WIDEN_FACTOR
                        widen factor for wide resnet (default: 10)
  --initial-sparsity-conv INITIAL_SPARSITY_CONV
                        Initial sparsity of conv layers(default: 0.5)
  --initial-sparsity-fc INITIAL_SPARSITY_FC
                        Initial sparsity for fully connected layers(default:
                        0.98)
  --job-idx JOB_IDX     job index provided by the job manager
  --no-augment          whether to use standard data augmentation (default:
                        use data augmentation)
  --data DIR            path to imagenet dataset
  -j N, --workers N     number of data loading workers (default: 8)
  --copy-mask-from COPY_MASK_FROM
                        checkpoint from which to copy mask data(default: none)
  --resume RESUME       path to latest checkpoint (default: none)
  --schedule-file SCHEDULE_FILE
                        yaml file containing learning rate schedule and rewire
                        period schedule
  --name NAME           name of experiment

Specific experiments

The two yaml files : wrnet_experiments.yaml and resnet_experiments.yaml contain YAML lists of all the invocations of the python executable needed to run the imagenet and the CIFAR10 experiments in the paper's main text and supplementary materials.

Important notes

  • Code development and all experiments were done with Python 3.6 and pytorch 0.4.1.
  • All experiments were conducted on NVidia TitanXP GPUs.
  • Imagenet experiments require multi-GPU data parallelism, which is done by default using all available GPUs specified by environment variable CUDA_VISIBLE_DEVICES.
You can’t perform that action at this time.