Skip to content

CerebrasResearch/RevBiFPN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RevBiFPN

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

Introduction

This is the official code of RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network implemented in PyTorch. RevSilo, the first reversible bidirectional multi-scale feature fusion module (implemented in ./rev_structs), is used to create the RevBiFPN backbone. We augment the RevBiFPN backbone with a classification head to pre-train RevBiFPN-S0 through RevBiFPN-S6 on ImageNet.

RevBiFPN with classification head.

Network Motivation

A neural network uses hidden activations to compute the gradient of the weights with respect to the loss. When training a neural network, autograd frameworks will cache the hidden activations used in the forward pass to be used during backpropagation. The activation cache consumes the majority of the accelerator's memory, limiting network scaling.

Networks using reversble recomputation, can recompute the network's hidden activations instead of needing to store them. This work is the first to create a fully reversible bidirectional multi-scale feature fusion pyramid network to serve as a drop-in replacement for FPN backbones such as EfficientDet and HRNet. The figure below shows how, for classification, RevBiFPN uses significantly less memory than EfficientNet at all scales. For example, RevBiFPN-S6 achieves comparable accuracy to EfficientNet-B7 on ImageNet (84.2% vs 84.3%) while using comparable MACs (38B vs 37B) and 19.8x lesser training memory per sample.

MACs vs measured memory usage for ImageNet training on 1 GPU.

Systems using the RevBiFPN backbone, consume considerably less memory for detection and segmentation with Faster R-CNN and Mask R-CNN, respectively.

Object detection in the Faster R-CNN framework.

Instance segmentation in the Mask R-CNN framework.

ImageNet models

Model #Params Res GMACs top-1 acc model ckpt sha256 with ckpt hyperlink
RevBiFPN-S0 3.42M 224 0.31 72.8% a9ee012a2670003ea18deca1afaed7c1323ffaafc83b0a30874d262bf2403cfa
RevBiFPN-S1 5.11M 256 0.62 75.9% 584b0c3ea677ac5eff6c0f54b4b683973e7533bfde334155cd770aef041673c4
RevBiFPN-S2 10.6M 256 1.37 79.0% 62ff9387b498550d31e248742a002be22cb29e800cd387ec9c93b6da7418dcc8
RevBiFPN-S3 19.6M 288 3.33 81.1% 1695576b09ee9fc584df616abaf0762188122468825cc92b5abfeec63b609d25
RevBiFPN-S4 48.7M 320 10.6 83.0% 61d7b65524000bb147aac26ad559906a2380e55a499d9d420063cc2b9e2ef42a
RevBiFPN-S5 82.0M 352 21.8 83.7% d7713a25c7f62bf4b9ebaa2693a7b6896e7965c6647cbd1d98eddeae8b74cdc3
RevBiFPN-S6 142.3M 352 38.1 84.2% 31f355d5fb54610ad9051d08d0ec10bb0d33f40c0936352cb863b9f9a3d4fa09

Pretrained model loading assumes /tmp/model_ckpts/revbifpn/revbifpn_s#.pth.tar file structure. Modify model_dir = "/tmp/model_ckpts/revbifpn" in revbifpn.py if necissary. When instantiating model, setting pretrained = True will download associated model into the aformentioned dir.

Training

Classification

For classification, we train RevBiFPN using pytorch-image-models' trian.py. Hyperparameters can be found in the RevBiFPN paper.

Note: running python revbifpn.py, will instantiate and produce network MAC / parameter counts for RevBiFPN-S0 through RevBiFPN-S6 (uses thop).

Detection and Segmentation

For detection and segmentation we use MMDetection. HRNet configs are used to fine-tune networks with RevBiFPN backbones.

Citation

To cite this work use:

@article{chiley2022revbifpn,
  title={RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network},
  author={Chiley, Vitaliy and Thangarasa, Vithursan and Gupta, Abhay and Samar, Anshul and Hestness, Joel and DeCoste, Dennis},
  journal={arXiv preprint arXiv:2206.14098},
  year={2022}
}

About

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages