Skip to content
All about acceleration and compression of Deep Neural Networks
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


All about acceleration and compression of Deep Neural Networks



  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

    A classic paper for binary neural network saying all weights and activation are binarized.

    Implementation: MXNet, Pytorch, Torch (origin)

  • DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

    Full stack quantization for weights, activation and gradient.

    Implementation: Tensorpack

  • Deep Learning with Low Precision by Half-wave Gaussian Quantization

    Try to improve expersiveness of quantized activation function.

    Implementation: Caffe (origin)

  • Quantizing deep convolutional networks for efficient inference: A whitepaper

    Non-official technical report of quantization from Google. You can find a lot of technical details about quantization in this paper.

  • Data-Free Quantization through Weight Equalization and Bias Correction

    Implementation: Pytorch

  • Additive Noise Annealing and Approximation Properties of Quantized Neural Networks

    Implementation: Pytorch

  • Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

    find optimal bit-width with NAS

    Implementation: Pytorch

  • Progressive Stochastic Binarization of Deep Networks

    Use power-of-2

    Implementation: TF

  • Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

    how to find the optimal threshold

    Implementation: TF

  • FAT: Fast Adjustable Threshold for Uniform Neural Network Quantization (Winning Solution on LPIRC-II)

    Implementation: TF

  • Proximal Mean-field for Neural Network Quantization

    Implementation: Pytorch

  • A Survey on Methods and Theories of Quantized Neural Networks

    Nice survey on quantization (up to Dec. 2018)

  • Balanced Binary Neural Networks with Gated Residual
  • IR-Net: Forward and Backward Information Retention for Highly Accurate Binary Neural Networks


  • Differentiable Product Quantization for Embedding Compression

    compress the embedding table with end-to-end learned KD codes via differentiable product quantization (DPQ)

    Implementation: TF

  • Model Compression with Adversarial Robustness: A Unified Optimization Framework

    This paper studies model compression through a different lens: could we compress models without hurting their robustness to adversarial attacks, in addition to maintaining accuracy?

    Implementation: Pytorch


  • Learning both Weights and Connections for Efficient Neural Networks

    A very simple way to introduce arbitrary sparisity.

  • Learning Structured Sparsity in Deep Neural Networks

    An united way to introduce structured sparsity.

    Implementation: Caffe

Neural Architecture Search (NAS)

  • Resource
  • Partial Channel Connections for Memory-Efficient Differentiable Architecture Search

    Our approach is memory efficient:(i) batch-size is increased to further accelerate the search on CIFAR10, (ii) directly search on ImageNet. Searched on ImageNet, we achieved currently one of, if not only, the best performance on ImageNet (24.2%/7.3%) under the mobile setting! The search process in CIFAR10 only requires 0.1 GPU-days, i.e., ~3 hours on one Nvidia 1080ti.(1.5 hours on one Tesla V100) Implementation: PyTorch (origin)


  • Benchmark Analysis of Representative Deep Neural Network Architectures [IEEE Access, University of Milano-Bicocca]

    This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition in terms of GFLOPs, #weights, Top-1 accuacy and so on.

  • Net2Net : Accelerating Learning via Knowledge Transfer

    An interesting way to change the architecture of models while keeping output the same

    Implementation: TF, Pytorch

Embedded System

  • EMDL: Embedded and mobile deep learning research notes

    Embedded and mobile deep learning research notes on Github



  • slimmable_networks

    An open source framework for slimmable training on tasks of ImageNet classification and COCO detection, which has enabled numerous projects.

  • distiller

    a Python package for neural network compression research

  • QPyTorch

    QPyTorch is a low-precision arithmetic simulation package in PyTorch. It is designed to support researches on low-precision machine learning, especially for researches in low-precision training.

  • Graffitist

    Graffitist is a flexible and scalable framework built on top of TensorFlow to process low-level graph descriptions of deep neural networks (DNNs) for accurate and efficient inference on fixed-point hardware. It comprises of a (growing) library of transforms to apply various neural network compression techniques such as quantization, pruning, and compression. Each transform consists of unique pattern matching and manipulation algorithms that when run sequentially produce an optimized output graph.


  • dabnn

    dabnn is an accelerated binary neural networks inference framework for mobile platform

You can’t perform that action at this time.