All about acceleration and compression of Deep Neural Networks
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
A classic paper for binary neural network saying all weights and activation are binarized.
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Full stack quantization for weights, activation and gradient.
Deep Learning with Low Precision by Half-wave Gaussian Quantization
Try to improve expersiveness of quantized activation function.
Implementation: Caffe (origin)
Quantizing deep convolutional networks for efficient inference: A whitepaper
Non-official technical report of quantization from Google. You can find a lot of technical details about quantization in this paper.
Data-Free Quantization through Weight Equalization and Bias Correction
Additive Noise Annealing and Approximation Properties of Quantized Neural Networks
Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
find optimal bit-width with NAS
Progressive Stochastic Binarization of Deep Networks
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
how to find the optimal threshold
FAT: Fast Adjustable Threshold for Uniform Neural Network Quantization (Winning Solution on LPIRC-II)
Proximal Mean-field for Neural Network Quantization
A Survey on Methods and Theories of Quantized Neural Networks
Nice survey on quantization (up to Dec. 2018)
- Balanced Binary Neural Networks with Gated Residual
- IR-Net: Forward and Backward Information Retention for Highly Accurate Binary Neural Networks
- Differentiable Product Quantization for Embedding Compression
compress the embedding table with end-to-end learned KD codes via differentiable product quantization (DPQ)
- Model Compression with Adversarial Robustness: A Unified Optimization Framework
This paper studies model compression through a different lens: could we compress models without hurting their robustness to adversarial attacks, in addition to maintaining accuracy?
Learning both Weights and Connections for Efficient Neural Networks
A very simple way to introduce arbitrary sparisity.
Learning Structured Sparsity in Deep Neural Networks
An united way to introduce structured sparsity.
Neural Architecture Search (NAS)
- Partial Channel Connections for Memory-Efficient Differentiable Architecture Search
Our approach is memory efficient:(i) batch-size is increased to further accelerate the search on CIFAR10, (ii) directly search on ImageNet. Searched on ImageNet, we achieved currently one of, if not only, the best performance on ImageNet (24.2%/7.3%) under the mobile setting! The search process in CIFAR10 only requires 0.1 GPU-days, i.e., ~3 hours on one Nvidia 1080ti.(1.5 hours on one Tesla V100) Implementation: PyTorch (origin)
Benchmark Analysis of Representative Deep Neural Network Architectures [IEEE Access, University of Milano-Bicocca]
This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition in terms of GFLOPs, #weights, Top-1 accuacy and so on.
Net2Net : Accelerating Learning via Knowledge Transfer
An interesting way to change the architecture of models while keeping output the same
- EMDL: Embedded and mobile deep learning research notes
Embedded and mobile deep learning research notes on Github
An open source framework for slimmable training on tasks of ImageNet classification and COCO detection, which has enabled numerous projects.
a Python package for neural network compression research
QPyTorch is a low-precision arithmetic simulation package in PyTorch. It is designed to support researches on low-precision machine learning, especially for researches in low-precision training.
Graffitist is a flexible and scalable framework built on top of TensorFlow to process low-level graph descriptions of deep neural networks (DNNs) for accurate and efficient inference on fixed-point hardware. It comprises of a (growing) library of transforms to apply various neural network compression techniques such as quantization, pruning, and compression. Each transform consists of unique pattern matching and manipulation algorithms that when run sequentially produce an optimized output graph.
dabnn is an accelerated binary neural networks inference framework for mobile platform