These hardware are specialized to accelerate inferencing on already trained Deep learning models.
-
Training deep neural networks with low precision multiplications - One of the initial papers which systematically studied the effect of precision on inferencing for DNNs.
-
Efficient Processing of Deep Neural Networks: A Tutorial and Survey Best survey paper for undestanding various aspects of Deep learning, with special emphasis on hardware design choices. Highly recommended paper for anyone who has minimal knowledge of Deep learning and looking for references for going deeper and more exhaustive about the topic.
-
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding - Arguably best paper to introduce systemtatic pipeline to prune unncessary weights, Quatization and applying clustering of weights to reduce the amount of computations for CNNs.
- EIE: Efficient Inference Engine on Compressed Deep Neural Network - Implementation of hardware for CNN accelaration based on ideas from this paper
-
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA - Arguably best paper for describing pruning and quatization effects on LSTM model for Speech recognition. Also introduces FPGA implementation based on the optimizations by removing redundant weights.
-
Recurrent Neural Networks Hardware Implementation on FPGA - FPGA Implementation for RNNs trained for a character level language model
-
Stanford CS231n - Lecture 15 | Efficient Methods and Hardware for Deep Learning In this invited talk, Sanghon describes the different ideas like pruning, quatization, clustering etc used in creation of efficient hardware for deep learning models.
-
Toward Efficient Deep Neural Network Deployment: Deep Compression and EIE Another interesting talk by Songhan.
These harware are used to accelerate learning phase for Deep learning models