Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 1.47 KB

NLP-Compression.md

File metadata and controls

41 lines (29 loc) · 1.47 KB

RNN

  • HitNet: Hybrid Ternary Recurrent Neural Network
  • Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition

BERT

  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Quantization

  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
  • Q8BERT: Quantized 8Bit BERT

Pruning

  • Reducing Transformer Depth on Demand with Structured Dropout
  • BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
  • Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
  • Are Sixteen Heads Really Better than One?
  • Structured Pruning of Large Language Models
  • Pruning a BERT-based Question Answering Model
  • DynaBERT: Dynamic BERT with Adaptive Width and Depth

Distillation

  • Another Summary
  • TinyBERT: Distilling BERT for Natural Language Understanding
  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Tensorization

  • A Tensorized Transformer for Language Modeling
  • Low-Rank Bottleneck in Multi-head Attention Models

Comprehensive Study

  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Understanding and Visualization

  • What Does BERT Look at? An Analysis of BERT’s Attention
  • Visualizing and understanding neural machine translation
  • An Analysis of Encoder Representations in Transformer-Based Machine Translation