This repo provides the implementation of our work Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning published in Findings of EMNLP 2023.
Our implementation is mainly based on transformers. We using the same data augmentation code provided by TinyBERT. We following LAT to calculate the speedup ratio.