You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transformer models are foundational to natural language processing (NLP) andcomputer vision. Despite various recent works devoted to reducing the quadraticcost of such models (as a function of the sequence length $n$), dealing withultra long sequences efficiently (e.g., with more than 16K tokens) remainschallenging. Applications such as answering questions based on an entire bookor summarizing a scientific article are inefficient or infeasible. In thispaper, we propose to significantly reduce the dependency of a Transformermodel's complexity on $n$, by compressing the input into a representation whosesize $r$ is independent of $n$ at each layer. Specifically, by exploiting thefact that in many tasks, only a small subset of special tokens (we callVIP-tokens) are most relevant to the final prediction, we propose a VIP-tokencentric compression (Vcc) scheme which selectively compresses the inputsequence based on their impact on approximating the representation of theseVIP-tokens. Compared with competitive baselines, the proposed algorithm notonly is efficient (achieving more than $3\times$ efficiency improvementcompared to baselines on 4K and 16K lengths), but also achieves competitive orbetter performance on a large number of tasks. Further, we show that ouralgorithm can be scaled to 128K tokens (or more) while consistently offeringaccuracy improvement.
AkihikoWatanabe
changed the title
あ
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing
Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23
May 9, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: