Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 #667

AkihikoWatanabe · 2023-05-09T04:17:17Z

URL

https://arxiv.org/abs/2305.04241

Affiliations

Zhanpeng Zeng, N/A
Cole Hawkins, N/A
Mingyi Hong, N/A
Aston Zhang, N/A
Nikolaos Pappas, N/A
Vikas Singh, N/A
Shuai Zheng, N/A

Abstract

Transformer models are foundational to natural language processing (NLP) andcomputer vision. Despite various recent works devoted to reducing the quadraticcost of such models (as a function of the sequence length $n$), dealing withultra long sequences efficiently (e.g., with more than 16K tokens) remainschallenging. Applications such as answering questions based on an entire bookor summarizing a scientific article are inefficient or infeasible. In thispaper, we propose to significantly reduce the dependency of a Transformermodel's complexity on $n$, by compressing the input into a representation whosesize $r$ is independent of $n$ at each layer. Specifically, by exploiting thefact that in many tasks, only a small subset of special tokens (we callVIP-tokens) are most relevant to the final prediction, we propose a VIP-tokencentric compression (Vcc) scheme which selectively compresses the inputsequence based on their impact on approximating the representation of theseVIP-tokens. Compared with competitive baselines, the proposed algorithm notonly is efficient (achieving more than $3\times$ efficiency improvementcompared to baselines on 4K and 16K lengths), but also achieves competitive orbetter performance on a large number of tasks. Further, we show that ouralgorithm can be scaled to 128K tokens (or more) while consistently offeringaccuracy improvement.

Translation (by gpt-3.5-turbo)

Transformerモデルは、自然言語処理（NLP）やコンピュータビジョンにおいて基盤的なものです。最近の様々な研究により、このようなモデルの二次コスト（シーケンス長$n$の関数として）を削減することに注力していますが、16Kトークン以上の超長いシーケンスを効率的に扱うことはまだ課題です。本論文では、Transformerモデルの複雑さが$n$に依存する部分を大幅に削減することを提案し、各層でサイズ$r$が$n$に独立した表現に入力を圧縮することで実現します。具体的には、多くのタスクにおいて、特別なトークンのサブセット（VIPトークンと呼ぶ）だけが最終的な予測に最も関連していることを利用し、VIPトークン中心の圧縮（Vcc）スキームを提案します。このスキームは、VIPトークンの表現を近似するための影響に基づいて、入力シーケンスを選択的に圧縮します。提案されたアルゴリズムは、競合するベースラインと比較して、効率的であり（4Kおよび16Kの長さに対してベースラインと比較して3倍以上の効率改善を実現）、多数のタスクにおいて競争力のあるまたはより優れたパフォーマンスを発揮します。さらに、我々は、アルゴリズムが128Kトークン（またはそれ以上）にスケーリングでき、一貫して精度の向上を提供することを示します。

Summary (by gpt-3.5-turbo)

本論文では、Transformerモデルの二次コストを削減するために、各層でサイズ$r$が$n$に独立した表現に入力を圧縮する方法を提案する。VIPトークン中心の圧縮（Vcc）スキームを使用し、VIPトークンの表現を近似するために入力シーケンスを選択的に圧縮する。提案されたアルゴリズムは、競合するベースラインと比較して効率的であり、多数のタスクにおいて競争力のあるまたはより優れたパフォーマンスを発揮する。また、アルゴリズムは128Kトークンにスケーリングでき、一貫して精度の向上を提供することが示された。

AkihikoWatanabe added the Pocket label May 9, 2023

AkihikoWatanabe changed the title あ Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 #667

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 #667

AkihikoWatanabe commented May 9, 2023 •

edited

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 #667

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens, Zhanpeng Zeng+, N/A, arXiv'23 #667

Comments

AkihikoWatanabe commented May 9, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented May 9, 2023 •

edited