Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 #820

AkihikoWatanabe · 2023-07-14T06:33:15Z

https://virtual2023.aclweb.org/paper_P5597.html

AkihikoWatanabe · 2023-07-22T15:44:05Z

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply training on longer sequences is inefficient due to the quadratic computation complexity of the global attention mechanism. In this work, we demonstrate that this failure mode is linked to positional encodings being out-of-distribution for longer sequences (even for relative encodings) and introduce a novel family of positional encodings that can overcome this problem. Concretely, our randomized positional encoding scheme simulates the positions of longer sequences and randomly selects an ordered subset to fit the sequence's length. Our large-scale empirical evaluation of 6000 models across 15 algorithmic reasoning tasks shows that our method allows Transformers to generalize to sequences of unseen length (increasing test accuracy by 12.0% on average).

Translation (by gpt-3.5-turbo)

トランスフォーマーは、固定されたコンテキスト長のタスクにおいて印象的な汎化能力を持っています。しかし、文字列の複製などの簡単に見えるタスクでも、任意の長さのシーケンスには汎化できません。さらに、単に長いシーケンスでトレーニングするだけでは、グローバルアテンションメカニズムの二次計算の複雑さのために効率的ではありません。本研究では、この失敗モードが長いシーケンスに対して位置エンコーディングが分布外であることに関連していることを示し、この問題を克服することができる新しい位置エンコーディングのファミリーを紹介します。具体的には、ランダム化された位置エンコーディングスキームは、長いシーケンスの位置をシミュレートし、順序付けられたサブセットをランダムに選択してシーケンスの長さに合わせます。15のアルゴリズム的推論タスクにわたる6000のモデルの大規模な実証評価により、私たちの手法がトランスフォーマーが未知の長さのシーケンスに汎化することを可能にし、テストの正確性を平均して12.0％向上させることを示しました。

Summary (by gpt-3.5-turbo)

トランスフォーマーは、固定長のタスクにおいては優れた汎化能力を持つが、任意の長さのシーケンスには対応できない。この問題を解決するために、新しい位置エンコーディング手法を提案する。ランダム化された位置エンコーディングスキームを使用し、長いシーケンスの位置をシミュレートし、順序付けられたサブセットをランダムに選択する。大規模な実証評価により、この手法がトランスフォーマーの汎化能力を向上させ、テストの正確性を平均して12.0％向上させることが示された。

AkihikoWatanabe added the translation_required label Jul 22, 2023

AkihikoWatanabe changed the title ~~Randomized Positional Encodings Boost Length Generalization of Transformers~~ Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 Oct 22, 2023

AkihikoWatanabe added NLP Transformer LongSequence PositionalEncoding labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 #820

Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 #820

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Jul 22, 2023 •

edited

Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 #820

Randomized Positional Encodings Boost Length Generalization of Transformers, ACL'23 #820

Comments

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Jul 22, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Jul 22, 2023 •

edited