Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 #755

AkihikoWatanabe · 2023-06-16T12:50:36Z

URL

https://arxiv.org/abs//2305.16843

Affiliations

Anian Ruoss, N/A
Grégoire Delétang, N/A
Tim Genewein, N/A
Jordi Grau-Moya, N/A
Róbert Csordás, N/A
Mehdi Bennani, N/A
Shane Legg, N/A
Joel Veness, N/A

Abstract

Transformers have impressive generalization capabilities on tasks with afixed context length. However, they fail to generalize to sequences ofarbitrary length, even for seemingly simple tasks such as duplicating a string.Moreover, simply training on longer sequences is inefficient due to thequadratic computation complexity of the global attention mechanism. In thiswork, we demonstrate that this failure mode is linked to positional encodingsbeing out-of-distribution for longer sequences (even for relative encodings)and introduce a novel family of positional encodings that can overcome thisproblem. Concretely, our randomized positional encoding scheme simulates thepositions of longer sequences and randomly selects an ordered subset to fit thesequence's length. Our large-scale empirical evaluation of 6000 models across15 algorithmic reasoning tasks shows that our method allows Transformers togeneralize to sequences of unseen length (increasing test accuracy by 12.0% onaverage).

Translation (by gpt-3.5-turbo)

トランスフォーマーは、固定されたコンテキスト長のタスクに対して印象的な汎化能力を持っています。しかし、文字列の複製などの簡単なタスクでも、任意の長さのシーケンスに汎化することができず、失敗します。さらに、単に長いシーケンスでトレーニングするだけでは、グローバルアテンションメカニズムの二次計算複雑性のために効率が悪くなります。本研究では、この失敗モードが、長いシーケンスに対して位置エンコーディングが分布外であることに関連していることを示し（相対エンコーディングでも）、この問題を克服することができる新しい位置エンコーディングのファミリーを紹介します。具体的には、ランダム化された位置エンコーディングスキームは、長いシーケンスの位置をシミュレートし、順序付けられたサブセットをランダムに選択してシーケンスの長さに合わせます。6000のモデルを15のアルゴリズム推論タスクで大規模に評価した結果、提案手法により、トランスフォーマーが未知の長さのシーケンスに汎化できるようになり、平均でテスト精度が12.0％向上しました。

Summary (by gpt-3.5-turbo)

トランスフォーマーは、固定されたコンテキスト長のタスクに対して印象的な汎化能力を持っているが、長いシーケンスに対しては失敗することがある。本研究では、この失敗モードが位置エンコーディングに関連していることを示し、新しい位置エンコーディングのファミリーを紹介する。ランダム化された位置エンコーディングスキームにより、トランスフォーマーが未知の長さのシーケンスに汎化できるようになり、平均でテスト精度が12.0％向上した。

AkihikoWatanabe added the Pocket label Jun 16, 2023

AkihikoWatanabe changed the title あ Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 #755

Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 #755

AkihikoWatanabe commented Jun 16, 2023 •

edited

Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 #755

Randomized Positional Encodings Boost Length Generalization of Transformers, Anian Ruoss+, N/A, arXiv'23 #755

Comments

AkihikoWatanabe commented Jun 16, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Jun 16, 2023 •

edited