You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transformers have impressive generalization capabilities on tasks with afixed context length. However, they fail to generalize to sequences ofarbitrary length, even for seemingly simple tasks such as duplicating a string.Moreover, simply training on longer sequences is inefficient due to thequadratic computation complexity of the global attention mechanism. In thiswork, we demonstrate that this failure mode is linked to positional encodingsbeing out-of-distribution for longer sequences (even for relative encodings)and introduce a novel family of positional encodings that can overcome thisproblem. Concretely, our randomized positional encoding scheme simulates thepositions of longer sequences and randomly selects an ordered subset to fit thesequence's length. Our large-scale empirical evaluation of 6000 models across15 algorithmic reasoning tasks shows that our method allows Transformers togeneralize to sequences of unseen length (increasing test accuracy by 12.0% onaverage).
AkihikoWatanabe
changed the title
あ
Randomized Positional Encodings Boost Length Generalization of
Transformers, Anian Ruoss+, N/A, arXiv'23
Jun 16, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: