You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Repeated tokens get incorrect offsets, especially in the presence of extra whitespace and whitespace-like characters. I found examples with spaces and with a right-to-left mark followed by a space. I left out the right-to-left mark example because it's invisible, but you can recreate the behavior by changing the first space to a right-to-left mark in #2 or #3 below. I haven't tested any other space-like characters.
A real life example from Vietnamese Wikipedia, showing more long-distance duplicates. Sorry if the text doesn't make sense. I edited out a bit of Arabic script, which also had a right-to-left mark in it, which is not visible. Here I've added an asterisk (*) before the lines with incorrect offsets.
Repeated tokens get incorrect offsets, especially in the presence of extra whitespace and whitespace-like characters. I found examples with spaces and with a right-to-left mark followed by a space. I left out the right-to-left mark example because it's invisible, but you can recreate the behavior by changing the first space to a right-to-left mark in #2 or #3 below. I haven't tested any other space-like characters.
The text was updated successfully, but these errors were encountered: