Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker needs to handle characters outside of US-ASCII #7

Closed
hannahwhy opened this issue Sep 14, 2013 · 3 comments
Closed

Tracker needs to handle characters outside of US-ASCII #7

hannahwhy opened this issue Sep 14, 2013 · 3 comments
Labels

Comments

@hannahwhy
Copy link
Member

It's possible for users to use characters that are not in US-ASCII in their usernames. Currently, the tracker claims page errors out with an invalid byte sequence error when displaying claims for usernames that contain such characters. One example:

http://paste.archivingyoursh.it/xucamucege.vbs

Strings returned by Redis have an unset encoding. redis/redis-rb#48 looks like it fixed up the Ruby client to honor the default external encoding. We might be able to fix this just with a Encoding::default_external = 'utf-8'.

@hannahwhy
Copy link
Member Author

This happened in our puu.sh project, but could really happen just about anywhere else.

@hannahwhy
Copy link
Member Author

This happened in the isoprey project yesterday.

Also, it's worth considering that nicknames are often used to construct upload targets and WARC names. Permitting characters outside basic ASCII puts additional requirements on filesystems and transfer protocols: not onerous ones, but ones that people often don't consider.

hannahwhy added a commit that referenced this issue Nov 11, 2013
This fix has been running in production for a couple of months now with
no evident ill effects, so I'm submitting it for permanent inclusion in
the tracker codebase.

The main source of external data in the tracker comes from its Warrior
clients and seesaw clients via its Redis instance.  Redis is
encoding-agnostic, and Warrior/Seesaw processes _usually_ run UTF-8.

I don't believe that this is a total fix; I don't think there's anything
stopping someone from sending e.g. a username encoded in UTF-16.  But
this will solve the common cause of #7, and we can build on top of this
work, e.g. enforcing UTF-8 for all Seesaw data returned to the tracker.
@hannahwhy
Copy link
Member Author

Probably fixed now: I haven't been able to trigger this category of bug since 01911de was added. If we see similar errors, we'll file new issues.

(And note to self: that Travis build needs to be fixed.)

hannahwhy added a commit that referenced this issue Nov 17, 2013
Set default external encoding to UTF-8.  #7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant