Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rehash trashes chars in story containing direct-entered chars > 0x0100 #41

Closed
marty-b opened this issue May 7, 2015 · 8 comments
Closed

Comments

@marty-b
Copy link
Contributor

marty-b commented May 7, 2015

(1) Launched https://dev.soylentnews.org using Pale Moon Atom/WinXP (v25.3.2)
(2) Logged in
(3) Clicked Submit Story link on left-hand side of main page
(4) Copy/pasted text from file: UTF-8gen.000001-0007ff.txt
(5) Selected "Plain Old Text" as submission format
(6) Clicked PreviewStory
(7) Clicked SubmitStory
(8) Loaded Submissions List page: https://dev.soylentnews.org/submit.pl?op=list
(9) Clicked on submitted story - text in story looked okay
(10) Clicked Preview - text in story looked okay
(11) Clicked Submit
(12) Waited for story to appear on the main page
(13) Noticed that almost all chars >= 0x0100 were replaced with question marks "?"
(14) Story can be viewed at: https://dev.soylentnews.org/article.pl?sid=15/05/07/1946248

@TheMightyBuzzard
Copy link
Contributor

Saw that.
Hit Edit. Looked fine again.
Hit preview. Looked fine again.
Hit update. Looked fine on regular display.
Hit the main page and it was again borked.
I'll look into it tomorrow or Monday but it's a hell of a quirky bug.

@TheMightyBuzzard
Copy link
Contributor

Okay, no, that was a varnish or memcached issue and it looks fine after clearing the cache.

Reproduce it if you would.

@TheMightyBuzzard
Copy link
Contributor

Right, this time I did NOTHING to any of the stories and they still turned into proper glyphs. Probably from some slashd job or other.

@marty-b
Copy link
Contributor Author

marty-b commented May 14, 2015

Not fixed.
1.) I have three stories that I created and in which I inserted U+0001..U+07ff directly as characters at different points in the story submission process:
a.) In the original submission: https://dev.soylentnews.org/article.pl?sid=15/05/07/1946248
b.) Pasted in upon first view of the story in the story submissions queue: https://dev.soylentnews.org/article.pl?sid=15/05/08/0320231
c.) Pasted in after story has been approved and exists in the Story queue: https://dev.soylentnews.org/article.pl?sid=15/05/08/0149218
2.) All stories, when viewed on the main page: https://dev.soylentnews.org/ show chars whose unicode point is greater than U+00ff as question marks instead of the expected character.
3.) When I attempted to edit each of the stories, they appear fine in the edit area and preview area.
4.) When I viewed each story directly (using the above story links), it looked okay in my browser.
5.) But, when I loaded the dev.soylentnews.org main page, all of them exhibited the replacement of chars greater than U+00ff as "?" (question mark) chars.

@NCommander
Copy link
Contributor

This is an incredibly irritating bug. What I'm suspecting is happening is it depends on the charset the form replies are sending in. mod_perl should be setting the UTF-8 flag if that infact is the case. I use Ubuntu and Firefox and Chrome send their replies in UTF-8 encoding. I don't know what @TheMightyBuzzard uses but it appears to be the same on transmission encoding. @marty-b likely is using a different browser/OS combination that's sending it in a locale that isn't UTF-8, and rehash self-destructs when trying to encode/decode it.

@paulej72
Copy link
Contributor

there is a form attribute that sets the char set. wonder if we need that set now. I think we were fudging it before.

@NCommander
Copy link
Contributor

@paulej72 I'm concerned that if the site 500s if someone submits invalid requests. I need to rewrite the apparmor rules for A2 though ...

@NCommander
Copy link
Contributor

This appears to have been fixed via a my.cnf fix on lithium. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants