New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with non-UTF-8 cookies #189
Comments
Ideally the way this would be specified is by sharing logic with |
Thanks and +1. @bsittler did substantial investigation into cross-browser and API/header behavior with encoding, but I'm not sure that research was retained and was definitely not reflected into the spec. We should audit test coverage and get coverage/spec updates. (Sharing w/ HTML also SGTM. PRs/concrete suggestions welcome!) |
FYI, here's the current behavior in Chrome: cookie_test(async t => {
const value = 'value\u00C6'; // Æ
const utf8_value = 'value\xC3\x86'; // UTF-8 encoding
const non_utf8_value = 'value\xC6'; // ISO-8859-1 encoding
await setCookieBinaryHttp(`utf8=${utf8_value}; path=/`);
assert_equals(await getCookieBinaryHttp(), `utf8=${utf8_value}`);
// Set a non-UTF-8 cookie on the server, verify the server thinks it's there.
await setCookieBinaryHttp(`nonutf8=${non_utf8_value}; path=/`);
assert_equals(await getCookieBinaryHttp(), `utf8=${utf8_value}; nonutf8=${non_utf8_value}`);
// document.cookie vs. mixed cookie encodings
assert_equals(await getCookieStringDocument(), undefined,
'document.cookie yields nothing if non-UTF-8 cookies served');
// Cookie Store API vs. mixed cookie encodings
const cookies = await cookieStore.getAll();
assert_equals(cookies.length, 2, 'Both cookies are present by name');
assert_not_equals(
cookies.find(c => c.name === 'utf8'), undefined, 'utf8 cookie present');
assert_equals(
cookies.find(c => c.name === 'utf8').value, value,
'utf8 cookie value should be preserved');
assert_not_equals(
cookies.find(c => c.name === 'nonutf8'), undefined,
'nonutf8 cookie present');
assert_equals(
cookies.find(c => c.name === 'nonutf8').value, '',
'nonutf8 cookie value is empty');
}, 'Non-UTF-8 cookies dropped by browser and not reflected by API'); Appears to be:
I should also try non-UTF-8 cookie names. |
Names: cookie_test(async t => {
const name = 'name\u00C6'; // Æ
const utf8_name = 'name\xC3\x86'; // UTF-8 encoding
const non_utf8_name = 'name\xC6'; // ISO-8859-1 encoding
await setCookieBinaryHttp(`${utf8_name}=value1; path=/`);
assert_equals(await getCookieBinaryHttp(), `${utf8_name}=value1`);
// Set a non-UTF-8 cookie on the server, verify the server thinks it's there.
await setCookieBinaryHttp(`${non_utf8_name}=value2; path=/`);
assert_equals(
await getCookieBinaryHttp(),
`${utf8_name}=value1; ${non_utf8_name}=value2`);
// document.cookie vs. mixed cookie encodings
assert_equals(await getCookieStringDocument(), undefined,
'document.cookie yields nothing if non-UTF-8 cookies served');
// Cookie Store API vs. mixed cookie encodings
const cookies = await cookieStore.getAll();
assert_equals(cookies.length, 2, 'Both cookies are present');
assert_not_equals(
cookies.find(c => c.name === name), undefined, 'utf8 cookie present');
assert_equals(
cookies.find(c => c.name === name).value, 'value1',
'utf8 cookie value should be preserved');
assert_not_equals(
cookies.find(c => c.name === ''), undefined, 'nonutf8 cookie name empty');
assert_equals(
cookies.find(c => c.name === '').value, 'value2',
'nonutf8 cookie value should be preserved');
}, 'Non-UTF-8 cookie names dropped by browser and not reflected by API'); ISTM this behavior would be described (as @annevk mentions) by changing the decode definition to be: To decode a value, run UTF-8 decode without BOM or fail on value, returning the result if not failure, or the empty string otherwise. |
Thanks for playing with this! I think consistently excluding is the way to go, but willing to be convinced of better alternatives. |
Seems like 3 options:
I think 1 is weird and not intended behavior. I'm optimistic that it's web-compatible to move away from that. Between 2 or 3, any preference? (Integration between this spec and the RFC is not incredibly rigorous at the moment; I could imagine one approach that takes the cookie-string output of the RFC and re-parses it, which could involve attempting to decode the whole string and therefore option 3 "falls out"? That also seems a bit weird, though.) |
Oh, I didn't realize |
I think I raised this before, but I don't really see text about it in the specification. If you limit yourself to UTF-8, you need to deal with cookies that do not decode as UTF-8, presumably by dropping them as Chrome appears to do for
document.cookie
. That should be specified somehow.I think that means cookies need to use "UTF-8 decode without BOM or fail" from Encoding.
The text was updated successfully, but these errors were encountered: