Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with non-UTF-8 cookies #189

Open
annevk opened this issue Dec 2, 2020 · 7 comments
Open

Deal with non-UTF-8 cookies #189

annevk opened this issue Dec 2, 2020 · 7 comments

Comments

@annevk
Copy link
Collaborator

annevk commented Dec 2, 2020

I think I raised this before, but I don't really see text about it in the specification. If you limit yourself to UTF-8, you need to deal with cookies that do not decode as UTF-8, presumably by dropping them as Chrome appears to do for document.cookie. That should be specified somehow.

I think that means cookies need to use "UTF-8 decode without BOM or fail" from Encoding.

@annevk
Copy link
Collaborator Author

annevk commented Dec 2, 2020

Ideally the way this would be specified is by sharing logic with document.cookie, FWIW. (Something I also raised before.)

@inexorabletash
Copy link
Collaborator

Thanks and +1. @bsittler did substantial investigation into cross-browser and API/header behavior with encoding, but I'm not sure that research was retained and was definitely not reflected into the spec.

We should audit test coverage and get coverage/spec updates.

(Sharing w/ HTML also SGTM. PRs/concrete suggestions welcome!)

@inexorabletash
Copy link
Collaborator

inexorabletash commented Jul 6, 2023

FYI, here's the current behavior in Chrome:

cookie_test(async t => {
  const value = 'value\u00C6'; // Æ
  const utf8_value = 'value\xC3\x86'; // UTF-8 encoding
  const non_utf8_value = 'value\xC6'; // ISO-8859-1 encoding

  await setCookieBinaryHttp(`utf8=${utf8_value}; path=/`);
  assert_equals(await getCookieBinaryHttp(), `utf8=${utf8_value}`);

  // Set a non-UTF-8 cookie on the server, verify the server thinks it's there.
  await setCookieBinaryHttp(`nonutf8=${non_utf8_value}; path=/`);
  assert_equals(await getCookieBinaryHttp(), `utf8=${utf8_value}; nonutf8=${non_utf8_value}`);

  // document.cookie vs. mixed cookie encodings
  assert_equals(await getCookieStringDocument(), undefined,
                'document.cookie yields nothing if non-UTF-8 cookies served');

  // Cookie Store API vs. mixed cookie encodings
  const cookies = await cookieStore.getAll();
  assert_equals(cookies.length, 2, 'Both cookies are present by name');
  assert_not_equals(
      cookies.find(c => c.name === 'utf8'), undefined, 'utf8 cookie present');
  assert_equals(
      cookies.find(c => c.name === 'utf8').value, value,
      'utf8 cookie value should be preserved');
  assert_not_equals(
      cookies.find(c => c.name === 'nonutf8'), undefined,
      'nonutf8 cookie present');
  assert_equals(
      cookies.find(c => c.name === 'nonutf8').value, '',
      'nonutf8 cookie value is empty');
}, 'Non-UTF-8 cookies dropped by browser and not reflected by API');

Appears to be:

  • document.cookie yields nothing if any of the cookies have non-UTF-8 value (!)
  • Cookie Store API getAll() yields UTF-8 cookies with name/value as expected; cookies with non-UTF-8 values are present (!) with empty string as value (!)

I should also try non-UTF-8 cookie names.

@inexorabletash
Copy link
Collaborator

inexorabletash commented Jul 6, 2023

Names:

cookie_test(async t => {
  const name = 'name\u00C6'; // Æ
  const utf8_name = 'name\xC3\x86'; // UTF-8 encoding
  const non_utf8_name = 'name\xC6'; // ISO-8859-1 encoding

  await setCookieBinaryHttp(`${utf8_name}=value1; path=/`);
  assert_equals(await getCookieBinaryHttp(), `${utf8_name}=value1`);

  // Set a non-UTF-8 cookie on the server, verify the server thinks it's there.
  await setCookieBinaryHttp(`${non_utf8_name}=value2; path=/`);
  assert_equals(
      await getCookieBinaryHttp(),
      `${utf8_name}=value1; ${non_utf8_name}=value2`);

  // document.cookie vs. mixed cookie encodings
  assert_equals(await getCookieStringDocument(), undefined,
                'document.cookie yields nothing if non-UTF-8 cookies served');

  // Cookie Store API vs. mixed cookie encodings
  const cookies = await cookieStore.getAll();
  assert_equals(cookies.length, 2, 'Both cookies are present');
  assert_not_equals(
      cookies.find(c => c.name === name), undefined, 'utf8 cookie present');
  assert_equals(
      cookies.find(c => c.name === name).value, 'value1',
    'utf8 cookie value should be preserved');
  assert_not_equals(
      cookies.find(c => c.name === ''), undefined, 'nonutf8 cookie name empty');
  assert_equals(
      cookies.find(c => c.name === '').value, 'value2',
    'nonutf8 cookie value should be preserved');
}, 'Non-UTF-8 cookie names dropped by browser and not reflected by API');

ISTM this behavior would be described (as @annevk mentions) by changing the decode definition to be:

To decode a value, run UTF-8 decode without BOM or fail on value, returning the result if not failure, or the empty string otherwise.

@annevk
Copy link
Collaborator Author

annevk commented Jul 6, 2023

Thanks for playing with this! I think consistently excluding is the way to go, but willing to be convinced of better alternatives.

@inexorabletash
Copy link
Collaborator

Seems like 3 options:

  1. For any given part of a cookie - if it fails to decode, yield empty string for that part (i.e. what Chrome appears to be doing).
  2. If any given part of a cookie fails to decode, elide that cookie .
  3. If any given part of any cookie fails to decode, elide all cookies (i.e. what document.cookie appears to do in Chrome)

I think 1 is weird and not intended behavior. I'm optimistic that it's web-compatible to move away from that. Between 2 or 3, any preference?

(Integration between this spec and the RFC is not incredibly rigorous at the moment; I could imagine one approach that takes the cookie-string output of the RFC and re-parses it, which could involve attempting to decode the whole string and therefore option 3 "falls out"? That also seems a bit weird, though.)

@annevk
Copy link
Collaborator Author

annevk commented Jul 7, 2023

Oh, I didn't realize document.cookie excluded all. That does seem weird, yeah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants