Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL: Fix URLSearchParams to further avoid decoding URIError #1174

Merged
merged 1 commit into from
Mar 16, 2022

Conversation

Krinkle
Copy link
Contributor

@Krinkle Krinkle commented Mar 15, 2022

Ref #1173.
Ref #4.

Co-authored-by: David Chan david@troi.org

@Krinkle Krinkle requested a review from a team as a code owner March 15, 2022 22:32
@origamiserviceuser origamiserviceuser added this to Backlog in Origami ✨ Mar 15, 2022
@github-actions github-actions bot added the library Relates to an Origami library label Mar 15, 2022
@JakeChampion
Copy link
Owner

Thanks for this @Krinkle - I've triggered CI to run the tests, they should be running at https://github.com/Financial-Times/polyfill-library/actions/runs/1989744310

Comment on lines +66 to +85
// This can't simply use decodeURIComponent (part of ECMAScript) as that's limited to
// decoding to valid UTF-8 only. It throws URIError for literals that look like percent
// encoding (e.g. `x=%`, `x=%a`, and `x=a%2sf`) and for non-UTF8 binary data that was
// percent encoded and cannot be turned back into binary within a JavaScript string.
//
// The spec deals with this as follows:
// * Read input as UTF-8 encoded bytes. This needs low-level access or a modern
// Web API, like TextDecoder. Old browsers don't have that, and it'd a large
// dependency to add to this polyfill.
// * For each percentage sign followed by two hex, blindly decode the byte in binary
// form. This would require TextEncoder to not corrupt multi-byte chars.
// * Replace any bytes that would be invalid under UTF-8 with U+FFFD.
//
// Instead we:
// * Use the fact that UTF-8 is designed to make validation easy in binary.
// You don't have to decode first. There are only a handful of valid prefixes and
// ranges, per RFC 3629. <https://datatracker.ietf.org/doc/html/rfc3629#section-3>
// * Safely create multi-byte chars with decodeURIComponent, by only passing it
// valid and full characters (e.g. "%F0" separately from "%F0%9F%92%A9" throws).
// Anything else is kept as literal or replaced with U+FFFD, as per the URL spec.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thank you for this detailed comment

@JakeChampion JakeChampion merged commit 0ece79c into JakeChampion:master Mar 16, 2022
Origami ✨ automation moved this from Backlog to Done Mar 16, 2022
@Krinkle Krinkle deleted the fix-percent-decode branch March 16, 2022 10:28
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 17, 2023
@robertboulton robertboulton removed this from Done in Origami ✨ Jul 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
library Relates to an Origami library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants