-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL: Fix URLSearchParams to further avoid decoding URIError #1174
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ref JakeChampion#1173. Ref JakeChampion#4. Co-authored-by: David Chan <david@troi.org>
7e0b5f0
to
d80a5ce
Compare
Thanks for this @Krinkle - I've triggered CI to run the tests, they should be running at https://github.com/Financial-Times/polyfill-library/actions/runs/1989744310 |
Comment on lines
+66
to
+85
// This can't simply use decodeURIComponent (part of ECMAScript) as that's limited to | ||
// decoding to valid UTF-8 only. It throws URIError for literals that look like percent | ||
// encoding (e.g. `x=%`, `x=%a`, and `x=a%2sf`) and for non-UTF8 binary data that was | ||
// percent encoded and cannot be turned back into binary within a JavaScript string. | ||
// | ||
// The spec deals with this as follows: | ||
// * Read input as UTF-8 encoded bytes. This needs low-level access or a modern | ||
// Web API, like TextDecoder. Old browsers don't have that, and it'd a large | ||
// dependency to add to this polyfill. | ||
// * For each percentage sign followed by two hex, blindly decode the byte in binary | ||
// form. This would require TextEncoder to not corrupt multi-byte chars. | ||
// * Replace any bytes that would be invalid under UTF-8 with U+FFFD. | ||
// | ||
// Instead we: | ||
// * Use the fact that UTF-8 is designed to make validation easy in binary. | ||
// You don't have to decode first. There are only a handful of valid prefixes and | ||
// ranges, per RFC 3629. <https://datatracker.ietf.org/doc/html/rfc3629#section-3> | ||
// * Safely create multi-byte chars with decodeURIComponent, by only passing it | ||
// valid and full characters (e.g. "%F0" separately from "%F0%9F%92%A9" throws). | ||
// Anything else is kept as literal or replaced with U+FFFD, as per the URL spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, thank you for this detailed comment
JakeChampion
approved these changes
Mar 16, 2022
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ref #1173.
Ref #4.
Co-authored-by: David Chan david@troi.org