Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use punycode more appropriately #7

Merged
merged 1 commit into from Jan 9, 2017
Merged

Conversation

domenic
Copy link
Member

@domenic domenic commented Jan 6, 2017

Previously we used punycode's toASCII and toUnicode exports, which implement incomplete parts of TR46 themselves (with buggy "is ASCII" checks until recently; see #5 and mathiasbynens/punycode.js#59. Now we usethe lower-level encode and decode exports, with appropriate tweaks to the surrounding code to more fully conform to the surrounding TR46 algorithm steps.

Fixes #5.


Still 488 failing tests before and after. However this improves jsdom/whatwg-url results on the host parser tests at web-platform-tests/wpt#4504 to only 3 failures none of which seem tr46 related.

@@ -12,6 +12,16 @@ function normalize(str) { // fix bug in v8
return str.split('\u0000').map(function (s) { return s.normalize('NFC'); }).join('\u0000');
}

function containsNonASCII(str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexes seem to be much faster for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, I should have checked.

@Sebmaster
Copy link
Member

Regarding the test failures: A lot of that is just the inadequate description of the test suite by Unicode. Getting failures down is basically only a heuristic.

@Sebmaster
Copy link
Member

Should also switch to punycode.js proper (the package) instead of using the node built-in module while we're at it.

Previously we used punycode's toASCII and toUnicode exports, which implement incomplete parts of TR46 themselves (with buggy "is ASCII" checks until recently; see jsdom#5 and mathiasbynens/punycode.js#59. Now we usethe lower-level encode and decode exports, with appropriate tweaks to the surrounding code to more fully conform to the surrounding TR46 algorithm steps.

This also upgrades to the latest punycode.js published on npm, instead of using the now-deprecated one bundled with Node.js.

Fixes jsdom#5.
@domenic domenic merged commit ca795bf into jsdom:master Jan 9, 2017
@domenic domenic deleted the better-punycode branch January 9, 2017 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants