Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsume json #7985

Merged
merged 6 commits into from
May 19, 2018
Merged

Subsume json #7985

merged 6 commits into from
May 19, 2018

Conversation

jridgewell
Copy link
Member

https://github.com/tc39/proposal-json-superset is at stage 3.

This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives.

@jridgewell jridgewell added PR: Spec Compliance 👓 A type of pull request used for our changelog categories PR: New Feature 🚀 A type of pull request used for our changelog categories labels May 19, 2018
expect("
".length).toBe(1);
// ^ That's a U+2028 LINE SEPARATOR UTF-16 char

expect("\
".length).toBe(0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe someone more familiar with the spec can explain why \ followed by U+2028 gets parsed as an empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that seems like a bug. The proposal doesn't appear to change the behavior for escaped unicode characters, just unescaped ones, and JSON.parse("\\\u2028") throws, so this probably should too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a parse issue, I'm reasonably certain this is according to the spec.

 ✓ jridgewell@Dandy  ~/src/babel  subsume-json*  cat test.js
console.log("\
".length)

 ✓ jridgewell@Dandy  ~/src/babel  subsume-json*  nvm exec 0.12 node test.js
Running node v0.12.10 (npm v2.14.9)
0

 ✓ jridgewell@Dandy  ~/src/babel  subsume-json*  nvm exec 10 node test.js
Running node v10.1.0 (npm v5.6.0)
0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh you're right, I looked at the wrong escape logic. This is the handling for cases like

var foo = "first\
second";

where the backslash escapes the newline, and those are treated as non-existent charaters. See https://www.ecma-international.org/ecma-262/8.0/#sec-static-semantics-sv

The SV of DoubleStringCharacter::LineContinuation is the empty code unit sequence.

and

The SV of SingleStringCharacter::LineContinuation is the empty code unit sequence.

Copy link
Contributor

@mathiasbynens mathiasbynens May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a LineContinuation. The proposal doesn’t change how "\<U+2028>" or "\<U+2029>" behave.

As a side note, please only ever link to https://tc39.github.io/ecma262/, not to the versioned snapshot on ecma-international.org.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathiasbynens I think it depends what you're linking. I generally purposefully link to the explicit versioned page because it means that links won't break over time if a section gets renamed or something. If it's something that's specifically landed since the last version, then totally, but that isn't the case here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re trying to avoid links that “break over time”, that’s all the more reason to link to the latest draft! The content in the link you posted was broken the moment a new commit was pushed to https://tc39.github.io/ecma262/ that was not reflected in the snapshot.

IDs are preserved even when section names change, so that’s not an issue. And in this PR we’re discussing something that hasn’t even landed in the draft spec yet, so there is no reason at all to link to an outdated spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content in the link you posted was broken the moment

That's why I said it is dependent on what you're linking. I'm not trying to link to an ever-updating version of the document, because this specific discussion is about the current state of things, and I want it to stay pointing at the current state of things at the time of the discussion. This is the same reason I always link to files in git repos with SHA-specific links, rather than to master. If I could link to a specific SHA when linking to https://tc39.github.io/ecma262/ I'd happily do so, but I don't know if that's possible. If so, I'd be happy to do so. If my objective was to link to a persistent updating version of that section, I'd 100% agree with what you're saying, but that was not my intention in this discussion.

IDs are preserved even when section names change, so that’s not an issue.

That's good to know. I've never heard that as an official policy anywhere, do you have a link to support that?

And in this PR we’re discussing something that hasn’t even landed in the draft spec yet, so there is no reason at all to link to an outdated spec.

The text I quoted is unchanged from what I linked to in both the proposal and the current spec text, so it seems entirely my choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same reason I always link to files in git repos with SHA-specific links, rather than to master.

This is an apples-to-oranges comparison. Linking to specific lines of code is different from linking to a spec.

When you link to the Babel repo itself, do you link to https://github.com/babel/babel or to https://github.com/babel/babel/tree/c558dedd7b5ffacf4d462177857cc031e7f4efe0? Why would it be different when linking to a spec?

That's good to know. I've never heard that as an official policy anywhere, do you have a link to support that?

It’s why we have es6num entries in the spec.

I'm not trying to link to an ever-updating version of the document, because this specific discussion is about the current state of things

The only document reflecting the current state of things is the draft. In this particular case, it’s the draft + the proposal. In this context especially, linking to an outdated snapshot doesn’t make sense.

Web standards are a moving target, and pretending otherwise is harmful. The WHATWG FAQ has a useful entry on this, which applies not just to their own standards, but to web standards in general: https://whatwg.org/faq#stable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an apples-to-oranges comparison. Linking to specific lines of code is different from linking to a spec.

I guess this is the primary point I disagree on. Any discussion has both a content, and a temporal aspect. If my intention was to answer "what section of the spec defines the string-value algorithm, I'd 100% agree that linking to https://tc39.github.io/ecma262/ with an ID alone would be the right call. But that's not what I was doing, I was quoting text from the spec at a specific point in time. That may change in the future, that's great, but I want to my link to stay pointing at the exact version that I took the quoted text from, because this discussion focuses on the spec as it is today, but as it will be in a year or whatever.

When you link to the Babel repo itself, do you link to babel/babel or to c558ded?

That would entirely depend on if I'm attempting to link to the repo in general, or the repo at a specific point in time.

It’s why we have es6num entries in the spec.

Awesome, I'm happy to hear that it is an official goal to keep IDs consistent.

Web standards are a moving target, and pretending otherwise is harmful. The WHATWG FAQ has a useful entry on this, which applies not just to their own standards, but to web standards in general: whatwg.org/faq#stable

I don't disagree, and I'm in no way pretending otherwise, but it was my only option because I can't provide a moment in time to view the spec on Github, and the most recent immutable version I could reach for was the published 8.0 document. I'd love to be able to link to https://tc39.github.io/ecma262/?SHA=ABC... to link to a specific moment in time. Then the spec page itself could trivially say "hey this section has been updated since the link was create" or something, but it doesn't currently do that.

The fact that specs are moving toward being living standards is fantastic and I fully support it, but it doesn't mean it isn't useful to be able to link to a spec at a specific moment.

@babel-bot
Copy link
Collaborator

babel-bot commented May 19, 2018

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/8067/

@babel babel deleted a comment from babel-bot May 19, 2018
@@ -1,3 +1,4 @@
// break directive parsing
breakDirectiveParsing;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do

("before
after");

instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@loganfsmyth loganfsmyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the README. I don't feel strongly about the breakDirectiveParsing; thing if you want to leave it.

@@ -0,0 +1,60 @@
# @babel/plugin-proposal-optional-catch-binding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old README content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

expect("
".length).toBe(1);
// ^ That's a U+2028 LINE SEPARATOR UTF-16 char

expect("\
".length).toBe(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that seems like a bug. The proposal doesn't appear to change the behavior for escaped unicode characters, just unescaped ones, and JSON.parse("\\\u2028") throws, so this probably should too.

https://github.com/tc39/proposal-json-superset is at stage 3.

This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives.

export default declare(api => {
api.assertVersion(7);
const regex = /(^|[^\\])([\u2028\u2029])/g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of check with [^\\] is always a bit dangerous because it will fail on code like

"before\\
after";

since there is a slash beforehand, but it was already escaped and thus not an issue. Not super common, but it is an edge case that isn't handled here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

// If there's an odd number, that means the separator itself was escaped.
// "\X" escapes X.
// "\\X" escapes the backslash, so X is unescaped.
const isEscaped = escapes.length % 2 === 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

Copy link
Member

@existentialism existentialism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jridgewell jridgewell merged commit 0879a6d into babel:master May 19, 2018
@jridgewell jridgewell deleted the subsume-json branch May 19, 2018 19:32
storyn26383 pushed a commit to UniSharp/babel that referenced this pull request Aug 6, 2018
* Add Subsume JSON transform

https://github.com/tc39/proposal-json-superset is at stage 3.

This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives.

* Move to Stage 3

* Break diretive parsing

* Update README

* Handle multi-escape sequences

* Remove babylon files after rename
@lock lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue PR: New Feature 🚀 A type of pull request used for our changelog categories PR: Spec Compliance 👓 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants