-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsume json #7985
Subsume json #7985
Conversation
expect(" ".length).toBe(1); | ||
// ^ That's a U+2028 LINE SEPARATOR UTF-16 char | ||
|
||
expect("\ ".length).toBe(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someone more familiar with the spec can explain why \
followed by U+2028 gets parsed as an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that seems like a bug. The proposal doesn't appear to change the behavior for escaped unicode characters, just unescaped ones, and JSON.parse("\\\u2028")
throws, so this probably should too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a parse issue, I'm reasonably certain this is according to the spec.
✓ jridgewell@Dandy ~/src/babel subsume-json* cat test.js
console.log("\
".length)
✓ jridgewell@Dandy ~/src/babel subsume-json* nvm exec 0.12 node test.js
Running node v0.12.10 (npm v2.14.9)
0
✓ jridgewell@Dandy ~/src/babel subsume-json* nvm exec 10 node test.js
Running node v10.1.0 (npm v5.6.0)
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh you're right, I looked at the wrong escape logic. This is the handling for cases like
var foo = "first\
second";
where the backslash escapes the newline, and those are treated as non-existent charaters. See https://www.ecma-international.org/ecma-262/8.0/#sec-static-semantics-sv
The SV of DoubleStringCharacter::LineContinuation is the empty code unit sequence.
and
The SV of SingleStringCharacter::LineContinuation is the empty code unit sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a LineContinuation
. The proposal doesn’t change how "\<U+2028>"
or "\<U+2029>"
behave.
As a side note, please only ever link to https://tc39.github.io/ecma262/, not to the versioned snapshot on ecma-international.org.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathiasbynens I think it depends what you're linking. I generally purposefully link to the explicit versioned page because it means that links won't break over time if a section gets renamed or something. If it's something that's specifically landed since the last version, then totally, but that isn't the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you’re trying to avoid links that “break over time”, that’s all the more reason to link to the latest draft! The content in the link you posted was broken the moment a new commit was pushed to https://tc39.github.io/ecma262/ that was not reflected in the snapshot.
IDs are preserved even when section names change, so that’s not an issue. And in this PR we’re discussing something that hasn’t even landed in the draft spec yet, so there is no reason at all to link to an outdated spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content in the link you posted was broken the moment
That's why I said it is dependent on what you're linking. I'm not trying to link to an ever-updating version of the document, because this specific discussion is about the current state of things, and I want it to stay pointing at the current state of things at the time of the discussion. This is the same reason I always link to files in git repos with SHA-specific links, rather than to master
. If I could link to a specific SHA when linking to https://tc39.github.io/ecma262/ I'd happily do so, but I don't know if that's possible. If so, I'd be happy to do so. If my objective was to link to a persistent updating version of that section, I'd 100% agree with what you're saying, but that was not my intention in this discussion.
IDs are preserved even when section names change, so that’s not an issue.
That's good to know. I've never heard that as an official policy anywhere, do you have a link to support that?
And in this PR we’re discussing something that hasn’t even landed in the draft spec yet, so there is no reason at all to link to an outdated spec.
The text I quoted is unchanged from what I linked to in both the proposal and the current spec text, so it seems entirely my choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same reason I always link to files in git repos with SHA-specific links, rather than to
master
.
This is an apples-to-oranges comparison. Linking to specific lines of code is different from linking to a spec.
When you link to the Babel repo itself, do you link to https://github.com/babel/babel or to https://github.com/babel/babel/tree/c558dedd7b5ffacf4d462177857cc031e7f4efe0? Why would it be different when linking to a spec?
That's good to know. I've never heard that as an official policy anywhere, do you have a link to support that?
It’s why we have es6num
entries in the spec.
I'm not trying to link to an ever-updating version of the document, because this specific discussion is about the current state of things
The only document reflecting the current state of things is the draft. In this particular case, it’s the draft + the proposal. In this context especially, linking to an outdated snapshot doesn’t make sense.
Web standards are a moving target, and pretending otherwise is harmful. The WHATWG FAQ has a useful entry on this, which applies not just to their own standards, but to web standards in general: https://whatwg.org/faq#stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an apples-to-oranges comparison. Linking to specific lines of code is different from linking to a spec.
I guess this is the primary point I disagree on. Any discussion has both a content, and a temporal aspect. If my intention was to answer "what section of the spec defines the string-value algorithm, I'd 100% agree that linking to https://tc39.github.io/ecma262/ with an ID alone would be the right call. But that's not what I was doing, I was quoting text from the spec at a specific point in time. That may change in the future, that's great, but I want to my link to stay pointing at the exact version that I took the quoted text from, because this discussion focuses on the spec as it is today, but as it will be in a year or whatever.
When you link to the Babel repo itself, do you link to babel/babel or to c558ded?
That would entirely depend on if I'm attempting to link to the repo in general, or the repo at a specific point in time.
It’s why we have es6num entries in the spec.
Awesome, I'm happy to hear that it is an official goal to keep IDs consistent.
Web standards are a moving target, and pretending otherwise is harmful. The WHATWG FAQ has a useful entry on this, which applies not just to their own standards, but to web standards in general: whatwg.org/faq#stable
I don't disagree, and I'm in no way pretending otherwise, but it was my only option because I can't provide a moment in time to view the spec on Github, and the most recent immutable version I could reach for was the published 8.0
document. I'd love to be able to link to https://tc39.github.io/ecma262/?SHA=ABC...
to link to a specific moment in time. Then the spec page itself could trivially say "hey this section has been updated since the link was create" or something, but it doesn't currently do that.
The fact that specs are moving toward being living standards is fantastic and I fully support it, but it doesn't mean it isn't useful to be able to link to a spec at a specific moment.
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/8067/ |
@@ -1,3 +1,4 @@ | |||
// break directive parsing | |||
breakDirectiveParsing; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do
("before
after");
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the README. I don't feel strongly about the breakDirectiveParsing;
thing if you want to leave it.
@@ -0,0 +1,60 @@ | |||
# @babel/plugin-proposal-optional-catch-binding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old README content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
expect(" ".length).toBe(1); | ||
// ^ That's a U+2028 LINE SEPARATOR UTF-16 char | ||
|
||
expect("\ ".length).toBe(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that seems like a bug. The proposal doesn't appear to change the behavior for escaped unicode characters, just unescaped ones, and JSON.parse("\\\u2028")
throws, so this probably should too.
https://github.com/tc39/proposal-json-superset is at stage 3. This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives.
|
||
export default declare(api => { | ||
api.assertVersion(7); | ||
const regex = /(^|[^\\])([\u2028\u2029])/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of check with [^\\]
is always a bit dangerous because it will fail on code like
"before\\
after";
since there is a slash beforehand, but it was already escaped and thus not an issue. Not super common, but it is an edge case that isn't handled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
// If there's an odd number, that means the separator itself was escaped. | ||
// "\X" escapes X. | ||
// "\\X" escapes the backslash, so X is unescaped. | ||
const isEscaped = escapes.length % 2 === 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
* Add Subsume JSON transform https://github.com/tc39/proposal-json-superset is at stage 3. This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives. * Move to Stage 3 * Break diretive parsing * Update README * Handle multi-escape sequences * Remove babylon files after rename
https://github.com/tc39/proposal-json-superset is at stage 3.
This allows U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped inside strings and directives.