Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric separators #20324

Merged
merged 11 commits into from
Dec 9, 2017
Merged

Numeric separators #20324

merged 11 commits into from
Dec 9, 2017

Conversation

weswigham
Copy link
Member

This implements the now stage-3 numeric separators proposal.

This enables us to parse literals with underscore separators such as 1_000_000_000, 0b1101_0101_1001, and 0xAE_FE_2F. At present, these are always downleveled to a non-underscore-containing version.

@@ -345,7 +345,7 @@ namespace ts {
export function getLiteralText(node: LiteralLikeNode, sourceFile: SourceFile) {
// If we don't need to downlevel and we can reach the original source text using
// the node's parent reference, then simply get the text as it was originally written.
if (!nodeIsSynthesized(node) && node.parent) {
if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should emit them as is for ESNext i would say..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acctually ES2018 and later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an es2018 target now or wait until the ES2018 spec is official?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhegazy said offline that we should open an issue to add an ES2018 target, since there are some lib additions to make in addition to adding an output mode that preserves these separators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #20342 to track.

@mhegazy
Copy link
Contributor

mhegazy commented Nov 29, 2017

can you run the perf tests as well.

Copy link
Member

@rbuckton rbuckton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at https://docs.google.com/presentation/d/1E8yKRJwA4iX_EctpY48KGBwAsCtNZpTOz4wu7tbxTqE/edit#slide=id.g29f65e0163_0_202 to make sure we test those examples as well? It looks like you covered most of them, but I didn't see tests for a few like ._ and 1\u005F01234 that it would be worth verifying we cover.

error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1);
pos++;
}
return result = (result || "") + text.substring(start, pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely assign result to at least "", so why not just initialize result above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you don't need to reassign result here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely assign result to at least "", so why not just initialize result above?

result is only assigned in the presence of a separator.

Also, you don't need to reassign result here.

Done.

break;
}
if (text.charCodeAt(pos - 1) === CharacterCodes._) {
pos--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this exact set of three lines multiple times in this PR. Would it make sense to provide an optional argument to error to override the position it uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.

start = pos;
continue;
}
if (isDigit(ch)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the similarities between this, scanHexDigit, and scanBinaryOrOctalDigits perhaps we should unify (at least in part) these three functions by having scanNumberFragment take a callback for isDigit and then calling it as scanNumberFragment(isDigit), scanNumberFragment(isHexDigit), scanNumberFragment(isBinaryDigit), and scanNumberFragment(isOctalDigit).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanBinaryOrOctalDigits and scanHexDigit also do arithmetic to calculate the actual numeric value, however (rather than relying on the host being able to parse the literal correctly). I would also need to pass thru another callback to retain that behavior, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanHexDigit also has to deal with validating that there are a certain number of digits and have an option to not recognize separators, for when its used while handling escape sequences. IMO, while the general structure is similar, they're distinct enough to not warrant a shared base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can pass in a base of type 2 | 8 | 10 | 16 like in scanBinaryOrOctalDigits and special case hex characters for base 16?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth keeping focus on performance when parsing numbers, even at a cost of flexibility. Too hot a code path.

@@ -345,7 +345,7 @@ namespace ts {
export function getLiteralText(node: LiteralLikeNode, sourceFile: SourceFile) {
// If we don't need to downlevel and we can reach the original source text using
// the node's parent reference, then simply get the text as it was originally written.
if (!nodeIsSynthesized(node) && node.parent) {
if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an es2018 target now or wait until the ES2018 spec is official?

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts (3 errors) ====
0_B0101
~
!!! error TS6188: Numeric seperators are not allowed at the end of a literal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message isn't clear considering the context. Perhaps it should be Numeric separators are not allowed here.

@@ -3407,6 +3407,10 @@
"category": "Message",
"code": 6187
},
"Numeric seperators are not allowed at the end of a literal.": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separators


!!! error TS1177: Binary digit expected.
~~~~
!!! error TS2304: Cannot find name '_110'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider just parsing out the rest of the number rather than just stop? Then we'd avoid the excess Cannot find name errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed grammar says the production is effectively Digit Sep Digit, so we'd be much more lenient than is needed. Which seems fine; I don't think 10__01 can be treated as anything other than a number with too many separators.

!!! error TS2304: Cannot find name 'B0101'.

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts (3 errors) ====
0b01__11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, after we report the numeric separator error, we may want to continue to parse out the rest of the number to avoid excessive unrelated errors, though I'd like to get @mhegazy's or @DanielRosenwasser's opinion on that.

0x00_11;
0X0_1;
0x1100_0011;
0X0_11_0101;
Copy link
Member

@rbuckton rbuckton Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per tc39/proposal-numeric-separator#25 it looks like "\u{10_ffff}" is permitted as an extended Unicode escape sequence. Can you add this to your existing tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion on tc39/proposal-numeric-separator#25 has continued and it is apparently now an explicit syntax error to witness an _ in a literal in an escape sequence.

end = pos;
}
}
if (tokenFlags & TokenFlags.ContainsSeparator) {
if (decimalFragment && scientificFragment) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just

let result = mainFragment;
if (decimalFragment) {
    result += "." + decimalFragment;
}
if (scientificFragment) {
    result += "." + scientificFragment;
}
return "" + +mainFragment;

@@ -909,8 +957,16 @@ namespace ts {
function scanHexDigits(minCount: number, scanAsManyAsPossible: boolean): number {
let digits = 0;
let value = 0;
let seperatorAllowed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separatorAllowed

@@ -1218,8 +1279,17 @@ namespace ts {
// For counting number of digits; Valid binaryIntegerLiteral must have at least one binary digit following B or b.
// Similarly valid octalIntegerLiteral must have at least one octal digit following o or O.
let numberOfDigits = 0;
let seperatorAllowed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separator

@weswigham
Copy link
Member Author

@rbuckton I think I've hit on all your comments except factoring scanNumberFragment to be a common base to all the number-scanners (which, IMO, isn't worth it with all the specialized functionality the hex scanner needs that the other two do not).

It's probably worth noting that I included regexp literals with the u flag in my unicode escape test; we don't currently support them (rather, we don't downlevel them at all), so none of them have errors flagged, but I figure it would be good to include them as a reference.

@weswigham
Copy link
Member Author

weswigham commented Dec 8, 2017

@mhegazy

Benchmark Name Iterations
Current numeric-seperators 10
Baseline master 10
Metric master numeric-seperators Delta Best Worst
Compiler - Unions - node (v9.0.0, x64)
Memory used 205,991k (± 0.01%) 206,039k (± 0.01%) +49k (+ 0.02%) 205,981k 206,092k
Parse Time 0.68s (± 4.36%) 0.65s (± 0.68%) -0.02s (- 3.69%) 0.64s 0.66s
Bind Time 0.64s (± 3.92%) 0.62s (± 1.52%) -0.01s (- 2.19%) 0.61s 0.65s
Check Time 11.35s (± 3.83%) 10.93s (± 0.49%) -0.42s (- 3.70%) 10.85s 11.10s
Emit Time 0.00s (± 0.00%) 0.00s (± 0.00%) 0.00s ( NaN%) 0.00s 0.00s
Total Time 12.67s (± 3.80%) 12.21s (± 0.46%) -0.46s (- 3.62%) 12.13s 12.39s
Monaco - node (v9.0.0, x64)
Memory used 333,101k (± 0.01%) 333,125k (± 0.02%) +24k (+ 0.01%) 333,005k 333,269k
Parse Time 1.59s (± 4.24%) 1.53s (± 1.80%) -0.05s (- 3.15%) 1.50s 1.64s
Bind Time 0.68s (± 5.63%) 0.67s (± 4.55%) -0.01s (- 1.61%) 0.63s 0.76s
Check Time 3.58s (± 2.46%) 3.48s (± 1.02%) -0.09s (- 2.63%) 3.40s 3.54s
Emit Time 2.46s (± 4.36%) 2.35s (± 3.21%) -0.11s (- 4.39%) 2.29s 2.56s
Total Time 8.31s (± 2.64%) 8.05s (± 0.88%) -0.26s (- 3.13%) 7.94s 8.23s
TFS - node (v9.0.0, x64)
Memory used 289,271k (± 0.01%) 289,275k (± 0.01%) +3k (+ 0.00%) 289,187k 289,356k
Parse Time 1.09s (± 1.46%) 1.10s (± 2.28%) +0.01s (+ 0.64%) 1.07s 1.19s
Bind Time 0.53s (± 0.89%) 0.54s (± 2.49%) +0.01s (+ 2.08%) 0.52s 0.59s
Check Time 3.10s (± 0.61%) 3.15s (± 1.73%) +0.05s (+ 1.58%) 3.11s 3.37s
Emit Time 2.05s (± 1.43%) 2.08s (± 1.77%) +0.03s (+ 1.42%) 2.03s 2.22s
Total Time 6.78s (± 0.56%) 6.87s (± 1.50%) +0.10s (+ 1.42%) 6.77s 7.24s

Perf shows little to no change (erring towards slight parse time improvement). Should we merge?

Copy link
Member

@DanielRosenwasser DanielRosenwasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Ideally, we'd continue parsing Unicode escapes with separators, and give a better error on consecutive separators, but that can be done later.

while (true) {
const ch = text.charCodeAt(pos);
// Numeric seperators are allowed anywhere within a numeric literal, except not at the beginning, or following another seperator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separator

break;
}
if (text.charCodeAt(pos - 1) === CharacterCodes._) {
pos--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.

result += text.substring(start, pos);
}
else {
error(Diagnostics.Numeric_separators_are_not_allowed_here, pos, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Multiple consecutive numeric separators are not permitted."

rhysd added a commit to rhysd/typescript-vim that referenced this pull request Jan 11, 2018
ECMAScript numeric separator proposal is now on stage3

https://github.com/tc39/proposal-numeric-separator

and will come to TypeScript at version 2.7.

microsoft/TypeScript#20324

Example:

```typescript
const i = 123_456;
const x = 0x1fa_EF6;
const f = 123_456.0_456;
const f2 = 123_456.4_5e1_2;
```
@microsoft microsoft locked and limited conversation to collaborators Jun 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants