-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numeric separators #20324
Numeric separators #20324
Conversation
@@ -345,7 +345,7 @@ namespace ts { | |||
export function getLiteralText(node: LiteralLikeNode, sourceFile: SourceFile) { | |||
// If we don't need to downlevel and we can reach the original source text using | |||
// the node's parent reference, then simply get the text as it was originally written. | |||
if (!nodeIsSynthesized(node) && node.parent) { | |||
if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should emit them as is for ESNext i would say..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acctually ES2018
and later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an es2018
target now or wait until the ES2018 spec is official?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhegazy said offline that we should open an issue to add an ES2018 target, since there are some lib additions to make in addition to adding an output mode that preserves these separators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #20342 to track.
can you run the perf tests as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at https://docs.google.com/presentation/d/1E8yKRJwA4iX_EctpY48KGBwAsCtNZpTOz4wu7tbxTqE/edit#slide=id.g29f65e0163_0_202 to make sure we test those examples as well? It looks like you covered most of them, but I didn't see tests for a few like ._
and 1\u005F01234
that it would be worth verifying we cover.
src/compiler/scanner.ts
Outdated
error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1); | ||
pos++; | ||
} | ||
return result = (result || "") + text.substring(start, pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely assign result
to at least ""
, so why not just initialize result
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you don't need to reassign result
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely assign
result
to at least""
, so why not just initializeresult
above?
result
is only assigned in the presence of a separator.
Also, you don't need to reassign
result
here.
Done.
src/compiler/scanner.ts
Outdated
break; | ||
} | ||
if (text.charCodeAt(pos - 1) === CharacterCodes._) { | ||
pos--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this exact set of three lines multiple times in this PR. Would it make sense to provide an optional argument to error
to override the position it uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.
start = pos; | ||
continue; | ||
} | ||
if (isDigit(ch)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the similarities between this, scanHexDigit
, and scanBinaryOrOctalDigits
perhaps we should unify (at least in part) these three functions by having scanNumberFragment
take a callback for isDigit
and then calling it as scanNumberFragment(isDigit)
, scanNumberFragment(isHexDigit)
, scanNumberFragment(isBinaryDigit)
, and scanNumberFragment(isOctalDigit)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scanBinaryOrOctalDigits
and scanHexDigit
also do arithmetic to calculate the actual numeric value, however (rather than relying on the host being able to parse the literal correctly). I would also need to pass thru another callback to retain that behavior, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scanHexDigit
also has to deal with validating that there are a certain number of digits and have an option to not recognize separators, for when its used while handling escape sequences. IMO, while the general structure is similar, they're distinct enough to not warrant a shared base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you can pass in a base
of type 2 | 8 | 10 | 16
like in scanBinaryOrOctalDigits
and special case hex characters for base 16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth keeping focus on performance when parsing numbers, even at a cost of flexibility. Too hot a code path.
@@ -345,7 +345,7 @@ namespace ts { | |||
export function getLiteralText(node: LiteralLikeNode, sourceFile: SourceFile) { | |||
// If we don't need to downlevel and we can reach the original source text using | |||
// the node's parent reference, then simply get the text as it was originally written. | |||
if (!nodeIsSynthesized(node) && node.parent) { | |||
if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an es2018
target now or wait until the ES2018 spec is official?
==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts (3 errors) ==== | ||
0_B0101 | ||
~ | ||
!!! error TS6188: Numeric seperators are not allowed at the end of a literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message isn't clear considering the context. Perhaps it should be Numeric separators are not allowed here
.
src/compiler/diagnosticMessages.json
Outdated
@@ -3407,6 +3407,10 @@ | |||
"category": "Message", | |||
"code": 6187 | |||
}, | |||
"Numeric seperators are not allowed at the end of a literal.": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separators
|
||
!!! error TS1177: Binary digit expected. | ||
~~~~ | ||
!!! error TS2304: Cannot find name '_110'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider just parsing out the rest of the number rather than just stop? Then we'd avoid the excess Cannot find name
errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed grammar says the production is effectively Digit Sep Digit
, so we'd be much more lenient than is needed. Which seems fine; I don't think 10__01
can be treated as anything other than a number with too many separators.
!!! error TS2304: Cannot find name 'B0101'. | ||
|
||
==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts (3 errors) ==== | ||
0b01__11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, after we report the numeric separator error, we may want to continue to parse out the rest of the number to avoid excessive unrelated errors, though I'd like to get @mhegazy's or @DanielRosenwasser's opinion on that.
0x00_11; | ||
0X0_1; | ||
0x1100_0011; | ||
0X0_11_0101; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per tc39/proposal-numeric-separator#25 it looks like "\u{10_ffff}"
is permitted as an extended Unicode escape sequence. Can you add this to your existing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion on tc39/proposal-numeric-separator#25 has continued and it is apparently now an explicit syntax error to witness an _
in a literal in an escape sequence.
src/compiler/scanner.ts
Outdated
end = pos; | ||
} | ||
} | ||
if (tokenFlags & TokenFlags.ContainsSeparator) { | ||
if (decimalFragment && scientificFragment) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just
let result = mainFragment;
if (decimalFragment) {
result += "." + decimalFragment;
}
if (scientificFragment) {
result += "." + scientificFragment;
}
return "" + +mainFragment;
src/compiler/scanner.ts
Outdated
@@ -909,8 +957,16 @@ namespace ts { | |||
function scanHexDigits(minCount: number, scanAsManyAsPossible: boolean): number { | |||
let digits = 0; | |||
let value = 0; | |||
let seperatorAllowed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separatorAllowed
src/compiler/scanner.ts
Outdated
@@ -1218,8 +1279,17 @@ namespace ts { | |||
// For counting number of digits; Valid binaryIntegerLiteral must have at least one binary digit following B or b. | |||
// Similarly valid octalIntegerLiteral must have at least one octal digit following o or O. | |||
let numberOfDigits = 0; | |||
let seperatorAllowed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separator
@rbuckton I think I've hit on all your comments except factoring It's probably worth noting that I included regexp literals with the |
Perf shows little to no change (erring towards slight parse time improvement). Should we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Ideally, we'd continue parsing Unicode escapes with separators, and give a better error on consecutive separators, but that can be done later.
src/compiler/scanner.ts
Outdated
while (true) { | ||
const ch = text.charCodeAt(pos); | ||
// Numeric seperators are allowed anywhere within a numeric literal, except not at the beginning, or following another seperator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separator
src/compiler/scanner.ts
Outdated
break; | ||
} | ||
if (text.charCodeAt(pos - 1) === CharacterCodes._) { | ||
pos--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.
result += text.substring(start, pos); | ||
} | ||
else { | ||
error(Diagnostics.Numeric_separators_are_not_allowed_here, pos, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Multiple consecutive numeric separators are not permitted."
ECMAScript numeric separator proposal is now on stage3 https://github.com/tc39/proposal-numeric-separator and will come to TypeScript at version 2.7. microsoft/TypeScript#20324 Example: ```typescript const i = 123_456; const x = 0x1fa_EF6; const f = 123_456.0_456; const f2 = 123_456.4_5e1_2; ```
This implements the now stage-3 numeric separators proposal.
This enables us to parse literals with underscore separators such as
1_000_000_000
,0b1101_0101_1001
, and0xAE_FE_2F
. At present, these are always downleveled to a non-underscore-containing version.