Digit separators in number literals.

_This is currently under implementation: [implementation issue](https://github.com/dart-lang/sdk/issues/56188), [feature specification](https://github.com/dart-lang/language/blob/main/accepted/future-releases/digit-separators/feature-specification.md)._

---

Solution to #1.

To make long number literals more readable, allow authors to inject [digit group separators](https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping) inside numbers.
Examples with different possible separators:
```text
100 000 000 000 000 000 000  // space 
100,000,000,000,000,000,000  // comma
100.000.000.000.000.000.000  // period
100'000'000'000'000'000'000  // apostrophe (C++)
100_000_000_000_000_000_000  // underscore (many programming languages).
```
The syntax must work even with just a single separator, so it can't be anything that can already validly seperate two expressions (excludes all infix operators and comma) and should already be part of a number literal (excludes decimal point).
So, the comma and decimal point are probably never going to work, even if they are already the standard "thousands separator" in text in different parts of the world.  

Space separation is dangerous because it's hard to see whether it's just space, or it's an accidental tab character. If we allow spacing, should we allow arbitrary whitespace, including line terminators? If so, then this suddenly become quite dangerous. Forget a comma at the end of a line in a multiline list, and two adjacent integers are automatically combined (we already have that problem with strings). So, probably not a good choice, even if it is the preferred formatting for print text.

The apostrope is also the string single-quote character. We don't currently allow adjacent numbers and strings, but if we ever do, then this syntax becomes ambiguous. It's still possible (we disambiguate by assuming it's a digit separator). It is currently used by C++ 14 as a digit group separator, so it is definitely possible.

That leaves underscore, which could be the start of an identifier. Currently `100_000` would be tokenized as "integer literal 100" followed by "identifier _000". However, users would never write an identifier adjacent to another token that contains identifier-valid characters (unlike strings, which have clear delimiters that do not occur anywher else), so this is unlikely to happen in practice. Underscore is already used by a large number of programming languages including Java, Swift, and Python.

We also want to allow multiple separators for higher-level grouping, e.g.,:
```dart
100__000_000_000__000_000_000
```
For this purpose, the underscore extends gracefully. So does space, but has the disadvantage that it collapses when inserted into HTML, whereas `''` looks odd.

For ease of reading and ease of parsing, we should only allow a digit separator that actually separates digits - it must occur between two digits of the number, not at the end or beginning, and if used in double literals, not adjacent to the `.` or `e{+,-,}` characters, or next to an `x` in a hexadecimal literal.

## Examples
```dart
100__000_000__000_000__000_000  // one hundred million million millions!
0x4000_0000_0000_0000
0.000_000_000_01
0x00_14_22_01_23_45  // MAC address
555_123_4567  // US Phone number
```

**Invalid** literals:
```dart
100_
0x_00_14_22_01_23_45 
0._000_000_000_1
100_.1
1.2e_3
```

An identifier like `_100` is a valid identifier, and `_100._100` is a valid member access. If users learn the "separator only between digits" rule quickly, this will likely not be an issue.

## Implementation issues
Should be trivial to implement at the parsing level. The only issue is that a parser might need to copy the digits (without the separators) before calling a parse function, where currently it might get away with pointing a native parse function directly at its input bytes.
This should have no effect after the parsing.

Style guides might introduce a preference for digit grouping (say, numbers with more than six digits should use separators) so a formatter or linter may want access to the actual source as well as the numerical value. The front end should make this available for source processing tools.

## Library issues
Should `int.parse`/`double.parse` accept inputs with underscores. I think it's fine to *not* accept such input. It is not generated by `int.toString()`, and if a user has a string containing such an input, they can remove underscores manually before calling `int.parse`. That is not an option for source code literals.
I'd prefer to keep `int.parse` as efficient as possible, which means not adding a special case in the inner loop.
In JavaScript, parsing uses the built-in `parseInt` or `Number` functions, which do not accept underscores, so it would add (another) overhead for JavaScript compiled code.

## Related work
Java [digit separators](https://docs.oracle.com/javase/8/docs/technotes/guides/language/underscores-literals.html).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Digit separators in number literals. #2

Examples

Implementation issues

Library issues

Related work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Digit separators in number literals. #2

Description

Examples

Implementation issues

Library issues

Related work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions