# Proposal: Digit separators #216

Closed
opened this Issue Feb 3, 2015 · 41 comments

Projects
None yet
Contributor

### MadsTorgersen commented Feb 3, 2015

 Being able to group digits in large numeric literals would have great readability impact and no significant downside. Adding binary literals (#215) would increase the likelihood of numeric literals being long, so the two features enhance each other. We would follow Java and others, and use an underscore `_` as a digit separator. It would be able to occur everywhere in a numeric literal (except as the first and last character), since different groupings may make sense in different scenarios and especially for different numeric bases: ```int bin = 0b1001_1010_0001_0100; int hex = 0x1b_a0_44_fe; int dec = 33_554_432; int weird = 1_2__3___4____5_____6______7_______8________9; double real = 1_000.111_1e-1_000;``` Any sequence of digits may be separated by underscores, possibly more than one underscore between two consecutive digits. They are allowed in decimals as well as exponents, but following the previous rule, they may not appear next to the decimal (`10_.0`), next to the exponent character (`1.1e_1`), or next to the type specifier (`10_f`). When used in binary and hexadecimal literals, they may not appear immediately following the `0x` or `0b`. The syntax is straightforward, and the separators have no semantic impact - they are simply ignored. This has broad value and is easy to implement.

Contributor

### svick commented Feb 3, 2015

 Does this apply to real literals as well? For example, would `1_0_._5_e_-_1_6_m_` be valid? I have no idea if this would be useful, just curious.

 👍

### chrisaut commented Feb 4, 2015

 Don't shoot me, but would it be too hard to parse "space" as a seperator? Or does that make the grammer ambiguous? ```int two = 0b 10; short max = 0x ffff; long oneMillion = 1 000 000;``` Just thinking out loud.
Contributor

### AdamSpeight2008 commented Feb 4, 2015

 Digit separators where included in the VB.net (vNext CTP) would it be beneficial to a also describe what was allow in VB? It think '1_000' was allowed but `1__000` wasn't. Comma usage could be an issue as it would clash with array literals, as you couldn't tell what was number and what was and array element.

Closed

### mburbea commented Feb 4, 2015

 I agree I like space more then underscore. It's generally easier to type and makes it easier when working with something like hex numbers e.g. `0x8080 8080 8080 8080UL` is so much easier to read and make sure I've filled all the slots vs something like `0x8080808080808080UL` where I have to sit and count to see if I got 16 characters or I only typed 14 or something. How's about ' as well.

### alanfo commented Feb 6, 2015

 I don't see how you could use a space as the separator because numeric literals would then potentially consist of not one but several tokens. This would make them very difficult to parse. The underscore seems the best choice of separator to me, particularly as it's already used by several other languages. I'm not so keen on allowing multiple consecutive underscores but I suppose it does no harm.
Contributor

### AdamSpeight2008 commented Feb 6, 2015

 This grammar wouldn't allow consecutive separators. `````` digit ::= '0' - '9' sep ::= '_' prefix ::= literal ::= prefix (sep? digit)+ `````` I think spaces could also be possible `````` digit ::= '0'-'9' seperator ::= ' ' literal ::= digit (separator? digit)* ``````

### paulomorgado commented Feb 6, 2015

 I think it would be very hard to use spaces. I haven't looked at the parser, but it's probably doing something like breaking the text at white spaces, parenthesis, braces, whatever and analyzing the tokens from there. Assuming that after a numeric literal it might come the rest of it is doable, but I don't think it is worth the cost. And what next? This? ``````var a = 1111 1111 1111 1111; `````` Or this? ``````var a = 1111 // comment 1111 // comment 1111 // comment 1111; // comment `````` Although it might be an itsy bitsy harder to write in most keyboard configuration, the semantic break of the numeric literal is the same with the `_` and I would argue that it's even better because gives separation and cohesion.
Contributor

### AdamSpeight2008 commented Feb 6, 2015

 Wonder if the parser supports significant whitespace?
Contributor

### AnthonyDGreen commented Feb 7, 2015

 The VB implementation of digit group separators prototyped last year actually supported three different separators originally: underscore, back tick, and space. So you could write &B1111 0010 or 1_000_000 or 3`600. We quickly decided that back tick didn't make enough sense to anyone and cut it. The VB preview still supported both underscores and spaces. The biggest motivation for spaces was binary literals, another feature prototyped at the same time, because binary numbers are conventionally separated with spaces. As to implementation, it's not hard at all really - at least in VB, particularly when you don't allow multiple consecutive separators. Normally the scanner encounters a digit and starts scanning a integral literal one character at a time until it encounters a character that's not a digit for the base being used (decimal, hex, octal) then it stops. We changed it so that if the non-digit character were a underscore or space it would peek one more character ahead and if that character were a digit it would keep scanning it as a single token. There are some corner cases you have to put extra recovery around but it's not very complicated, particularly because in VB it's not valid to have two integer literals follow one another so it's non-breaking to interpret 1 1 as 11. I think C# is the same here though in C# we were pretty settled that underscore would be the sole separator. I think the biggest concern about that is that tools would be confused thinking the space was a word boundary (not VS, the editor is smart enough in VS to handle space) and we just couldn't foresee what havoc spaces would be unleashing on the world (if any). Another more minor concern was complexity - would users benefit more from having a single consistent separator used everywhere? If we decided to pick one it would likely be the underscore so space was only a possibility if we were ok with having two separators which was an open question. -ADG
Contributor

### thomaslevesque commented Feb 7, 2015

 Using space as a separator would probably be a bad idea, because it would cause hard-to-spot mistakes. For instance, `int[] numbers = { 1 2 }` looks like an array with the numbers 1 and 2, but it would actually be an array with only the number 12. Forgetting a comma would silently change the meaning of the code, instead of causing an error.

### chrisaut commented Feb 7, 2015

 @thomaslevesque that is a very good point, before I suggested it I quickly tried to think of places where two numbers would follow each other, but I had totally missed this obvious one. I think that is probably a deal breaker. Seems generally people are not for using space, and I think I have come to agree with this point. Still don't like how "1_000" looks, but it might be the best and easiest option.
Contributor

### AdamSpeight2008 commented Feb 7, 2015

 Isn't this proposal about digit separators for the literals have a prefix?
Contributor

### thomaslevesque commented Feb 7, 2015

 @AdamSpeight2008, no, it's for all numeric literals.

Closed

Contributor

### AnthonyDGreen commented Feb 8, 2015

 @AdamSpeight2008, we did consider restricting space in particular to its most obvious use case - binary literals. It would be unusual, but I think it's worth considering if it gives us more confidence in the feature. @thomaslevesque, @chrisaut, I find that developers tend to bias negatively on what would confuse other developers and how often. Just about every feature ever proposed or introduced has someone saying "this will cause hard to spot mistakes for everyone ever". There are also features which at first seem harmless - then later turn out to be pits of failure. Fortunately, with "Roslyn" and a managed code base it's much easier to quickly prototype language features - even the scary ones and experiment and make decisions after making observations. I think that will give us the most room to explore the full potential of the language without being committed to doing or not doing a feature a particular way too early. It's still very very early in the design of VB15 (this idea has 0% chance of making it into C#) and given how often space has been proposed or preferred by different VB users we've spoken to I'd hate to cut the idea down prematurely if it could actually produce a better experience for those users. Regards, -ADG

### mikedn commented Feb 8, 2015

 I'd say ' or ` are better choices than _: They're easier to type (single keystroke instead of combination) Even they are placed at the top of the text they look more similar to commas and dots that are used as digit separators in various cultures The _ might be useful to other features, such as user defined literals.

### d-kr commented Feb 8, 2015

 @mikedn They're easier to type (single keystroke instead of combination) Sadly this holds true only for the US keyboard layout. At least In the German layout all three require two key strokes. Only space is one keystroke here, too.

### tomasr commented Feb 8, 2015

 Agreed ` or ' are undesirable for the reasons already mentioned. I actually don't mind using _ as a separator at all, and, frankly, anything here is better than nothing :) Using space seems like a recipe for conflicts all over the place, and I don't see it adding that much value. I dislike the idea of allowing multiple, alternative separators, while anyone reusing Roslyn wouldn't care, other tools doing their own lexing of C# code would have to do much more work.
Contributor

### AdamSpeight2008 commented Feb 9, 2015

 `'` is used for a comment in VB.net
Contributor

### AdamSpeight2008 commented Feb 9, 2015

 In VB.net `_` is also used as a line continuation. Would that cause a misread of the user's intent?

### ViIvanov commented Feb 10, 2015

 @mikedn, @tomasr `or ' is good only for decimal digits. Let`s see other cases: ```int bin = 0b`1001`1010`0001`0100; int hex = 0x1b`a0`44`fe; int dec = 33`554`432; int weird = 1`2``3```4````5`````6``````7```````8````````9`````````;``` I think `_` is better because it more universal.
Contributor

### AdamSpeight2008 commented Feb 10, 2015

 @ViIvanov ` and ' make it look like numbers are indicating degrees. or feet and inches.
Contributor

### AnthonyDGreen commented Feb 11, 2015

 @AdamSpeight2008, in VB the explicit line continuation is actually to ensure that the underscore is never a trailing character of an identifier or other token so it wouldn't be a problem. I agree that ` and ' look more like units of measurement. _ has a precedent in identifiers as a chunk separator. is used for binary numbers in particular and has been recommended by various bodies as a standard separator alternative to either comma or period (http://en.wikipedia.org/wiki/Decimal_mark#Digit_grouping) I haven't seen a good scenario for multiple consecutive separators yet and am likely to advocate disallowing them.

### paulomorgado commented Feb 11, 2015

 Just to reinforce what @AnthonyDGreen and @d-kr said, on the Portuguese keyboard layout `requires me to type **[SHIFT]**+**[`]** followed by [SPACE] if the following character is a vowel. You couldn't possibly imagine how hard was to me to type code in markdown.

Closed

Closed

Merged

### zippec commented Jun 26, 2015

 I like this proposal. But I don't know why this restriction is necessary: "When used in binary and hexadecimal literals, they may not appear immediately following the `0x` or `0b`." I feel like `int bin = 0b_1001_1010_0001_0100;` is much better than `int bin = 0b1001_1010_0001_0100;` and I can't imagine any problem with allowing this.
Contributor

### AdamSpeight2008 commented Jun 26, 2015

 @jveselka Me too especially the general grammar would be `literal ::= prefix (sep? digit)+`

Closed

### yume-chan commented Jun 3, 2016

 @gafter So the final decision is disallowing separators immediately after prefixes?
Contributor

### AdamSpeight2008 commented Jun 3, 2016

 @CnSimonChan I think it is implement in the `Future` branch.but it needs the feature flag to be set (or the language version to be VB15. Not sure if these features are available by default in that version (15) of the language.

### jskeet commented Jul 22, 2016

 @zippec: Completely agree. @jaredpar should we break out the feature request for `0x_1001_1000` to be valid into a separate issue?
Member

### jaredpar commented Jul 22, 2016

 @jskeet yes let's use a separate issue since this feature is implemented as spec'd here. We can use the new issue to track changing to allow that syntax.

Closed

Open

Open

Open

### weitzhandler commented Mar 6, 2017 • edited Edited 1 time weitzhandler edited Mar 6, 2017 (most recent)

 Would be nice. Although space feels less C#ish, I still vote for spaces, I mean can it go wrong as long as we're expecting a `;`? Anyway, I think it should only be allowed in binary/hex/o̷c̷t̷ etc.?

### paulomorgado commented Mar 6, 2017

 @weitzhandler, I think that changing C# 7 and Visual Studio for tomorrow is, most probably, out of the question. 😄
Contributor

### alrz commented Mar 6, 2017

 so `var a = 1 0;` is actually just ten?

### paulomorgado commented Mar 6, 2017

 @alrz, that's no worst than ``````var a = 1______________________________________________________________________________________________________________________________________________________________________________________________________________________________________0; `````` The greater issue here is that, in this particular case and only in this particular case, space is a special case for white spaces. And that's bad. Very bad.
Contributor

### alrz commented Mar 6, 2017

 @paulomorgado No, the space is worst because it's invisible. In your example it's impossible to overlook the zero because the literal goes on and on. and on.

### weitzhandler commented Mar 6, 2017 • edited Edited 1 time weitzhandler edited Mar 6, 2017 (most recent)

 Limit to single space (surely no line breaks 😡). `_` is definitely more C#ish anyway. And separation only make sense in binary/hex.
Contributor

### alrz commented Mar 6, 2017

 @weitzhandler No it doesn't. C# doesn't mind how many spaces you are using between tokens at all.

### weitzhandler commented Mar 6, 2017

 We should keep the discussion here.
Member

### gafter commented Mar 6, 2017

 @weitzhandler I think you mean discussion has moved here.

### paulomorgado commented Mar 7, 2017

 @alrz, Visual Studio can make white space visible. But I still think that would be the least of the problems.
Member

### gafter commented Mar 7, 2017

 Discussion for this feature has been moved here.

Merged