Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement number separator #10480

Closed
Simn opened this issue Nov 15, 2021 · 17 comments
Closed

Implement number separator #10480

Simn opened this issue Nov 15, 2021 · 17 comments

Comments

@Simn
Copy link
Member

Simn commented Nov 15, 2021

See HaxeFoundation/haxe-evolution#90

Simn added a commit that referenced this issue Nov 15, 2021
@Simn
Copy link
Member Author

Simn commented Nov 15, 2021

Now that I'm looking at this we might have been a bit too quick to fully consider this. Since this opens up more accepted syntax which is visible to macros, there will be questions about support for this new syntax. This is going to have ramifications for parseInt and parseFloat as well which could turn this into a somewhat involved change.

I originally thought that we would just discard the _ during lexing, but that would be inconsistent with how we treat literals in general.

@Aurel300
Copy link
Member

Isn't that true of the numeric suffixes as well? It might be good to provide an API for macros to extract values from EConst

@Simn
Copy link
Member Author

Simn commented Nov 15, 2021

I expect that those are dealt with by a second optional argument to CInt that denotes the suffix (as a String).

@Aurel300
Copy link
Member

Sure, but if it is i64 it should represent an Int64 literal, and Std.parseInt won't deal with that.

@Simn
Copy link
Member Author

Simn commented Nov 15, 2021

Yeah well, it's a good thing we don't talk about parsing so much regarding that feature!

Unlike the separator, it's easy to argue (and document) that the suffix is not part of the literal and that the best the run-time could do with it is ignore it, which I think it already does anyway.

@back2dos
Copy link
Member

I originally thought that we would just discard the _ during lexing, but that would be inconsistent with how we treat literals in general.

String literals also don't necessarily contain the exact source: https://try.haxe.org/#5f35A102

I suppose the question is more relevant whether or not someone will want to perform some shenanigans based on the separators, but since any binary operator may be used to syntactically string together groups of digits I'd say those use cases are already covered.

@nanjizal
Copy link
Contributor

Would it work to have the numbers as an abstract. That way they have no impact on Float's and Int's directly and any side effects or bugs that are created are limited to use of these special abstracts. I don't know how it would look...
var int: __Int = 0xFF_00_FF_00;
But I think if I had a vote I would have agreed with Gama11, maybe you guys can pull it off but seems quite a risky change for not huge gain. Would like Binary numbers though.

@kLabz
Copy link
Contributor

kLabz commented Nov 23, 2021

Would it be allow combination with #10481 as for example var a = 123456_i64;?

@RealyUniqueName RealyUniqueName added this to the Backlog milestone Nov 29, 2021
@Simn
Copy link
Member Author

Simn commented Jan 6, 2022

This is the macro case we'll have to look into:

import haxe.macro.Expr;

macro function test(e:Expr) {
	switch (e.expr) {
		case EConst(CInt(s)):
			trace(s); // 1_2
			trace(Std.parseInt(s)); // 1
		case _:
	}
	return macro null;
}

function main() {
	test(1_2);
}

Would it be allow combination with #10481 as for example var a = 123456_i64;?

Yes. We don't allow trailing _ on their own, but we do allow separating suffixes like that.

@kLabz
Copy link
Contributor

kLabz commented Jan 6, 2022

Maybe turning CInt(s:String) into CInt(s:String, ?raw:String) (in this example, CInt("12", "1_2")) would do? One can still use CInt(s) in pattern matching if raw is optional. And raw could be left out if no number separator is included (to justify it being optional).

@Simn
Copy link
Member Author

Simn commented Jan 6, 2022

I think it's enough if we adapt the macro version of parseInt and parseFloat. If anyone has some custom parse function, it can be updated in a backwards-compatible way.

@Simn
Copy link
Member Author

Simn commented Jan 6, 2022

Technically this is a breaking change, because parseInt is specified to parse until invalid characters:

In decimal mode, parsing continues until an invalid character is detected

So if anybody relied on Std.parseInt("1_2") resulting in 1, well...

@Simn
Copy link
Member Author

Simn commented Jan 6, 2022

... and then we're also back to the problem that parseInt("1_2") would be 12 for macros and 1 everywhere else, so that's no good... argh

Simn added a commit that referenced this issue Jan 6, 2022
* [lexer] accept `_` as number separator

see #10480

* add some tests

* deal with generators

* well whatever

* allow _f64
@Aurel300
Copy link
Member

Aurel300 commented Jan 7, 2022

I don't like that Std.parseInt and Std.parseFloat are used in macros to parse Haxe-like literals. Making the functions do anything other than "just" parsing numbers seems weird. I already find it very annoying that you need to prepend "0x" to the argument of parseInt to parse hexadecimals, rather than having a proper radix argument.

@Simn
Copy link
Member Author

Simn commented Jan 7, 2022

At this point I think you're right. I originally wanted the separator to be completely invisible, but I realize that this isn't going to be the case at syntactic level anyway. Nor should it be, because 12 and 1_2 are two different things there.

I'll think about it a bit more, but I'll probably just close this issue then.

@Simn
Copy link
Member Author

Simn commented Jan 18, 2022

Yeah this is fine!

@Simn Simn closed this as completed Jan 18, 2022
@nanjizal
Copy link
Contributor

Have implemented a simple String->String parser with StringBuffer in pure haxe with no macros, it is suitable for parsing runtime user String ( within xml/json feeds ) into a format suitable for current parseInt and parseFloat.

It could be tweaked and added to StringTools, if that seem a good route, without any backwards compatibility concerns, and could be used directly only when a developer required the extra safety or wished to allow users to use commas in numbers, obviously it could be optimised per target in the future.

https://github.com/nanjizal/parseNumber

ParseNumberScreenshot2

Currently it is functional in js and c++ targets, although it should work in all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants