-
Notifications
You must be signed in to change notification settings - Fork 26
Patch numeric underscores #112
Patch numeric underscores #112
Conversation
* Add constant.numeric.hexfloat.haskell for HexFloatLiterals extension [1] * Add constant.numeric.binary.haskell for BinaryLiterals extension [2] * Modify lit_num for NumericUnderscores extension [3][4] [1]: https://ghc.haskell.org/trac/ghc/ticket/9224 [2]: https://ghc.haskell.org/trac/ghc/ticket/13126 [3]: https://ghc.haskell.org/trac/ghc/ticket/14473 [4]: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0009-numeric-underscores.rst
Hello. Thank you for your contribution. I can't not notice though that except There are also some questions about regular expressions used, which I will post shortly. |
src/include/repository.coffee
Outdated
name: 'constant.numeric.hexadecimal.haskell' | ||
match: '0[xX][0-9a-fA-F]+' | ||
match: '0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure why you're using this rather convoluted expression here (and other places). For all intents and purposes, 0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])*
is the same as 0[xX](?:_*[0-9a-fA-F])+
. And if we're not aiming for technical correctness (which we aren't), latter expression is roughly equivalent to 0[xX][_0-9a-fA-F]+
, which would also accept cases like 0x_
(which I don't think is valid Haskell), but is measurably faster.
Also please don't introduce new capturing groups if you can avoid it, it can potentially break some stuff, and test coverage isn't 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for telling me your policy.
I'm not entirely sure why you're using this rather convoluted expression here (and other places)
I simply implemented it according to the accepted formal specification.
... but is measurably faster.
You mean that this project has adopted a policy that gives priority to speed over accuracy.
I understand.
It is one of the valuable decisions for many platforms.
Also please don't introduce new capturing groups if you can avoid it, it can potentially break some stuff, and test coverage isn't 100%.
OK. Is it also better to merge the binary with octal?
In consideration of speed, I propose the simplified two plans below.
Are you acceptable for this?
Plan A: BinaryLiterals
+ NumericUnderscores
extension (without HexFloatLiterals
)
lit_num: [
name: 'constant.numeric.hexadecimal.haskell'
- match: '0[xX][0-9a-fA-F]+'
+ match: '0[xX][_0-9a-fA-F]+'
,
name: 'constant.numeric.octal.haskell'
- match: '0[oO][0-7]+'
+ match: '0[oObB][_0-7]+'
,
name: 'constant.numeric.float.haskell'
- match: '[0-9]+(\\.[0-9]+[eE][+-]?|\\.|[eE][+-]?)[0-9]+'
+ match: '[_0-9]+(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+'
,
name: 'constant.numeric.decimal.haskell'
- match: '[0-9]+'
+ match: '[0-9][_0-9]*'
]
Plan B: Plan A + HexFloatLiterals
This is difference for Plan A.
name: 'constant.numeric.float.haskell'
- match: '[_0-9]+(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+'
+ match: '(?:0[xX])?[_0-9a-fA-F]+(?:\\.[_0-9a-fA-F]+[eEpP][+-]?|\\.|[eEpP][+-]?)[_0-9a-fA-F]+'
Thank you for your time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also better to merge the binary with octal?
No, those are different beasts, and parser has a rather obvious choice based on 2-character lookahead.
Same goes for hexadecimal floats, those are obviously (from point of view of parser) syntactically different from decimal floats. There's no need to create horrifyingly convoluted regular expressions (also apparently incorrect, (?:0[xX])?[_0-9a-fA-F]+(?:\\.[_0-9a-fA-F]+[eEpP][+-]?|\\.|[eEpP][+-]?)[_0-9a-fA-F]+
matches 0xfade1
, which I don't think it should). Just abstract away a common pattern for floats. F.ex.:
floatPattern = (digit, exp) -> "#{digit}+(\\.#{digit}+#{exp}[+-]?|\\.|#{exp}[+-]?)#{digit}+"
and then use it for both regular floats and hexfloats:
float: "[0-9]#{floatPattern('[0-9_]', '[eE]')}" # see below why [0-9] at the start
hexfloat: "0[xX]#{floatPattern('[0-9a-fA-F_]','[pP]')}"
Everything else seems reasonable. Except maybe in 'Plan A' I would use the same approach in float
as in decimal
(only for the first digit mind)
[0-9][_0-9]*(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+
so that it could bail early on holes (like f _x _y z = z
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, those are different beasts, and parser has a rather obvious choice based on 2-character lookahead.
Thank you. I understand.
I was misunderstanding the matter of the capturing groups.
Just abstract away a common pattern for floats. F.ex.:
Beautiful!
I realized what you pointed out.
Everything else seems reasonable. Except maybe in 'Plan A' I would use the same approach in float as in decimal
OK, the policy of pattern matching is below.
- The leading of the pattern should be accurate for identification with other tokens.
- Subsequent patterns prioritize speed.
I will push the revised version after this.
src/include/repository.coffee
Outdated
@@ -496,17 +496,23 @@ module.exports= | |||
name: 'punctuation.separator.comma.haskell' | |||
match: /,/ | |||
lit_num: [ | |||
name: 'constant.numeric.hexfloat.haskell' | |||
match: '0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])*((\\.[0-9a-fA-F](_*[0-9a-fA-F])*(_*[pP][+-]?[0-9](_*[0-9])*)*)|(_*[pP][+-]?[0-9](_*[0-9])*))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather hard to read frankly, and I'm not a fan of duplication. Abstracting general "floating point" pattern into a function accepting a character class could be a good idea (as far as I can tell, character class is the only difference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will comment together in the next place.
@lierdakil thank you for your kind explanation.
Sorry that there is little information.
The implementation of I believe that this feature will be widely used in the future. |
Thank you again and again. I modifed Because the exponent part of hexfloat should be also decimal numbers. Thank you for taking so much time. |
src/include/util.coffee
Outdated
@@ -22,4 +22,8 @@ balanced = (name, left, right, inner, ignore = '') -> | |||
else | |||
"(?<#{name}>(?:[^#{left}#{right}#{ignore}]|#{left}\\g<#{name}>#{right})*)" | |||
|
|||
module.exports = {list, listMaybe, concat, balanced} | |||
floatPattern = (digit, exp) -> | |||
"#{digit}*(?:(?:\\.#{digit}+)?#{exp}[+-]?[0-9_]+|\\.#{digit}+)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now this expression here doesn't match floats without explicit exponents, e.g. 3.1415
.
In general, Haskell floating points have syntactical from similar to one of:
3.1415
314e-2
31.415e-1
I would imagine it's similar for hexfloats?
Anyway, we'd like to catch all of those, that's why original float regular expression is kinda convoluted -- you can have either fractional part, exponent part, or both of them, but you can't have a float without at least one. In quasi-EBNF terms, this would be somewhat simple to express:
float → digit+ ( fraction | exponent | fraction exponent )
fraction → "." digit+
exponent → [eE] [+-]? digit+
Equivalent regular expression was trivial to factorize, because all alternatives end on digit+
, but if digit
in exponent
can differ from digit
elsewhere, this won't work, so you'll need to write out alternatives.
However, I should point out, fraction
and fraction exponent
have the same prefix, and hence would require backtracking. We can avoid that by rewriting expression a little:
float → digit+ ( fraction exponent? | exponent )
this way, regexp engine can bail from any branch early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, disregard the first line, I made a mistake when testing. Still, not starting branches on optional parts will on average be about twice as effective, i.e.
float → digit+ ( fraction exponent? | exponent )
is more effective than
float → digit+ ( fraction? exponent | fraction )
which you apparently have here (notice common prefix, remember that regexp parses left-to-right, and realize that fraction part will be effectively matched twice if there is no exponent
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which you apparently have here (notice common prefix, remember that regexp parses left-to-right, and realize that fraction part will be effectively matched twice if there is no exponent)
Thank you for teaching me about effective regexp.
You gave me a wonderful insight about behavior of parsing regex.
I will revise the pattern.
I revised the pattern according to your suggestion. here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for bearing with my nitpicks. It looks like everything's in order, so I'll merge this.
Released as v1.15.0. |
I'm grateful for your kind response. |
Dear authors,
Sorry if it is not the correct procedure.
Could you please fix this for syntax highlighting of numeric literals?
This patch contains three fixes:
I visually checked this patch with the these files 5 6.
Thanks for your wonderful work