Patch numeric underscores #112

takenobu-hs · 2017-12-03T04:12:23Z

Dear authors,

Sorry if it is not the correct procedure.
Could you please fix this for syntax highlighting of numeric literals?

This patch contains three fixes:

Add constant.numeric.hexfloat.haskell for HexFloatLiterals extension 1
Add constant.numeric.binary.haskell for BinaryLiterals extension 2
Modify lit_num for NumericUnderscores extension 3 4

I visually checked this patch with the these files 5 6.

Thanks for your wonderful work

* Add constant.numeric.hexfloat.haskell for HexFloatLiterals extension [1] * Add constant.numeric.binary.haskell for BinaryLiterals extension [2] * Modify lit_num for NumericUnderscores extension [3][4] [1]: https://ghc.haskell.org/trac/ghc/ticket/9224 [2]: https://ghc.haskell.org/trac/ghc/ticket/13126 [3]: https://ghc.haskell.org/trac/ghc/ticket/14473 [4]: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0009-numeric-underscores.rst

lierdakil · 2017-12-03T17:39:35Z

Hello. Thank you for your contribution. I can't not notice though that except BinaryLiterals other proposals aren't implemented in any of the currently-released GHC versions (please correct me if I'm wrong). I'm a bit reluctant to include experimental GHC features here.

There are also some questions about regular expressions used, which I will post shortly.

lierdakil · 2017-12-03T17:45:20Z

src/include/repository.coffee

    name: 'constant.numeric.hexadecimal.haskell'
-    match: '0[xX][0-9a-fA-F]+'
+    match: '0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])*'


I'm not entirely sure why you're using this rather convoluted expression here (and other places). For all intents and purposes, 0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])* is the same as 0[xX](?:_*[0-9a-fA-F])+. And if we're not aiming for technical correctness (which we aren't), latter expression is roughly equivalent to 0[xX][_0-9a-fA-F]+, which would also accept cases like 0x_ (which I don't think is valid Haskell), but is measurably faster.

Also please don't introduce new capturing groups if you can avoid it, it can potentially break some stuff, and test coverage isn't 100%.

Thank you for telling me your policy.

I'm not entirely sure why you're using this rather convoluted expression here (and other places)

I simply implemented it according to the accepted formal specification.

https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0009-numeric-underscores.rst#new-syntax-this-proposal

... but is measurably faster.

You mean that this project has adopted a policy that gives priority to speed over accuracy.
I understand.
It is one of the valuable decisions for many platforms.

Also please don't introduce new capturing groups if you can avoid it, it can potentially break some stuff, and test coverage isn't 100%.

OK. Is it also better to merge the binary with octal?

In consideration of speed, I propose the simplified two plans below.
Are you acceptable for this?

Plan A: BinaryLiterals + NumericUnderscores extension (without HexFloatLiterals)

lit_num: [ name: 'constant.numeric.hexadecimal.haskell' - match: '0[xX][0-9a-fA-F]+' + match: '0[xX][_0-9a-fA-F]+' , name: 'constant.numeric.octal.haskell' - match: '0[oO][0-7]+' + match: '0[oObB][_0-7]+' , name: 'constant.numeric.float.haskell' - match: '[0-9]+(\\.[0-9]+[eE][+-]?|\\.|[eE][+-]?)[0-9]+' + match: '[_0-9]+(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+' , name: 'constant.numeric.decimal.haskell' - match: '[0-9]+' + match: '[0-9][_0-9]*' ]

Plan B: Plan A + HexFloatLiterals

This is difference for Plan A.

name: 'constant.numeric.float.haskell' - match: '[_0-9]+(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+' + match: '(?:0[xX])?[_0-9a-fA-F]+(?:\\.[_0-9a-fA-F]+[eEpP][+-]?|\\.|[eEpP][+-]?)[_0-9a-fA-F]+'

Thank you for your time.

Is it also better to merge the binary with octal?

No, those are different beasts, and parser has a rather obvious choice based on 2-character lookahead.

Same goes for hexadecimal floats, those are obviously (from point of view of parser) syntactically different from decimal floats. There's no need to create horrifyingly convoluted regular expressions (also apparently incorrect, (?:0[xX])?[_0-9a-fA-F]+(?:\\.[_0-9a-fA-F]+[eEpP][+-]?|\\.|[eEpP][+-]?)[_0-9a-fA-F]+ matches 0xfade1, which I don't think it should). Just abstract away a common pattern for floats. F.ex.:

floatPattern = (digit, exp) -> "#{digit}+(\\.#{digit}+#{exp}[+-]?|\\.|#{exp}[+-]?)#{digit}+"

and then use it for both regular floats and hexfloats:

float: "[0-9]#{floatPattern('[0-9_]', '[eE]')}" # see below why [0-9] at the start hexfloat: "0[xX]#{floatPattern('[0-9a-fA-F_]','[pP]')}"

Everything else seems reasonable. Except maybe in 'Plan A' I would use the same approach in float as in decimal (only for the first digit mind)

[0-9][_0-9]*(\\.[_0-9]+[eE][+-]?|\\.|[eE][+-]?)[_0-9]+

so that it could bail early on holes (like f _x _y z = z)

No, those are different beasts, and parser has a rather obvious choice based on 2-character lookahead.

Thank you. I understand.
I was misunderstanding the matter of the capturing groups.

Just abstract away a common pattern for floats. F.ex.:

Beautiful!
I realized what you pointed out.

Everything else seems reasonable. Except maybe in 'Plan A' I would use the same approach in float as in decimal

OK, the policy of pattern matching is below.

The leading of the pattern should be accurate for identification with other tokens.

Subsequent patterns prioritize speed.

I will push the revised version after this.

lierdakil · 2017-12-03T17:47:40Z

src/include/repository.coffee

@@ -496,17 +496,23 @@ module.exports=
    name: 'punctuation.separator.comma.haskell'
    match: /,/
  lit_num: [
+    name: 'constant.numeric.hexfloat.haskell'
+    match: '0[xX]_*[0-9a-fA-F](_*[0-9a-fA-F])*((\\.[0-9a-fA-F](_*[0-9a-fA-F])*(_*[pP][+-]?[0-9](_*[0-9])*)*)|(_*[pP][+-]?[0-9](_*[0-9])*))'


This is rather hard to read frankly, and I'm not a fan of duplication. Abstracting general "floating point" pattern into a function accepting a character class could be a good idea (as far as I can tell, character class is the only difference)

I will comment together in the next place.

takenobu-hs · 2017-12-04T12:45:34Z

@lierdakil thank you for your kind explanation.

Hello. Thank you for your contribution. I can't not notice though that except BinaryLiterals other proposals aren't implemented in any of the currently-released GHC versions (please correct me if I'm wrong). I'm a bit reluctant to include experimental GHC features here.

Sorry that there is little information.
HexFloatLiterals and NumericUnderscores were officially accepted the ghc-proposal.

HexFloatLiterals was implemented in master and ghc8.4.1 branch.
It will be shipped with ghc 8.4.1.

https://phabricator.haskell.org/D3066

The implementation of NumericUnderscores is under review.

https://phabricator.haskell.org/D4235

I believe that this feature will be widely used in the future.

Propose underscores in numeric literals ghc-proposals/ghc-proposals#76 (comment)

takenobu-hs · 2017-12-05T13:41:26Z

Thank you again and again.
I pushed the revised version. here
Could you please review again?

I modifed floatPattern for hexfloat pattern from your suggestion. here

Because the exponent part of hexfloat should be also decimal numbers.
If it is complicated, please tell me. I will further simplify it.

Thank you for taking so much time.

lierdakil · 2017-12-05T19:49:09Z

src/include/util.coffee

@@ -22,4 +22,8 @@ balanced = (name, left, right, inner, ignore = '') ->
  else
    "(?<#{name}>(?:[^#{left}#{right}#{ignore}]|#{left}\\g<#{name}>#{right})*)"

-module.exports = {list, listMaybe, concat, balanced}
+floatPattern = (digit, exp) ->
+  "#{digit}*(?:(?:\\.#{digit}+)?#{exp}[+-]?[0-9_]+|\\.#{digit}+)"


Okay, now this expression here doesn't match floats without explicit exponents, e.g. 3.1415.

In general, Haskell floating points have syntactical from similar to one of:

3.1415

314e-2

31.415e-1

I would imagine it's similar for hexfloats?

Anyway, we'd like to catch all of those, that's why original float regular expression is kinda convoluted -- you can have either fractional part, exponent part, or both of them, but you can't have a float without at least one. In quasi-EBNF terms, this would be somewhat simple to express:

float → digit+ ( fraction | exponent | fraction exponent ) fraction → "." digit+ exponent → [eE] [+-]? digit+

Equivalent regular expression was trivial to factorize, because all alternatives end on digit+, but if digit in exponent can differ from digit elsewhere, this won't work, so you'll need to write out alternatives.

However, I should point out, fraction and fraction exponent have the same prefix, and hence would require backtracking. We can avoid that by rewriting expression a little:

float → digit+ ( fraction exponent? | exponent )

this way, regexp engine can bail from any branch early

Okay, disregard the first line, I made a mistake when testing. Still, not starting branches on optional parts will on average be about twice as effective, i.e.

float → digit+ ( fraction exponent? | exponent )

is more effective than

float → digit+ ( fraction? exponent | fraction )

which you apparently have here (notice common prefix, remember that regexp parses left-to-right, and realize that fraction part will be effectively matched twice if there is no exponent)

which you apparently have here (notice common prefix, remember that regexp parses left-to-right, and realize that fraction part will be effectively matched twice if there is no exponent)

Thank you for teaching me about effective regexp.
You gave me a wonderful insight about behavior of parsing regex.

I will revise the pattern.

takenobu-hs · 2017-12-06T12:28:56Z

I revised the pattern according to your suggestion. here
Please review again when you have time.
Thank you very much.

lierdakil

Okay, thanks for bearing with my nitpicks. It looks like everything's in order, so I'll merge this.

lierdakil · 2017-12-06T19:41:24Z

Released as v1.15.0.

takenobu-hs · 2017-12-07T12:28:38Z

Okay, thanks for bearing with my nitpicks. It looks like everything's in order, so I'll merge this.

I'm grateful for your kind response.
Your insight make the user experience for Haskell community better.
Thank you very much for taking the time :)

takenobu-hs added 2 commits December 3, 2017 12:50

Update generated files

88628c3

lierdakil reviewed Dec 3, 2017

View reviewed changes

takenobu-hs added 2 commits December 5, 2017 21:35

Modify to improve speed and readability

eb978e9

Fix hexfloat pattern for decimal exponent

f641a65

lierdakil reviewed Dec 5, 2017

View reviewed changes

Modify regexp pattern to improve efficiency

c34207e

lierdakil approved these changes Dec 6, 2017

View reviewed changes

lierdakil merged commit f90eb7d into atom-haskell:master Dec 6, 2017

Patch numeric underscores #112

Patch numeric underscores #112

Uh oh!

Conversation

takenobu-hs commented Dec 3, 2017

Uh oh!

lierdakil commented Dec 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

takenobu-hs commented Dec 4, 2017

Uh oh!

takenobu-hs commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lierdakil Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

takenobu-hs commented Dec 6, 2017

Uh oh!

lierdakil left a comment

Choose a reason for hiding this comment

Uh oh!

lierdakil commented Dec 6, 2017

Uh oh!

takenobu-hs commented Dec 7, 2017

Uh oh!

Uh oh!

lierdakil Dec 5, 2017 •

edited

Loading