Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<ws> poorly defined #1729

Closed
jdoege opened this issue Jan 11, 2018 · 5 comments
Closed

<ws> poorly defined #1729

jdoege opened this issue Jan 11, 2018 · 5 comments

Comments

@jdoege
Copy link

@jdoege jdoege commented Jan 11, 2018

In Language/grammars.pod ws is defined at line 209 as one or more whitepace characters or a word boundary, i.e./ \s+ | <|w> /
If, however, you define in a grammar, token ws { \s+ | <|w> } , you may find that your previously working parser no longer parses.
After looking in Language/regexes, I found there are some examples of redefining ws but no definition of how ws is defined by default. Those examples provided a clue that, perhaps ws is actually defined as, token ws { <!ww> | \s+ }. Searching further, I found in stackoverflow a comment by moritz that ws is, in fact defined as token ws { <!ww> | \s+ }. https://stackoverflow.com/questions/47728466/perl-6-grammar-doesnt-match-like-i-think-it-should
The default definition of ws should be corrected in grammars.pod and added to the sigspace section of regexes.pod.

One other point that might be made clear is that a grammar that explicitly defines ws behaves a bit differently than one which uses the default definition. Using the default, ws gets thrown away. When ws is explicitly defined, whatever it parses gets put into the parse tree.

@tisonkun
Copy link
Collaborator

@tisonkun tisonkun commented Jan 12, 2018

@coke coke added the docs label Jan 17, 2018
@ronaldxs
Copy link
Contributor

@ronaldxs ronaldxs commented Mar 23, 2018

Language/grammars in the middle of the section on <ws> says:

https://github.com/perl6/doc/blob/9db0df2854fbf86601ff1f3ab1a3b160ba3c1a1b/doc/Language/grammars.pod6#L212-L213

The sigspace section of Language/regexes first implies the same idea saying

By default, <.ws> makes sure that words are separated

and then shows, if you knew what you were looking for in the first place, that it is not talking about <|w> by saying

"^&" ... will match <.ws> in the middle

which is correct but neither <|w> nor \s+ matches that case.

https://github.com/perl6/doc/blob/9b6ee713775238f3dab58397f72f8b1c318d6886/doc/Language/regexes.pod6#L1546-L1548

In the stack overflow article moritz says the <ws> definition is ws { <!ww> \s* } which is a shade different and close to source:

moritz stackoverflow: https://stackoverflow.com/questions/47728466/perl-6-grammar-doesnt-match-like-i-think-it-should#comment82426178_47728653
source: https://github.com/perl6/nqp/blob/a2f66567052e827a39cfda6d8908f62532ef3b12/src/NQP/Grammar.nqp#L59-L68

I think it might be helpful for the <!ww> \s* approximation to be in the relevant docs.

The idea that <ws> matches word separation other than spaces between words is adequately explained but a bit counterintuitive and buried in the middle of the relevant sections. For the regexes sigspace section a fourth example might be added to the first three demonstrating it (for example):

say so "I used a Photoshop(photo shop)" ~~ m:i:s/ photo shop /;'

Or a stronger hint could be added to the second example by adding a '.' period at the end of the sentence:

say so "I used a photo shop." ~~ m:i:s/ photo shop /;

In the grammars <ws> section the sentence starting with

The default ws

and the block of examples just below it might be moved to the top and again @moritz simplification of the <ws> definition to { <!ww> \s* } might be included there.

The two sections should also point to each other.

So I was confused about <ws> as was the original poster of the issue, and my confusion was based in part in looking at @albastev's modelica grammar which defines ws as zero or more spaces but doesn't have a concept of an actual word break leading, I believe, the grammar to add many otherwise unneeded <|w> tests next to keywords.

Thanks to @AlexDaniel for his patience helping to explain some of this on IRC #perl6.

@JJ
Copy link
Contributor

@JJ JJ commented Jul 24, 2018

Maybe this issue #2211 will also solve this one? At any rate, can you please check now for the definition?

@JJ
Copy link
Contributor

@JJ JJ commented Jun 1, 2019

@tisonkun if I got the Rakudo bug correctly, the main problem is when re-defining ws in a Grammar, not in the definition of ws itself, which this issue is about, right?

@JJ JJ added the external label Jun 1, 2019
@JJ
Copy link
Contributor

@JJ JJ commented Jun 1, 2019

Added the external label since it's waiting for this issue to be solved (via @tisonkun )

JJ added a commit that referenced this issue Jun 1, 2019
@JJ JJ changed the title <.ws> poorly defined <ws> poorly defined Jun 1, 2019
@JJ JJ closed this in c0e9876 Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants