Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for grammar macros #674

Closed
blainehansen opened this issue Mar 6, 2018 · 4 comments
Closed

Support for grammar macros #674

blainehansen opened this issue Mar 6, 2018 · 4 comments

Comments

@blainehansen
Copy link

blainehansen commented Mar 6, 2018

From the MANY_SEP and AT_LEAST_ONE_SEP docs:

Note that for the purposes of deciding on whether or not another iteration exists Only a single Token is examined (The separator). Therefore if the grammar being implemented is so "crazy" to require multiple tokens to identify an item separator please use the more basic DSL methods to implement it.

It would be nice to have a generic grammar "macro" that takes more complex formations of rules and intersperses them with each other. Nearley has a concept like this.

A basic example of a time when this becomes very useful is separators that have to be more complex than a token. I'm building a grammar where whitespace can't simply be ignored (even spaces sometimes are mandatory and clarify different options), and it's causing a lot of redundancy and bloat.
This example doesn't show a very terrifying situation, but it shows that this functionality cleans things up.

this.RULE('callParams', () => {
  this.CONSUME(OpenParen)
  this.SUBRULE(this._)

  this.OPTION(() => {
    this.SUBRULE(this.expression)
    this.MANY(() => {
      this.SUBRULE(this.whiteSpacedComma)
      this.SUBRULE2(this.expression)
    }
  })

  this.SUBRULE2(this._)
  this.CONSUME(CloseParen)
})

A possible api:

  • MACRO for definitions
  • MACRO_ARG for using the parameters within the definition
  • SUBMACRO for uses

Here's what my above rule could look like:

this.MACRO('internalSeparated', ([item, separator]) => {
  this.MACRO_ARG(item)
  this.MANY(() => {
    this.MACRO_ARG(separator)
    this.MACRO_ARG2(item)
  })
})

this.RULE('callParams', () => {
  this.CONSUME(OpenParen)
  this.SUBRULE(this._)

  this.OPTION(() => this.SUBMACRO(this.internalSeparated, [
    this.SUBRULE(this.expression),
    this.SUBRULE(this.whiteSpacedComma)
  ]))

  this.SUBRULE2(this._)
  this.CONSUME(CloseParen)
})

A more verbose and abstract example:

// the parameters could always be in an array,
// so that normal parameterized rules would still work
// further parameters could be the rest of the args
this.MACRO('testMacro', ([one, two, three], someParameter) => {
  this.MACRO_ARG(one)
  this.SUBRULE(this.sep)
  this.MACRO_ARG(two)
  this.MANY(() => this.MACRO_ARG(three))
})

this.RULE('myRule', () => {
  this.SUBMACRO(this.testMacro, [
    this.SUBRULE(this.aSubrule),
    this.OPTION(() => this.SUBRULE(this.aDifferentSubrule)),
    this.OR([
      { ALT: () => {
        this.SUBRULE(this.thirdSubrule)
        this.SUBRULE2(this.aSubrule)
      }},
      { ALT: () => this.CONSUME(Keyword) }
    ]),
  ])
})
@bd82
Copy link
Member

bd82 commented Mar 6, 2018

On Macros

Macros are very cool, Chevrotain's heart is actually based on implementing a minimal readonly
macro like system in JavaScript to "read" the user's implementation using Function.prototype.toString for building the grammar structure.

However because we have fewer levels of abstractions here compared to a parser generator I would be wary of implementing Write macros as well as the existing system is already a little fragile

Do you really need Macros?

Is your language partially white space sensitive? or fully white space sensitive?
because if you only need to consume whitespace at very specific places
it may be possible to implement multiple token streams by overriding a few parser methods.
So you could ignore whitespace 95% of the time and only handle it when you actually care about it.
That should reduce the verbosity...

Maybe the _SEP methods could be upgraded to support more complex separators
or at least GATEs that will allow more complex lookahead conditions.

If you still really want macros and the extra abstractions that enable those.

You can build your own custom APIS for using the Chevrotain engine.

Those could be generator or combinator styled APIs and because you will have an additional
level of abstraction there (code generation) and are already creating your whole new API
implementing macros should be pretty straight forward.

@blainehansen
Copy link
Author

It's an indentation based language. Indents, Deindents, Newlines, and Tabs, are always semantic and not allowed just anywhere. It's also a language that parses css selectors, so spaces are semantic to represent nesting of elements vs compounding of elements (.thing .stuff vs .thing.stuff). I'll look into overriding the parser to make dealing with spaces easier in places where they don't mean anything.

I don't think I want to try using the custom apis system, since I'm building a parser with embedded actions, and I'll need to use parameterized rules.

Honestly though, from where I'm standing it seems possible to use the existing Function.toString analysis system with these kinds of macro definitions, and without changing the fundamental structure of chevrotain. Isn't the MACRO idea conceptually similar to SUBRULE? When a MACRO definition is made, it could analyze the function string to see how rules and macro args are applied, save that as a template, and then when SUBMACRO is called use that template along with an analysis of the args to know how to build a gast.

I haven't looked through enough of the code to understand this all deeply though, and if what I'm talking about is impractical I completely understand.

@blainehansen
Copy link
Author

It certainly would be nice if the _SEP methods accepted full rules rather than just tokens.

@bd82
Copy link
Member

bd82 commented Mar 7, 2018

it's an indentation based language. Indents, Deindents, Newlines, and Tabs, are always semantic and not allowed just anywhere.

This complexity could indeed explain your use case for macros.

Honestly though, from where I'm standing it seems possible to use the existing Function.toString analysis system with these kinds of macro definitions.

I think it is possible to get something working, I'm just not sure on the complexity of getting
a "proper" solution that will provide good error messages (design & runtime)
and how that would interact with automatic error recovery.

That is why I am not sure if I want to introduce this feature to Chevrotain at this time.

Way Forward.

Option1

However, it may be simple to refactor the grammar analysis phase to be extensible so
end users could implement macros for their less general use case.

Option2

Chevrotain Grammars are plain JavaScript, So why are we trying to solve text replacement macros and general code
transformation in the scope of the library? The problem can be reduced to simple parameterized text
replacement macros (C/C++ macros as opposed to LISP "true" Macros).

Some Options:

I think the babel-plugin-macros is the better choice as we only need simple text replacements
for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants