Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic creation of a Concrete Syntax Tree. #215

Closed
13 of 14 tasks
bd82 opened this issue Jul 16, 2016 · 12 comments · Fixed by #365
Closed
13 of 14 tasks

Automatic creation of a Concrete Syntax Tree. #215

bd82 opened this issue Jul 16, 2016 · 12 comments · Fixed by #365

Comments

@bd82
Copy link
Member

bd82 commented Jul 16, 2016

Thus allowing the creation of "pure" grammars without any actions.
It may be possible as the Parser always knows the current "position in the grammar"
so perhaps the SUBRULE and CONSUME methods may be overridden with pre/post
hooks to assemble the required data structures.

CST

  • Define CST Structure.
  • CST Option in ParserConfig.
  • Integrate CST Building into parsing Flow.
  • Support nested rules defined by "named" DSL Methods.
  • Validation that Terminal and NonTerminals never have identical names.
  • Validation that nested rules do not have conflicting names with other nested rules.
  • Validation that nested rules names always start with '$'
  • Optimize CST building performance
  • Use No-Op pattern to avoid performance hit when CST is disabled.
  • CstNodes for results of error recovery.
  • More permissive pattern for name props during "self analysis"
  • 100% coverage.

Docs & Examples

  • Runnable example.
  • Usage and detailed Docs.
@Zumbala
Copy link

Zumbala commented Oct 2, 2016

I came across the same problem and did something like below to overcome this.

// create place holder for your ast
var ast = {};
// give a descriptive name for adding to the same level, value = null or object
// value null will clear the array
function AST(name, value) {
 if (value === null) ast[name] = [];
 if (value) ast[name].push(value);
 return ast[name];
}

// sample usage:

$.start = $.RULE("start",function () {
  return {
    schema : $.SUBRULE($.define),
    rules : (function(name) { 
              AST(name,null);
              $.MANY(function () { AST(name,$.SUBRULE($.definitions));});
              return AST(name);
             })("rules"),
    start : $.SUBRULE($.root)   
  };
});

So by using an IIFE you can create an AST.

Maybe better is to have the MANY and others have the same implementation as the OR function.
Meaning the return value can be anything coming from the consuming functions.
In this case an ARRAY should be returned as MANY suggests doing so.

$.definitions = $.RULE("definitions",function () {
   return $.OR([
    { ALT : function () { return $.SUBRULE($.validator); } },
    { ALT : function () { return $.SUBRULE($.type); } },
    { ALT : function () { return $.SUBRULE($.object); } }
  ]);

});

@bd82
Copy link
Member Author

bd82 commented Oct 2, 2016

Thanks @Zumbala.

I'm not sure what is actually happening in the example you provided.
By I think understand the gist of it of minimizing the AST building code so the grammar would be more "pronounced".

I've done something similar by building a very simple "ParseTree" structure and deferring the full AST building to a later stage.
https://github.com/SAP/chevrotain/blob/master/examples/language_services/src/examples/json/parser.ts

but this is just a mitigation of the problem. I'm interested in creating a completely pure grammar and defining the "actions" outside of it (listener/visitor/...). That would be a full solution.

I believe this is possible with Chevrotain by overriding CONSUME/SUBRULE/RULE.
And hope to find time to investigate in the future 😄

@bd82
Copy link
Member Author

bd82 commented Oct 2, 2016

Maybe better is to have the MANY and others have the same implementation as the OR function.
Meaning the return value can be anything coming from the consuming functions.
In this case an ARRAY should be returned as MANY suggests doing so.

Possibly, however:

  • This won't be suitable for all use cases as sometimes the "MANY** has complex contents as its not always a single "SUBRULE" invocation.
  • It would also conflict with "MANY_SEP" which returns an array of separators.
    If you are interested in this open a different issue and we can discuss it farther, however
  • And it would have a negative performance impact even in situations the returned array is not used.

I'm not sure I can implement this without causing issues...

You could easily override MANY1-5 and "manyInternal"
to accomplish this, but with the disadvantage of possibly having extra work when upgrading Chevrotain versions.

@bd82 bd82 changed the title Investigate automatic creation of a Syntax Tree. Investigate automatic creation of a Syntax Tree And Grammar Listeners Oct 25, 2016
@tlrobinson
Copy link

tlrobinson commented Dec 12, 2016

If this idea is what I think it is, it would be really nice. I have a grammar that already has actions, which I use to parse and compile a simple language. I'd like to re-use the same grammar for syntax highlighting. It would be neat if I could toggle a "syntax tree only" mode that ignored the actions and returned a syntax tree instead.

Awesome project, by the way!

@bd82 bd82 changed the title Investigate automatic creation of a Syntax Tree And Grammar Listeners Separate the Grammar from the Grammar Actions. Dec 12, 2016
@bd82
Copy link
Member Author

bd82 commented Dec 12, 2016

Awesome project, by the way!

Thanks @tlrobinson. 😄

I've renamed the issue to be clearer.
It is similar to what you described, but instead of having grammar actions that compile a simple language and having another flow which would disable those actions and build a tree.

You will have a pure grammar without any actions (example)
And two grammar listeners.

  • One used to parse and compile.
  • One used to do syntax highlights.

So there will be a stronger separation of concerns between the syntax definition
and the actions performed on the input.

You could still write grammar actions if you prefer.
But I have no way to selectively disable the user defined actions because they are
always mixed with the grammar in the same JavaScript function.
Example:

$.RULE("atomicExpression", function() {
            return $.OR([
                {ALT: function(){ return $.SUBRULE($.parenthesisExpression)}},
                // The grammar action (parseInt) can't be separated from the grammar (CONSUME)
                {ALT: function(){ return parseInt($.CONSUME(NumberLiteral).image, 10)}}
            ]);
    });

@tlrobinson
Copy link

tlrobinson commented Dec 12, 2016

Sounds good.

In the meantime, how would you recommend re-using a grammar for both purposes? Perhaps I could have an abstract class that has the grammar with actions that call abstract methods that are implemented by two different subclasses.

But I have no way to selectively disable the user defined actions because they are
always mixed with the grammar in the same JavaScript function.

Doesn't the content assist mode bypass the actions?

@bd82
Copy link
Member Author

bd82 commented Dec 13, 2016

moved discussion related to metabase grammar here: #327

bd82 added a commit that referenced this issue Feb 9, 2017
fixes #215
@bd82 bd82 mentioned this issue Feb 9, 2017
Merged
bd82 added a commit that referenced this issue Feb 9, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 9, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 16, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 17, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 17, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 19, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 20, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 20, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 20, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 21, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 21, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 22, 2017
fixes #215
bd82 added a commit that referenced this issue Feb 22, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 3, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 3, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 3, 2017
fixes #215
@bd82
Copy link
Member Author

bd82 commented Mar 4, 2017

Benchmarked using a JSON grammar with a large 1,000 lines input.
When CST building is enabled the performance is about 70% of when it is not. (on V8).

This is a little slower than expected, but it does make sense as there is a quite a bit of additional overhead
required to "insert" the CST building into the flow, not to mention that the CST itself is by definition
very detailed and includes information on the entire parse tree so that is more work than only collecting
specific parts for a specific flow.

bd82 added a commit that referenced this issue Mar 4, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 4, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 4, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 4, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 7, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 7, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 9, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 9, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 11, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 11, 2017
fixes #215
@bd82
Copy link
Member Author

bd82 commented Mar 11, 2017

The Semantic Actions on the Concrete Syntax Tree output will be handled in a separate issue
as this is already a huge change.

bd82 added a commit that referenced this issue Mar 11, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 12, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 12, 2017
fixes #215
bd82 added a commit that referenced this issue Mar 12, 2017
Allows writing "pure" grammars and perform semantic actions afterwards.

In the (near?) future will add CST visitors to make it easy to traverse the CST structure.

fixes #215
bd82 added a commit that referenced this issue Mar 12, 2017
Allows writing "pure" grammars and perform semantic actions afterwards.

In the (near?) future will add CST visitors to make it easy to traverse the CST structure.

fixes #215
@bd82 bd82 closed this as completed in #365 Mar 12, 2017
@bd82
Copy link
Member Author

bd82 commented Mar 12, 2017

Just missing Docs now.

@bd82 bd82 reopened this Mar 12, 2017
bd82 added a commit that referenced this issue Mar 16, 2017
It is too complicate from the perspective of and end user
to try and figure out the type of the children dictionary value (mandatory/optional/collection)
As it depends on (potentially) traversing multiple paths in the same grammar rule.

Related to #215.
bd82 pushed a commit that referenced this issue Mar 16, 2017
Relates to #215.
@bd82 bd82 mentioned this issue Mar 16, 2017
@bd82 bd82 changed the title Separate the Grammar from the Grammar Actions. Automatic creation of a Concrete Syntax Tree. Mar 16, 2017
@bd82
Copy link
Member Author

bd82 commented Mar 16, 2017

Left overs related to docs & benchmark will be handled separately.

@bd82 bd82 closed this as completed Mar 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants