Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabase Expressions Grammar Assistance Thread #327

Closed
bd82 opened this issue Dec 13, 2016 · 11 comments
Closed

Metabase Expressions Grammar Assistance Thread #327

bd82 opened this issue Dec 13, 2016 · 11 comments

Comments

@bd82
Copy link
Member

bd82 commented Dec 13, 2016

Moved Discussion from:#215 (comment) to here:

@tlrobinson Wrote:

If this idea is what I think it is, it would be really nice. I have a grammar that already has actions, which I use to parse and compile a simple language. I'd like to re-use the same grammar for syntax highlighting. It would be neat if I could toggle a "syntax tree only" mode that ignored the actions and returned a syntax tree instead.

How would you recommend re-using a grammar for both purposes? Perhaps I could have an abstract class that has the grammar with actions that call abstract methods that are implemented by two different subclasses.

FYI Here's a prototype of my editor (without my Chevrotain grammar integrated): https://jsfiddle.net/tlrobinson/ghakomg1/ As you can see identifiers inside of Sum are highlighted blue (field names), and outside they're highlighted green (aggregation names).

And here's the grammar I'm using for compiling and autosuggest https://github.com/metabase/metabase/blob/b9a04774b0c96253afd4ef301e8e17fe8b0185a3/frontend/src/metabase/lib/expressions/parser.js

@bd82 bd82 added the question label Dec 13, 2016
@bd82
Copy link
Member Author

bd82 commented Dec 13, 2016

Cool @tlrobinson I did not know there was Open Source BI & analytics software. 👍

I see you have an expression language which you compile to some data structure.
additionally, you seem to building an mini editor for this expression language (Content Assist / Formatter / Syntax Highlights).

The Problem:

Is that during your parsing flow you lose the vast majority of the syntactic information examples:

@bd82
Copy link
Member Author

bd82 commented Dec 13, 2016

A Possible Solution:

Basically instead of building an Abstract Syntax Tree which does not contain the entire syntactic information. The Parser should create a more low level data structure: A Concrete Syntax Tree
This CST could later be transformed to more high level data structures for semantic related analysis.

Example building Language Services:

  • The Parser builds a very low level Parse Tree which contains all the Tokens.
  • This Parse Tree is converted into an AST in a separate step in the compilation flow.
  • This specific AST Still contains all the syntactic tokens so we can now use a Visitor/Dispatcher To implement different editor flows (Outline / Semantic Checks / Syntax highlights).

For your use case to support syntactic highlights depending on grammar context, you would visit all the AST nodes, find the Identifier tokens in each Node and decide on the color do highlight depending on current parent AstNode.

This may be over-designed for a small expression language, not all these steps are necessary for all use cases...

@bd82
Copy link
Member Author

bd82 commented Dec 13, 2016

The "Separate the Grammar from the Grammar Actions" issue when implemented will provide some
of the flow described above for "free".

It will not build any AST, but it will create some low level Concrete Syntax Tree and provide the capability to iterate over it and invoke actions. So something like grammar context dependent syntactic highlights would be trivial to implement using that future Chevrotain functionality.

But this issue won't help with semantic related capabilities like "Semantic Checks".

@tlrobinson
Copy link

@bd82 Thanks for all your help.

I've got my parser outputting a syntax tree, however the last wrinkle I forgot about is my lexer skips whitespace, which obviously leaves gaps in my syntax tree. For my purposes I could just assume any gaps are spaces (don't care about tabs/newlines) but is there another way to get the full syntax tree without adding $.OPTION(()=> $.CONSUME(Whitespace)) all over my grammar?

@bd82
Copy link
Member Author

bd82 commented Dec 13, 2016

You are welcome @tlrobinson.

For handling whitespace you could use tokenGroups

Currently you define the whitespace tokens as "skipped" Which means they are both ignored and sent into oblivion. If you define their group as some custom name, for example "whitespace".
All the whitespace tokens will be collected in the lexer result. See example of accessing collected comments

So if you have all the whitespace tokens available(including the distinction between tabs/newlines).
and in addition you have a syntax tree which supports "ranges". so you know in which offset
each tree node starts and ends, you can combine this information to augment the syntax tree
to include the whitespace information.

@tlrobinson
Copy link

@bd82 Thanks, that makes sense, I'll give that a try.

Another question, (sorry if this isn't the best place to ask), my expression language has "aggregations" (like Sum(expression), Average(expression)) which can't be nested, so I use a outsideAggregation parameter on some of the rules and GATE to enforce this. Also "metrics" can't appear inside aggregations, and "fields" must only appear inside aggregations.

I'd like for identifiers and string literals to parse as a "fields" inside aggregations, and "metrics" (basically a user-defined aggregation) outside aggregations, e.x. Aggregation("Field") + "Metric") (previously the grammar required metrics be identifiers followed by parens, e.x. Metric())

I tried this:

$.RULE("atomicExpression", (outsideAggregation) => {
    return $.OR([
        // aggregations and metrics are not allowed inside other aggregations
        {GATE: () => outsideAggregation, ALT: () => $.SUBRULE($.aggregationExpression, [false]) },
        {GATE: () => outsideAggregation, ALT: () => $.SUBRULE($.metricExpression) },
        // fields are not allowed outside aggregations
        {GATE: () => !outsideAggregation, ALT: () => $.SUBRULE($.fieldExpression) },
        {ALT: () => $.SUBRULE($.parenthesisExpression, [outsideAggregation]) },
        {ALT: () => $.SUBRULE($.numberLiteral) }
    ]);
});

(both metricExpression and fieldExpression match stringLiteral or identifier)

But now I'm getting this:

Uncaught Error: Parser Definition Errors detected
: Ambiguous alternatives: <2 ,3> in <OR1> inside <atomicExpression> Rule,
<StringLiteral> may appears as a prefix path in all these alternatives.
To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=5)
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see http://sap.github.io/chevrotain/documentation/0_9_0/interfaces/iparserconfig.html for

Is there a way to resolve this without duplicating all the rules that can appear inside and outside of aggregations? (e.x. parenthesisExpression, additiveExpression, multiplicativeExpression, atomicExpression etc)

@bd82
Copy link
Member Author

bd82 commented Dec 14, 2016

(sorry if this isn't the best place to ask)

Github issues seem more convenient than google groups, Perhaps you are familiar with a better solution for handling these discussions? I know gitter but this use case is not a live chat...

Perhaps I will rename this issue to "Metabase Expressions Grammar Assistance" 😄

Back to the Problem:

The problem is that the errors are on a pure grammar level (design time) and the gates only exist at runtime.
So there is no way to understand that gate used solves the ambiguity.

But I am not certain that really is needed.
How do these metricExpression and fieldExpression rules look like?
Do they both just consume a single identifier / string literal?

Can you link a fuller example?

@bd82 bd82 changed the title Discussion: Building a code editor using Chevrotain. Metabase Expressions Grammar Assistance Thread Dec 14, 2016
@tlrobinson
Copy link

Currently metricExpression must be an identifier followed by () but I want to change it to be an identifier or string literal (see the commented out code): https://github.com/metabase/metabase/blob/e4667a621d18cc53f11ba3de7d8f46a86fdabab6/frontend/src/metabase/lib/expressions/parser.js#L90-L97

fieldExpression is an identifier or string literal: https://github.com/metabase/metabase/blob/e4667a621d18cc53f11ba3de7d8f46a86fdabab6/frontend/src/metabase/lib/expressions/parser.js#L106-L110

fieldExpressions only appear inside aggregations, and metricExpressions only appear outside aggregations, so there shouldn't be any ambiguity, but I don't know the best way to express it in Chevrotain (preferably without duplicating the rules for inside/outside aggregations)

@tlrobinson
Copy link

@bd82 I tried disabling the error and everything seems to work correctly, thanks!

@bd82
Copy link
Member Author

bd82 commented Dec 14, 2016

I tried disabling the error and everything seems to work correctly, thanks!

While this works it just hides the symptom. Note that both the metricExpression and the fieldExpression have exactly the same syntax.

You can create a single rule which combines them both.
Because from a syntactic point of view they are the same, the semantics (what you do with the syntax) is the only difference...

What about something like:

$.RULE("metricOrFieldExpression", (isFieldContext) => {
            const metricOrFieldName = $.OR([
                 {ALT: () => $.SUBRULE($.stringLiteral) },
                 {ALT: () => $.SUBRULE($.identifier) }
            ]);

            let result
            if (isFieldContext) {
               // do something and create a Field
               result = ...
            }
            else {
              // do something and create a Metric 
              result = ...
            }
            
            return result;
});

@bd82
Copy link
Member Author

bd82 commented Jan 3, 2017

Closing this issue for now as it seems the MetaBase expressions parser using Chevrotain was successfully merged.
https://github.com/metabase/metabase/blob/master/package.json#L15
https://github.com/metabase/metabase/blob/master/frontend/src/metabase/lib/expressions/parser.js

@bd82 bd82 closed this as completed Jan 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants