Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing function definitions for sort #197

Closed
StrayAlien opened this issue Nov 23, 2018 · 13 comments
Closed

parsing function definitions for sort #197

StrayAlien opened this issue Nov 23, 2018 · 13 comments

Comments

@StrayAlien
Copy link
Contributor

StrayAlien commented Nov 23, 2018

The TCK has some coverage for sort(). Presently, either just using primitive types for a & b, or context types with simple property names.

Those are pretty parseable. But consider the following (hopefully illustrative) nasty example - using things not yet covered by the TCK.

sort(precedes: function(a, b) b.tax + total + a.a + a > max(list: for i in )a.another + total ..5(, j in (2..10( return i*2 + j*a + total), list: foo)

It is utterly horrific, but, I believe valid. It show a function defn using complex identifiers, a function call using a multi-iteration-context for/in that has open/closed ranges as iterations contexts with a final 'return' that may (or may not) be using the value of 'a'.

In order to execute the sort() we need to parse the 'precedes' and 'list' named params.

at the time of parsing 'preceeds', we do not yet know what type 'list' is so

for 'b.tax + total + a'

we don't know whether "tax + total + a" is a property on b, or whether it is just "b.tax" etc or indeed whether "tax + total + a.a + a" is just a property on b. The usual weirdo stuff with identifiers with spaces and plus (etc) signs.

Function definitions are not delimited so we don't quite know where it ends ... so we can't grab the text and park it for later 'parsing on demand'.
We can't scan forward looking for a comma, because the for in is multi-iteration-context and it has commas.
We can't scan looking for a ':' because we may invocations using named params inside the func def
We can't scan looking for matching parenthesis because the iteration contexts are intervals, with closed / open end combination like ")x .. y(" and "(x .. y(". (not covered yet in TCK)

Note: the spec doesn't specifically say ranges are permitted in for/in loops, but follow the grammar and an expression, including a unary test is. At any rate, a range could be embedded in some other way inside an anon function definition.

If we take the given range of ")a.another + total ..5(" we don't know whether "another + total" is part property on 'a' or not.

If we know the type of 'list' we can - but, interestingly, probably not it if it is a list of type 'any'.

The thing is though, we have to parse 'preceeds' to get to 'list', and we have to parse them both before invoking sort(). My issue is I can't see how we can parse 'precedes' to an AST without first knowing the list type. We really need to understand the a & b type, and for that we need to understand 'list'.

If the function definition is not embedded but in a context, or is a BKM, then life is simpler, we get "sort(precedes: bkm, list: this + that)" and we're sweet. (well, sweeter ...)

Does that make sense?

... or do I have it all wrong .... ?

All advice appreciated,

Greg.

EDIT: typos.

@StrayAlien
Copy link
Contributor Author

Further to this I guess it goes without saying that the same holds for passing a lambda like this to a BKM. If the function definition body requires context to disambiguate then that context is only known when the BKM invokes the lambda ... which is after you need to parse it ...

Again, all comments welcome.

Greg.

@etirelli
Copy link
Contributor

Greg, there are a few syntax problems with your example, but I will answer on what I think is the spirit of your question: the anonymous function syntax (lambdas) can't always be fully resolved at compile time, because of the problems you raised. In fact, a number of the problems you raised here and in other tickets will only be fully solved when the spec moves to a context free grammar.

For DMN 1.2, there was a suggestion to use quoted names for variables with spaces or special characters in it, but there wasn't time in the end to fully resolve all the concerns, as well as there is a concern to maintain backward compatibility in DMN 1.x.

For now, the guideline that the members of the TCK have been using is to ensure compatibility between tools as much as possible and for the most common cases. For edge cases like your example, we will probably only have a complete guarantee of compatibility when DMN 2.0 is released, hopefully with a context free grammar.

@StrayAlien
Copy link
Contributor Author

StrayAlien commented Dec 3, 2018

@etirelli Thanks Edson. Appreciated. Referring to one of the things I alluded to in the 'thoughts' threads is that the spec may actually cover for it already. I am not a FEEL nor language grammar expert but here is my take on it. Comments welcome.

Bear with me here in my explanations. I can think of two possible solutions that keep a context free grammar as long as we accept small agree-upon limitations.

Possible solution 1:

Despite the sort() example in the spec, the grammar seems to indicate that a function definition cannot be contained in a textual expression. They can be in 'boxed' expressions, not 'textual'. Grammar rules 1, 2, 55.

1. expression = a.boxed expression | b.textual expression ;

2. textual expression = a.for expression | if expression | quantified expression | b.disjunction | c.conjunction | d. comparison | e. arithmetic expression | f. instance of | g. path expression | filter expression | function invocation | h. literal | simple positive unary test | name | "(" , textual expression , ")" ;

55. boxed expression = list | function definition | context ;

Note 'textual expression has 'function invocation', but not 'function definition'. If this is the case, then I think we're good and do not need a context free grammar. Like I said, not an expert.

A function definition that is not embedded inside in a literal expression, but resides as a stand-alone expression, say, as a context value, can be 'lazy compiled' at execution time. At that point all the context will be available in scope to parse and execute it. The return type may not be known, but that is a lesser evil.

Then the sort function, for example, does not actually have the lambda inline, but references a context value in scope as a param. That context value has he function definition but it is not yet compiled. It is compiled on invocation.

I hope that makes sense.

Possible solution 2

The other possible solution gives embedded lambdas but imposes another small restriction.

Grammar rules 57/58:

57. function definition = "function" , "(" , [ formal parameter { "," , formal parameter } ] , ")" , [ "external" ] , expression ;

58. formal parameter = parameter name [":" type ] ;

A function definition parameters can be typed. So we could have

sort(precedes: function(a:tFoo, b:tBar) etc ...)

If we know the types, we can parse before execution time.

Hopefully, that makes sense also.

Personally, I'd go for 'define a function as a context entry' and stick with that as it is easy to communicate and places less emphasis for users to know types etc etc.

Comments welcome.

Greg.

PS. As an aside, I thought the above 'sort()' example was valid syntax. Very (very) weird .. but valid.

@etirelli
Copy link
Contributor

etirelli commented Dec 7, 2018

@StrayAlien your overall reasoning makes sense, but from a TCK perspective, we need to take the spec as is, validate/implement/test/critique, and go back to the OMG RTF group with problems that we eventually find.

In particular, in the case of the sort() function and lambda function support, it works if we set constraints according to some rules in the spec.

For instance, I took your example and tested on Drools, and it parses it fine (after fixing the syntax error). Here is the quick parser test I created:

    @Test
    public void testSortExpression() {
        String inputExpression = "sort(precedes: function(a, b) b.tax + total + a.a + a > max(list: for i in ]a.another + total ..5[, j in (2..10[ return i*2 + j*a + total), list: foo)";
        BaseNode pathBase = parse( inputExpression );

        assertThat( pathBase, is( instanceOf( FunctionInvocationNode.class ) ) );
        FunctionInvocationNode sort = (FunctionInvocationNode) pathBase;
        assertThat( sort.getName().getText(), is( "sort" ) );
        assertThat( sort.getParams().getElements().size(), is( 2 ) );

        NamedParameterNode precedes = (NamedParameterNode) sort.getParams().getElements().get(0);
        assertThat( precedes.getName().getText(), is( "precedes" ) );
        assertThat( precedes.getExpression().getText(), is( "function(a, b) b.tax + total + a.a + a > max(list: for i in ]a.another + total ..5[, j in (2..10[ return i*2 + j*a + total)" ) );

        NamedParameterNode list = (NamedParameterNode) sort.getParams().getElements().get(1);
        assertThat( list.getName().getText(), is( "list" ) );
        assertThat( list.getExpression().getText(), is( "foo" ) );
    }

This will probably fail at runtime, because things like 'a.another + total' is probably(?) an expression, but is being parsed as a path variable name (according with the default parsing rules of the grammar). But again, that is how the spec works today. If the user wants to disambiguate that, he has to provide type information as you mention in your "possible solution 2", or use parenthesis as suggested in the spec: (a.another) + total.

PS: the syntax error I referred to in my previous post is that the exclusive range character is the flipped square bracket, not parenthesis. So the correct expression would be:

sort(precedes: function(a, b) b.tax + total + a.a + a > max(list: for i in ]a.another + total ..5[, j in (2..10[ return i*2 + j*a + total), list: foo)

Just a final note, don't pay too much attention to the grammar rule names. The fact that some of the constructions are named "boxed expressions" in the grammar have no effect on their use in the language. They do refer to the fact that those expression have boxed expressions equivalents in DMN, but in no way they prevent their use in textual literal expressions.

@StrayAlien
Copy link
Contributor Author

StrayAlien commented Dec 8, 2018

@etirelli . Thank you Edson. Appreciated. I'll digest that in the coming days (just working on some other DMN tests atm). You have an eagle eye. Well done. - yes, the left range delimiter is not correct. My reading of the spec had it differently. Thank you again. Something (else) to add to the test suite!

As another note on func ambiguity .. I'll raise another issue. Just writing 'instance of' tests ...

@etirelli
Copy link
Contributor

@StrayAlien can we close this ticket?

As discussed, the grammar is ambiguous, but it is possible to parse expressions like you presented and there are some guide rules on how to decide when there are ambiguities. That is the best we can aim for from a testing perspective at the moment.

@StrayAlien
Copy link
Contributor Author

Hi @etirelli, thanks for replying - appreciated. What are the guide rules? If we all stick to them then great. Dare I ask if they are documented somewhere handy?

@etirelli
Copy link
Contributor

@StrayAlien there are some references spread on chapter 10 of the spec, but I was referring specifically to chapter "10.3.1.6 Ambiguity". In particular to:

Ambiguity is resolved using the scope. Name tokens are matched from left to right against the names in-scope, and the longest match is preferred. In the case where the longest match is not desired, parenthesis or other punctuation (that is not allowed in a name) can be used to disambiguate a FEEL expression. For example, to subtract b from a if a-b is the name of an in-scope context entry, one could write (a)-(b). Notice that it does not help to write a - b, using space to separate the tokens, because the space is not part of the token sequence and thus not part of the name.

As I mentioned previously, such guidelines are not fool proof. For instance, the guideline defines how to parse "a + b" if "a", "b" and/or "a + b" are previously declared as variables, but it doesn't define what to do when none of them are declared. It also is not clear about fields of variables without type. But, it does allow users to use ( ) to disambiguate expressions, in case type information is not fully available at parsing time.

@opatrascoiu
Copy link
Contributor

I believe that an issues was raised https://issues.omg.org/issues/DMN13-121

@opatrascoiu
Copy link
Contributor

There is a proposal that is part of Ballot 6. Should be voted by 13th of March.

@StrayAlien
Copy link
Contributor Author

Hi @opatrascoiu, anywhere we can see that proposal?

@opatrascoiu
Copy link
Contributor

The proposal has been voted this week.

The normal procedure is to attach documents when submitted a proposal. The attached documents are available in the public UI. In this case, becuase the change was small, it was specified in the jira ticket, hence is not visible in the public UI.

The change is

Change grammar rule 2.h. on pg 111 of formal-19-01-05 to
literal | simple positive unary test | name | "(" , expression , ")" ;

@agilepro
Copy link
Contributor

I think this is all done. Closing. REopen if there are any more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants