Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using generic expressions inside operand? #165

Closed
CP3088 opened this issue Mar 12, 2020 · 7 comments
Closed

Using generic expressions inside operand? #165

CP3088 opened this issue Mar 12, 2020 · 7 comments

Comments

@CP3088
Copy link
Contributor

CP3088 commented Mar 12, 2020

Hello,

I have been working on my parser more trying to overcome the issues from last time.
For the past few weeks I have been toying around with generic expressions, and seem to have run into a issue where generic expressions cannot be reused as operands without performance loss.

Given the following grammar:

[Production("test_stmt : LBEG [d] id DEFINE [d] arith_expr LEND [d]")]

[Production("arith_expr : PineParser_expressions")]
[Production("arith_expr : ternary_expr")] // TODO: | if_expr | for_expr

[Production("ternary_expr : PineParser_expressions COND [d] PineParser_expressions COND_ELSE [d] PineParser_expressions")]
[Production("sqbr_expr : id LSQBR [d] PineParser_expressions RSQBR [d]")]
[Production("group_expr : LPAR [d] PineParser_expressions RPAR [d]")]

[Operand]
[Production("atom : id")]
[Production("atom : literal")]
[Production("atom : sqbr_expr")]
[Production("atom : group_expr")]

/* ... */

I am getting these results:

a = close[3] / (2 + 2)       21860 ms
a = close[3] / 2 + 2         7435 ms           Removed Group Rule
a = close[3] / 2 + 2         3633 ms           Removed Ternary Rule

Function calls will be required to act as an operand while containing PineParser_expressions as well.
What do you recommend for fixing this recursion?

Thanks,
CP3088

@b3b00
Copy link
Owner

b3b00 commented Mar 15, 2020

hi @CP3088 , glad to see that you keep with your project.
I dont see any obvious flaw in yiur grammar. I dont undrstqnd the 3 time drop you get when removing the group rule, it seems impossible to have a link in your particular case.
I will look at it later when having time, surely tomorrow.

@b3b00
Copy link
Owner

b3b00 commented Mar 15, 2020

I look at it quickly tyrying to reproduc a minimal parser (see below). And I do not reproduce any performance drop ! removing or leting rules does not make the parser any faster or slower. It takes on ~20ms to parse your examples.
I will later check your full grammar to see if there is something else.
For now I think the parser excerpt that you show is completely ok.

  [Production("test_stmt :  ID EQ [d] arith_expr ")]
        public object assign(Token<Issue165Lexer> id, object expr)
        {
            return null;
        }

        [Operation((int) Issue165Lexer.PLUS, Affix.InFix, Associativity.Right, 10)]
        [Operation("MINUS", Affix.InFix, Associativity.Left, 10)]
        public object BinaryTermExpression(object left, Token<Issue165Lexer> operation, object right)
        {
            return null;
        }

        [Operation((int) Issue165Lexer.TIMES, Affix.InFix, Associativity.Right, 50)]
        [Operation("DIVIDE", Affix.InFix, Associativity.Left, 50)]
        public object BinaryFactorExpression(double left, Token<Issue165Lexer> operation, object right)
        {
            return null;
        }



        [Production("arith_expr : Issue165Parser_expressions")]
        [Production("arith_expr : ternary_expr")]
        public object arith(object h)
        {
            return h;
        }

        [Production(
            "ternary_expr : Issue165Parser_expressions QUESTIONMARK [d] Issue165Parser_expressions COLON [d] Issue165Parser_expressions")]
        public object ternary(object j, object k, object l)
        {
            return null;
        }

        [Production("sqbr_expr : ID LBR [d] Issue165Parser_expressions RBR [d]")]
        public object sqrbrack(Token<Issue165Lexer> token, object expr)
        {
            return null;
        }

        [Production("group_expr : LPAR [d] Issue165Parser_expressions RPAR [d]")]
        public object group(object subexpr)
        {
            return subexpr;
        }

        [Operand]
        [Production("atom : ID")]
        public object id(Token<Issue165Lexer> i)
        {
            return null;
        }
        [Operand]
        [Production("atom : INT")]
        public object integer(Token<Issue165Lexer> i)
        {
            return i;
        }
        [Operand]
        [Production("atom : sqbr_expr")]
        public object sq(object i)
        {
            return i;
        }
        [Operand]
        [Production("atom : group_expr")]
        public object id(object g)
        {
            return g;
        }

b3b00 added a commit that referenced this issue Mar 15, 2020
@b3b00
Copy link
Owner

b3b00 commented Mar 16, 2020

Hello @CP3088 ,

I've tried to reproduce your issue using master branch on your repo :

           var parser = BuildParser();
            string source = "a = c[3] / (2 + 2)";
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var res = parser.Parse(source,"var_def");
            sw.Stop();
            Console.WriteLine($"{res.IsOk} ! {sw.ElapsedMilliseconds} ms");

But parse fails with

unexpected "c" (ID) at line 0, column 4.

Could you tell me what do you use to test ?

Olivier

@CP3088
Copy link
Contributor Author

CP3088 commented Mar 17, 2020

Good morning @b3b00 ,

I have just now updated my repo. It is mostly simplified to demonstrate this specific issue.

|B|a = close[3] / (2 + 2)|E|

Parsing...
Done 29994ms

There are some ideas I have for your parser which I will attempt to test today, but please let me know if you can see the issue in this repo. When debugging csly, I find that some unnecessary looping might be occurring. Additionally, if we phase out System.Linq there will be considerable increases in performance for large grammars such as mine.

CP3088

@CP3088
Copy link
Contributor Author

CP3088 commented Mar 18, 2020

Hello,

Removing all my unused Generic Expressions brought the time down to 70ms even with the ternary and group rules.

//[Operation((int)PineToken.NOT, Affix.PreFix, Associativity.Left, 80)]
//[Operation((int)PineToken.PLUS, Affix.PreFix, Associativity.Left, 80)]
//[Operation((int)PineToken.MINUS, Affix.PreFix, Associativity.Left, 80)]

//[Operation((int)PineToken.MUL, Affix.InFix, Associativity.Left, 70)]
[Operation((int)PineToken.DIV, Affix.InFix, Associativity.Left, 70)]
//[Operation((int)PineToken.MOD, Affix.InFix, Associativity.Left, 70)]
[Operation((int)PineToken.PLUS, Affix.InFix, Associativity.Left, 60)]
//[Operation((int)PineToken.MINUS, Affix.InFix, Associativity.Left, 60)]

//[Operation((int)PineToken.GT, Affix.InFix, Associativity.Left, 50)]
//[Operation((int)PineToken.LT, Affix.InFix, Associativity.Left, 50)]
//[Operation((int)PineToken.GE, Affix.InFix, Associativity.Left, 50)]
//[Operation((int)PineToken.LE, Affix.InFix, Associativity.Left, 50)]
//[Operation((int)PineToken.OR, Affix.InFix, Associativity.Left, 20)]
//[Operation((int)PineToken.AND, Affix.InFix, Associativity.Left, 30)]
//[Operation((int)PineToken.EQ, Affix.InFix, Associativity.Left, 40)]
//[Operation((int)PineToken.NEQ, Affix.InFix, Associativity.Left, 40)]

The parser tries every one of these, and must exhaust an exponential number of possibilities, even though it doesn't contain the required token. We are talking O(n^2) or worse from my testing.

What do you think of a look ahead? The same way the parser will attempt to get PossibleLeadingTokens, it should maybe narrow that list down when the 2nd token of a rule is terminal and not optional?

CP3088

@b3b00
Copy link
Owner

b3b00 commented Mar 18, 2020

indeed it should solve the problem. A lookahead of two would be fine for your case, but maybe this could be 3 or even 4 for another.
I am quite busy these days so I have no time for CSLY. If you find yourself confident to propose a solution go ahead and send me a PR.

CP3088 added a commit to JQluv/csly that referenced this issue Apr 1, 2020
CP3088 added a commit to JQluv/csly that referenced this issue Apr 1, 2020
@CP3088
Copy link
Contributor Author

CP3088 commented Apr 14, 2020

Optimized how the expression generator works.

#170

@CP3088 CP3088 closed this as completed Apr 14, 2020
b3b00 added a commit that referenced this issue Apr 15, 2020
Optimized Expression Rules for #165
b3b00 added a commit that referenced this issue Apr 15, 2020
b3b00 added a commit that referenced this issue Apr 15, 2020
* 'dev' of github.com:b3b00/csly:
  Update README.md
  Update README.md
  #165 fix issues after @CP3088 pull-request
  Optimized Expression Rules for #165
  looking at #165
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants