Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a syntaxicTree for version 2.0 of ExpressionEvaluator #58

Open
codingseb opened this issue Jul 8, 2020 · 10 comments
Open
Assignees
Milestone

Comments

@codingseb
Copy link
Owner

codingseb commented Jul 8, 2020

I create here an issue to follow and document (first for myself but for who is interested) the evolution of the work to version 2.0

The goal is to deeply refactor ExpressionEvaluator to internally build a syntaxic tree with different kinds of tokens.
The evaluation will be executed in 2 phases.

  1. Parse the expression string and build the syntaxic tree.
  2. Execute the syntaxic tree.

This will allow a lot of stuffs that are not possible with the current "on the fly" way of evaluating things.

It should allow to :

  • Cache expressions to speed up evaluations when executed multiple times.
  • Managing better the evaluation flow.
    • To correct ConditionalAnd with variables #56 (Conditional and and Conditional always execute their right part) without breaking operators precedence
    • To correct Sum of decimal fail #65 (Problem to select the good method when multiple method with same name defined that take lambda as arguments)
  • Validate more things before executing and prevent partial executions
  • Make better self explained exceptions.

Risks and drawdowns

  • I'm not sure yet that all existing functionalities can work with this new way of doing things.
  • Will take time
  • A lot of tests exists to ensure that what is working now will still working after but there are not exaustive. So there is a high risk of breaking things. More tests are needed.
  • As the structure will change, the way that some of the existing functionalities works will change. It means that the corresponding documentation will need to be rewrite.

I will also decide how to split things between version 1 and 2.

  • Fork, Branch or continue in the same repo and same release flow
  • How to manage documentation differences.
  • Split code in multiple files, keep it in one for easy copy or make a precompile merge or something like this.
@TheoVC
Copy link

TheoVC commented Jul 21, 2021

I've been making a little investigation and found that this library make something "like" the version 2 of this library should make.
https://github.com/zzzprojects/System.Linq.Dynamic.Core
It has a parselamba method wich parse a string into an AST abtract syntactic tree. Using linq.Expressions. so you can compile at the end and get an amazing fast lambda wich make whatever you want.
This library can manage a Context (because can be strongly typed). however it doesn't have variables functionality, does not support ?. or ?? operators... and has a lot of other database functionality that is innecesary.
We should study it...
Tomorrow I'm gonna post a little demo that I did.

@TheoVC
Copy link

TheoVC commented Jul 21, 2021

This is the context

public class Person
{
    public string Name { get; set; }
    public decimal Value { get; set; }
}

those the usings

using System.Linq.Dynamic.Core;
using System.Linq.Expressions;
using System.Diagnostics;

and this is the main test

List<Person> MyList = new List<Person>();

for(int i= 0; i< 100000; i++)
{
    MyList.Add(new Person { Name = "Theo" + i, Value = i + 0.11m });
}

var testFunc = Test3();

Stopwatch Reloj = new Stopwatch();

Reloj.Start();

foreach (var p in MyList)
{
    var rtn = testFunc(p);
    if(rtn) Console.WriteLine(rtn + p.Name);
}

Reloj.Stop();

Console.WriteLine(Reloj.ElapsedMilliseconds + " milisecs");
public static Func<Person, decimal> Test2()
{
    var arbol = DynamicExpressionParser.ParseLambda<Person, decimal>(new ParsingConfig() { }, false, "Value + 3.3m");
    return arbol.Compile();
}
public static Func<Person, bool> Test3()
{
    var tree = DynamicExpressionParser.ParseLambda<Person, bool>(new ParsingConfig() { }, false, "Name.Trim() == \"Theo45\"");
    // at this point you have the expression tree (in Linq.Expressions) in tree and walk trought it.
    // at the end we compile
    return tree.Compile();
}

In my machine, I get 3 miliseconds for the 100,000 iterations.
making something similar with codingSeb it took 24000 miliseconds

@codingseb
Copy link
Owner Author

codingseb commented Jul 23, 2021

@TheoVC Thanks for your your investigation and your tests. Yes I think syntaxic tree or compilation are a big speed enhencement for multiple execution. Speed is a big drawdown of EE.
I will investigate a bit on how it is done. The issue I see for now, is that a lot of current functionalities available in current version of EE seems to be hard to implement with these kind of systems. On the fly evaluations events, syntax customization and script evaluations for example. I need to investigate more to find solutions to implement them.

@codingseb
Copy link
Owner Author

I investigated a bit in the wonderful world of System.Linq.Expressions.
What I see for now and reflect on

  • The gains in performance are enormous.
  • Some recurring EE Pitfalls are easier to manage or handle automatically (Method override selection, && and || right operand evaluation versus syntax exception)
  • Everything need to be typed and so a lot of casting need to be done compared the duck typing of the current version of EE.
  • So variables types must be defined at parsing time and can not change for each evaluation.
  • On the fly events equivalent should occurs at parsing time to define the type of the value to return and define a delegate to call at evaluation time.
  • A lot of options and syntax customization will applied at parsing time.
  • After compilation a lot of stuffs are rigid.
  • async await asked in [suggestion] Implement async / await #76 are not supported in System.Linq.Expressions for now (refs : stackoverflow, Microsoft)

@TheoVC
Copy link

TheoVC commented Aug 25, 2021

For investigation purposes... I have done a recursive descent parser based on the Richard Weeks series.
https://www.youtube.com/watch?v=_hYV5N_NAgk&list=PLHLYG7mk_iQloafxJ55Uqxc2n7m4NnNO7&index=12
and extended it to implement more complicated c# syntax. don't know if it would be usefull to you. I can send it to you if you wish.
as you said everything have to be strong typed, even variables, and of course the context.
so the constructor could receive the type of the context...

var evaluator = new Evaluator(typeof( PERSON ));

of maybe

var evaluator = new Evaluator<PERSONA>();

the type could be null if you don't want a context... and just use the variables feature

another feature that would be wonderfull is that the library not just evaluates an expression, but also produce an string explaining to the user what’s happening inside the expression.

Something like this:

For expression:

200.0 + variable1 + myFunction( variable2 ) + ( variable3 ? 100 : variable 4 )

produce an explanation like this:

200.0 + /* variable1 */ 200 + /*myFunction( [variable2] 5 )*/ 20 + ( /* variable3 */ true ? 100 : variable 4 )

if you remove the coments you can have something like this;

200.0 + 200 + 20 + ( true ? 100 : variable 4 )

So this string can be displayed by the application and easily the IT guy or even the user himself can deduce what it’s going wrong? or simply to know why ??? you get that result.

We already achieve this using codingseb but has a lot of manual parsing and it’s slow…
I did a lot of research and there's NO library that can do it.

@PhenX
Copy link
Contributor

PhenX commented Aug 30, 2021

Hello, if that's of any help, what do you think about this library ? https://github.com/davideicardi/DynamicExpresso
It looks like a good starting point.

@codingseb
Copy link
Owner Author

codingseb commented Aug 31, 2021

Hello, if that's of any help, what do you think about this library ? https://github.com/davideicardi/DynamicExpresso
It looks like a good starting point.

Yes it's a really good lib.

I created a new repo where I test some partial rewriting of EE with a ExpressionTree compilation.
It work already for small operation without all variables, methods and properties access and on the fly stuffs :
CSXPression.

I made a lot of copy paste modify operations from EE so it's not really a Lexer/ASTNode but I made a parser part that build a tokens tree from which I can get an expression tree and then compile to Func<object> delegate.

@codingseb
Copy link
Owner Author

codingseb commented Aug 31, 2021

I am just a bit stuck on how to adapt the variables and on the fly events (that were for me the big cool stuff of EE with syntax customization) without doing just the same as these already existing libraries.

@codingseb
Copy link
Owner Author

For investigation purposes... I have done a recursive descent parser based on the Richard Weeks series.
https://www.youtube.com/watch?v=_hYV5N_NAgk&list=PLHLYG7mk_iQloafxJ55Uqxc2n7m4NnNO7&index=12

Also very interesting stuff here thank for sharing this serie. I am watching it. And I am asking myself about the perfs of Lexer/ASTNodes char by char parsing versus my big regex way of parsing.

@lofcz
Copy link
Contributor

lofcz commented Sep 1, 2021

@codingseb one of the cleanest architectures of hand written parser is here - https://github.com/scriban/scriban
I would possibly use one of LL(k) parser generators not only for perf but also for ease of maintenance.
Currently I can't use EE due to perf so I put together a js-c# like language that transpiles to lua scripts and I execute them with Moonsharp (heavy clr usage makes up for slower vm compared to other lua implementations for c#).

Anyway for v2 our main goal should be performance as I think we can live without cool customization when such implementaion can't be then used in real world applications. Hence a good solution would imho be able to generate bytecode for either native c# or custom vm which would then interpret this simpler language on runtime. Lua vm has only ~50 bytecode opcodes.

Of almost the same importance is imho ability to properly sandbox the script evaluation as untrusted code will be interpreted in many use cases. This should include recursion depth limit, total instructions to interpret limit, no access to clr classes unless explicitly provided.

Last thing I would love to see is native async / await support. This is becoming more and more important with each new version of c#.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants