Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New math parser #956

Merged
merged 5 commits into from
Oct 31, 2023
Merged

New math parser #956

merged 5 commits into from
Oct 31, 2023

Conversation

adamchalmers
Copy link
Collaborator

@adamchalmers adamchalmers commented Oct 31, 2023

Performance

The math parser benchmark now runs in only 3.37% of its previous time (30x speedup). Here's the criterion data from parsing math.kcl:

// Main
time:   [54.612 ms 54.651 ms 54.699 ms]   
// This PR
time:   [1.8411 ms 1.8420 ms 1.8430 ms]
change: [-96.630% -96.626% -96.622%] (p = 0.00 < 0.05)

Algorithmic differences

The previous algorithm received as input a vector of KCL syntax tokens. This meant the previous math parser had to identify which tokens were operands and which were operators. For example, the input "a[0] + (b[0] + c)" has to be parsed to figure out that a[0] is an operand, and to convert the flat vec of tokens into a nested structure (by understanding the parentheses around b[0] + c).

This algorithm is a little different. It receives a vec of operators and operands -- not tokens. The previous parser step in Winnow already parsed the KCL syntax tokens into operators and operands, so there's no need for the math parser to redo this work. The Winnow parser also started building a tree of binary expressions, and it understands nested brackets.

So, my new parser has a lot less work to do (because it reuses previous Winnow steps). My parser really just needs to understand precedence and associativity, and convert the flat vec of operator/operands into a binary expression tree. Because some of these operands are already binary expressions, this is kind of combining expression trees into one big tree.

As a result, I use a simplified version of the "shunting yard" algorithm for converting infix to postfix. This algorithm doesn't have to worry about open/close parentheses (those parentheses have already been converted into expression trees by previous Winnow parsing step). This applies precedence and associativity.

Then I run the evaluate(rpn) function to convert the postfix tokens into a binary expression tree. You just read operands/operators off the postfix list. You push operands into a stack, and if you get an operator, you pop its left and right subtree from the stack, make a new tree, and push that tree onto the stack.

Also, my algorithm iterates over a stack rather than using recursion, which explains at least some performance improvement (though not much compared to reusing the winnow operand/operator parsing). It should reduce the risk of stack overflows.

Copy link

vercel bot commented Oct 31, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
modeling-app ✅ Ready (Inspect) Visit Preview Oct 31, 2023 7:04pm

@adamchalmers adamchalmers force-pushed the achalmers/new-math-parser branch 2 times, most recently from 37c9c48 to fb347b2 Compare October 31, 2023 18:10
@adamchalmers adamchalmers marked this pull request as ready for review October 31, 2023 18:12
@atlantis-for-zoo
Copy link

Error: This repo is not allowlisted for Atlantis.

@@ -50,6 +50,9 @@ engine = []
panic = "abort"
debug = true

[profile.bench]
debug = true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flamegraphs of the benchmarks need this for accurately knowing the names of functions

@@ -2175,6 +2175,7 @@ impl BinaryExpression {
BinaryOperator::Mul => (left * right).into(),
BinaryOperator::Div => (left / right).into(),
BinaryOperator::Mod => (left % right).into(),
BinaryOperator::Pow => (left.powf(right)).into(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added solely so I could test the new math parser with examples from wikipedia

Copy link
Collaborator

@Irev-Dev Irev-Dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just had one question about some commented stuff, just wanted to make sure they weren't forgotten about?

src/wasm-lib/kcl/src/parser/parser_impl.rs Outdated Show resolved Hide resolved
@adamchalmers adamchalmers merged commit 023c3cb into main Oct 31, 2023
14 checks passed
@adamchalmers adamchalmers deleted the achalmers/new-math-parser branch October 31, 2023 19:16
Copy link
Collaborator

@paultag paultag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, this is a definite improvement

@@ -2257,13 +2258,46 @@ pub enum BinaryOperator {
#[serde(rename = "%")]
#[display("%")]
Mod,
/// Raise a number to a power.
#[serde(rename = "^")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pow being ^ does make sense from a math point of view (specifically those coming from LaTeX, etc), but Python, Rust, C, Go, etc use ** for exponentiation. We may want to document this pretty heavily and/or come out with a "We don't do bitwise maths" which may help avoid confusion since XOR wouldn't ever be present.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think given our audience (mech eng) and that we never manipulate network etc, we're probably not going to support bitwise operations.

pub fn associativity(&self) -> Associativity {
match self {
Self::Add | Self::Sub | Self::Mul | Self::Div | Self::Mod => Associativity::Left,
Self::Pow => Associativity::Right,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I've seen a lot of naive parsers that get X^Y^Z wrong.

}
operator_stack.push(o1);
}
o @ BinaryExpressionToken::Operand(_) => output.push(o),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about the match @ operator, this is neat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants