New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer #advanceExact on expression dependencies until their values are needed #12560
Conversation
… needed This extends the idea in GH#11878 to avoid advancing dependencies that are never referenced because of expression branching (i.e., ternary expressions).
if (currentDoc == doc) { | ||
return true; | ||
} | ||
for (DoubleValues v : functionValues) { | ||
v.advanceExact(doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused... Isn't this calling advanceExact
more eagerly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this layer, yes. But all these DoubleValue instances come from ExpressionValueSource#zeroWhenUnpositioned
(see the setup done in ExpressionValueSource#getValues
). In this PR, I'm proposing pushing down the lazy advancing into that layer. Does this make sense? We could keep the lazy advance logic up here as well, but I didn't think it was really necessary since the delegation to the underlying advancing is now lazy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but still a little confused about why this is supposed to be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have an expression like a == 0 ? b : c
where a
, b
and c
are all other references in your bindings, only b
or c
need to be evaluated for any given doc, not both. But today, we greedily call advanceExact
on all three up-front, for every doc. So we're doing wasted advancing work for one of b
or c
for every single doc. I added a similar example in a test case.
Depending on what the advancing is doing, this could come with significant cost (e.g., if b
or c
is another complex expression tree that ends up propagating the advance down all of its dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for explaining, makes sense to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused - it seems like we're pushing things around, but I know you have some compelling-looking results so it must be OK? But I think it would be great if you could distill whatever knowledge you've garnered into a comment here. The missing link for me is -- what is ExpressionFunctionValues representing? Is that used for functions only (like max()
) or generally for any expression? I guess if it's only for functions then it makes sense that we are now getting lazy advance for other expressions that might not need all their subs. But then have we lost some laziness for functions? Does this work for short-circuited booleans to? Like (1 == 1) || expensive()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit of indirection here, so maybe this will help:
ExpressionFunctionValues
represents a computed expression value for a given doc. This comes from any instance ofExpression
(e.g., fromJavascriptCompiler#compile
).Expression#getDoubleValuesSource
provides an instance ofExpressionValueSource
, which in turn providesExpressionFunctionValues
when you call#getValues
for a specific segment context.ExpressionFunctionValues#advanceExact
always returns true (expressions produce results for all docs), andExpressionFunctionValues#doubleValue
computes the value for the doc.- In
ExpressionValueSource#getValues
, we take all the expression's arguments (which are all instances ofDoubleValues
) and wrap them with an anonymous class provided by#zeroWhenUnpositioned
. The main purpose of this wrapper is to provide a value of0
for any expression argument that isn't available for a doc (i.e.,advanceExact
returned false for the doc). This PR extends the functionality of this wrapper to lazily advance until the point whendoubleValues
is called. - So without this PR,
ExpressionFunctionValues#advanceExact
is lazy but onceExpressionFunctionValues#doubleValues
is called, it advances all of the arguments (regardless of whether-or-not that argument is ever needed for a given doc). With this change, theadvanceExact
also becomes lazy, so it gets deferred until a value is actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: I brought back the lazy ExpressionFunctionValues#advanceExact
logic because it created confusion and isn't necessary to remove. It doesn't hurt to keep this lazy logic in place. I had removed it (making it greedy again) as an attempt to simplify, but since it created confusion, I figured I'd bring it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I also added another test-case to show that this laziness also works for condition short-circuiting (like you asked about @msokolov). And just to clarify, this is for all expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, nice!
if (currentDoc == doc) { | ||
return true; | ||
} | ||
for (DoubleValues v : functionValues) { | ||
v.advanceExact(doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for explaining, makes sense to me!
public boolean advanceExact(int doc) throws IOException { | ||
return positioned = in.advanceExact(doc); | ||
public boolean advanceExact(int doc) { | ||
currentDoc = doc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any sense in checking if we are advancing to the same doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I'll include that logic.
if (currentDoc == doc) { | ||
return true; | ||
} | ||
for (DoubleValues v : functionValues) { | ||
v.advanceExact(doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused - it seems like we're pushing things around, but I know you have some compelling-looking results so it must be OK? But I think it would be great if you could distill whatever knowledge you've garnered into a comment here. The missing link for me is -- what is ExpressionFunctionValues representing? Is that used for functions only (like max()
) or generally for any expression? I guess if it's only for functions then it makes sense that we are now getting lazy advance for other expressions that might not need all their subs. But then have we lost some laziness for functions? Does this work for short-circuited booleans to? Like (1 == 1) || expensive()
?
…n and maybe for slight additional benefit
Circling back on this: For Amazon's Product Search engine, we make fairly heavy use of these expression implementations. I pulled this change into our Lucene fork early (currently on 9.7) and ran internal benchmarks we have, and this produced a ~23% redline QPS improvement. Milage may vary of course, but the impact was significant so others may find a nice win as well (if heavy expression users). |
Description
This extends the idea in GH#11878 to avoid advancing dependencies that are never referenced because of expression branching (i.e., ternary expressions). I think we should be able to be a little more optimal in terms of not unnecessarily advancing dependent double values.