New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210
Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210
Conversation
2d9d0c8
to
934a922
Compare
+1 to fixing |
|
||
;; If no new elements matched the token, we don't need to do anything for this token | ||
;; since the final result is guaranteed to be the same. | ||
:when (seq new-filtered-facts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can skip
previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates)
as well if this is empty right? That could be a fairly significant savings too since sometimes the join filter can be an expensive operation and we may have a lot of previous-candidates
. I didn't notice when looking through the logic you had previously.
(doseq [token matched-tokens
:let [new-filtered-facts (filter-accum-facts join-filter-fn token candidates)]
;; If no new elements matched the token, we don't need to do anything for this token
;; since the final result is guaranteed to be the same.
:when (seq new-filtered-facts)
:let [previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates)
previous-accum-result-init <etc>]]
<etc>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I have fixed this.
… groups when there are no corresponding elements
934a922
to
e625710
Compare
;; ordering consistent allows us to skip the filter step on previous elements on right-activations. | ||
(let [combined-candidates (into [] | ||
cat | ||
[previous-candidates candidates])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own education, have you found the use of the cat transducer here to be more efficient than "(into previous-candidates candidates)" ? (I have no objections to this approach but it surprised me a bit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the into call into an empty collection here is actually to ensure that new items are always added to the end, rather than the beginning. This is important since while rules have to be OK with getting things in any order, both activations and retractions need to accumulate in the same order since accumulators don't have to return equal results for different orders. The acc/all accumulator is the most obvious example of this; the rules have to be OK with getting [FactA FactB] or [FactB FactA] but trying to retract a token that contains [FactA FactB] won't work if the tokens stored in memory for downstream nodes actually have [FactB FactA]. Getting this ordering consistent (assuming I'm thinking about things correctly) allows us to make the optimization on lines 1208-1216 of adding the new elements onto the previous accumulator result rather than fully accumulating twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transducers are nice here to give us concat
like behavior without the problem of stacking the lazyseqs unbounded depths where we get a stack overflow later. into
is always dependent on the conj
behavior of the target collection.
into
would have been faster for the case of the target being a vector though since it would only have to copy the second collection onto the first. Either way, irrelevant here.
This is really excellent work, @WilliamParker. I'm fine with merging it if you and @mrrodriguez feel it's ready. |
For my part, I think this is ready to merge. |
+1 from me. |
Merged. Thanks! |
…here are no corresponding elements
I removed the special 1-element case of mem/remove-first-of-each since it changes the ordering of members of the collection of items that were not removed. I believe the into call [1] is causing this. If we look at split-with [2] it uses take-while, which creates a lazy sequence. Into will append to the start of this sequence, so elements after the element removed will move to an earlier position. I don't see a reason why this special case should be more performant than the current implementation for an arbitrary number of elements to remove; if anyone sees such a reason please explain it. From looking through the history it looks like the general case was less efficient at one time. Being able to guarantee that the ordering is unchanged allows us to make some performance optimizations.
The optimizations I made were
I think these are valid, but would probably merit some thought from you too and anyone else looking at this PR.
Otherwise this is fairly similar to my changes for AccumulateNode, with some additional tests added to reflect AccumulateWithJoinFilterNode specific paths and previous tests for AccumulateNode parameterized to test both nodes. I'll note that I decided to refer to the join types rather than the node names in the parameterization, since on further thought referring to node types in the tests seemed like unnecessary brittleness of test comments and descriptions. I think my previous pattern of referring to the concrete node types was a mistake.