Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

WilliamParker · 2016-08-10T12:09:36Z

…here are no corresponding elements

I removed the special 1-element case of mem/remove-first-of-each since it changes the ordering of members of the collection of items that were not removed. I believe the into call [1] is causing this. If we look at split-with [2] it uses take-while, which creates a lazy sequence. Into will append to the start of this sequence, so elements after the element removed will move to an earlier position. I don't see a reason why this special case should be more performant than the current implementation for an arbitrary number of elements to remove; if anyone sees such a reason please explain it. From looking through the history it looks like the general case was less efficient at one time. Being able to guarantee that the ordering is unchanged allows us to make some performance optimizations.

The optimizations I made were

In right-activate, accumulate the new elements on top of the result of the previous accumulation rather than re-accumulating from scratch.
In right-retract, don't do anything for a token other than update the memory if the number of elements filtered for that token is the same before and after the retraction i.e. if none of the retracted elements matched that token in the first place. If remove-first-of-each changes the ordering of elements then the ordering of elements in the memory will change, thus changing the accumulation order. We cannot have a state where the elements are ordered like [element1 element2] in the memory but the downstream token has [element2 element1], since future operations may attempt to retract something using the former ordering, but fail to actually retract anything since the downstream tokens have another ordering. Note that while rules need to accept any ordering the engine uses, accumulator results will not necessarily be equal (and in the case of acc/all they won't be).

I think these are valid, but would probably merit some thought from you too and anyone else looking at this PR.

Otherwise this is fairly similar to my changes for AccumulateNode, with some additional tests added to reflect AccumulateWithJoinFilterNode specific paths and previous tests for AccumulateNode parameterized to test both nodes. I'll note that I decided to refer to the join types rather than the node names in the parameterization, since on further thought referring to node types in the tests seemed like unnecessary brittleness of test comments and descriptions. I think my previous pattern of referring to the concrete node types was a mistake.

mrrodriguez · 2016-08-10T12:35:52Z

I don't see a reason why this special case should be more performant than the current implementation for an arbitrary number of elements to remove; if anyone sees such a reason please explain it. From looking through the history it looks like the general case was less efficient at one time.

+1 to fixing mem/remove-first-of-each. I worked on the last change to that area of code and I left the special case for size=1 just because it wasn't specifically what I was trying to fix at the time. The behavior of the size=1 case is sneaky that it can change the order and is likely to lead to other subtle issues if left around. Now the that the size>1 has been optimized anyways, there is really no great benefit to what the special case is doing.
Even with all that said, this function is barely used in the CLJ implementation anymore. It's biggest use-case that I see is in AccumulateWithJoinFilterNode so we should certainly be sure to make it useful for that node. :)

mrrodriguez · 2016-08-10T17:18:55Z

src/main/clojure/clara/rules/engine.cljc

+
+              ;; If no new elements matched the token, we don't need to do anything for this token
+              ;; since the final result is guaranteed to be the same.
+              :when (seq new-filtered-facts)


We can skip

previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates)

as well if this is empty right? That could be a fairly significant savings too since sometimes the join filter can be an expensive operation and we may have a lot of previous-candidates. I didn't notice when looking through the logic you had previously.

(doseq [token matched-tokens :let [new-filtered-facts (filter-accum-facts join-filter-fn token candidates)] ;; If no new elements matched the token, we don't need to do anything for this token ;; since the final result is guaranteed to be the same. :when (seq new-filtered-facts) :let [previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates) previous-accum-result-init <etc>]] <etc>)

Agreed, I have fixed this.

… groups when there are no corresponding elements

rbrush · 2016-08-11T16:02:33Z

src/main/clojure/clara/rules/engine.cljc

+      ;; ordering consistent allows us to skip the filter step on previous elements on right-activations.
+      (let [combined-candidates (into []
+                                      cat
+                                      [previous-candidates candidates])]


Just for my own education, have you found the use of the cat transducer here to be more efficient than "(into previous-candidates candidates)" ? (I have no objections to this approach but it surprised me a bit.)

The point of the into call into an empty collection here is actually to ensure that new items are always added to the end, rather than the beginning. This is important since while rules have to be OK with getting things in any order, both activations and retractions need to accumulate in the same order since accumulators don't have to return equal results for different orders. The acc/all accumulator is the most obvious example of this; the rules have to be OK with getting [FactA FactB] or [FactB FactA] but trying to retract a token that contains [FactA FactB] won't work if the tokens stored in memory for downstream nodes actually have [FactB FactA]. Getting this ordering consistent (assuming I'm thinking about things correctly) allows us to make the optimization on lines 1208-1216 of adding the new elements onto the previous accumulator result rather than fully accumulating twice.

Ah, makes sense. Thanks for the clarification.

Transducers are nice here to give us concat like behavior without the problem of stacking the lazyseqs unbounded depths where we get a stack overflow later. into is always dependent on the conj behavior of the target collection.

into would have been faster for the case of the target being a vector though since it would only have to copy the second collection onto the first. Either way, irrelevant here.

rbrush · 2016-08-11T16:05:41Z

This is really excellent work, @WilliamParker. I'm fine with merging it if you and @mrrodriguez feel it's ready.

WilliamParker · 2016-08-11T16:42:06Z

For my part, I think this is ready to merge.

mrrodriguez · 2016-08-11T18:20:27Z

+1 from me.

rbrush · 2016-08-11T18:27:34Z

Merged. Thanks!

WilliamParker force-pushed the issue102-AccumulateWithJoinFilterNode branch from 2d9d0c8 to 934a922 Compare August 10, 2016 12:11

WilliamParker changed the title ~~Issue 102: Prevent AccumulateNode from creating binding groups when t…~~ Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… Aug 10, 2016

mrrodriguez reviewed Aug 10, 2016
View reviewed changes

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding…

e625710

… groups when there are no corresponding elements

WilliamParker force-pushed the issue102-AccumulateWithJoinFilterNode branch from 934a922 to e625710 Compare August 11, 2016 09:03

rbrush reviewed Aug 11, 2016
View reviewed changes

rbrush merged commit 1fdf9d6 into oracle-samples:master Aug 11, 2016

This was referenced Aug 12, 2016

Accumulators with initial value may fire with unbound internal values. #102

Closed

Should AccumulateNode use the cat transducer instead of simple into call? #211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

WilliamParker commented Aug 10, 2016

mrrodriguez commented Aug 10, 2016

mrrodriguez Aug 10, 2016

WilliamParker Aug 11, 2016

rbrush Aug 11, 2016

WilliamParker Aug 11, 2016

rbrush Aug 11, 2016

mrrodriguez Aug 11, 2016

rbrush commented Aug 11, 2016

WilliamParker commented Aug 11, 2016

mrrodriguez commented Aug 11, 2016

rbrush commented Aug 11, 2016

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

Conversation

WilliamParker commented Aug 10, 2016

mrrodriguez commented Aug 10, 2016

mrrodriguez Aug 10, 2016

Choose a reason for hiding this comment

WilliamParker Aug 11, 2016

Choose a reason for hiding this comment

rbrush Aug 11, 2016

Choose a reason for hiding this comment

WilliamParker Aug 11, 2016

Choose a reason for hiding this comment

rbrush Aug 11, 2016

Choose a reason for hiding this comment

mrrodriguez Aug 11, 2016

Choose a reason for hiding this comment

rbrush commented Aug 11, 2016

WilliamParker commented Aug 11, 2016

mrrodriguez commented Aug 11, 2016

rbrush commented Aug 11, 2016