Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… #210

Conversation

WilliamParker
Copy link
Collaborator

…here are no corresponding elements

I removed the special 1-element case of mem/remove-first-of-each since it changes the ordering of members of the collection of items that were not removed. I believe the into call [1] is causing this. If we look at split-with [2] it uses take-while, which creates a lazy sequence. Into will append to the start of this sequence, so elements after the element removed will move to an earlier position. I don't see a reason why this special case should be more performant than the current implementation for an arbitrary number of elements to remove; if anyone sees such a reason please explain it. From looking through the history it looks like the general case was less efficient at one time. Being able to guarantee that the ordering is unchanged allows us to make some performance optimizations.

The optimizations I made were

  • In right-activate, accumulate the new elements on top of the result of the previous accumulation rather than re-accumulating from scratch.
  • In right-retract, don't do anything for a token other than update the memory if the number of elements filtered for that token is the same before and after the retraction i.e. if none of the retracted elements matched that token in the first place. If remove-first-of-each changes the ordering of elements then the ordering of elements in the memory will change, thus changing the accumulation order. We cannot have a state where the elements are ordered like [element1 element2] in the memory but the downstream token has [element2 element1], since future operations may attempt to retract something using the former ordering, but fail to actually retract anything since the downstream tokens have another ordering. Note that while rules need to accept any ordering the engine uses, accumulator results will not necessarily be equal (and in the case of acc/all they won't be).

I think these are valid, but would probably merit some thought from you too and anyone else looking at this PR.

Otherwise this is fairly similar to my changes for AccumulateNode, with some additional tests added to reflect AccumulateWithJoinFilterNode specific paths and previous tests for AccumulateNode parameterized to test both nodes. I'll note that I decided to refer to the join types rather than the node names in the parameterization, since on further thought referring to node types in the tests seemed like unnecessary brittleness of test comments and descriptions. I think my previous pattern of referring to the concrete node types was a mistake.

  1. https://github.com/rbrush/clara-rules/blob/0.11.1/src/main/clojure/clara/rules/memory.cljc#L125
  2. https://github.com/clojure/clojure/blob/clojure-1.8.0/src/clj/clojure/core.clj#L2888
  3. https://github.com/clojure/clojure/blob/clojure-1.8.0/src/clj/clojure/core.clj#L2795

@WilliamParker WilliamParker force-pushed the issue102-AccumulateWithJoinFilterNode branch from 2d9d0c8 to 934a922 Compare August 10, 2016 12:11
@WilliamParker WilliamParker changed the title Issue 102: Prevent AccumulateNode from creating binding groups when t… Issue 102: Prevent AccumulateWithJoinFilterNode from creating binding groups when t… Aug 10, 2016
@mrrodriguez
Copy link
Collaborator

I don't see a reason why this special case should be more performant than the current implementation for an arbitrary number of elements to remove; if anyone sees such a reason please explain it. From looking through the history it looks like the general case was less efficient at one time.

+1 to fixing mem/remove-first-of-each. I worked on the last change to that area of code and I left the special case for size=1 just because it wasn't specifically what I was trying to fix at the time. The behavior of the size=1 case is sneaky that it can change the order and is likely to lead to other subtle issues if left around. Now the that the size>1 has been optimized anyways, there is really no great benefit to what the special case is doing.
Even with all that said, this function is barely used in the CLJ implementation anymore. It's biggest use-case that I see is in AccumulateWithJoinFilterNode so we should certainly be sure to make it useful for that node. :)


;; If no new elements matched the token, we don't need to do anything for this token
;; since the final result is guaranteed to be the same.
:when (seq new-filtered-facts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip

previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates)

as well if this is empty right? That could be a fairly significant savings too since sometimes the join filter can be an expensive operation and we may have a lot of previous-candidates. I didn't notice when looking through the logic you had previously.

(doseq [token matched-tokens

        :let [new-filtered-facts (filter-accum-facts join-filter-fn token candidates)]

        ;; If no new elements matched the token, we don't need to do anything for this token
        ;; since the final result is guaranteed to be the same.
        :when (seq new-filtered-facts)

        :let [previous-filtered-facts (filter-accum-facts join-filter-fn token previous-candidates)
              previous-accum-result-init <etc>]]
  <etc>)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I have fixed this.

… groups when there are no corresponding elements
@WilliamParker WilliamParker force-pushed the issue102-AccumulateWithJoinFilterNode branch from 934a922 to e625710 Compare August 11, 2016 09:03
;; ordering consistent allows us to skip the filter step on previous elements on right-activations.
(let [combined-candidates (into []
cat
[previous-candidates candidates])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own education, have you found the use of the cat transducer here to be more efficient than "(into previous-candidates candidates)" ? (I have no objections to this approach but it surprised me a bit.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the into call into an empty collection here is actually to ensure that new items are always added to the end, rather than the beginning. This is important since while rules have to be OK with getting things in any order, both activations and retractions need to accumulate in the same order since accumulators don't have to return equal results for different orders. The acc/all accumulator is the most obvious example of this; the rules have to be OK with getting [FactA FactB] or [FactB FactA] but trying to retract a token that contains [FactA FactB] won't work if the tokens stored in memory for downstream nodes actually have [FactB FactA]. Getting this ordering consistent (assuming I'm thinking about things correctly) allows us to make the optimization on lines 1208-1216 of adding the new elements onto the previous accumulator result rather than fully accumulating twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Thanks for the clarification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transducers are nice here to give us concat like behavior without the problem of stacking the lazyseqs unbounded depths where we get a stack overflow later. into is always dependent on the conj behavior of the target collection.

into would have been faster for the case of the target being a vector though since it would only have to copy the second collection onto the first. Either way, irrelevant here.

@rbrush
Copy link
Contributor

rbrush commented Aug 11, 2016

This is really excellent work, @WilliamParker. I'm fine with merging it if you and @mrrodriguez feel it's ready.

@WilliamParker
Copy link
Collaborator Author

For my part, I think this is ready to merge.

@mrrodriguez
Copy link
Collaborator

+1 from me.

@rbrush rbrush merged commit 1fdf9d6 into oracle-samples:master Aug 11, 2016
@rbrush
Copy link
Contributor

rbrush commented Aug 11, 2016

Merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants