Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking AccumulateNode #190

Closed
mrrodriguez opened this issue May 12, 2016 · 14 comments
Closed

Rethinking AccumulateNode #190

mrrodriguez opened this issue May 12, 2016 · 14 comments
Milestone

Comments

@mrrodriguez
Copy link
Collaborator

I'd like to propose a fairly major shift in the AccumulateNode for a few reasons. This relates to the old topic of #127, but there is more to the overall story now.

I apologize for this being long-winded, but I would like to get some feedback on these points. I have toyed with a few implemenation ideas of this at this point, but I don't want to go to deep in that without discussing it some more first.


The AccumulateWithJoinFilterNode has very different operational semantics from AccumulateNode. This can cause some interesting and surprising performance characteristics between the two, among other issues we may have around propagating nil, retracting the correct bindings, etc. On the binding retraction part, the comment made @ #188 (comment) of interest, because it is actually not possible (?) to know if we have retracted the "last item" from an AccumulateNode. We can detect this in an AccumulateWithJoinFilterNode however.

So my basic proposal here is this:

(1) Implement the default :retract-fn logic directly in the engine.
(2) Make the :combine-fn an optional feature of accumulators. If the caller does not want to provide a :combine-fn and they do not even want their :reduce-fn to be a combiner, either because it cannot be one, or they don't want that sort of semantics, they should be able to opt out of it. Actually, I'd prefer that the caller opt in to combine function logic at this point, which I'm going to get into a bit.

Firstly, why do I think that (1) is a good idea?

Performance

(a)

The wrapper implementation for :retract-fn that Clara uses with the built-in accumulators, and consumers may choose to use as well, can have very bad performance characteristics. A large factor in Clara's efficiency is in the fact that the engine perform batched propagation and batched retractions of facts. The wrapper implementation of :retract-fn only has visibility to a single fact being retracted at a time. However, the engine is aware of some N number of facts to be retracted at any given time. Our default :retract-fn does an eager, reduce over the items left after removing a single fact at a time. This ends up needing to do N^2 operations to remove N batched retractions within the engine. This can quickly degrade performance in heavy data, large rete network scenarios where retraction can be much more common due to truth maintenance changes. Overall, we want the engine to be robust to these types of situations.

We could try to gloss over this issue again, by making our retraction wrapper "smarter". It could batch up the facts being retracted itself. Then "lazily" figure out the reduced value, only when it is needed later - when the next :reduce-fn is called or when the :convert-return-fn is called.

  • This complicates the wrapper more and causes operations to happen at weird times from a consumer perspective.
  • I consider this sort of a bandaid anyways. I also have more reasons for my (1) proposal.

(b)

This part of the discussion somewhat leaks into my discussion of my point (2) above, but it does apply here. A goal of our accumulator logic, in our most typical use-cases at least, should be to be as lazy as possibly about evaluating the accumulator. Under the current implementation of AccumulateWithJoinFilterNode, the accumulation is not done until the left-activate or left-retract of tokens. This is a good thing. This means the accumulator will not perform accumulation on elements when there are not any, and may never be any, matching tokens upstream of it anyways. This adds a slight edge of "laziness" to the AccumulateWithJoinFilterFn.

In the AccumulateNode however, the situation is quite different. The AccumulateNode has no real reason to need to wait to accumulate since all tokens given in left-activate and left-retract will always match all elements given in right-activate and right-retract. So the AccumulateNode immediately starts accumulating (and maybe combining to previous accumulations) when right activations are happening.
The result of this is the AccumulateNode is more eager than it needs to be. There may be some cases where it'd make sense to attempt to reduce early if there was some reason for this information to have to be stored in some slower access memory, however, I'm not even sure this is the case. Also, I think we need to tailor to the predominant pattern now and offer this sort of option later.

Correct semantics

There are some edge cases we had some discussion around lately on different semantics between retracting the last item and arriving back at the :initial-value vs retracting an item and getting nil. I also think there are some issues with understanding what the bindings should be when propragating when we aren't sure if we have "came back" to the :initial-value or not. This issue is due to the AccumulateNode not retaining memory of the items it has seen so far.

Overall

I know the AccumulateNode is setup to be capable of doing some sort of "streaming" operation, but I think anyone who would rely on this pattern right now would be relying on immplementation details of the engine that also are not checked very well. If I had a rule that could effectively be streamed and I needed to be sure to never retain all elements in memory at a time (like on a storm cluster node or something), I could have a rule like:

(defrule my-rule
  [?f <- FactA]
  [?streaming-result <- find-streaming-result :from [FactB (= v (:v ?f))]]
  =>
  <etc>)

This would be "streamable", with the caveat taht you may not be able to rely on truth maintanence unless you could make a streaming style :retract-fn that didn't need to remember all items ever seen before. However, if you relied on this, and later you couldn't use = anymore because the criteria became more complex, you'd have:

(defrule my-rule
  [?f <- FactA]
  [?streaming-result <- find-streaming-result :from [FactB (some-test-fn v (:v ?f))]]
  =>
  <etc>)

Now, your streaming would be lost. The node would be AccumulateWithJoinFilterNode now and you'd run out of memory or whatever you were trying to avoid. The engine would not make this clear at all what caused it.

My proposal to support a streaming mode like this (future), would be to be sure we don't corner ourselves to not be able to do it, but instead make this a first-class feature of an accumulate node. Also, probably even mark accumulators as "streamable" when they can support this feature. Then fail in the engine if we are in "streaming mode" and accumluators that are "streamable" are used in places with join filters where streaming won't work at.

It is hard to tailor to both this "streaming" case and our typical case in a way that makes sense for both. However, I'm thinking now, why do we need to? More options to mk-session or something seem completely reasonable to me for when the caller has different sorts of use-cases.

So why make the :combine-fn optional i.e. (2)?

Lazy evaluation

(a)

If the accumulator becomes more lazy in that it only attempts to accumulate when there are left activations/retractions, then it doesn't necessarily make sense to use the :combine-fn as currently used.

(b)

If we wanted to use parallelism via something like clojure.core.reducers in the future, again I think this is similar to "streaming" mode. This could be explicitly turned on for accumulators that desired it.

Performance

I've seen some real-world examples now where it may be more difficult and inefficient to write a :combine-fn than if we could just only use :reduce-fn calls instead. Even if we reduce at some point previously, then new facts come, we could just keep reducing into our current reduction.
An example of this is if the :reduce-fn is building a more complex data structure, like a map with nested maps etc. It can be fairly difficult to make a performant :combine-fn in these cases sometimes, when it would be cheaper to just keep adding onto the accumulating map with :reduce-fn calls. This also comes up with the use of transient data structures as the return value of the :reduce-fn. The transients may not be easy to combine, without persisting on of them and dumping it into the other. This is wasteful to need to copy from one to another.

Overall

:combine-fn can be a good and useful thing. I just think there should be a way to opt out of it if it isn't needed. The :reduce-fn can just be used in a way that doesn't need to combine two independently reduced values. This may go against some of the "streaming" mode semantics, but I think that should probably just be its own mode as discussed elsewhere.

@rbrush
Copy link
Contributor

rbrush commented May 13, 2016

A shift to a lazy AccumulateNode and making combine optional makes sense. I think we should still offer the option of a user-defined :retract-fn because in some cases it can be significantly more efficient than a re-reduce (e.g., sums and averages have a trivial arithmetic retract.). I'm not sure how this impacts your thoughts on making :retract-fn internal to the node. I suppose AccumulateNode could check for the presence of a user-defined :retract-fn and behave appropriately?

In any case, making the default behavior of an accumulator just a lazy reduce seems reasonable...I'd just like to preserve the option of a :retract-fn and :combine-fn for accumulators and workloads that may benefit from it.

As for the streaming characteristics, I agree this hasn't gained as much traction in practice as I thought it would have and surprising performance differences between the two AccumulateNode implementations could cause problems. So I'm fine with dropping that behavior and perhaps looking at adding deeper guarantee in the future, perhaps some sort of "streaming accumulator" declared by the user.

@mrrodriguez
Copy link
Collaborator Author

Yes, sorry if my proposal didn't make that part clear. This idea still accepts a caller-given :retract-fn and/or :combine-fn with the presumption there being that the caller-given one can do a better job of retraction and/or the caller has an efficient way to combine reduced values together.

If you are on board with this, I will try to investigate this a little more and see where I get. I'll post back here when I get something working.

@rbrush
Copy link
Contributor

rbrush commented May 13, 2016

Gotcha. This strategy sounds good to me. Feel free to take a stab at it if
you're interested, and I'm happy to help in any way.

On Fri, May 13, 2016 at 3:28 PM Mike Rodriguez notifications@github.com
wrote:

Yes, sorry if my proposal didn't make that part clear. This idea still
accepts a caller-given :retract-fn and/or :combine-fn with the
presumption there being that the caller-given one can do a better job of
retraction and/or the caller has an efficient way to combine reduced values
together.

If you are on board with this, I will try to investigate this a little
more and see where I get. I'll post back here when I get something working.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#190 (comment)

@WilliamParker
Copy link
Collaborator

FYI, I don't currently plan to do anything in the engine accumulator implementation in the immediate future apart from any fixes that may arise from discussion on #191 . I'm planning on focusing on #192 and other ways to increase consistency.

@mrrodriguez , are you planning to address #188 and/or #189 with these changes?

@mrrodriguez
Copy link
Collaborator Author

@WilliamParker I'll consider them when doing this. If they are obvious/in the way of making these changes have clear semantics then they may be involved. However, I wouldn't want to go out-of-the-way with this work to fix them if they are able to be done separately.

@mrrodriguez
Copy link
Collaborator Author

I've been looking into this a bit more and I think these changes will end up encompassing a few of these issues. I haven't laid out exactly which ones will be covered.

I'm thinking

May naturally need to be done with these changes.

#102 will be out of scope though.

@WilliamParker
Copy link
Collaborator

@mrrodriguez , is one of the goals here to make rules like

(defrule possibly-blocked-rule
       [:not [BlockerFact]]
       [?result <- (really-expensive-accumulator)]
       =>
       ;; the RHS doesn't matter here)

(defrule possibly-blocked-rule-2
       [RequiredFact]
       [?result <- (really-expensive-accumulator)]
       =>
       ;; the RHS doesn't matter here)

not take much time if there is a BlockerFact present or there is no RequiredFact present (depending on which rule)? I believe we have cases like these where we end up preventing the rule from firing. Perhaps in some of these cases we could add some sort of functionality to not pass these rules to Clara, but it would be nicer if rules like this were more performant (and in some cases we might not know at session compilation time that the rule won't be used.) The description above and your branch at https://github.com/mrrodriguez/clara-rules/tree/issue190 suggest the answer is yes, but I'd like to be sure I understand correctly.

If this is correct, have you considered whether the compiler will consistently place the AccumulateNode in such a way that it can benefit? I haven't traced through all of this (yet, anyway), but I'm wondering if we need to worry about a scenario like:

  • The AccumulateNode is left-activated by the empty-token.
  • The AccumulateNode performs an expensive accumulation.
  • The AccumulateNode propagates tokens to a node from a "blocker" condition which never gets satisfied, so the rule doesn't fire.

Right now when I tried these two rules:

(defrule accum-order-rule
                    [?c <- [Cold]]
                    [?h <- (acc/all :from [Hot])]
                    =>
                    (println "Hello world"))

(defrule accum-order-rule-reversed
                    [?h <- (acc/all :from [Hot])]
                    [?c <- [Cold]]
                    =>
                    (println "Hello world"))

and made a session from each:

(-> (mk-session [accum-order-rule-reversed]) .rulebase :beta-roots first)
(-> (mk-session [accum-order-rule]) .rulebase :beta-roots first)
;; I also checked that each rulebase had only a single beta root

which type (Cold vs Hot) was the beta root, and would thus be left-activated by the empty-token [1]
depended on which came first in the rule. With this change, we'd potentially be relying heavily on node ordering for performance.

It might be reasonable to establish an expectation that where possible, conditions will be ordered sequentially, but has anyone thought through the implications of that? @rbrush , has any expectation around such ordering been established previously? Alternatively, is it possible for the compiler to autodetect the best ordering somehow?

Sorry if I'm rambling a bit here, but I just wanted to start this conversation now since I may not have time to respond here for a few days.

  1. https://github.com/rbrush/clara-rules/blob/0.11.1/src/main/clojure/clara/rules/engine.cljc#L1216

@mrrodriguez
Copy link
Collaborator Author

@WilliamParker the work I'm doing on this would like to make the AccumulateNode (no big work on AccumulateWithJoinFilterNode in this one) lazier than it currently is and yes, it would benefit in cases where it was downstream of other nodes and they didn't have tokens to propagate yet.

This isn't the main goal of this though and it only enables the laziness from the perspective of the AccumulateNode. The order in which the compiler chooses to put the nodes is an orthogonal concern and I think that'd be the work of a separate issue.

The main reason I'm working on these changes right now are

  1. to provide an efficient :retract-fn default
  2. to not require a :combine-fn when it is not helpful and actually can be hard to implement one that is better than just continually reducing onto a single reduced (transient maybe) value
  3. to enable the lazier behavior discussed above - this is only a minor added benefit of this work

Along with that, I've been able to make some better choices in the semantics of :initial-value propagation and retraction that solves the issues #188 and #189 for the AccumulateNode.
I think the same solution will likely work in AccumulateWithJoinFilterNode, but I'll probably make that a separate issue since the changes will already be heavy enough with this pass.

Any work on the condition ordering for performance issue seems to me to be closely related to what we do in clara.rules.compiler/sort-conditions [1] which is outside the scope of what I'm doing here. I do think this is an important topic to consider though.

[1] https://github.com/rbrush/clara-rules/blob/0.11.1/src/main/clojure/clara/rules/compiler.clj#L693-L759

@mrrodriguez
Copy link
Collaborator Author

mrrodriguez commented Jun 9, 2016

#202 is out for the changes I have related to this issue.

For the AccumulateNode case, it also fixes #188 and #189. It also deals with the issue faced in #200 .

The changes are a pretty big overhaul for AccumulateNode. I'm not sure how easy it will be to compare the diff. It may be one of those things to look at non-diff'ed mostly to get the most value.

I believe this issues main description discusses enough detail still to describe the goals of these changes. If there are more questions, ask away.

I haven't seen any regression in behavior with these changes in our tests or some production cases. I have seen some performance improvements now though. Most of this would be from the lazier behavior of not accumulating when no tokens are passed, as well as not wrapping :retract-fn when the given :reduce-fn would struggle to perform well with the N^2 behavior.

A microbench to show this:

(let [n 1e4
      temps (vec
             (for [i (range n)]
               (->Temperature i "MCI")))
      r (dsl/parse-rule [[Temperature (= ?t temperature)]]
                        (insert! (->Cold ?t)))
      q (dsl/parse-query []
                         [[?min <- (acc/min :temperature :returns-fact true)
                           :from [Cold]]])
      s (mk-session [r q])
      with-temps (-> s
                     (insert-all temps)
                     fire-rules)]

  (is (= (frequencies [{:?min (-> temps first :temperature ->Cold)}])
         (frequencies (query with-temps q))))

  (is (= #{}
         (-> (time (fire-rules (apply retract with-temps temps)))
             (query q)
             set))))

;; BEFORE TIME 
"Elapsed time: 18124.600923 msecs"

;; AFTER TIME 
"Elapsed time: 2346.586594 msecs"

Going to n = 1e5 the results are

;; BEFORE TIME 
"Elapsed time: 2717871.481783 msecs"

;; AFTER TIME 
"Elapsed time: 196069.910082 msecs"

I do have another set of changes (for the JVM-side only) that when applied on top of these changes speeds up these numbers to something like:

;; n = 1e4
"Elapsed time: 945.914718 msecs"

;; n = 1e5
"Elapsed time: 69259.13893 msecs"

This is using mutable Java data structures for the AccumulateNode's memory of facts-seen-so-far. This set of changes are in the spirit of #184 but I'm not fully done with my testing of those yet, so I figured I may just make a new PR out of them after this one is merged in. Through profiling with these coming changes, the 69 seconds here looks to be mostly contained to the accumulator itself at this point. That'd be a good place to get to.

Again, the changes are not all about performance here. Some are for correctness around the :initial-value propagation. Also, not having to specify the :combine-fn if it doesn't benefit your use-case is a positive.

@mrrodriguez
Copy link
Collaborator Author

I am working on fixing the issues from #188 and #189 for AccumulateWithJoinFilterNode now. I won't be working on #102 as of yet though. @WilliamParker said he may be looking at that one some.

@rbrush
Copy link
Contributor

rbrush commented Aug 19, 2016

Similar to #188 and #189. Is it safe to close this one out now?

@WilliamParker
Copy link
Collaborator

I concur with @rbrush ; I think this can be closed. Do you have any objections to closing this @mrrodriguez ?

@mrrodriguez
Copy link
Collaborator Author

+1 to closing

@rbrush rbrush added this to the 0.12.0 milestone Aug 22, 2016
@rbrush
Copy link
Contributor

rbrush commented Aug 22, 2016

Closing for 0.12 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants