Optimize LocalTransport retraction #183
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
In clara.rules.engine the LocalTransport implementation of the
retract-elements
andretract-tokens
has the potential to waste quite a bit of time. In the reverse direction, thesend-elements
andsend-tokens
the issue is not present. It looks like it was optimized a long while back with 12f0cdb#diff-fc240c836498f0d897381182d11d3df8L88. This optimization was beneficial for batch propagation to nodes with no join bindings. It was also an optimization to create smaller map keys when doing agroup-by
(or our tuned variety).Since
retract-elements
andretract-tokens
are essentially the same implementation, I'll just refer toretract-elements
from here on.Problem
retract-elements
always performs something of the form(platform/tuned-group-by :bindings elements)
regardless of join bindings of the nodes that it is going to callright-retract
on. This causes performance degradation for three reasons that I can immediately see.(1)
(platform/tuned-group-by :bindings elements)
must hash all of the:bindings
for each element, since these become the map keys. We only really need the:bindings
involved in join bindings to be the keys of this map. This extra hashing can become expensive, especially when the values of the:bindings
have slow hash code implementations - which is more likely to happen when hashing:fact-bindings
or:result-bindings
from accumulators. We wouldn't typically be doing a join on something like a fact-binding or a large accumulated:result-binding
either in practice.e.g.
In my (very contrived)
my-rule
, there are no explicit join bindings between?big
and?marker
. With the usage ofinsert-unconditional!
we actually do not need to ever hash the BigFact at all. However, we are hashing it currently upon retraction of previously accumulated values. Note: this is actually an issue withretract-tokens
, but like I said, their implementation details are the same.Overall, this is a sneaky consequence, but we have rules of this form in practice and the calculation of hash codes for our "BigFact"s can be expensive and ideally the engine can try to avoid it when it is not necessary. It would be much rarer to actually use this BigFact as a real join binding.
(2)
When a node has no join bindings, such as a RootJoinNode [1], the elements to retract are still batched by their
:bindings
prior to propagation across the network. This causes unnecessary separation of batched propagations. Batched propagation across the network tends to lead to significant performance gains, so it is good to keep these batches as large as possible. Note: Other nodes may also have no join bindings.Note: It is fairly common to have independent conditions used in rules before in which case we get HashJoinNode's, AccumulateNode's, NegationNode's, etc with no join bindings etc.
(3)
Relating to #2, we do not need to perform a
group-by
operation on:bindings
at all if there are no join bindings for the node anyways. Sincegroup-by
is hit pretty hard within the engine and has been a bottleneck before, it is good to avoid it whenever unnecessary.Proposed changes
The changes in this commit are to use the same pattern as
send-elements
andsend-tokens
in theretract-elements
andretract-tokens
path of the LocalTransport. This should behave functionally equivalent as before, only with less potential to waste time in the retraction paths. Since all 4 of these paths are now near identical, I pulled a helper function out to extract the logic. I did this to avoid the maintenance burden and potential to miss updating one if we update another and they are intended to be the same. I don't think this extraction adds any overhead that is worth worrying about. I have profiled it with criterium and the difference looks to be irrelevant:I don't necessary have any great tests to show this since the functionality is the same. A benchmark may be a good idea, however, this isn't really a good scenario to do a Drools vs Clara analysis when this is much more a detail of Clara's implementation. I can definitely demonstrate a significant difference in how this can hurt batched retraction propagation in the REPL.
The following uses the same setup as the criterium benchmark above. I didn't use criterium here because the BEFORE version is 30+ seconds, while AFTER is less than 3 seconds. Running it many times myself manually results in similar numbers anyways. Profiling results show that in the BEFORE case is causing many more small changes to ripple through the (small in this case) network, while AFTER a single change is propagated in batch across the nodes in the network.
Results
When I ran with these changes in place in our production environment, I saw overall time slightly better than before. However, what was significant was I saw our worst-case scenarios improved significantly. We identify in our environment when a particular rule session against a particular dataset are much longer runner than others. I've seen a significant reduction in those showing up as long running now.
Furthermore, the issue I mentioned in point (1) regarding excessive hash coding of large objects is more critical in some workflows we are working on where we need quicker one-off runs of rules for a given dataset.
Notes
[1] In RootJoinNode's case I think
(get-join-keys [node] binding-keys)
will always result in an an empty set because only join bindings seem to be included here in the clara.rules.compiler. The RootJoinNode shouldn't have any joins since it is the root "dummy" node starting the beta network.