Speed up _compute_centroid_quantile by using an accumulation tree #28

tkluck-booking · 2017-02-03T11:43:35Z

A profiling run shows that _compute_centroid_quantile is by far a bottleneck, and that intuitively makes sense because its runtime is bounded by the number of centroids in the tree O(n) (as opposed to by O(log(n)) for the other tree operations).

Ted Dunning's original paper suggests speeding up this computation by storing partial sums at every node in the tree. This commit does that.

In this proof-of-concept, this means trading in the C speed for FastRBTree for the Python speed of RBTree. Still, I get an almost 50% speedup for a test use case

time python -c'import random; from tdigest import TDigest; TDigest().batch_update(random.random() for _ in range(10000))'

proving yet again that better algorithms beat faster implementation languages.

A followup could be to write an accumulation tree in C or Cython, which should give the best of both worlds.

…an accumulation tree A profiling run shows that _compute_centroid_quantile is by far a bottleneck, and that intuitively makes sense because its runtime is bounded by the number of centroids in the tree O(n) (as opposed to by O(log(n)) for the other tree operations). Ted Dunning's original paper suggests speeding up this computation by storing partial sums at every node in the tree. This commit does that. In this proof-of-concept, this means trading in the C speed for FastRBTree for the Python speed of RBTree. Still, I get an almost 50% speedup for a test use case time python -c'import random; from tdigest import TDigest; TDigest().batch_update(random.random() for _ in range(10000))' proving yet again that better algorithms beat faster implementation languages. A followup could be to write an accumulation tree in C or Cython, which should give the best of both worlds.

tkluck-booking · 2017-02-03T13:55:18Z

^^ Fixing that integration check. I'd noticed it before, but because of other reasons, I didn't think this commit introduced it - it did, obviously :)

On it - correctness is much more important than speed. Will re-open this pull request once that is done.

…tree The major issue was that _path_to_key chose the wrong child node when descending into the tree. The other issues are cosmetic, although this commit also fixes the method get_accumulation (which is currently unused by TDigest). The unit tests now (1) pass and (2) run 5 times faster.

tkluck-booking · 2017-02-03T15:10:15Z

Just fixed the unit tests with a followup commit; let's see what travis thinks.

CamDavidsonPilon · 2017-02-03T17:30:58Z

tdigest/tdigest.py

@@ -55,8 +56,7 @@ def _add_centroid(self, centroid):

    def _compute_centroid_quantile(self, centroid):
        denom = self.n
-        cumulative_sum = sum(
-            c_i.count for c_i in self.C.value_slice(-float('Inf'), centroid.mean))


ho! Interesting, this is linear. Nice catch

CamDavidsonPilon · 2017-02-03T17:32:16Z

PR looks really fascinating, I bet it was fun to write. I don't have time now to review, but will this weekend!

tkluck · 2017-02-05T00:06:02Z

I now have a cython implementation of this 'accumulation_tree'. It brings make test down to 11 seconds (from 160 seconds before the work in this branch, or 30 seconds with the python implementation). I could either add it to tdigest, or release it as a separate library and add it as a dependency. I could also add it as a pull request to bintrees - but its github page says that its development has been abandoned. What do you think?

CamDavidsonPilon · 2017-02-05T15:42:34Z

tdigest/accumulation_tree.py

+    def _get_left_accumulation(self, node, upper):
+        if not node:
+            return self._zero
+        if(upper > node.key):


Correct me if I'm wrong, but this means our accumulation is not inclusive, right? ie. [a,b)

CamDavidsonPilon · 2017-02-05T15:44:43Z

tdigest/accumulation_tree.py

+                    self._get_full_accumulation(node[0]),
+                    self._get_left_accumulation(node[1], upper),
+                ),
+                self._mapper(node.value),


A part I'm not sure about: does the node store in its accumulation variable the sum of everything below and it's own value too? Or just everything below? Here is looks like we add in it's own value, but below in _update_accumulation it looks like we already include it.

does the node store in its accumulation variable the sum of everything below and it's own value too?

Yes, that's what it' supposed to do. So in particular, if it has no children, then node.accumulation = self._mapper(node.value). Which is the part of _update_accumulation that makes you think this does not hold? (It's perfectly possible I missed a case.)

Hm, if that's true, then aren't we adding value again in https://github.com/tkluck/tdigest/blob/e0acbee7ac74bddb51f8eaa90186d9d7839d6bbf/tdigest/accumulation_tree.py#L71-L76?

Ah yes, I see where you are coming from. No, I don't think we're adding the value twice. Because when we visit node in that function, we are not sure that we need the entire accumulation under node, which is why you'll notice node.accumulation is not accessed at all. This is the reason there is no double-counting: we either use self._mapper(node.value) or node.accumulation, but never both in the same function.

Let's go over it in detail. By definition, _get_left_accumulation(self, node, upper) returns the sum of all nodes n under node (including node itself) which satisfy n.key < upper. "By definition" is to say: that's what I want it to do. Let me try to convince you that indeed, it does:

def _get_left_accumulation(self, node, upper):

if node is None, this accumulation is zero. This is the stopping condition for the recursion.

if not node: return self._zero

if upper > node.key, we now need to retrieve the sum of the following three things:

node.key itself (see definition);

the sum of node.left (node[0]) and all its children, because all of them have n.key < node.key, and therefore also n.key < upper. We call _get_full_accumulation on node.left;

that part of the tree rooted at node.right that satisfies n.key < upper. We just call this function recursively.

if(upper > node.key): return self._reducer( self._reducer( self._get_full_accumulation(node[0]), self._get_left_accumulation(node[1], upper), ), self._mapper(node.value), )

if upper <= node.key, we can ignore this node and everything smaller, so we ignore node[1] and only recurse into node[0].

else: return self._get_left_accumulation(node[0], upper)

Does this explanation help? You can reason all other functions through in the same way.

(edit: fix flipped node[0] vs node[1] in the explanation for case three.)

gotcha, I now understand after some more thought. Thanks for the explanation.

CamDavidsonPilon · 2017-02-05T15:45:33Z

tdigest/accumulation_tree.py

+
+        self._dirty_nodes = set()
+
+        class AccumulatingNode(Node):


Having this class inside the Tree class is a very clever way to share variables like _dirty_nodes 👍

Haha yes, ever since i learned about it, I've been able to cut a lot of

def __init__(self, variable1, variable2, ...) self._variable1 = variable1 self._variable2 = variable2 ....

boilerplate.

CamDavidsonPilon · 2017-02-05T15:48:26Z

I have a few questions (above). Overall the code is excellent. I feel like I'm looking at code that I want to write in a few years. Not to mention the algorithm looks great. 👏

(Did you write this yourself, or did you have a reference to help?)

CamDavidsonPilon · 2017-02-05T15:56:55Z

I could also add it as a pull request to bintrees - but its github page says that its development has been abandoned

I think this is the least favourable option. It probably wouldn't see the light of day if the project is abandoned.

I could either add it to tdigest

Since we already abstracted cython parts (bintrees), tdigest can be made to be pure python (pros are less complicated setup.py, installing, etc). So, with that in mind ...

release it as a separate library and add it as a dependency.

I think this is the best option: it abstracts the cython bits outside of this library, and then also other libraries can benefit from the implementation (https://github.com/jcrist/crick is an example)

tkluck · 2017-02-05T18:11:14Z

release it as a separate library and add it as a dependency.

I think this is the best option: it abstracts the cython bits outside of this library, and then also other libraries can benefit from the implementation (https://github.com/jcrist/crick is an example)

All right, I'll get to it. I guess that will mean I'll move the code from this branch to that library instead. I'll try to mimic bintrees' behaviour of falling back gracefully to a python implementation in cases cython is not available.

I have a few questions (above). Overall the code is excellent. I feel like I'm looking at code that I want to write in a few years. Not to mention the algorithm looks great. 👏

(Did you write this yourself, or did you have a reference to help?)

Thanks for being positive and open to pull requests! I must say, reaching out on github can be hit-or-miss, and you seem to be one of those maintainers that manages to set exactly the right atmosphere that would allow for a community to form. Keep up the good spirit!

FWIW, yes, i wrote this. I'd done something similar before for computing quick accumulations of slices of arrays. The new bit was to combine those ideas with a binary search tree.

See https://github.com/tkluck/accumulation_tree, which was written with exactly this use case in mind.

tkluck · 2017-02-05T23:58:23Z

^^ Just released that package; haven't tested with python 3 yet, so that's likely to fail the integration testing. Will look into that tomorrow.

CamDavidsonPilon · 2017-02-06T00:24:21Z

requirements.txt

@@ -1,2 +1,3 @@
 bintrees


we can drop bintrees now that it's a dep on your package

… tree now (Pushing this commit to the pull request also bumps travis to run the integration test against the newer version of the accumulation_tree library, v0.3, which should solve the issue with python3.)

tkluck · 2017-02-06T21:08:24Z

Oh no! The integration checks are failing. The reason seems to be that I'm depending on a relatively new version of setuptools, which fixed this issue.

It was very hard to distribute a python2/3 cython extension before that change. There were two ways to distribute your package:

either just require that the user always has cython installed;
or ship the .c file that cython generated in your source distribution. Whenever cython is not available, fall back to that.

Unfortunately, the second option can't be simultaneously python2 and python3 compatible, because cython generates different C sources for each.

The newer version of setuptools allows you to just specify setup_requires=["cython"] and you're all set.

I'll try to reproduce travis' environment and see what I can do to address this.

CamDavidsonPilon · 2017-02-06T22:45:02Z

setup.py

@@ -17,7 +17,8 @@ def read(fname):
      long_description=read('README.md'),
      install_requires=[
          "bintrees",


let's nuke this here too

FlorianWilhelm · 2017-02-10T13:24:29Z

It's really nice to see that work is going on to improve the performance. On my laptop I see that about 10,000 updates take about 1 second. For my actual use case I would have to ingest 100s of millions of values (or I just subsample :-). Nevertheless what speedup do you expect in this PR with the new partial sum method implemented in Cython?

tkluck-booking · 2017-02-13T10:11:46Z

FlorianWilhelm: the speedup is each insert operation moving from being linear in the number of centroids to being logarithmic. So the speedup should be much more than a constant factor. Feel free to checkout this branch and test its performance on your machine.

CamDavidsonPilon · 2017-02-19T21:11:24Z

Are you still able to contribute to this PR @tkluck? Otherwise I don't mind taking it over.

tkluck · 2017-02-19T22:20:34Z

@CamDavidsonPilon feel free to beat me to fixing this :) I'll pick it up eventually if you don't; for external reasons, it has dropped a few places on my list of priorities.

Thanks for checking in!

tkluck-booking closed this Feb 3, 2017

tkluck-booking reopened this Feb 3, 2017

Fix relative import of accumulation_tree when running under python3

e0acbee

CamDavidsonPilon reviewed Feb 3, 2017

View reviewed changes

CamDavidsonPilon reviewed Feb 5, 2017

View reviewed changes

Move to separately packaged, cython version of accumulation tree

a06be93

See https://github.com/tkluck/accumulation_tree, which was written with exactly this use case in mind.

CamDavidsonPilon reviewed Feb 6, 2017

View reviewed changes

requirements.txt Outdated

@@ -1,2 +1,3 @@

bintrees

Copy link

Owner

CamDavidsonPilon Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop bintrees now that it's a dep on your package

Remove bintrees as a dependency, since accumulation_tree provides the…

52a59f7

… tree now (Pushing this commit to the pull request also bumps travis to run the integration test against the newer version of the accumulation_tree library, v0.3, which should solve the issue with python3.)

CamDavidsonPilon reviewed Feb 6, 2017

View reviewed changes

CamDavidsonPilon closed this Dec 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up _compute_centroid_quantile by using an accumulation tree #28

Speed up _compute_centroid_quantile by using an accumulation tree #28

tkluck-booking commented Feb 3, 2017

tkluck-booking commented Feb 3, 2017

tkluck-booking commented Feb 3, 2017

CamDavidsonPilon Feb 3, 2017

CamDavidsonPilon commented Feb 3, 2017

tkluck commented Feb 5, 2017

CamDavidsonPilon Feb 5, 2017

tkluck Feb 5, 2017

CamDavidsonPilon Feb 5, 2017

tkluck Feb 5, 2017 •

edited

CamDavidsonPilon Feb 5, 2017

tkluck Feb 5, 2017 •

edited

CamDavidsonPilon Feb 5, 2017

CamDavidsonPilon Feb 5, 2017

tkluck Feb 5, 2017

CamDavidsonPilon commented Feb 5, 2017

CamDavidsonPilon commented Feb 5, 2017

tkluck commented Feb 5, 2017

tkluck commented Feb 5, 2017

CamDavidsonPilon Feb 6, 2017

tkluck commented Feb 6, 2017

CamDavidsonPilon Feb 6, 2017

FlorianWilhelm commented Feb 10, 2017

tkluck-booking commented Feb 13, 2017

CamDavidsonPilon commented Feb 19, 2017

tkluck commented Feb 19, 2017

Speed up _compute_centroid_quantile by using an accumulation tree #28

Speed up _compute_centroid_quantile by using an accumulation tree #28

Conversation

tkluck-booking commented Feb 3, 2017

tkluck-booking commented Feb 3, 2017

tkluck-booking commented Feb 3, 2017

Choose a reason for hiding this comment

CamDavidsonPilon commented Feb 3, 2017

tkluck commented Feb 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkluck Feb 5, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkluck Feb 5, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CamDavidsonPilon commented Feb 5, 2017

CamDavidsonPilon commented Feb 5, 2017

tkluck commented Feb 5, 2017

tkluck commented Feb 5, 2017

Choose a reason for hiding this comment

tkluck commented Feb 6, 2017

Choose a reason for hiding this comment

FlorianWilhelm commented Feb 10, 2017

tkluck-booking commented Feb 13, 2017

CamDavidsonPilon commented Feb 19, 2017

tkluck commented Feb 19, 2017

tkluck Feb 5, 2017 •

edited

tkluck Feb 5, 2017 •

edited