RFC: use pairwise summation for sum, cumsum, and cumprod #4039

Merged
merged 3 commits into from Aug 13, 2013

Projects

None yet

7 participants

@stevengj
The Julia Language member

This patch modifies the sum and cumsum functions (and cumprod) to use pairwise summation for summing AbstractArrays, in order to achieve much better accuracy at a negligible computational cost.

Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.

For example:

A = rand(10^8)
es = sum_kbn(A)
(oldsum(A) - es)/es, (newsum(A) - es)/es

where oldsum and newsum are the old and new implementations, respectively, gives (-1.2233732622777777e-13,0.0) on my machine in a typical run: the old sum loses three significant digits, whereas the new sum loses none. On the other hand, their execution time is nearly identical:

@time oldsum(A);
@time newsum(A);

gives

elapsed time: 0.116988841 seconds (105816 bytes allocated)
elapsed time: 0.110502884 seconds (64 bytes allocated)

(The difference is within the noise.)

@JeffBezanson and @StefanKarpinski, I think I mentioned this possibility to you at Berkeley.

@stevengj
The Julia Language member

(We could use the old definitions for AbstractArray{T<:Integer}, as pairwise summation has no accuracy advantage there, although it doesn't really hurt either.)

@GunnarFarneback

If you do image processing with single precision, this is an enormously big deal. E.g.

julia> mean(fill(1.5f0, 3000, 4000))
1.766983f0

(Example chosen for its striking effect. Unfortunately the mean implementation is disjoint from the sum implementation that this patch modifies.)

@stevengj
The Julia Language member

@GunnarFarneback, yes, it is a pain to modify all of the different sum variants, although it would be possible in principle. But it would help with the case you mentioned:

julia> newsum(fill!(Array(Float32, 3000*4000), 1.5f0)) / (3000*4000) 
1.5f0
@stevengj
The Julia Language member

Of course, in single precision, the right thing is to just do summation and other accumulation in double precision, rounding back to single precision at the end.

@GunnarFarneback

To clarify my example was meant show the insufficiency of the current state of summing. That pairwise summation is an effective solution I considered as obvious, so I'm all in favor of this patch. That mean and some other functions fail to take advantage of it is a separate issue.

@GunnarFarneback

The right thing with respect to precision, not necessarily with respect to speed.

@StefanKarpinski
The Julia Language member

I'm good with this as a first step and think we should, as @GunnarFarneback points out, integrate even further so that mean and other stats functions also use pairwise summation.

@stevengj
The Julia Language member

Updated to use pairwise summation for mean, var, and varm as well. At least when looking at the whole array; for sums etc. along a subset of the dimensions of a larger array, we need a pairwise variant of reducedim (for associative operations).

The variance computation is also more efficient now because (at least when operating on the whole array) it no longer constructs a temporary array of the same size. (Would be even better if abs2 were inlined, but that will be easy once something like #3796 lands.)

Also, I noticed that var was buggy for complex matrices, since it used x.*x instead of abs2(x). Our varm function used dot, which did the complex conjugation, but it returned a complex number with zero imaginary part instead of a real number. Now these both should work and return a real value; I added a test case.

@stevengj
The Julia Language member

Also added an associative variant of mapreduce which uses pairwise reduction.

Question: Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.) If so, then we don't need a separate mapreduce_associative function, right?

@StefanKarpinski
The Julia Language member

Another approach would be to have a Associativity type that can be passed into reduce and mapreduce as a keyword (defaulting to pairwise?) and then we can specialize the internals of the reductions on that type, which will minimize the overhead. I.e. similar to the sorting infrastructure.

@pao
The Julia Language member

Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.)

That would be a good idea--for instance, https://github.com/pao/Monads.jl/blob/master/src/Monads.jl#L54 relies on the fold direction.

@stevengj
The Julia Language member

The only sensible associativities to have in Base are left, right, and unspecified. A Pairwise associativity makes no sense because we don't even want to implement a fully pairwise case (because, for performance, the base case should be a simple left-associative loop) and we want to be free to change the details as needed. Note also that for parallel operations you really want to have unspecified associativity so that more efficient tree algorithms can be used.

My suggestion would be that mapreduce and reduce should be explicitly documented as having implementation-dependent associativity, and then to have separate reduce_left and mapreduce_left functions which enforce left-associativity. (And maybe right-associative variants? But those can't easily be implemented for iterators, and I'm not sure how useful they are; I would just punt on those until there is demand.) Since there are only three useful cases, defining a type for this doesn't make sense to me. But probably this should be a separate issue.

@stevengj
The Julia Language member

@StefanKarpinski, should I go ahead and merge this?

@ViralBShah
The Julia Language member

This is great, and I would love to see this merged. Also, we should probably not export sum_kbn anymore, given this patch.

@stevengj
The Julia Language member

sum_kbn is still more accurate in some cases. e.g. sum([1 1e-100 -1]) still gives 0.0, but sum_kbn gives 1.0e-100. But I agree that the need for sum_kbn is greatly reduced.

@StefanKarpinski
The Julia Language member

I'm good with merging this. @JeffBezanson?

@JeffBezanson JeffBezanson merged commit c8f89d1 into JuliaLang:master Aug 13, 2013

1 check passed

Details default The Travis CI build passed
@stevengj stevengj added a commit that referenced this pull request Aug 13, 2013
@stevengj stevengj mention #4039 in NEWS de7bc3f
@timholy
The Julia Language member

This is great. I had played with my own variant of this idea, breaking it up into blocks of size sqrt(N), but for small arrays the sqrt was a performance-killer. This is much better, thanks for contributing it.

@stevengj stevengj deleted the stevengj:pairwise branch Dec 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment