New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: use pairwise summation for sum, cumsum, and cumprod #4039

Merged
merged 3 commits into from Aug 13, 2013

Conversation

Projects
None yet
7 participants
@stevengj
Member

stevengj commented Aug 13, 2013

This patch modifies the sum and cumsum functions (and cumprod) to use pairwise summation for summing AbstractArrays, in order to achieve much better accuracy at a negligible computational cost.

Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.

For example:

A = rand(10^8)
es = sum_kbn(A)
(oldsum(A) - es)/es, (newsum(A) - es)/es

where oldsum and newsum are the old and new implementations, respectively, gives (-1.2233732622777777e-13,0.0) on my machine in a typical run: the old sum loses three significant digits, whereas the new sum loses none. On the other hand, their execution time is nearly identical:

@time oldsum(A);
@time newsum(A);

gives

elapsed time: 0.116988841 seconds (105816 bytes allocated)
elapsed time: 0.110502884 seconds (64 bytes allocated)

(The difference is within the noise.)

@JeffBezanson and @StefanKarpinski, I think I mentioned this possibility to you at Berkeley.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

(We could use the old definitions for AbstractArray{T<:Integer}, as pairwise summation has no accuracy advantage there, although it doesn't really hurt either.)

Member

stevengj commented Aug 13, 2013

(We could use the old definitions for AbstractArray{T<:Integer}, as pairwise summation has no accuracy advantage there, although it doesn't really hurt either.)

@GunnarFarneback

This comment has been minimized.

Show comment
Hide comment
@GunnarFarneback

GunnarFarneback Aug 13, 2013

Contributor

If you do image processing with single precision, this is an enormously big deal. E.g.

julia> mean(fill(1.5f0, 3000, 4000))
1.766983f0

(Example chosen for its striking effect. Unfortunately the mean implementation is disjoint from the sum implementation that this patch modifies.)

Contributor

GunnarFarneback commented Aug 13, 2013

If you do image processing with single precision, this is an enormously big deal. E.g.

julia> mean(fill(1.5f0, 3000, 4000))
1.766983f0

(Example chosen for its striking effect. Unfortunately the mean implementation is disjoint from the sum implementation that this patch modifies.)

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

@GunnarFarneback, yes, it is a pain to modify all of the different sum variants, although it would be possible in principle. But it would help with the case you mentioned:

julia> newsum(fill!(Array(Float32, 3000*4000), 1.5f0)) / (3000*4000) 
1.5f0
Member

stevengj commented Aug 13, 2013

@GunnarFarneback, yes, it is a pain to modify all of the different sum variants, although it would be possible in principle. But it would help with the case you mentioned:

julia> newsum(fill!(Array(Float32, 3000*4000), 1.5f0)) / (3000*4000) 
1.5f0
@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

Of course, in single precision, the right thing is to just do summation and other accumulation in double precision, rounding back to single precision at the end.

Member

stevengj commented Aug 13, 2013

Of course, in single precision, the right thing is to just do summation and other accumulation in double precision, rounding back to single precision at the end.

@GunnarFarneback

This comment has been minimized.

Show comment
Hide comment
@GunnarFarneback

GunnarFarneback Aug 13, 2013

Contributor

To clarify my example was meant show the insufficiency of the current state of summing. That pairwise summation is an effective solution I considered as obvious, so I'm all in favor of this patch. That mean and some other functions fail to take advantage of it is a separate issue.

Contributor

GunnarFarneback commented Aug 13, 2013

To clarify my example was meant show the insufficiency of the current state of summing. That pairwise summation is an effective solution I considered as obvious, so I'm all in favor of this patch. That mean and some other functions fail to take advantage of it is a separate issue.

@GunnarFarneback

This comment has been minimized.

Show comment
Hide comment
@GunnarFarneback

GunnarFarneback Aug 13, 2013

Contributor

The right thing with respect to precision, not necessarily with respect to speed.

Contributor

GunnarFarneback commented Aug 13, 2013

The right thing with respect to precision, not necessarily with respect to speed.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 13, 2013

Member

I'm good with this as a first step and think we should, as @GunnarFarneback points out, integrate even further so that mean and other stats functions also use pairwise summation.

Member

StefanKarpinski commented Aug 13, 2013

I'm good with this as a first step and think we should, as @GunnarFarneback points out, integrate even further so that mean and other stats functions also use pairwise summation.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

Updated to use pairwise summation for mean, var, and varm as well. At least when looking at the whole array; for sums etc. along a subset of the dimensions of a larger array, we need a pairwise variant of reducedim (for associative operations).

The variance computation is also more efficient now because (at least when operating on the whole array) it no longer constructs a temporary array of the same size. (Would be even better if abs2 were inlined, but that will be easy once something like #3796 lands.)

Also, I noticed that var was buggy for complex matrices, since it used x.*x instead of abs2(x). Our varm function used dot, which did the complex conjugation, but it returned a complex number with zero imaginary part instead of a real number. Now these both should work and return a real value; I added a test case.

Member

stevengj commented Aug 13, 2013

Updated to use pairwise summation for mean, var, and varm as well. At least when looking at the whole array; for sums etc. along a subset of the dimensions of a larger array, we need a pairwise variant of reducedim (for associative operations).

The variance computation is also more efficient now because (at least when operating on the whole array) it no longer constructs a temporary array of the same size. (Would be even better if abs2 were inlined, but that will be easy once something like #3796 lands.)

Also, I noticed that var was buggy for complex matrices, since it used x.*x instead of abs2(x). Our varm function used dot, which did the complex conjugation, but it returned a complex number with zero imaginary part instead of a real number. Now these both should work and return a real value; I added a test case.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

Also added an associative variant of mapreduce which uses pairwise reduction.

_Question:_ Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.) If so, then we don't need a separate mapreduce_associative function, right?

Member

stevengj commented Aug 13, 2013

Also added an associative variant of mapreduce which uses pairwise reduction.

_Question:_ Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.) If so, then we don't need a separate mapreduce_associative function, right?

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 13, 2013

Member

Another approach would be to have a Associativity type that can be passed into reduce and mapreduce as a keyword (defaulting to pairwise?) and then we can specialize the internals of the reductions on that type, which will minimize the overhead. I.e. similar to the sorting infrastructure.

Member

StefanKarpinski commented Aug 13, 2013

Another approach would be to have a Associativity type that can be passed into reduce and mapreduce as a keyword (defaulting to pairwise?) and then we can specialize the internals of the reductions on that type, which will minimize the overhead. I.e. similar to the sorting infrastructure.

@pao

This comment has been minimized.

Show comment
Hide comment
@pao

pao Aug 13, 2013

Member

Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.)

That would be a good idea--for instance, https://github.com/pao/Monads.jl/blob/master/src/Monads.jl#L54 relies on the fold direction.

Member

pao commented Aug 13, 2013

Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.)

That would be a good idea--for instance, https://github.com/pao/Monads.jl/blob/master/src/Monads.jl#L54 relies on the fold direction.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

The only sensible associativities to have in Base are left, right, and unspecified. A Pairwise associativity makes no sense because we don't even want to implement a fully pairwise case (because, for performance, the base case should be a simple left-associative loop) and we want to be free to change the details as needed. Note also that for parallel operations you really want to have unspecified associativity so that more efficient tree algorithms can be used.

My suggestion would be that mapreduce and reduce should be explicitly documented as having implementation-dependent associativity, and then to have separate reduce_left and mapreduce_left functions which enforce left-associativity. (And maybe right-associative variants? But those can't easily be implemented for iterators, and I'm not sure how useful they are; I would just punt on those until there is demand.) Since there are only three useful cases, defining a type for this doesn't make sense to me. But probably this should be a separate issue.

Member

stevengj commented Aug 13, 2013

The only sensible associativities to have in Base are left, right, and unspecified. A Pairwise associativity makes no sense because we don't even want to implement a fully pairwise case (because, for performance, the base case should be a simple left-associative loop) and we want to be free to change the details as needed. Note also that for parallel operations you really want to have unspecified associativity so that more efficient tree algorithms can be used.

My suggestion would be that mapreduce and reduce should be explicitly documented as having implementation-dependent associativity, and then to have separate reduce_left and mapreduce_left functions which enforce left-associativity. (And maybe right-associative variants? But those can't easily be implemented for iterators, and I'm not sure how useful they are; I would just punt on those until there is demand.) Since there are only three useful cases, defining a type for this doesn't make sense to me. But probably this should be a separate issue.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

@StefanKarpinski, should I go ahead and merge this?

Member

stevengj commented Aug 13, 2013

@StefanKarpinski, should I go ahead and merge this?

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Aug 13, 2013

Member

This is great, and I would love to see this merged. Also, we should probably not export sum_kbn anymore, given this patch.

Member

ViralBShah commented Aug 13, 2013

This is great, and I would love to see this merged. Also, we should probably not export sum_kbn anymore, given this patch.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 13, 2013

Member

sum_kbn is still more accurate in some cases. e.g. sum([1 1e-100 -1]) still gives 0.0, but sum_kbn gives 1.0e-100. But I agree that the need for sum_kbn is greatly reduced.

Member

stevengj commented Aug 13, 2013

sum_kbn is still more accurate in some cases. e.g. sum([1 1e-100 -1]) still gives 0.0, but sum_kbn gives 1.0e-100. But I agree that the need for sum_kbn is greatly reduced.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 13, 2013

Member

I'm good with merging this. @JeffBezanson?

Member

StefanKarpinski commented Aug 13, 2013

I'm good with merging this. @JeffBezanson?

JeffBezanson added a commit that referenced this pull request Aug 13, 2013

Merge pull request #4039 from stevengj/pairwise
RFC: use pairwise summation for sum, cumsum, and cumprod

@JeffBezanson JeffBezanson merged commit c8f89d1 into JuliaLang:master Aug 13, 2013

1 check passed

default The Travis CI build passed
Details

stevengj added a commit that referenced this pull request Aug 13, 2013

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 14, 2013

Member

This is great. I had played with my own variant of this idea, breaking it up into blocks of size sqrt(N), but for small arrays the sqrt was a performance-killer. This is much better, thanks for contributing it.

Member

timholy commented Aug 14, 2013

This is great. I had played with my own variant of this idea, breaking it up into blocks of size sqrt(N), but for small arrays the sqrt was a performance-killer. This is much better, thanks for contributing it.

@simonster simonster referenced this pull request Oct 29, 2013

Closed

Pairwise reductions? #21

@stevengj stevengj referenced this pull request Jan 6, 2015

Closed

more accurate cumsum #9648

@stevengj stevengj deleted the stevengj:pairwise branch Dec 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment