implement a better summation algorithm #199

Closed
JeffBezanson opened this Issue Sep 19, 2011 · 10 comments

Comments

Projects
None yet
5 participants
Owner

JeffBezanson commented Sep 19, 2011

sum should use a better algorithm, or at least we should provide an alternative function that does a better job. Candidates include Kahan summation (http://en.wikipedia.org/wiki/Kahan_summation_algorithm) and pairwise summation.

Owner

StefanKarpinski commented Sep 20, 2011

By pairwise summation, I assume you mean recursive pairwise, as in this sort of thing:

sum(x::Vector) = length(x) == 0 ? 0 :
                 length(x) == 1 ? x[1] :
                 sum(x[1:div(end-1,2)]) + sum(x[div(end+1,2):end])
Owner

JeffBezanson commented Sep 20, 2011

Kahan summation looks promising since it looks like it can be done with just a couple extra arithmetic ops on values already in registers.

Owner

StefanKarpinski commented Mar 15, 2012

Maybe a keyword option for this: alg="kahan". We could also implement recursive and sorted summation algorithms.

Contributor

JeffreySarnoff commented Jul 5, 2012

sorry about that -- this is the part that matters

# bettersum.jl
#
# bettersum(Vector{Float64}) is more accurate and faster than kahansum()
#
# Jeffrey Sarnoff on 2012-Jul-05



# Kahan's compensated summation
# W. Kahan.
# Further remarks on reducing truncation erros.
# Comm. ACM, 8:40, 1965


function kahansum(x)
    n = length(x)
    if (n==0)  return(0)  end

    s = x[1]
    c = 0
    for i in 2:n
      y = x[i] - c
      t = s + y
      c = (t - s) -y
      s = t
    end
    s
end    


# Kahan and Babuska summation, Neumaier variant
# A. Neumaier.
# Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen.
# Math. Mechanik, 54:39–51, 1974.

function bettersum(x)
    n = length(x)
    if (n == 0)   return(0)  end

    s = x[1]
    c = 0
    for i in 2:n
        t = s + x[i]
        if ( abs(s) >= abs(x[i]) )
           c += ( (s-t) + x[i] )
        else
           c += ( (x[i]-t) + s )
        end
        s = t
    end

    s + c
end


# test vector is Tim Peters'
# truesum( vec ) == 2_000.0

vec =  [1,1e100,1,-1e100]*1000

sum(vec)      == 0.0
kahansum(vec) == 0.0
kbnsum(vec)   == 2_000.0

[pao: syntax highlights]

Owner

JeffBezanson commented Jul 6, 2012

Is kbnsum always better? Maybe we should use this by default for float arrays.

Contributor

JeffreySarnoff commented Jul 6, 2012

kbnsum (above implemented as bettersum, written as kbnsum --truer name-- in the test)
is never less accurate than kahansum. On long vectors with elements of similar magnitude,
the two approaches often give the same result. Whether its LLVM jitness alone or with my
hardware, kbnsum runs 4 times faster than kahansum on both small and large vectors.
Relative to sum, kahansum runs about 28:1 and kbnsum runs about 7:1.

I recommend using kbnsum for Julia until there is compelling reason to use a different
algorithm. Some of the alternate choices are best used for vectors longer than some n.
There are not that many alternatives, and, for me, part of getting comfortable with a new
programming language is porting or implementing better numerics. Given Julia's nature,
it is likely that effort will be covered. Kbnsum has the virtue of being straightforward.
The alternatives involve more lines of code. I have used them elsewhere, but have not
coded them Julia (yet). If I find something is notably better, you will hear about it.
Meanwhile, and perhaps for a long while, kbnsum will work for you when others test
against languages that use kahansum internally.

kmsquire added a commit to kmsquire/julia that referenced this issue Jul 11, 2012

use K-B-N summation for float arrays, with thanks to @JeffreySarnoff
written carefully it is no more than 20% slower
closes #199
Owner

ViralBShah commented Apr 18, 2013

Now that we have optional arguments, perhaps sum and cumsum can have an option for KBN summation, and we can remove sum_kbn and cumsum_kbn.

Contributor

JeffreySarnoff commented Apr 18, 2013

If so,​
with sum(..., kbn=false) the default,
​one​ should be able to override the default and use kbn-summation
everywhere throughout a third party package on one day and use, say, the
package's default summation another day with a single override (without
requiring each call to sum, cumsum to be changed).

On Thu, Apr 18, 2013 at 5:14 AM, Viral B. Shah notifications@github.comwrote:

Now that we have optional arguments, perhaps sum and cumsum can have an
option for KBN summation, and we can remove sum_kbn and cumsum_kbn.


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/199#issuecomment-16565856
.

Contributor

JeffreySarnoff commented Apr 18, 2013

e.g.

pkg.Require("CumsumAnalysis", kbn=true)
passes the package level option kbn=true to CumsumAnalysis
and CumsumAnalysis, if it require other packages, repasses that package level option by default

Owner

stevengj commented Aug 14, 2013

PS. For interested parties on this thread, note that we now use pairwise summation (#4039), which is often surprisingly close to Kahan summation for large arrays, but without the performance penalty.

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

Merge pull request #199 from JuliaLang/tk/dontexportstring
Don't export String since it is already exported by Base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment