Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify the result type of a sum #5311

Closed
lindahua opened this issue Jan 5, 2014 · 17 comments
Closed

Allow to specify the result type of a sum #5311

lindahua opened this issue Jan 5, 2014 · 17 comments

Comments

@lindahua
Copy link
Contributor

lindahua commented Jan 5, 2014

Currently, the sum function use the element type (except for the boolean case) of the input array as result type. This is sometimes not a desirable choice:

function mysum(R, x)
    s = zero(R)
    for v in x; s += v; end
    return s
end

julia> x = uint8(rand(1:10, 50));

julia> sum(x)
0x0000000000000122

# to prevent overflow
julia> mysum(Int, x)
290

julia> a = rand(Float32, 10^6);

julia> sum(float64(a)) - sum(a)
-0.018809685949236155

# to get higher accuracy
julia> sum(float64(a)) - mysum(0.0, a)
1.1641532182693481e-10

My proposal is to add a method like sum(R, x) where R is the user-specifed type. This won't break any existing codes, but provide the user additional choices.

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

I can make a PR to implement this if people agree that this is useful.

@nalimilan
Copy link
Member

Interesting. And how about (in addition to adding your proposed function) defaulting to Int (resp. Float,Uint) for all shorter signed integer (resp. unsigned integer, float) types? My reasoning is that you may want to store a large vector of values in e.g. a Uint8 to save space, because you know the values are small, but typically if you sum them they won't hold in a Uint8. If you really want to save memory, use the detailed function. Just a thought.

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

I agree with using a more sensible choice in determining the default type.

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 5, 2014

This seems like a very natural extension of the current functionality. I have some problem to understand a few gotchas.

  1. Currently int8(100) + int8(100) === int64(200) because (int8, int8) gets promoted to Int64/Int32 depending on the system. This feels wrong to me, but I know someone raised it as an issue before and it was not changed. Because of this, the stock sum function gives does not overflow in your first example and the only difference is that Unsigned Ints prints in hex. Your mysum is also sometimes wrong and type unstable because of promotion rules and I can not see how to fix that without adding typecasts after every computation.
  2. This would make the current behavior for reduce with an initial vector v0 even more confusing, because that is only a alias for reduce(op, v0, itr) = op(v0, reduce(op, itr)), and the type of v0 does not affect the rounding and overflow in the computation. Maybe we should deprecate reduce with initial value?

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

The correct implementation of mysum should be

function mysum(R, x)
    s = zero(R)
    for v in x
         s = convert(R, s + v)
    end
    return s
end

When s + v is of the type R, the convert operation is no-op, and will not cause performance problems. This, I believe, is the case where this function is intended to help.

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

I agree that the current implementation of reduce is not perfect -- but that should be a different issue.

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 5, 2014

Yes it is kind of a different issue, but sum is mentioned as a special case of reduce in the documentation, so they are definitely related. Your new mysum would could also easily be used for general reduction also if op was not hard-coded as +, and s was initialized to the first element in x.

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

Currently, sum is implemented separately (like other reduction functions maximum, minimum, all, any). Their implementation is not based on reduce, which is simply too slow (without the compiler's support on inlining input functions).

Also, sum over floating point arrays are using pairwise summation, maximum & minimum using special mechanism to deal with NaN at the beginning, and all & any using shortcut. These special mechanisms can not be easily subsumed under a reduce function. From a practical standpoint, reduce should only be used when you want to apply a special operation that none of the builtin reduction functions support.

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 5, 2014

My point was that these functions are closely related, and if one want to change the interface, all the interfaces should be considered together.

I have some understanding of the implementation details, and did NOT intend to suggest that the other reduction functions should be implemented using our reduce implementation. Standard library functions should use every trick possible to get the correct result in the most efficient way.

@lindahua
Copy link
Contributor Author

lindahua commented Jan 5, 2014

I agree that the implementation of reduce(op, v0, itr) needs improvement. I can revise this reduce method to ensure its behavior is consistent with sum.

@carlobaldassi
Copy link
Member

On the subject of making reduce and sum consistent, I'll take the chance to point out the following, which I recently found out: having a Vector{Vector{Float64}} with elements all of the same size, e.g. Ws=Vector{Float64}[[1.0,2.0,3.0], [3.0,2.0,0.5]], the following used to work in julia 0.2:

W = sum(Ws)

but it now fails because zero{Vector{Float64}} is not defined (and understandably so); instead, this works fine at the moment:

W = reduce(+, Ws)

I'm not sure if the former should actually work or not, but if the documentation says that the two should be equivalent, it should.

BTW the error for the sum(Ws) form is rather cryptic:

julia> sum(Ws)
ERROR: no method convert(Type{Array{Float64,1}}, Int64)
 in sum_seq at abstractarray.jl:1445
 in sum_pairwise at abstractarray.jl:1505
 in sum at abstractarray.jl:1512

@carlobaldassi
Copy link
Member

Note: the issue in the previous comment about sum(x) != reduce(+,x) was fixed by @lindahua in #5318.

@oscardssmith
Copy link
Member

oscardssmith commented Oct 30, 2016

Given the existence of #18423, should this be closed?

@kmsquire
Copy link
Member

@oscardssmith typically issues are closed only after relevant pull requests are merged.

@yuyichao
Copy link
Contributor

Or when the two issues are duplicate of each other (or when one covers the other one). These two issues seems to be closely related but are not quite the same. (A fix should probably take both into account though.)

@martinholters
Copy link
Member

The original issue seems to be solved, sum(OutputType, x) does work:

julia> a = rand(Float32, 10^6);

julia> sum(Float64.(a)) - sum(a)
-0.020957350730895996

julia> sum(Float64.(a)) - sum(Float64,a)
0.0

Close or change title?

PS: I haven't verified there is a test for this; we might want to do so before closing.

@KristofferC
Copy link
Sponsor Member

Works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants