Provide method for `sum` that accepts a neutral element #18423

eschnett · 2016-09-09T14:56:45Z

Currently, the expression sum([]) leads to an error, since there is no way to know the intended neutral element ("zero") for an array of type Array{Any}. I suggest to add a method to sum (or to provide a new function sum0) that accepts such a zero element. This would correspond to the existing method reduce(op, v0, itr).

The implementation would be straightforward:

function sum0(itr, v0)
    isempty(itr) && return v0
    sum(itr)
end

Equivalent methods should be added for prod, minimum, and maximum as well.

Whether we can use the name sum for this, or need to use a new name sum0, depends on what methods currently exist for sum and whether there would be a conflict.

The text was updated successfully, but these errors were encountered:

timholy · 2016-09-09T15:21:26Z

What's wrong with calling it reduce(+, itr, v0)?

eschnett · 2016-09-09T16:11:09Z

There's a discussion elsewhere where people complain about / are puzzled about / don't like that sum([]) doesn't return 1, or true, or a similar "generic" value for zero. If people feel strongly enough about sum doing the right thing, then such a method would help.

Otherwise, we need to update the documentation of sum pointing out that one either needs to ensure the collection is not empty, or needs to ensure that it has an eltype that isn't Any. Otherwise, one needs to fall back to using reduce(+) instead. And since there is already a function sum that doesn't do much more than call reduce(+), I'd argue that our beginner-friendliness needs to be extended here.

StefanKarpinski · 2016-09-09T18:52:37Z

We could potentially handle this the same way we handle inferring the element type of empty comprehensions. Of course, that approach has its own predictability issues, but in the future, if we decide that the approach is acceptable we could handle empty reductions the same way – i.e. by calling zero(T) or one(T) when the element type of the result can be inferred as concrete.

stevengj · 2016-09-09T19:39:48Z

@StefanKarpinski, we already call zero(T) in sum for an empty Array{T}. The problem is what to do with T=Any and similar cases where it is impossible to define zero in a completely general way.

I'm inclined to think this is not a practical problem. If you are summing an Any array, and it throws an exception on the case of an empty array, what should you do:

Realize that you should probably be using a different type of array?
Call reduce(+, itr, v0)?
Call some special form of sum that lets you supply an extra element?

In 95% of such cases, you should probably be using a more concretely typed array. In the remaining 5% of cases, I don't see the problem with calling reduce. We could add a line to the sum documentation mentioning this possibility.

Adding an extra optional argument to sum won't really solve anything because it will be optional. The people who are seeing exceptions now for sum([]) are precisely the people who wouldn't supply the optional argument to sum until after they encounter the exception, at which point they might as well change the array type or call reduce.

simonster · 2016-09-09T20:28:18Z

See also #5311

ararslan · 2016-09-09T22:12:48Z

See #18367 for some relevant discussion as well

TotalVerb · 2016-09-09T22:39:09Z

It's possible to just call reduce(+, itr, v0), and this is the workaround that made its way into Iterators.jl. But I still don't understand what's desirable about throwing an error as opposed to returning an additive identity of the simplest possible type. Isn't the whole point of providing sum, prod, etc. to provide a better interface?

sum is defined as a generic function — it's not just const sum = xs -> reduce(+, xs). It comes with features like finding a meaningful zero type, which reduce can't do. So why does it just give up on the empty case? Sure, there's no "universal zero" (although there's nothing preventing that from being added, as @andyferris has suggested). But there's a value that's good enough as universal zero in 95% of cases: currently, false.

There is no precedent for "if this doesn't hold in full generality, bail out". Imagine if vect did this:

julia> Base.vect(1, 2)
2-element Array{Int64,1}:
 1
 2

julia> Base.vect(1.0, 2.0)
2-element Array{Float64,1}:
 1.0
 2.0

julia> Base.vect()
ERROR: Cannot figure out type.

That's ridiculous, so Base.vect() picks the array type most likely to work in most situations — Any.

Same goes for Base.map():

julia> map(x -> 0, 1)
0

julia> map(x -> 0, [1])
1-element Array{Int64,1}:
 0

julia> map(() -> 0)
0

There's no correct return type here either. But a scalar is the sane choice, because it will work properly in most cases. The remaining 5% of cases isn't worth throwing an error for.

Similarly, I think Base.sum([]) and Base.prod([]) ought to pick the value that will work in most cases. Since it's an Any array, there's no chance of its return value being type stable, so there's no disadvantage to picking something that will promote to any type: Bool at present, possibly UInt1 or Zero if desired.

@stevengj I disagree that it's not a practical problem; it's shown up in Iterators.jl, so clearly it's not merely academic. I agree with your point that it's easy to work around — reduce(+, itr, v0) works as well — but I'm simply not convinced that this workaround is necessary. Why can't sum just do the right thing?

stevengj · 2016-09-10T00:05:30Z

Because the empty sum has a well defined answer — the additive identity — but there is no single additive identity that works for any type defining +, hence there is no possible correct answer for sum(Any[]). If there were some possible correct answer, even if there were multiple choices, I wouldn't mind us picking one arbitrarily.

In contrast, if Base.vect(args...) is defined as a container that holds all of its arguments Base.vect() --> Any[] conforms with that definition, hence it is a possible correct answer.

Similarly, map(f) -> f() is consistent with the definition of map. It is true that there are other possible consistent definitions, but when have multiple correct choices I don't think it's a problem to pick the one must likely to be useful.

TotalVerb · 2016-09-10T00:09:40Z

I disagree that false is not a correct answer. Sure, it is not always correct, but it is consistent with the definition of sum the same way map(f) -> f() is consistent with the definition of map. Why does it have to work for any type defining +? false works for a large number of types; certainly more types than MethodError does. If we want it to work on all types, then something like a singleton ZeroType might make sense.

TotalVerb · 2016-09-10T00:14:37Z

This issue has been marked with the doc tag. If we aren't going to fix sum and prod to work properly, then I would be fine with including a warning of that in the documentation, recommending reduce(+, itr, false) instead when itr does not have concrete eltype. I don't think a neutral element is necessary.

stevengj · 2016-09-10T01:37:08Z

The definition for sum of an empty collection of type T is the additive identity of type T. Since there is no universal additive identity of type Any, then sum must throw an error. false is never the correct answer to the question "what is the additive identity for Any (i.e. for all types)."

The documentation should definitely suggest reduce(+, itr, false) if one must use a collection of a type that does not have a zero, or alternatively using a more strongly typed collection.

However, the type doesn't have to be concrete. It would be perfectly reasonable to define zero(::Type{Number}) = false, for example.

stevengj · 2016-09-10T01:42:01Z

(Currently, zero(Number) --> Int(0), which is questionable, although this is an additive identity in the sense of == if not isequal.)

TotalVerb · 2016-09-10T02:29:34Z

That sounds like a fair solution to me. 0 being an additive identity for zero(Number) is also not true in general, at least in package land: Unitful.jl makes dimensioned quantities subtype Number also. (Though, arguably, this is the fault of Unitful — Base.Dates does not do this.)

ajkeller34 · 2016-09-10T03:48:01Z

To follow up on the previous comment, I'm still open to changing Unitful, ideally before I register it as a package and people start relying on it to behave a certain way. I had a productive discussion with @timholy about related issues here: PainterQubits/Unitful.jl#9. To avoid derailing this thread, I'm happy to engage you there if you have thoughts specific to whether or not physical quantities should be a subtype of Number.

andyferris · 2016-09-11T13:23:06Z

Because the empty sum has a well defined answer — the additive identity — but there is no single additive identity that works for any type defining +, hence there is no possible correct answer for sum(Any[]). If there were some possible correct answer, even if there were multiple choices, I wouldn't mind us picking one arbitrarily.

Since I've been mentioned here, I'll respond to this (highlighted by me) comment by @stevengj by saying that there seems to be an underlying demand for having access to an abstract additive identity. The fact that there isn't one suggests that we should really consider adding one (or just use the zero function itself), along with generic definitions like +(::typeof{zero}, x) = x. The advantages would be twofold: you can use it in cases like Vector{Any} where code isn't type stable so the abstract type won't slow you down but will give you a correct answer for sum([]), or you can use it in type-stable code as an optimization to remove useless operations (if you don't want to rely on LLVM to maybe do that for you). One could further define other conveniences along the lines of colon(a,b) = a:one:b. If you wanted to double-down on abstractions, perhaps "zero" as Identity{+} and "one" as Identity{*} could be similarly useful.

To me, the downside is there is no limit to where the abstractions end. E.g. we could make assumptions like + is commutative and * may or may not be (this thinking already pervades Julia, e.g. "taking string concatenation seriously" #11030). But then what? Implement the distributive law as a default fallback? And so on and so on...

felixrehren · 2017-02-10T17:13:52Z

Going back to the feasibility of the opening suggestion: I would love it, because it is being grok-able, type-stable and work for empties, but there are practical problems to making the nicest form of this syntax available. For me, a proposal for sum(xs, zero) should address this design question:

julia> sum([i^2 for i in 1:5], 0)
ERROR: ArgumentError: dimension must be = 1, got 0
julia> sum(i^2 for i in 1:5, 0)
ERROR: syntax: invalid iteration specification

summing over Vectors clashes with the second argument giving the dimension, one has to write sum([i^2 for i in 1:5], 1, 0) instead
summing over generators clashes with the compact notation for nested for loops, where the comma leads the syntax parser expects something like for i in 1:5, j in ... instead of a second argument for sum. This can be avoided by sum((i^2 for i in 1:5), 0), but we don't usually force parenthesis for generators.

So if the sum(xs, zero) would force ugly notation in most(?) of the useful cases anyway, you might as well give up and write reduce(+, zero, xs) -- unfortunately. Or is there another way?

Looking at #5311, I realised that the notation sum(T::Type, xs) can be extended to achieve most of my goals for sum(xs, zero). It just needs this PR: #20561. Could that functionality be an adequate replacement for the sum(xs, zero) proposed here?

KristofferC · 2020-06-23T11:02:22Z

#36188

stevengj added the domain:docs This change adds or pertains to documentation label Sep 9, 2016

oscardssmith mentioned this issue Oct 30, 2016

Allow to specify the result type of a sum #5311

Closed

StefanKarpinski mentioned this issue Feb 10, 2017

stop being clever with reduction result types #20560

Closed

martinholters mentioned this issue Feb 13, 2017

give sum(T::Type, xs) = zero(T) when xs is empy #20561

Closed

KristofferC closed this as completed Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide method for `sum` that accepts a neutral element #18423

Provide method for `sum` that accepts a neutral element #18423

eschnett commented Sep 9, 2016

timholy commented Sep 9, 2016

eschnett commented Sep 9, 2016

StefanKarpinski commented Sep 9, 2016

stevengj commented Sep 9, 2016 •

edited

simonster commented Sep 9, 2016

ararslan commented Sep 9, 2016

TotalVerb commented Sep 9, 2016

stevengj commented Sep 10, 2016 •

edited

TotalVerb commented Sep 10, 2016

TotalVerb commented Sep 10, 2016

stevengj commented Sep 10, 2016 •

edited

stevengj commented Sep 10, 2016

TotalVerb commented Sep 10, 2016

ajkeller34 commented Sep 10, 2016

andyferris commented Sep 11, 2016

felixrehren commented Feb 10, 2017

KristofferC commented Jun 23, 2020

Provide method for sum that accepts a neutral element #18423

Provide method for sum that accepts a neutral element #18423

Comments

eschnett commented Sep 9, 2016

timholy commented Sep 9, 2016

eschnett commented Sep 9, 2016

StefanKarpinski commented Sep 9, 2016

stevengj commented Sep 9, 2016 • edited

simonster commented Sep 9, 2016

ararslan commented Sep 9, 2016

TotalVerb commented Sep 9, 2016

stevengj commented Sep 10, 2016 • edited

TotalVerb commented Sep 10, 2016

TotalVerb commented Sep 10, 2016

stevengj commented Sep 10, 2016 • edited

stevengj commented Sep 10, 2016

TotalVerb commented Sep 10, 2016

ajkeller34 commented Sep 10, 2016

andyferris commented Sep 11, 2016

felixrehren commented Feb 10, 2017

KristofferC commented Jun 23, 2020

Provide method for `sum` that accepts a neutral element #18423

Provide method for `sum` that accepts a neutral element #18423

stevengj commented Sep 9, 2016 •

edited

stevengj commented Sep 10, 2016 •

edited

stevengj commented Sep 10, 2016 •

edited