Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum(f, itr) when itr is an empty collection should return zero #36262

Closed
CameronBieganek opened this issue Jun 12, 2020 · 12 comments
Closed

sum(f, itr) when itr is an empty collection should return zero #36262

CameronBieganek opened this issue Jun 12, 2020 · 12 comments
Labels
fold sum, maximum, reduce, foldl, etc.

Comments

@CameronBieganek
Copy link
Contributor

The sum of an empty, typed array is zero:

julia> sum(Int[])
0

julia> sum(Bool[])
0

Broadcasting over an empty array returns an empty array:

julia> (x -> 2x).(Int[])
0-element Array{Int64,1}

julia> Int[] .== 1
0-element BitArray{1}

So, as expected, summing over the above broadcasted operations returns zero:

julia> sum((x -> 2x).(Int[]))
0

julia> sum(Int[] .== 1)
0

However, if we perform the same operations using the sum(f, itr) method, we get an ArgumentError:

julia> sum(x -> 2x, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::var"#7#8", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[8]:1

julia> sum(==(1), Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::Base.Fix2{typeof(==),Int64}, ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[9]:1
@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 12, 2020

What if, for example, f(x) always returns some specific type, regardless of the type of x?

For example, suppose we define f(x) = 1.0. So f(x) always returns a Float64, regardless of the type of x.

In this example, sum(f, Int[1]) will return a Float64, and sum(f, Int[1, 2]) will return a Float64, but sum(f, Int[]) will return an Int?

Won't that mean that this method of sum is not type-stable? Because you can get two different output types for the same input types.

@CameronBieganek
Copy link
Contributor Author

CameronBieganek commented Jun 12, 2020

Well, ok. I guess I shouldn't have put zero(eltype(itr)) in the title. Ideally sum(f, itr) would return the zero of the output type of f (or more precisely, the zero of the type resulting from summing an array of the type of the output of f). Because this is inconsistent:

julia> sum((x -> 1.0).(Int[]))
0.0

julia> sum(x -> 1.0, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::var"#13#14", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[28]:1

However, off the top of my head, I don't know how to implement that. Is there an internal function somewhere in Base or Core that returns the inferred output type of a function?

@CameronBieganek CameronBieganek changed the title sum(f, itr) when itr is an empty collection should return zero(eltype(itr)) sum(f, itr) when itr is an empty collection should return zero Jun 12, 2020
@CameronBieganek
Copy link
Contributor Author

CameronBieganek commented Jun 12, 2020

I can't speak to the efficiency of this, but here's an implementation that has the desired behavior:

julia> mysum(f, itr) = sum(f.(itr))
mysum (generic function with 1 method)

julia> mysum(x -> 1.0, Int[])
0.0

julia> mysum(==(1), Int[])
0

@goretkin
Copy link
Contributor

What zero should it return, when the type is not inferrable?

On the other hand:

julia> sum(Array{Union{Float64, Int64},1}())
0

So if sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.

@tkf
Copy link
Member

tkf commented Jun 12, 2020

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

But we now have init for sum #36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections.

(Note: The confusion here is that broadcasting is "cheating" by using the internal inference API return_type in the empty case. The broadcasting implementation weighs more on convenience and performance than the correctness. So, I don't think we should be using the behavior of broadcasting as a reference.)

@CameronBieganek
Copy link
Contributor Author

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).

But we now have init for sum #36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections.

Ok. And I see that you've updated the docstrings for sum, so that helps to clarify what happens with empty collections. However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.

@tkf Question:
After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?

@tkf
Copy link
Member

tkf commented Jun 13, 2020

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).

Actually, no, it's not defining any concrete API. It's exposing undefined behavior of return_type rather than throwing an exception.

However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.

Personally, I think it's better to make f.(itr) "more correct" by returning Array{Union{}} when the input is empty. But I know that this is not very popular opinion.

After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?

sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.

@goretkin
Copy link
Contributor

Note: The confusion here is that broadcasting is "cheating" by using the internal inference API return_type in the empty case.

Is that the whole story?
Regarding #36262 (comment)
Broadcasting is using that internal inference API to produce the result Array{Union{Float64, Int64},1}(), but that is separate from sum(Array{Union{Float64, Int64},1}()) === 0

@tkf
Copy link
Member

tkf commented Jun 13, 2020

On the other hand:

julia> sum(Array{Union{Float64, Int64},1}())
0

So if sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.

Actually I miss why these behaviors are inconsistent. I'd expect these behaviors if using return_type is OK.

@tkf tkf added the fold sum, maximum, reduce, foldl, etc. label Jun 13, 2020
@NightMachinery
Copy link

NightMachinery commented Aug 27, 2020

@tkf commented on Jun 13, 2020, 4:51 AM GMT+4:30:

sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.

sum(1 for i in []) doesn't work though.

I am trying to count something using a for comprehension. I tried length(nothing for i in [1,2] if i == 1), but that throws ERROR: MethodError: no method matching length(::Base.Iterators.Filter{var"#246#248",Array{Int64,1}}).

@martinholters
Copy link
Member

count(==(1), [1,2]) for that case. More generally, the recently introduced init keyword for sum will help out:

julia> sum(1 for i in []; init = 0)
0

Given the difficulty in determining which zero (i.e. of which type) to return in general for an empty input collection, I wonder whether init is sufficient to close this?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 23, 2021

closing since we now have init, both as an argument, and in the error message, as mentioned above:

But we now have init for sum #36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

No branches or pull requests

7 participants