(Soft-)deprecate `init` for `sum!` etc? #36266

tkf · 2020-06-13T05:27:54Z

As @timholy pointed out in #35839, the meaning of init in in-place multi-dimensional reducers like sum! and count! are incompatible with reduce etc.

Now that sum(...; init) #36188 is merged and would be shipped with Julia 1.6, it's a bit concerning that init in sum!(r, A; init = false) and sum(A; init = false) means something completely different. I think it might make sense to deprecate or "soft-deprecate" init for sum! etc. By "soft-deprecate" I mean to add a new keyword argument that has init as the fallback without depwarn. This new keyword argument would be documented as a preferred way to control the initialization of the destination array. Since sum! etc. do not actually document init, going directly to deprecation might be OK. But, since this keyword argument exists for a long time, I'm inclined to treat this as a fully-fledged public API. (This is a bit tricky case because it's unclear if performance-breaking but semantics-preserving change should be treated as non-breaking.)

For the actual name of the new keyword argument, how about sum!(r, A; fill = true) instead of sum!(r, A; init = true)?

The text was updated successfully, but these errors were encountered:

tkf · 2020-06-14T20:34:08Z

Looking at it again, now I'm not so sure about fill as the new keyword argument. Brainstorming some other names:

fillinit
initialize
initfirst
needinit

Alternatively, it also makes sense to rename init of reduce etc. to identity, id, neutral, or something. We need something similar to init for foldl/foldr/accumulate, though.

(cc'ing other people who might be interested in this: @nalimilan @mbauman @JeffBezanson)

mcabbott · 2020-06-14T22:07:44Z

How about sum!(r, A; keep = false) or sum!(r, A; discard = true)?

tkf · 2020-06-14T22:15:51Z

I think discard is a good candidate! I guess keep is kind of a same idea but keeping some value happens only if the corresponding region of the input array is zero (or NaN/missing is in the destination). I think discard is better because discarding old values happen always.

timholy · 2020-06-14T23:41:34Z

I also like discard, great suggestion @mcabbott and thanks @tkf for spearheading this and so many other advances!

StefanKarpinski · 2020-06-15T19:24:16Z

Doesn't the keyword argument to sum! control whether the array gets filled first or not? This keyword isn't documented anywhere that I can see and even figuring out what it does requires perusing some fairly confusing code in reducedim.jl. Perhaps I'm missing something but I don't get how keep or discard are more appropriate than fill.

tkf · 2020-06-15T19:58:41Z

Right, init for sum! is not mentioned. So very strictly speaking it's not a public API. But I don't know how it's used in the wild.

In Julia 1.5, we'll have count! which mentions init:

julia/base/reducedim.jl

Lines 393 to 400 in 13b07fc

    
               count!([f=identity,] r, A; init=true) 
        
           Count the number of elements in `A` for which `f` returns `true` over the 
        
           singleton dimensions of `r`, writing the result into `r` in-place. 
        
           If `init` is `true`, values in `r` are initialized to zero. 
        
           !!! compat "Julia 1.5" 
        
               inplace `count!` was added in Julia 1.5.

I don't get how keep or discard are more appropriate than fill.

Well, I'm OK with fill, too. I thought discard was better because fill could mean filling something else (e.g., filling missing/NaN; although I think the chance for such misunderstanding is very low). OK, maybe it can be said for discard ("discarding" missings). Naming is damn hard....

I actually think initialize is slightly better than discard. I think it also avoid confusion with filling missing/NaN.

timholy · 2020-06-15T20:29:20Z

Another option is reset

mbauman · 2020-06-15T21:38:37Z

I've never used a bang reduction with init, and — even knowing that this was a boolean flag based on the first post — my first instinct was that this flag would work oppositely as it does (that is, if true, use the existing values in the output array as the reducer's init kwarg). So that's another motivation to change this.

In some senses, I think a pretty good API would be one in which we defaulted to using the values in the output array and if an init kwarg is provided we fill!(output, init) before performing the reduction. Unfortunately that's definitely and hugely breaking.

Alternatively, we could use a sentinel like nothing or undef to opt out of initialization and allow init'ing arbitrary values. Again, though, that's changing the meaning of init in the wild. It's not used much though — that's a fairly conservative search and really only flags GPUArrays (but it misses multi-line arg lists).

If we're going to use a boolean flag, reset is the best option yet. I wish we could also convey the zeroing property, but neither zero nor zero! make for good kwargs. We could alternatively define this as a function that "preprocesses" the output array, with a default of out->fill!(out, zero(eltype(out)))... but specifying preprocess=identity doesn't feel like a good way of opting out of this behavior.

StefanKarpinski · 2020-06-15T22:06:46Z

In hindsight, it seems like it would be best if the out-of-place accumulators were defined to always increment the target array: if you want to start at zero, pass in a zeroed array.

tkf · 2020-06-15T22:14:20Z

maximum! does something a bit different than filling the identity element (it uses the first slice of the input array). I think auto-initialization is a useful API as initial value handling can be a headache sometimes.

mbauman · 2020-06-15T22:20:05Z

maximum! does something a bit different than filling the identity element (it uses the first slice of the input array)

You could still think of this as filling an identity element — the "maximative" identity is the typemin, yeah?

Edit: in fact, this framing would allow for a non-erroring empty reduction:

julia> sum!(fill(NaN, 1,4), ones(0,4))
1×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0

julia> maximum!(fill(NaN, 1,4), ones(0,4))
ERROR: BoundsError: attempt to access 0×4 Matrix{Float64} at index [1:1, 1:4]

tkf · 2020-06-15T22:34:48Z

Well, I'm a bit ambivalent about using typemin/typemax for maximums/minimum. I think erroring out is a bit better. This is because the "semantic" domain of the function may not be captured by the type. For example, if you have

ys .= sin.(xs)
maximum(ys)

it's in some sense not desirable to get -Inf when xs is empty. It'd be nice if it's -1 but of course, that's rather impossible.

nalimilan · 2020-06-16T07:43:47Z

In Julia 1.5, we'll have count! which mentions init:

Should we remove the mention of that keyword argument before people start using it?

tkf · 2020-06-16T08:14:46Z

Yeah, good point. Now that everyone agrees that rename is the way to go (and converging to reset as the new name?), it makes sense to at least hide it in the documentation. I just opened #36305 that does that.

tkf · 2020-06-16T19:16:17Z

Is everyone OK with renaming init to reset?

StefanKarpinski · 2020-06-16T19:22:50Z

Can someone write a sentence explaining what the artist formerly known as init does?

tkf · 2020-06-16T19:37:20Z

f!(r, A; init) does the following before the reduction iff init = true:

sum!, count!: fill!(r, zero(eltype(r)))
prod!: fill!(r, one(eltype(r)))
all!: fill!(r, true)
any!: fill!(r, false)
maximum!, minimum!: fill r with the first slice of A

mbauman · 2020-06-16T19:40:21Z

In the context of 0-element reductions:

init is used as the output. If not provided, error.

In the context of 1+-element reductions:

init is used as an argument to the first call to the reducer. If not provided and there's only one element, just return that one element itself.

In the context of bang (in-place) reductions, it's really hard. Here's the best I can do:

This flag chooses whether to use the existing values in the output as the reducer's init argument or not.

StefanKarpinski · 2020-06-17T12:53:42Z

The first two are the ones we're keeping the init term for (and it fits) so I'm not worried about them. It's the last tough one I'm trying to get a handle on and I fear that reset is no clearer, which is why I wanted to play the "write a sentence that explains what it does" game since there's usually a word in that sentence that is a good choice for the argument name. One thing that keeps occurring to me is initialized with the opposite meaning of the current name that indicates whether the incoming argument is already initialized or not (if it isn't, then the reducer should initialize it).

tkf · 2020-06-17T19:42:20Z

I created a WIP PR for this #36332. I updated the docstring to what it does. I also added a few doctest demos. I guess it'd help the game?

Here is the canonical phrase I used in the PR:

If reset is true, values in r are discarded. Otherwise, they are used as the initial value of the accumulation.

(reset is just a placeholder for now)

mbauman · 2020-06-17T20:06:02Z

One option would be to use the word init but pass either an array the same shape as out or nothing. This would allow folks to pass out itself, and we could still easily deprecate init::Bool.

tkf · 2020-06-17T20:19:06Z

Hm... I'm a bit confused. Do you mean sum!(A; out = r) instead of sum!(r, A; init = false)? What is the signature for sum!(r, A)?

mbauman · 2020-06-17T20:25:38Z

No, I mean you'd call sum!(r, A; init=r) in cases where you want to preserve the values in r as the init state of the reductions. By default we'd pass init=nothing, which would be the sentinel that flags not having an init in each reduction. You could also pass init=zeros(size(r)) or whatever to explicitly choose what values you want (without messing with the output ahead of time).

Edit: or we could default to maximum!(r, A; init = view(A, #= whatever indexing would give you the first slice in the shape of r =#)))

tkf · 2020-06-17T20:33:50Z

If the out array is not the one mutated, isn't it incompatible with the name out(put)? (Or at least that's how it is used in Numpy so I was confused.)

tkf · 2020-06-17T20:36:06Z

@StefanKarpinski's initialized sounds good to me too, BTW.

mbauman · 2020-06-17T20:38:17Z

Hunh? This is just determining what the output array gets pre-filled with. E.g.,

function sum!(r, A; init=zeros(size(r)))
    if init === r
        # do nothing
    elseif init isa Bool
        # depwarn
    else
        r .= init
    end
    # now proceed with reduction in-place using the values in `r` as the first argument
    return r
end

tkf · 2020-06-17T20:41:41Z

Oh, I misunderstood your proposal. I thought you were suggesting to use out as the name of the keyword argument.

tkf · 2020-06-17T21:36:02Z

Wondering about init :: AbstractArray API, I still think flag-based interface is easier to compose. For example, I think

r = sum!(preprocess(), A; reset = false)

is much cleaner than

r = preprocess()
sum!(r, A; init = r)

tkf added the domain:fold sum, maximum, reduce, foldl, etc. label Jun 13, 2020

tkf mentioned this issue Jun 16, 2020

Remove init from count! docstring #36305

Merged

tkf linked a pull request Jun 17, 2020 that will close this issue

Introduce keyword argument reset for sum! etc. #36332

Draft

tkf mentioned this issue Jun 20, 2020

Define extrema using mapreduce; support init #36265

Closed

simeonschaub mentioned this issue Sep 8, 2020

add init argument to count #37461

Merged

mcabbott mentioned this issue Nov 21, 2020

init kwarg on sum!, any!, prod!, etc not documented #38512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Soft-)deprecate `init` for `sum!` etc? #36266

(Soft-)deprecate `init` for `sum!` etc? #36266

tkf commented Jun 13, 2020

tkf commented Jun 14, 2020

mcabbott commented Jun 14, 2020

tkf commented Jun 14, 2020

timholy commented Jun 14, 2020

StefanKarpinski commented Jun 15, 2020 •

edited

tkf commented Jun 15, 2020 •

edited

timholy commented Jun 15, 2020

mbauman commented Jun 15, 2020

StefanKarpinski commented Jun 15, 2020

tkf commented Jun 15, 2020

mbauman commented Jun 15, 2020 •

edited

tkf commented Jun 15, 2020

nalimilan commented Jun 16, 2020

tkf commented Jun 16, 2020

tkf commented Jun 16, 2020

StefanKarpinski commented Jun 16, 2020

tkf commented Jun 16, 2020

mbauman commented Jun 16, 2020

StefanKarpinski commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020 •

edited

tkf commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020

tkf commented Jun 17, 2020

tkf commented Jun 17, 2020

(Soft-)deprecate init for sum! etc? #36266

(Soft-)deprecate init for sum! etc? #36266

Comments

tkf commented Jun 13, 2020

tkf commented Jun 14, 2020

mcabbott commented Jun 14, 2020

tkf commented Jun 14, 2020

timholy commented Jun 14, 2020

StefanKarpinski commented Jun 15, 2020 • edited

tkf commented Jun 15, 2020 • edited

timholy commented Jun 15, 2020

mbauman commented Jun 15, 2020

StefanKarpinski commented Jun 15, 2020

tkf commented Jun 15, 2020

mbauman commented Jun 15, 2020 • edited

tkf commented Jun 15, 2020

nalimilan commented Jun 16, 2020

tkf commented Jun 16, 2020

tkf commented Jun 16, 2020

StefanKarpinski commented Jun 16, 2020

tkf commented Jun 16, 2020

mbauman commented Jun 16, 2020

StefanKarpinski commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020 • edited

tkf commented Jun 17, 2020

tkf commented Jun 17, 2020

mbauman commented Jun 17, 2020

tkf commented Jun 17, 2020

tkf commented Jun 17, 2020

(Soft-)deprecate `init` for `sum!` etc? #36266

(Soft-)deprecate `init` for `sum!` etc? #36266

StefanKarpinski commented Jun 15, 2020 •

edited

tkf commented Jun 15, 2020 •

edited

mbauman commented Jun 15, 2020 •

edited

mbauman commented Jun 17, 2020 •

edited