Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling of an empty iterator for mean and var #29033

Merged
merged 5 commits into from Oct 11, 2018

Conversation

4 participants
@bkamins
Copy link
Contributor

commented Sep 4, 2018

This is a follow up of #28777 fixing what is minimally needed to be consistent in Julia 1.0 between empty arrays and empty collections (needed for skipmissing).

Major cleanup of consistency of the outputs for different scenarios of application of functions from Statistics module are left for Julia 2.0.

@@ -86,6 +86,10 @@ end
@test ismissing(mean([missing, NaN]))
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0])
@test mean(skipmissing([1, missing, 2])) === 1.5
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

It would also make sense to test that an empty Int input gives a Float64 output, and that an error is thrown when the element type isn't known (e.g. using a generator).

BTW, you could probably use ===.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

This PR is in muddy waters :) You cannot use === as we have different NaN in Julia and here it would fail. Notice that:

julia> NaN === -NaN
false

and this is exactly the case here, that is why I had to use isequal.

I will add the tests.

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

I must be missing something: you really expect NaN::Float64 here, right?

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

The situation is more complex (and that is why I have used this as a test):

julia> x = mean(Complex{Float64}[])
NaN + NaN*im

julia> bitstring(x.re)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(x.im)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(NaN) # see that the bitstring is different
"0111111111111000000000000000000000000000000000000000000000000000"

also you have:

julia> x = mean(Int[])
NaN

julia> x === NaN
false

For the same reason that we have different NaNs in Julia (and as you see they actually happen in practice).

@@ -245,6 +249,9 @@ end
@test ismissing(f([missing, NaN], missing))
@test f(skipmissing([1, missing, 2]), 0) === f([1, 2], 0)
end

@test isequal(mean(Complex{Float64}[]), NaN)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

I guess you intended to test var?

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

Sure. Fixed.

@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me
function _var(iterable, corrected::Bool, mean)
y = iterate(iterable)
if y === nothing
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))"))
T = eltype(iterable)
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

So this throws an error when calling zero if eltype returns Any. Maybe better call throw a more explicit error similar to the existing one?

Could use oftype.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

I would leave it as is, because this is how var for AbstractArray is implemented and we want to be non-breaking (so it should have 100% the same behavior - the same outputs and the same errors - so using array or iterator is transparent).

For sure in Julia 2.0 approach we could consider cleaning it up in both places.

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

I just mean that instead of letting zero throw the error, it would be clearer to check T === Any and throw a more explicit error.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

The problem is that T does not have to be Any. It can be an arbitrary type that is or is not a subtype of Number. And upfront we do not know if this type:

  • defines zero
  • has abs2 defined
  • implements a constructor from NaN

Any of those may or may not be true so the only way to be consistent is to repeat the same code that is used for a version of var defined for arrays (otherwise I will be able to create a situation where array and iterator var differs).

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

OK. We could still print a nice error in the most common case, but then we'd lose consistency.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

This is the point for now. As discussed in Julia 2.0 we can probably clean it up in both methods.

This comment has been minimized.

Copy link
@TotalVerb

TotalVerb Sep 13, 2018

Contributor

personally, would use oftype as @nalimilan suggested for clarity

@bkamins bkamins force-pushed the bkamins:empty_collections branch from b7d48b9 to 79be956 Sep 4, 2018

@bkamins bkamins force-pushed the bkamins:empty_collections branch from 79be956 to 40b2ea7 Sep 4, 2018

@@ -86,6 +86,19 @@ end
@test ismissing(mean([missing, NaN]))
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0])
@test mean(skipmissing([1, missing, 2])) === 1.5
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im)
@test isequal(mean(skipmissing(Complex{Float64}[])), NaN+NaN*im)
@test mean(Complex{Float64}[]) isa Complex{Float64}

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

Why not also test the value? Same in other places.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

The value is tested above in line 89 (unless you meant something else).

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 4, 2018

Contributor

Sorry, this one is, I originally spotted mean(Int[]). It would probably be clearer to have them side by side.

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 4, 2018

Author Contributor

All cleaned up now (hopefully 😄, as you have a keener eye for details 👍)

@bkamins bkamins force-pushed the bkamins:empty_collections branch from 720b43b to 9e0b0ba Sep 4, 2018

@nalimilan
Copy link
Contributor

left a comment

Thanks, looks good to me, but another review wouldn't hurt.

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2018

One CI error seems unrelated.

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Who would be a good person to review?

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2018

@TotalVerb
Copy link
Contributor

left a comment

overall lgtm

There's a mean(::AbstractRange{<:Real}) method on line 134. Might be worth changing that behavior too for consistency, even though ranges can't have missings.

@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me
function _var(iterable, corrected::Bool, mean)
y = iterate(iterable)
if y === nothing
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))"))
T = eltype(iterable)
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN)

This comment has been minimized.

Copy link
@TotalVerb

TotalVerb Sep 13, 2018

Contributor

personally, would use oftype as @nalimilan suggested for clarity

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Sep 14, 2018

Thank you. Changed to oftype in both places for consistency and changed mean of empty ranges (with added tests).

@bkamins bkamins force-pushed the bkamins:empty_collections branch from 0aa0683 to 0487d17 Sep 14, 2018

@@ -131,8 +131,8 @@ mean(A::AbstractArray; dims=:) = _mean(A, dims)
_mean(A::AbstractArray{T}, region) where {T} = mean!(Base.reducedim_init(t -> t/2, +, A, region), A)
_mean(A::AbstractArray, ::Colon) = sum(A) / length(A)

function mean(r::AbstractRange{<:Real})
isempty(r) && throw(ArgumentError("mean of an empty range is undefined"))
function Statistics.mean(r::AbstractRange{<:Real})

This comment has been minimized.

Copy link
@nalimilan

nalimilan Sep 14, 2018

Contributor

Why add a module prefix?

This comment has been minimized.

Copy link
@bkamins

bkamins Sep 14, 2018

Author Contributor

Typo. Will fix.

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2018

How do we want to proceed with this PR?

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

Just merge?

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Oct 11, 2018

I think just go ahead with this. We can always revert it.

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2018

I like when everybody approves but nobody actually merges... :-)

@nalimilan nalimilan merged commit b519147 into JuliaLang:master Oct 11, 2018

4 checks passed

ci/circleci: build-i686 Your tests passed on CircleCI!
Details
ci/circleci: build-x86_64 Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nalimilan nalimilan removed the triage label Oct 11, 2018

@bkamins bkamins deleted the bkamins:empty_collections branch Oct 11, 2018

fredrikekre added a commit that referenced this pull request Dec 5, 2018

Addition of NEWS and compat admonitions for important
changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.

Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>

fredrikekre added a commit that referenced this pull request Dec 5, 2018

Addition of NEWS and compat admonitions for important
changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.

Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>

fredrikekre added a commit that referenced this pull request Dec 5, 2018

Compat admonitions and NEWS for Julia 1.1 (#30230)
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.


Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.