Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PValue and TestStat non-Real + add tests with Aqua #866

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Jun 17, 2023

Fixes #861 as suggested in #861 (comment) by making PValue and TestStat non-Real. Also adds tests for the issue (and other method ambiguities, unbound type parameters, ...) with Aqua: https://juliatesting.github.io/Aqua.jl/

The change is technically breaking but based on a search on Juliahub (https://juliahub.com/ui/Search?q=PValue&type=code&w=true and https://juliahub.com/ui/Search?q=TestStat&type=code&w=true) the two types are used by other packages only for pretty printing which is not affected by this PR. Generally, I'd argue this change is desirable regardless of whether Aqua flags these method ambiguities or not - creating new numeric types is generally quite brittle, prone to ambiguity issues, and requires implementing many methods.

Edit: This change is non-breaking - the types are neither documented nor exported.

@devmotion
Copy link
Member Author

The Documenter failure is known (see e.g. #863 (comment)) and fixed upstream.

@nalimilan
Copy link
Member

@palday You requested adding this at #668, what do you think?

Personally I'm not totally convinced that it's worth changing given that no real issue has been reported. Breaking changes should be done for strong reasons IMO.

@palday
Copy link
Member

palday commented Jun 21, 2023

I'm pretty opposed to this both because it's breaking in the abstract (and we're just in the last week or two finally seeing the necessary compat for StatsBase 0.34 propagate far enough to get compatible Makie and AlgebraOfGraphics releases) and because it's breaking in the concrete for me -- I introduced this behavior precisely because I have code that depends on it.

@devmotion
Copy link
Member Author

Can you point me to where exactly you use something that's removed in this PR? As mentioned above, based on the Juliahub search results it seems these types are only used for printing results which is not affected by this PR. All tests still pass. If there's a specific instance where the removed functionality is used, I'd happily fix these downstream issues (before the PR is merged, of course).

Personally I'm not totally convinced that it's worth changing given that no real issue has been reported.

#861 reports the issues: Even though it seems as these types support comparisons with other real numbers, this is not the case in practice - you don't even have to come up with other weird real types but just trying to compare them with values of type AbstractIrrational will throw an error:

julia> using StatsBase

julia> StatsBase.TestStat(3.0) == π
ERROR: MethodError: ==(::StatsBase.TestStat, ::Irrational{:π}) is ambiguous.

Candidates:
  ==(x::Union{StatsBase.PValue, StatsBase.TestStat}, y::Real)
    @ StatsBase ~/.julia/packages/StatsBase/iMkPf/src/statmodels.jl:90
  ==(x::Real, y::AbstractIrrational)
    @ Base irrationals.jl:90

Possible fix, define
  ==(::Union{StatsBase.PValue, StatsBase.TestStat}, ::AbstractIrrational)

Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

julia> StatsBase.TestStat(complex(1.0))
ERROR: MethodError: StatsBase.TestStat(::ComplexF64) is ambiguous.

Candidates:
  (::Type{T})(z::Complex) where T<:Real
    @ Base complex.jl:44
  StatsBase.TestStat(v)
    @ StatsBase ~/.julia/packages/StatsBase/iMkPf/src/statmodels.jl:80

Possible fix, define
  StatsBase.TestStat(::Complex)

Stacktrace:
 [1] top-level scope
   @ REPL[7]:1

I've tried to work with custom Reals at some point myself but the impression I got is that it's just a bad idea in general. My feeling was that there is no way to do this in a concise way that does not lead to ambiguities and problems, even when only considering compatibility with Base - and even more so when considering other more obscure Real types. Since the main purpose of these types is printing, the effort and issues introduced by making them subtypes of Real does not seem worth it to me.

Breaking changes should be done for strong reasons IMO.

My impression was that this change is not breaking in practice. More importantly though, these types are not exported - so it's actually technically not breaking regarding how much these types are changed.

@palday
Copy link
Member

palday commented Jun 21, 2023

Can you point me to where exactly you use something that's removed in this PR? As mentioned above, based on the Juliahub search results it seems these types are only used for printing results which is not affected by this PR.

Not all code is indexed by JuliaHub or even public.

All tests still pass.

Except the deleted ones.

@devmotion
Copy link
Member Author

Not all code is indexed by JuliaHub or even public.

True, but I guess JuliaHub is the best way to judge practical implications of the PR. And based on the publicly available indexed code the PR does not seem to break anything (additionally, technically these changes are not breaking since these types are neither exported nor documented). Generally, if the PR breaks anything it is easy to fix the broken code: Operators and comparisons for Reals just have to be applied to x.v instead of x::Union{PValue,TestStat}.

Except the deleted ones.

True, I implicitly excluded those as the PR is supposed to remove the functionality they test. So rather: all printing-related tests still pass.

@palday
Copy link
Member

palday commented Jun 21, 2023

"Technically non breaking" is not great for something near the root of the dependency tree. Fixing the ambiguities here is straightforward (and certainly no more difficult than "just" replacing a bunch of calls in a different code base).

We could for example do something like:

for op in [:(==), :<, :, :(isless), :(isequal)] # isless and < to place nice with NaN
    @eval begin
        Base.$op(x::Union{TestStat, PValue}, y::T) where {T<:Real} = $op(x.v, y)
        Base.$op(y::T, x::Union{TestStat, PValue}) where {T<:Real} = $op(y, x.v)
        Base.$op(x1::Union{TestStat, PValue}, x2::Union{TestStat, PValue}) = $op(x1.v, x2.v)
    end
end

EDIT: okay, that doesn't quite work, but resolving the ambiguities is also a definitely non breaking way to fix this.

@devmotion
Copy link
Member Author

"Technically non breaking" is not great for something near the root of the dependency tree.

The types are neither exported nor documented, so IMO this PR is clearly non-breaking. I had missed this initially but corrected the OP. I don't think the number of dependencies changes this fact.

okay, that doesn't quite work, but resolving the ambiguities is also a definitely non breaking way to fix this.

Surely, the ambiguities could be fixed in a different way. But IMO the better fix is to get rid of the Real supertype. It reduces code complexity whereas other approaches would require more code and can become broken again at any time by changes in Julia itself.

@palday
Copy link
Member

palday commented Jun 22, 2023

Removing undocumented unexported "clearly" internal type aliases was viewed as breaking:

#840

@devmotion
Copy link
Member Author

To me it seems the main reason for tagging a breaking release in that case (even though it was a non-breaking change) was that it was clear from the publicly available code (e.g., by searching on JuliaHub) that this non-breaking change would break publicly available downstream packages. This is not the case here as the PR does not modify the printing functionality used by downstream packages.

@nalimilan
Copy link
Member

Is it really so hard to fix the issue reported above? It seems that we should stop using ::Union{TestStat, PValue} but instead define separate methods for each of these types?

@devmotion
Copy link
Member Author

It's still not clear to me why we should add even more code to the already lengthy implementation and spend the time and effort to try to do it in a somewhat satisfying way (which can be broken by Base at any time), when this functionality is not documented, not exported, and does not seem to be used in any of the public downstream packages?

It seems that we should stop using ::Union{TestStat, PValue} but instead define separate methods for each of these types?

That's not sufficient - we have to add e.g. all combinations of ==(::TestStat, ::SomeOtherType) etc. for all types SomeOtherType <: Real for which ==(::Real, ::SomeOtherType) is defined (since otherwise there will be a method ambiguity with ==(::TestStat, ::Real)). As an example, even after replacing Union{...} with separate definitions for PValue and TestStat the following errors:

julia> using StatsBase

julia> StatsBase.TestStat(3.0) == π
ERROR: MethodError: ==(::StatsBase.TestStat, ::Irrational{:π}) is ambiguous.

Candidates:
  ==(x::Real, y::AbstractIrrational)
    @ Base irrationals.jl:90
  ==(x::StatsBase.TestStat, y::Real)
    @ StatsBase ~/.julia/dev/StatsBase/src/statmodels.jl:91

Possible fix, define
  ==(::StatsBase.TestStat, ::AbstractIrrational)

Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

@nalimilan
Copy link
Member

OK. Maybe the right solution to implement a Real is to define promote_rule and leave all other methods alone?

anyway, to be clear, the proposal I favor is to spent zero time on this by just leaving it alone until a real issue is reported. :-) Is it really the new recommended approach to check that no ambiguities exist using Aqua.jl? This seems fragile as lots of irrelevant changes in Base can break these tests.

@devmotion
Copy link
Member Author

This seems fragile as lots of irrelevant changes in Base can break these tests.

The point is that in contrast to the current master and alternative proposals this PR ensures that PValue and TestStat won't be affected by changes in Base 🙂

There are a lot of possibilities to fine-tune the Aqua tests and eg to exclude functions from the ambiguity checks (similar settings are about to be added also for the piracy checks), disable some tests completely (or for some modules or mark them as broken). So it should be easy to fix tests in case something breaks, and generally I assume it should not be more problematic than the existing ambiguity checks.

@palday
Copy link
Member

palday commented Jun 27, 2023

I still think this is technically breaking, though I've come around to the idea that PValue should not be a numeric type. We should also document this a little, even if it's just to say "this is a convenience type for display purposes and should be used to store actual p values" -- GLM.jl explicitly imports it, so there are definitely major downstream consumers.

@ararslan
Copy link
Member

Very tangential, just thinking out loud:

using Printf
using Printf: Format, format

"""
    PValueDisplay

Type used for pretty-printing p-values.

    PValueDisplay(value; limit, formatless, formatequal)

Construct a `PValueDisplay` for the given p-value `value`. When `show` is called,
the `printf`-style format string `formatless` is used if `value < limit`, otherwise
`formatequal` is used.
"""
struct PValueDisplay{T<:Real}
    value::T
    limit::T
    formatless::String
    formatequal::String

    function PValueDisplay(p; limit=0.001, formatless="<%.0e", formatequal="%.4f")
        0 <= p <= 1 || throw(ArgumentError("expected p-value 0 ≤ p ≤ 1, got $p"))
        return new{typeof(p)}(p, limit, formatless, formatequal)
    end
end

function Base.show(io::IO, p::PValueDisplay)
    if p.value < p.limit
        fmt = p.formatless
        val = p.limit
    else
        fmt = p.formatequal
        val = p.value
    end
    return format(io, Format(fmt), val)
end

The name makes its intent clearer and doesn't give the impression that number-like operations would be supported. It also provides some ability to customize since in some cases you may want to show different numbers of digits, standard or scientific notation, space after <, etc.

@devmotion
Copy link
Member Author

I guess such generalizations (and possible renaming) could be useful, but my initial feeling is that such more significant changes should maybe be left for follow-up PRs?

@palday
Copy link
Member

palday commented Jun 29, 2023

@devmotion I think that was very much the intent of @ararslan -- proposing a longer-term solution that should not be implemented in this PR but which could inform our strategy for getting this PR merged.

@ararslan
Copy link
Member

Yep, exactly. I just had some thoughts that I wanted to write down somewhere and ended up doing it here. 😅

@devmotion devmotion mentioned this pull request Aug 23, 2023
@devmotion devmotion mentioned this pull request Sep 16, 2023
@devmotion
Copy link
Member Author

So, what's the conclusion here? What's missing or to resolve?

@nalimilan
Copy link
Member

We could leave PValue as it is in StatsBase, but when importing most of StatsBase features to Statistics (JuliaStats/Statistics.jl#87), take the occasion to make it not <:Real and/or rename it to PValueDisplay.

@devmotion
Copy link
Member Author

So you'd like to close the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aqua.jl found ambiguities
4 participants