Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nominal ranges consisting of different functions. #666

Merged
merged 16 commits into from
Nov 7, 2021

Conversation

davnn
Copy link
Contributor

@davnn davnn commented Oct 29, 2021

Outlier detection models might tune different score normalization functions, which is not possible currently.

@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2021

Codecov Report

Merging #666 (190dca3) into dev (4eac65c) will decrease coverage by 0.04%.
The diff coverage is 80.00%.

❗ Current head 190dca3 differs from pull request most recent head 3cdde60. Consider uploading reports for the commit 3cdde60 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #666      +/-   ##
==========================================
- Coverage   85.30%   85.26%   -0.05%     
==========================================
  Files          39       39              
  Lines        3417     3434      +17     
==========================================
+ Hits         2915     2928      +13     
- Misses        502      506       +4     
Impacted Files Coverage Δ
src/hyperparam/one_dimensional_ranges.jl 71.42% <77.77%> (-0.45%) ⬇️
src/utilities.jl 81.67% <81.81%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eac65c...3cdde60. Read the comment docs.

@ablaom
Copy link
Member

ablaom commented Oct 31, 2021

This should be possible already, as in:

julia> range(Function, :f, values=[cos, sin, tan])

An error is thrown, but only because of a bug in the display. If I suppress display, it works:

julia> r = range(Any, :f, values=[cos, sin, tan]);

julia> iterator(r)
3-element Vector{Function}:
 cos (generic function with 13 methods)
 sin (generic function with 13 methods)
 tan (generic function with 12 methods)

However, I expect your complaint is with the case where you use a model in the first argument to automatically infer the type, as in

mutable struct MyModel <: Deterministic
   f::Function
end

julia> range(MyModel(sin), :f, values=[sin, cos, tan])
ERROR: ArgumentError: `Function[sin, cos, tan]` must be an instance of type `AbstractVector{<:typeof(sin)}`.
Stacktrace:
 [1] nominal_range(T::Type, field::Symbol, values::Vector{Function})
   @ MLJBase ~/MLJ/MLJBase/src/hyperparam/one_dimensional_ranges.jl:176
 [2] range(model::MyModel, field::Symbol; values::Vector{Function}, lower::Nothing, upper::Nothing, origin::Nothing, unit::Nothing, scale::Nothing)
   @ MLJBase ~/MLJ/MLJBase/src/hyperparam/one_dimensional_ranges.jl:115
 [3] top-level scope
   @ REPL[25]:1

We have run into related issues frequently, for example when a hyperparameter has type Union{Nothing, Int} and the instance used for inferring the type has nothing but the values are Int. A more generic solution to that suggested in this PR is to get the hyperparameter type from the model type instead of model instance . I feel there must have been a reason we dismissed this, but I can't think of one now!

@davnn Thoughts?

@davnn
Copy link
Contributor Author

davnn commented Oct 31, 2021

A more generic solution to that suggested in this PR is to get the hyperparameter type from the model typeinstead of model instance .

I was actually surprised that it doesnt work that way, but yes that should be the solution if it‘s doable imho.

Edit: Should be trivial to implement https://docs.julialang.org/en/v1/devdocs/reflection/. I might get to it in a couple of days.

@davnn
Copy link
Contributor Author

davnn commented Nov 1, 2021

@ablaom awaiting your comments. We should probably add some tests ;)

@ablaom
Copy link
Member

ablaom commented Nov 1, 2021

@davnn Thanks for that. This looks good.

There could be some unexpected behaviour for the occasional field that is not typed in the struct definition (ie, is Any). Perhaps if the type is Any we should fall back to the type of the instance and issue a warning like "Inferred type of range values is Any. Using $T. To override specify type explicitly, as in range(Int64, :my_parameter, ...)."?

Also, if we have Union{Nothing, Integer}, say, then we really want to pass Integer on to the more specialised methods or get unexpected behaviour (in numeric ranges, integers imply rounding, other values do not).

What do you think?

By the way, I have in mind to factor this code out of MLJBase, in which case the first argument will always be a type (to avoid dependency on MLJ) and this "type inference" stuff will be MLJ-specific overloading of the externally defined method.

And, yes some tests would be good.

Your revisions would trigger a breaking release, which could be a week or two away yet.

If you want to kick the can on this, I'd be happy to merge your original fix immediately, but I'm also happy to support the new effort. In the latter case, please rebase onto the branch "for-0-point-19-release branch".

@davnn
Copy link
Contributor Author

davnn commented Nov 1, 2021

There could be some unexpected behaviour for the occasional field that is not typed in the struct definition (ie, is Any). Perhaps if the type is Any we should fall back to the type of the instance and issue a warning like "Inferred type of range values is Any. Using $T. To override specify type explicitly, as in range(Int64, :my_parameter, ...)."?

Sounds good, I've implemented the changes.

Also, if we have Union{Nothing, Integer}, say, then we really want to pass Integer on to the more specialised methods or get unexpected behaviour (in numeric ranges, integers imply rounding, other values do not).

My proposal didn't really address this case, but I have added it now. The problem is what should we do if we get an ambiguous union like Union{Float32, Integer, Nothing}. We could always take the first <: Real and throw a warning, that's how I've implemented it for now.

If you want to kick the can on this, I'd be happy to merge your original fix immediately, but I'm also happy to support the new effort. In the latter case, please rebase onto the branch "for-0-point-19-release branch".

Rebased.

@ablaom ablaom changed the base branch from dev to for-0-point-19-release November 1, 2021 21:06
@ablaom ablaom closed this Nov 1, 2021
@ablaom ablaom reopened this Nov 1, 2021
@ablaom
Copy link
Member

ablaom commented Nov 1, 2021

Double checking that branch: #669

@ablaom
Copy link
Member

ablaom commented Nov 1, 2021

Okay that branch is not passing tests. I'll investigate and get back to you.

@ablaom ablaom closed this Nov 1, 2021
@ablaom ablaom reopened this Nov 1, 2021
else
T = model
end
if T <: Real && values === nothing
union_types = Base.uniontypes(T)
Copy link
Member

@ablaom ablaom Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I suspect uniontypes is not part of the public API, right? (Although I'm very glad to learn about this method, as it's better than T.a and T.b !) Additionally, I don't think "first" is well-defined, because Union{Int,Char} == Union{Char,Int}.

How about we revert to the typeof instance in the case of ambiguous unions as well? I feel this is a corner of a corner case anyway.

Copy link
Member

@ablaom ablaom Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is little more subtle than I thought at first. What about using Base.typesplit insead? This is used in the implementation of MIssings.nonmissingtype:

julia> Base.typesplit(Union{Nothing,Float64,Int}, Nothing)
Union{Float64, Int64}

That is, if Nothing <: T then replace T with typesplit(T, Nothing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I suspect uniontypes is not part of the public API, right? (Although I'm very glad to learn about this method, as it's better than T.a and T.b !) Additionally, I don't think "first" is well-defined, because Union{Int,Char} == Union{Char,Int}.

Nope it's not public, we could also copy the implementation (see below). It's first as in first of the uniontypes array, but we should remove that from the warning message and show the used type instead.

_uniontypes(x::Union, ts) = (_uniontypes(x.a,ts); _uniontypes(x.b,ts); ts)
_uniontypes(@nospecialize(x), ts) = (push!(ts, x); ts)
uniontypes(@nospecialize(x)) = _uniontypes(x, Any[])

Not sure how Base.typesplit would help us here? What range would you create with the resulting Union{Float64, Int64}? You'd still have to guess what type is meant or not? I mean instead of guessing we could just error when there is an ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd still have to guess what type is meant or not?

You are right!

I mean instead of guessing we could just error when there is an ambiguity.

Yes, let's throw error in case of ambiguity. With a suggestion to explicitly specify the type, as we already have in existing warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's throw error in case of ambiguity. With a suggestion to explicitly specify the type, as we already have in existing warnings.

Done, see tests for examples.

@ablaom
Copy link
Member

ablaom commented Nov 1, 2021

That branch is now fixed. It seems you need to make new commits to re-trigger CI.

@ablaom ablaom merged commit 191fae3 into JuliaAI:for-0-point-19-release Nov 7, 2021
@ablaom ablaom mentioned this pull request Nov 7, 2021
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants