-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type coercion for vectors and columns #114
Conversation
39bb694
to
19ea457
Compare
Okay, great start, thanks. Can we please:
Notes: (1) This raises a new issue. The user may want conversion to a custom type |
Ok, I think I addressed all comments. Thanks for pointing out tests for Regarding conversion to |
Yes, that's would seem to work for floats. (And I realise I have made a mistake in declaring all However the discrete case is different, for consider this:
The vector
I believe that if you apply your coecion to it you will loose some of the levels (there are four of them), for consider this:
So, I think you need to explicitly check (using Incidentally, some of the scitype methods changed name in MLJBase 0.1.1. In particular Some of the nomenclature surrounding scitypes is hard to keep
julia> X = (height = Float64[183, 145, 160, 78, 182, 76],
gender = categorical([:m, :f, :f, :f, :m, :m]),
weight = Float64[92, 67, 62, 25, 80, 31],
age = Float64[53, 12, 60, 5, 31, 7],
julia> scitypes(X)
(height = Continuous,
gender = Multiclass{2},
weight = Continuous,
age = Continuous,
overall_health = FiniteOrderedFactor{3},)
The method
|
Oops, closed by mistake. |
In addition: * fix output type of `coerce(types, X)` * do not convert `AbstractVector{<:Integer}` to `Vector{Int}` * add more tests * check if a vector already has the requested scitype in the discrete case
Discrete case fixed as suggested, thanks! |
Great. One last request. If a vector has I think the preferred behaviour is for the missing values to be ignored with a warning thrown. So, for example, Start by defining new methods Sound reasonable to you? |
julia> MLJ.coerce(Continuous, [4, missing, 7])
3-element Array{Union{Missing, Float64},1}:
4.0
missing
7.0 In order to show the warning, for the continuous case we can just catch arrays with missing values with: function coerce(T::Type{Continuous}, y::AbstractVector{Union{S,Missing}}) where S
@warn "Missing values encountered. Coerced to Union{Missing,Continuous} instead of Continuous."
return float(y)
end
coerce(T::Type{Continuous}, y) = float(y)
[...]
julia> MLJ.coerce(Continuous, [4, missing, 7])
┌ Warning: Missing values encountered. Coerced to Union{Missing,Continuous} instead of Continuous.
└ @ MLJ ~/.julia/dev/MLJ/src/tasks.jl:36
3-element Array{Union{Missing, Float64},1}:
4.0
missing
7.0 Caveat: this is the same test for missing values used by We can do something similar for the other cases. This isn't exactly what you suggested but gives pretty much the same result (provided that the caveat above is acceptable). |
You raise a good point. I want |
Ok, here is how I went on to deal with missing values for the Note: actually, only the last method could be kept, and all tests would pass (apart those that check that |
Nice. |
Happy for the same approach to the Discrete scitypes. |
There is a bug here I think (from MLJ/tasks.jl):
|
So there should be a default type to coerce to? Like |
No. You don't touch columns that are not keys of the dictionary. That is, if :foo is a column of |
Just pass a column as is when there is no coercion type in the provided dictionary.
Ok, I see what you mean, thanks. There are quite some tests, I hope I covered all relevant cases. BTW, it'd be useful to enable test coverage, I recommend https://coveralls.io for which it's possible to disable annoying PR comments. |
Very happy with the fix. A few fussy points:
the following would be sufficient:
And let's add |
That's already the case:
Yes, good point, I just did it |
Thanks. Regarding coerce to Muliticlass/FiniteOrderedFactor, your methods already allow for missing values. All that's needed, I believe, is to add the warning. |
Well, I was just delegating the work to
Done! |
Thanks!! |
Address first part of #109. I've added basic tests, but they can probably be improved.