-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
Hi! Really cool that you are doing this. I agree that we should allow for the obsdim to be specified. In fact we already have a "system" for doing so which we use at MLDataPatterns.jl (that package will be the new back-end for MLDataUtils for all data subsetting, k-folds etc). Would be cool if you could adapt the code to that "system". I describe the general way of doing it here: joshday/OnlineStats.jl#40 (comment) . For most code we allow any order array, but it would already be a big improvement to just have code for vectors an matrices edit: |
Cool. I ll definitely try to adapt to that scheme! |
somewhat like this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thanks for this. Made a few comments. Other than that the main thing missing are a few tests
src/feature_scaling.jl
Outdated
@@ -1,55 +1,130 @@ | |||
""" | |||
`μ = center!(X[, μ])` | |||
`μ = center!(X, obsdim[, μ])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obsdim
should be depicted as the last parameter and also optional
src/feature_scaling.jl
Outdated
center!(X, mu, obsdim) | ||
end | ||
|
||
function center!{T}(X::AbstractVector{T}, bosdim::ObsDim.Constant{1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in bosdim
, although you could just write , ::ObsDim.Constant{1}
src/feature_scaling.jl
Outdated
X[j, i] = X[j, i] - μ[j] | ||
end | ||
function center!(X, mu; obsdim=LearnBase.default_obsdim(X)) | ||
center!(X, mu, LearnBase.obs_dim(obsdim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current default indentation for Julia is 4 spaces. so each indentation level should be 4 spaces more
src/feature_scaling.jl
Outdated
""" | ||
`μ, σ = rescale!(X[, μ, σ])` | ||
`μ, σ = rescale!(X, obsdim[, μ, σ])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obsdim
should be listed as the last parameter and also as optional
Thanks for the comments, I ll add some tests later on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Nice work. One last request
src/feature_scaling.jl
Outdated
for i = 1:n | ||
X[j, i] = X[j, i] - μ[j] | ||
function center!(X, μ; obsdim=LearnBase.default_obsdim(X)) | ||
center!(X, μ, LearnBase.obs_dim(obsdim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change
LearnBase.obs_dim(obsdim)
everywhere to
convert(LearnBase.ObsDimension, obsdim)
(I deprecated the former just a few days ago)
src/feature_scaling.jl
Outdated
@@ -127,14 +127,14 @@ function rescale!(X::AbstractMatrix, μ::AbstractVector, σ::AbstractVector, ::O | |||
μ, σ | |||
end | |||
|
|||
function rescale!(X::AbstractVector, μ::AbstractVector, σ::AbstractVector, ::ObsDim.Constant{1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i figured this would include ObsDim.First() and .Last(). No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since X
is a vector, the fallback method you implemented transforms ObsDim.Last
to ObsDim.Constant{1}
anyway.
I think here ::ObsDim.Constant{1}
was cleaner than ::ObsDim.Constant
(which would also allow ObsDim.Constant{2}()
), because if the data X
is a vector, then 1
is the only sensible dimension to choose. i.e. I'd expect a method error if I use ObsDim.Constant{2}()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. i ll undo my improvements :)
This is optional, but if you feel up to it, it would be cool to update the corresponding section in the documentation: https://raw.githubusercontent.com/JuliaML/MLDataUtils.jl/master/docs/data/feature.rst |
lgtm. Ready to merge when you are |
thanks! |
thanks for all the comments! also learned about singleton types in the process. :) |
I will add a |
It might be useful to also have the feature rescaling functions working on both dimensions.
Currently the tests are failing, but I figured to check with you first if you like the general idea.