Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalers #88

Open
leonardtschora opened this issue Jul 22, 2020 · 2 comments
Open

Scalers #88

leonardtschora opened this issue Jul 22, 2020 · 2 comments

Comments

@leonardtschora
Copy link

Hi everyone, I'm starting to work around the use of ScikitLearn in Julia.

In my understanding, there are a few models and tools curently implemented in Julia and the rest of the code are bindings to Python function. It is possible to use all of Scikit's functions via the @py_import macro.

If I'm not mistaken, the scalers have not been ported yet to Julia.

I made this quick workaround to implement my own Scaler class in Julia and it seems that they are way faster (which is why we are using Julia?). My Scaler class is far from being complete (no keywords arguments), but it seems that there exists such scalers JuliaML.

Is there a reason why Scalers (and the question could be extended to a lot of other tools) are not currently in ScikitLearn.jl?
Thanks again for your time.

using Statistics, ScikitLearn, ScikitLearnBase, BenchmarkTools
import ScikitLearnBase: fit!, transform, inverse_transform
@sk_import preprocessing: StandardScaler

"""
A Julia standardScaler
"""
mutable struct JStandardScaler <: BaseEstimator
    epsilon::Float64

    mean_::Matrix
    std_::Matrix
    real_std_::Matrix
    JStandardScaler(; epsilon=0.001) = new(epsilon)
end

function fit!(model::JStandardScaler, X, y=nothing)
    model.mean_ = mean(X, dims=1)
    model.real_std_ = std(X, dims=1)
    model.std_ = map(model.real_std_) do x
        x > model.epsilon && return x
        return model.epsilon
    end
    return model
end

function transform(model::JStandardScaler, X)  
    return @. (X - model.mean_) / model.std_
end

function inverse_transform(model::JStandardScaler, X)
    return @. X * model.std_ + model.mean_
end

n = Int(10e6)
X = rand(Int, n, 12)

julia_scaler = JStandardScaler()
fit!(julia_scaler, X)
X_ = transform(julia_scaler, X)
X__ = inverse_transform(julia_scaler, X_)
@assert isapprox(X, X__)

python_scaler = StandardScaler()
fit!(python_scaler, X)
X_ = transform(python_scaler, X)
X__ = inverse_transform(python_scaler, X_)
@assert isapprox(X, X__)

julia_scaler = JStandardScaler()
@btime begin
    fit!($julia_scaler, $X)
    X_ = transform($julia_scaler, $X)
    X__ = inverse_transform($julia_scaler, $X_)
end

python_scaler = StandardScaler()
@btime begin
    fit!($python_scaler, $X)
    X_ = transform($python_scaler, $X)
    X__ = inverse_transform($python_scaler, $X_)
end
@cstjean
Copy link
Owner

cstjean commented Jul 22, 2020

Is there a reason why Scalers (and the question could be extended to a lot of other tools) are not currently in ScikitLearn.jl?

Just lack of time! If you would like to contribute them, that would be a very nice PR.

Meanwhile, as happy as I am to see interest in ScikitLearn.jl... Have you checked out MLJ.jl? It is very actively developed. Unless someone steps up to push it further, ScikitLearn.jl will continue its life as a "gateway package", easing Python users into a new ecosystem.

@leonardtschora
Copy link
Author

If you would like to contribute them, that would be a very nice PR.

I don't know yet if I will have the time to make a nicer Scaler object. For now, I just need the very basic one.

I checked out MLJ, but it seems llike juste another wrapper/interface for 3rd parties MachineLearning packages. If it is performance-wise more efficient I migth switch to it but for now I prefer using the ScikitLearn's algorithms.

Thanks a lot for your help, I migth ask new questions soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants