Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to FillImputer #289

Merged
merged 3 commits into from
Aug 5, 2020
Merged

Fixes to FillImputer #289

merged 3 commits into from
Aug 5, 2020

Conversation

ablaom
Copy link
Member

@ablaom ablaom commented Aug 5, 2020

This PR addresses #287 and #286. To this end I have added a UnivariateFillImputer. This shall make refactoring after #288 easier, or might be a model for implementing #288.

To do:

  • update model metadata

@ablaom ablaom merged commit a0fd450 into dev Aug 5, 2020
@ablaom ablaom mentioned this pull request Aug 5, 2020
if Missing <: elscitype(vnew)
w = copy(vnew) # transform must be non-mutating
w[ismissing.(w)] .= filler
w_tight = convert.(nonmissing(eltype(w)), w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom I know this has already been merged. I hope you don't mind me making a slight change here.

 w = copy(vnew) # transform must be non-mutating
        w[ismissing.(w)] .= filler
        w_tight = convert.(nonmissing(eltype(w)), w)

the above code has too much allocations (w = copy(vnew) and w_tight = convert.(nonmissing(eltype(w)), w). Since a new array must always be created due the requirement that transform must be non-mutating , i feel that the following rewrite is slightly more efficient (Any slight increase in efficiency is required. right?)

w_tight = similar(vnew, nonmissing(eltype(vnew)))
 @inbounds for i in eachindex(vnew)
       ismissing(vnew[i]) ? (w_tight[i] = filler ) : (w_tight[i] = vnew[i])
     end

Maybe the following contrived example can help show why?

using Random, BenchmarkTools
function h1!(vnew, filler) #code avoiding double allocation
       w = similar(vnew, nonmissingtype(eltype(vnew)))
       @inbounds for i in eachindex(vnew)
       ismissing(vnew[i]) ? (w[i] = filler ) : (w[i] = vnew[i])
       end
       w
end
function h2!(vnew, filler) #code with double allocations
       w = copy(vnew) # transform must be non-mutating
               w[ismissing.(w)] .= filler
               w_tight = convert.(nonmissingtype(eltype(w)), w)
       w_tight
end
n = [repeat([missing],10000)..., rand(20000)...]; #array containing missing values
n1 = copy(n)
n2 = copy(n)
shuffle!(n);
n3 = copy(n);
n4 = copy(n);
julia> @btime h1!($n1, 0);
  38.010 μs (2 allocations: 234.45 KiB)
julia> @btime h2!($n2, 0);
  144.883 μs (9 allocations: 584.45 KiB)
julia> @btime h1!($n3, 0);
  166.380 μs (2 allocations: 234.45 KiB)

julia> @btime h2!($n4, 0);
  173.802 μs (9 allocations: 584.45 KiB)

h1! compared to h2! is slightly faster and involves little allocations (this becomes important if this code is called by other functions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes great idea! Can you make a new PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thank you for investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants