Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom iterators for replace(), skip() and fail() #50

Merged
merged 3 commits into from
Oct 19, 2017
Merged

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Oct 15, 2017

This dramatically improves the performance of skip() on arrays by getting
rid of the type instability which is currently not well handled.
This optimization cannot be applied to non-array iterables since it relies
on passing indices and accessing an entry several times in some cases.
Hovewer, forcing inlining makes the code somewhat faster even for non-arrays.
Performance improvements are smaller but still significant for replace()
and skip(). There is 2× regression when passing a generator to fail(), though,
but the gain for the array case is worth it.

The second advantage of using custom iterators is that eltype() returns
Nulls.T(eltype(x)) when x is an array, while when using plain generators
it returned Any.

Cf. https://discourse.julialang.org/t/nulls-skip-is-very-slow/6351. With these changes, sum(Nulls.fail(::Array{Union{Int, Null}})) and sum(Nulls.skip(::Array{Union{Int, Null}})) are about 13-25 times slower than a plain sum(::Array{Int}). But they are 8-17 times faster than sum(::Array{Union{Int, Null}}).

Full benchmark on Julia 0.6.0:

julia> using Nulls

julia> using BenchmarkTools

julia> x = Vector{Union{Int, Null}}(rand(Int, 100_000));

julia> x[rand(1:length(x), 10_000)] = null;

julia> skip_gen(x) = sum(v for v in x if !isnull(v))
skip_gen (generic function with 1 method)

julia> skip_iter(x) = sum(Nulls.skip(x))
skip_iter (generic function with 1 method)

julia> @btime skip_gen(x);
  55.129 ms (1266013 allocations: 24.84 MiB)

julia> @btime skip_iter(x);
  288.650 μs (3 allocations: 48 bytes)

julia> @btime skip_gen(v for v in x);
  57.628 ms (1356477 allocations: 26.22 MiB)

julia> @btime skip_iter(v for v in x);
  32.116 ms (1175154 allocations: 19.31 MiB)

julia> replace_gen(x, y) = sum(ifelse(isnull(v), y, v) for v in x)
replace_gen (generic function with 1 method)

julia> replace_iter(x, y) = sum(Nulls.replace(x, y))
replace_iter (generic function with 1 method)

julia> @btime replace_gen(x, 0);
  6.732 ms (661853 allocations: 10.10 MiB)

julia> @btime replace_iter(x, 0);
  113.393 μs (3 allocations: 64 bytes)

julia> @btime replace_gen((v for v in x), 0);
  8.918 ms (842780 allocations: 12.86 MiB)

julia> @btime replace_iter((v for v in x), 0);
  1.671 ms (90471 allocations: 1.38 MiB)

julia> y = Vector{Union{Int, Null}}(rand(Int, 100_000));

julia> fail_gen(x) = sum(v !== null ? v : throw(NullException()) for v in x)
fail_gen (generic function with 1 method)

julia> fail_iter(x) = sum(Nulls.fail(x))
fail_iter (generic function with 1 method)

julia> @btime fail_gen(y);
  1.587 ms (100003 allocations: 1.53 MiB)

julia> @btime fail_iter(y);
  563.937 μs (3 allocations: 48 bytes)

julia> @btime fail_gen(v for v in y);
  3.669 ms (300004 allocations: 4.58 MiB)

julia> @btime fail_iter(v for v in y);
  7.589 ms (700001 allocations: 10.68 MiB)

# once in done() to find the next non-null entry, and once in next() to return it.
# This works around the type instability problem of the generic fallback.
@inline function _next_nonnull_ind(x::AbstractArray, s)
idx = eachindex(x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a simpler way to work with indices which would be completely generic. Linear indices would of course work, but they would be slow for LinearSlow arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we dispatch and use linear indices for most arrays but a specialized implementation for LinearSlow?

@nalimilan
Copy link
Member Author

Any ideas about why Nulls.fail is slower with the custom iterator when passed an iterator?

@codecov-io
Copy link

codecov-io commented Oct 15, 2017

Codecov Report

Merging #50 into master will increase coverage by 1.76%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
+ Coverage   92.15%   93.91%   +1.76%     
==========================================
  Files           1        1              
  Lines         102      148      +46     
==========================================
+ Hits           94      139      +45     
- Misses          8        9       +1
Impacted Files Coverage Δ
src/Nulls.jl 93.91% <100%> (+1.76%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bff939d...1633af1. Read the comment docs.

This dramatically improves the performance of skip() on arrays by getting
rid of the type instability which is currently not well handled.
This optimization cannot be applied to non-array iterables since it relies
on passing indices and accessing an entry several times in some cases.
Hovewer, forcing inlining makes the code somewhat faster even for non-arrays.
Performance improvements are smaller but still significant for replace()
and skip(). There is 2× regression when passing a generator to fail(), though,
but the gain for the array case is worth it.

The second advantage of using custom iterators is that eltype() returns
Nulls.T(eltype(x)) when x is an array, while when using plain generators
it returned Any.
src/Nulls.jl Outdated
Union{Nulls.T(eltype(itr.x)), typeof(itr.replacement)}
@inline function Base.next(itr::EachReplaceNull, state)
v, s = next(itr.x, state)
((isnull(v) ? itr.replacement : v)::eltype(itr), s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ifelse in place of the explicit branch help at all here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it gives slightly slower performance for some obscure reason. I suspect that the consequences of type instability propagate more with ifelse. Anyway I think the compiler is able to get rid of simple branches like this if it considers it's faster. Though we should revisit this once Unions handling will have improved.

src/Nulls.jl Outdated
Base.start(itr::EachReplaceNull) = start(itr.x)
Base.done(itr::EachReplaceNull, state) = done(itr.x, state)
Base.eltype(itr::EachReplaceNull) =
Union{Nulls.T(eltype(itr.x)), typeof(itr.replacement)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user is replacing null values, presumably they would expect the resulting element type to be typejoin(Nulls.T(eltype(itr.x)), typeof(itr.replacement)). Otherwise, wouldn't replace([1,null,3], 2.0) have element type Union{Int, Float64}? Seems like this might cause some slowdowns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a subtle distinction and I never know which one is the most appropriate. Wouldn't typejoin always return an abstract type when replacement is of a different type? I'm not sure Real is better than Union{Int, Float64}. A third option is to use promotion to choose the best type, and perform conversion on the fly. I'm really not sure what's the best approach, maybe in part because I don't have a use case in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, not typejoin, I mean promote types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Makes sense to use promote indeed. I'll do that unless somebody is of a different opinion.

@nalimilan
Copy link
Member Author

FWIW, for a 1M vector like in the benchmark above, both R and DataArrays (with sum(da, skipnull=true)) take 3ms to compute the sum when skipping nulls, while Vector{Union{Int, Null}} (with sum(Nulls.skip(x))) takes 8ms. So we're not that far.

Copy link
Contributor

@cjprybol cjprybol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks and tests look great, thanks Milan!

@nalimilan
Copy link
Member Author

OK, I've added a commit to use promotion.

@nalimilan
Copy link
Member Author

Woops, it seems to kill performance. Need to investigate why.

@nalimilan
Copy link
Member Author

Unfortunately, I couldn't the same performance as before. Calling convert on values from the input iterable is enough to make the function a lot slower. So for now I've added code so that replacement is converted to the element type of the input iterable. In most cases that's what people want anyway, and if the need something more general they should use something else like Base.replace or CategoricalArrays.recode. We should be able to revisit this when the compiler has improved.

end

"""
Nulls.fail(itr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't really understand the use for this iterator/function. Can you remind me?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can be used to be sure you don't pass an iterator containing nulls to a function which would accept it silently. It's faster than any(isnull, x) since checking happens on the fly. Currently sum(Nulls.fail(x)) also much faster than sum(x) since there's no type instability, but with compiler improvements I guess this difference could go away.

src/Nulls.jl Outdated
Base.eltype(itr::EachReplaceNull) = Nulls.T(eltype(itr.x))
@inline function Base.next(itr::EachReplaceNull, state)
v, s = next(itr.x, state)
((isnull(v) ? itr.replacement : v)::eltype(itr), s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a slight speedup by doing

if v isa Null
        return (itr.replacement, s)
    else
        return (v, s)
    end

Don't ask me why or even if it will work for you, but worth a try?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, indeed that makes a significant difference (I've updated the timings above). Now we're about as fast as R. It's unfortunate that it isn't extensible to custom types, but for now it's certainly worth it.

I couldn't use isa Null for fail though because of a weird codegen bug: JuliaLang/julia#24177.

@nalimilan
Copy link
Member Author

Merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants