Use custom iterators for replace(), skip() and fail() #50

nalimilan · 2017-10-15T12:39:52Z

This dramatically improves the performance of skip() on arrays by getting
rid of the type instability which is currently not well handled.
This optimization cannot be applied to non-array iterables since it relies
on passing indices and accessing an entry several times in some cases.
Hovewer, forcing inlining makes the code somewhat faster even for non-arrays.
Performance improvements are smaller but still significant for replace()
and skip(). There is 2× regression when passing a generator to fail(), though,
but the gain for the array case is worth it.

The second advantage of using custom iterators is that eltype() returns
Nulls.T(eltype(x)) when x is an array, while when using plain generators
it returned Any.

Cf. https://discourse.julialang.org/t/nulls-skip-is-very-slow/6351. With these changes, sum(Nulls.fail(::Array{Union{Int, Null}})) and sum(Nulls.skip(::Array{Union{Int, Null}})) are about 13-25 times slower than a plain sum(::Array{Int}). But they are 8-17 times faster than sum(::Array{Union{Int, Null}}).

Full benchmark on Julia 0.6.0:

julia> using Nulls

julia> using BenchmarkTools

julia> x = Vector{Union{Int, Null}}(rand(Int, 100_000));

julia> x[rand(1:length(x), 10_000)] = null;

julia> skip_gen(x) = sum(v for v in x if !isnull(v))
skip_gen (generic function with 1 method)

julia> skip_iter(x) = sum(Nulls.skip(x))
skip_iter (generic function with 1 method)

julia> @btime skip_gen(x);
  55.129 ms (1266013 allocations: 24.84 MiB)

julia> @btime skip_iter(x);
  288.650 μs (3 allocations: 48 bytes)

julia> @btime skip_gen(v for v in x);
  57.628 ms (1356477 allocations: 26.22 MiB)

julia> @btime skip_iter(v for v in x);
  32.116 ms (1175154 allocations: 19.31 MiB)

julia> replace_gen(x, y) = sum(ifelse(isnull(v), y, v) for v in x)
replace_gen (generic function with 1 method)

julia> replace_iter(x, y) = sum(Nulls.replace(x, y))
replace_iter (generic function with 1 method)

julia> @btime replace_gen(x, 0);
  6.732 ms (661853 allocations: 10.10 MiB)

julia> @btime replace_iter(x, 0);
  113.393 μs (3 allocations: 64 bytes)

julia> @btime replace_gen((v for v in x), 0);
  8.918 ms (842780 allocations: 12.86 MiB)

julia> @btime replace_iter((v for v in x), 0);
  1.671 ms (90471 allocations: 1.38 MiB)

julia> y = Vector{Union{Int, Null}}(rand(Int, 100_000));

julia> fail_gen(x) = sum(v !== null ? v : throw(NullException()) for v in x)
fail_gen (generic function with 1 method)

julia> fail_iter(x) = sum(Nulls.fail(x))
fail_iter (generic function with 1 method)

julia> @btime fail_gen(y);
  1.587 ms (100003 allocations: 1.53 MiB)

julia> @btime fail_iter(y);
  563.937 μs (3 allocations: 48 bytes)

julia> @btime fail_gen(v for v in y);
  3.669 ms (300004 allocations: 4.58 MiB)

julia> @btime fail_iter(v for v in y);
  7.589 ms (700001 allocations: 10.68 MiB)

nalimilan · 2017-10-15T12:45:30Z

src/Nulls.jl

+# once in done() to find the next non-null entry, and once in next() to return it.
+# This works around the type instability problem of the generic fallback.
+@inline function _next_nonnull_ind(x::AbstractArray, s)
+    idx = eachindex(x)


I couldn't find a simpler way to work with indices which would be completely generic. Linear indices would of course work, but they would be slow for LinearSlow arrays.

Couldn't we dispatch and use linear indices for most arrays but a specialized implementation for LinearSlow?

nalimilan · 2017-10-15T12:47:17Z

Any ideas about why Nulls.fail is slower with the custom iterator when passed an iterator?

codecov-io · 2017-10-15T12:55:51Z

Codecov Report

Merging #50 into master will increase coverage by 1.76%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
+ Coverage   92.15%   93.91%   +1.76%     
==========================================
  Files           1        1              
  Lines         102      148      +46     
==========================================
+ Hits           94      139      +45     
- Misses          8        9       +1

Impacted Files	Coverage Δ
src/Nulls.jl	`93.91% <100%> (+1.76%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bff939d...1633af1. Read the comment docs.

This dramatically improves the performance of skip() on arrays by getting rid of the type instability which is currently not well handled. This optimization cannot be applied to non-array iterables since it relies on passing indices and accessing an entry several times in some cases. Hovewer, forcing inlining makes the code somewhat faster even for non-arrays. Performance improvements are smaller but still significant for replace() and skip(). There is 2× regression when passing a generator to fail(), though, but the gain for the array case is worth it. The second advantage of using custom iterators is that eltype() returns Nulls.T(eltype(x)) when x is an array, while when using plain generators it returned Any.

ararslan · 2017-10-15T18:18:01Z

src/Nulls.jl

+    Union{Nulls.T(eltype(itr.x)), typeof(itr.replacement)}
+@inline function Base.next(itr::EachReplaceNull, state)
+    v, s = next(itr.x, state)
+    ((isnull(v) ? itr.replacement : v)::eltype(itr), s)


Would ifelse in place of the explicit branch help at all here?

Actually, it gives slightly slower performance for some obscure reason. I suspect that the consequences of type instability propagate more with ifelse. Anyway I think the compiler is able to get rid of simple branches like this if it considers it's faster. Though we should revisit this once Unions handling will have improved.

ararslan · 2017-10-15T18:20:38Z

src/Nulls.jl

+Base.start(itr::EachReplaceNull) = start(itr.x)
+Base.done(itr::EachReplaceNull, state) = done(itr.x, state)
+Base.eltype(itr::EachReplaceNull) =
+    Union{Nulls.T(eltype(itr.x)), typeof(itr.replacement)}


If the user is replacing null values, presumably they would expect the resulting element type to be typejoin(Nulls.T(eltype(itr.x)), typeof(itr.replacement)). Otherwise, wouldn't replace([1,null,3], 2.0) have element type Union{Int, Float64}? Seems like this might cause some slowdowns.

Hmm, that's a subtle distinction and I never know which one is the most appropriate. Wouldn't typejoin always return an abstract type when replacement is of a different type? I'm not sure Real is better than Union{Int, Float64}. A third option is to use promotion to choose the best type, and perform conversion on the fly. I'm really not sure what's the best approach, maybe in part because I don't have a use case in mind.

Oh sorry, not typejoin, I mean promote types.

OK. Makes sense to use promote indeed. I'll do that unless somebody is of a different opinion.

nalimilan · 2017-10-16T12:12:35Z

FWIW, for a 1M vector like in the benchmark above, both R and DataArrays (with sum(da, skipnull=true)) take 3ms to compute the sum when skipping nulls, while Vector{Union{Int, Null}} (with sum(Nulls.skip(x))) takes 8ms. So we're not that far.

cjprybol

Benchmarks and tests look great, thanks Milan!

nalimilan · 2017-10-16T18:47:30Z

OK, I've added a commit to use promotion.

nalimilan · 2017-10-16T18:54:52Z

Woops, it seems to kill performance. Need to investigate why.

nalimilan · 2017-10-16T19:56:33Z

Unfortunately, I couldn't the same performance as before. Calling convert on values from the input iterable is enough to make the function a lot slower. So for now I've added code so that replacement is converted to the element type of the input iterable. In most cases that's what people want anyway, and if the need something more general they should use something else like Base.replace or CategoricalArrays.recode. We should be able to revisit this when the compiler has improved.

quinnj · 2017-10-17T03:31:52Z

src/Nulls.jl

+end
+
+"""
+    Nulls.fail(itr)


I still don't really understand the use for this iterator/function. Can you remind me?

Well, it can be used to be sure you don't pass an iterator containing nulls to a function which would accept it silently. It's faster than any(isnull, x) since checking happens on the fly. Currently sum(Nulls.fail(x)) also much faster than sum(x) since there's no type instability, but with compiler improvements I guess this difference could go away.

quinnj · 2017-10-17T03:55:29Z

src/Nulls.jl

+Base.eltype(itr::EachReplaceNull) = Nulls.T(eltype(itr.x))
+@inline function Base.next(itr::EachReplaceNull, state)
+    v, s = next(itr.x, state)
+    ((isnull(v) ? itr.replacement : v)::eltype(itr), s)


I get a slight speedup by doing

if v isa Null return (itr.replacement, s) else return (v, s) end

Don't ask me why or even if it will work for you, but worth a try?

Wow, indeed that makes a significant difference (I've updated the timings above). Now we're about as fast as R. It's unfortunate that it isn't extensible to custom types, but for now it's certainly worth it.

I couldn't use isa Null for fail though because of a weird codegen bug: JuliaLang/julia#24177.

nalimilan · 2017-10-19T16:42:59Z

Merge?

nalimilan commented Oct 15, 2017

View reviewed changes

nalimilan force-pushed the nl/itr branch from 4863e50 to 2060fe5 Compare October 15, 2017 13:08

ararslan reviewed Oct 15, 2017

View reviewed changes

cjprybol reviewed Oct 16, 2017

View reviewed changes

nalimilan force-pushed the nl/itr branch from 4cc05fa to 2e213d5 Compare October 16, 2017 18:51

Convert replacement to element type of input iterator

65297f9

nalimilan force-pushed the nl/itr branch from 2e213d5 to 65297f9 Compare October 16, 2017 19:53

quinnj reviewed Oct 17, 2017

View reviewed changes

nalimilan mentioned this pull request Oct 17, 2017

Codegen bug with isa JuliaLang/julia#24177

Closed

Use isa Null instead of isnull() to improve performance

1633af1

nalimilan force-pushed the nl/itr branch from 921d956 to 1633af1 Compare October 17, 2017 07:43

quinnj merged commit b540c7a into master Oct 19, 2017

quinnj deleted the nl/itr branch October 19, 2017 16:45

nalimilan mentioned this pull request Oct 19, 2017

Make Each(Drop|Replace|Fail)Null iterators faster JuliaStats/DataArrays.jl#289

Merged

davidanthoff mentioned this pull request Oct 26, 2017

Make sure our dropna is fast queryverse/DataValues.jl#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use custom iterators for replace(), skip() and fail() #50

Use custom iterators for replace(), skip() and fail() #50

nalimilan commented Oct 15, 2017 •

edited

nalimilan Oct 15, 2017

ararslan Oct 15, 2017

nalimilan commented Oct 15, 2017

codecov-io commented Oct 15, 2017 •

edited

ararslan Oct 15, 2017

nalimilan Oct 15, 2017

ararslan Oct 15, 2017

nalimilan Oct 15, 2017

ararslan Oct 15, 2017

nalimilan Oct 15, 2017

nalimilan commented Oct 16, 2017

cjprybol left a comment

nalimilan commented Oct 16, 2017

nalimilan commented Oct 16, 2017

nalimilan commented Oct 16, 2017

quinnj Oct 17, 2017

nalimilan Oct 17, 2017

quinnj Oct 17, 2017

nalimilan Oct 17, 2017

nalimilan commented Oct 19, 2017

Use custom iterators for replace(), skip() and fail() #50

Use custom iterators for replace(), skip() and fail() #50

Conversation

nalimilan commented Oct 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 15, 2017

codecov-io commented Oct 15, 2017 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 16, 2017

cjprybol left a comment

Choose a reason for hiding this comment

nalimilan commented Oct 16, 2017

nalimilan commented Oct 16, 2017

nalimilan commented Oct 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 19, 2017

nalimilan commented Oct 15, 2017 •

edited

codecov-io commented Oct 15, 2017 •

edited