Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast pathway for copy, collect, tcollect, and tcopy for size-stable operations #553

Closed
wants to merge 18 commits into from

Conversation

MasonProtter
Copy link
Member

Current State

Fundamentally, Transducers is quite good at doing reductions but collecting results into an output array is a major weakness. The way that it does this currently is essentially just doing

foldxl(append!!, Map(f), coll)

(or foldxt for the parallel version). If f is expensive to evaluate, then this extra overhead isn't so bad, but for functions that can be done in a CPU cycle or two, it's catastrophic:

Here's how it currently looks with a very cheap function (abs):

julia> let A = rand(100_000)
           @btime map(abs, $A)
           @btime collect(Map(abs), $A)
           @btime tcollect(Map(abs), $A)
       end;
  31.440 μs (2 allocations: 781.30 KiB)
  70.460 μs (12 allocations: 1.83 MiB)
  212.270 μs (123 allocations: 4.54 MiB)

And here's a more expensive function (sin):

julia> let A = rand(100_000)
           @btime map(sin, $A)
           @btime collect(Map(sin), $A)
           @btime tcollect(Map(sin), $A)
       end;
  447.810 μs (2 allocations: 781.30 KiB)
  486.680 μs (12 allocations: 1.83 MiB)
  302.360 μs (123 allocations: 4.54 MiB)

This PR

In this PR I made a version of collect(xf::Transducer, coll) (and similar for copy) operating on transducers that checks if xf preserves the size of coll (i.e. Map is okay, but Filter is not), and checks if coll has a known (runtime) size. If both of those are satisfied, then we do a more optimized method that involves setindex!! on arrays.

We can't do the setindex!! thing directly for tcollect since it would cause race conditions if the output object changed, so instead for tcollect I split the collection into a bunch of chunks whose size is determined by basesize (I use Iterators.partition for this currently and want to fix that before merging to use SplittablesBase.jl).

Now here's what those benchmarks look like with my new changes:
abs:

julia> let A = rand(100_000)
           @btime map(abs, $A)
           @btime collect(Map(abs), $A)
           @btime tcollect(Map(abs), $A)
       end;
  28.860 μs (2 allocations: 781.30 KiB)
  28.870 μs (2 allocations: 781.30 KiB)
  162.670 μs (244 allocations: 3.15 MiB)

and sin:

julia> let A = rand(100_000)
           @btime map(sin, $A)
           @btime collect(Map(sin), $A)
           @btime tcollect(Map(sin), $A)
       end;
  481.480 μs (2 allocations: 781.30 KiB)
  482.801 μs (2 allocations: 781.30 KiB)
  217.760 μs (244 allocations: 3.15 MiB)

So that's a nice speedup, though tcollect is still leaving some performance on the table, it's still an improvement. This should help alleviate tkf/ThreadsX.jl#196 and tkf/ThreadsX.jl#196, though it still won't be as fast as ThreadsX.map! since the way we combine the results from different arrays is not as efficient as preallocating and then just assigning.

@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Merging #553 (c616391) into master (f8d0dfe) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #553      +/-   ##
==========================================
+ Coverage   95.43%   95.54%   +0.11%     
==========================================
  Files          32       32              
  Lines        2233     2268      +35     
==========================================
+ Hits         2131     2167      +36     
+ Misses        102      101       -1     
Flag Coverage Δ
Pkg.test 94.54% <100.00%> (-0.02%) ⬇️
Run.test 95.41% <100.00%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Transducers.jl 73.33% <ø> (ø)
src/core.jl 93.15% <100.00%> (+0.09%) ⬆️
src/dreduce.jl 100.00% <100.00%> (ø)
src/processes.jl 94.71% <100.00%> (+0.50%) ⬆️
src/reduce.jl 96.61% <100.00%> (+0.18%) ⬆️

... and 2 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant