Skip to content

Conversation

@tkf
Copy link
Member

@tkf tkf commented Mar 9, 2021

Commit Message

Add SplitBy (#461)

This patch adds SplitBy iterator transformation and an underlying
(ATM internal) ReduceSplitBy transducer. SplitBy can be used for
processing like eachline and split in parallel and/or streaming
programs.

@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #461 (aad1c7f) into master (c11379f) will increase coverage by 0.54%.
The diff coverage is 96.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #461      +/-   ##
==========================================
+ Coverage   90.86%   91.41%   +0.54%     
==========================================
  Files          31       32       +1     
  Lines        2037     2120      +83     
==========================================
+ Hits         1851     1938      +87     
+ Misses        186      182       -4     
Flag Coverage Δ
unittests 91.41% <96.38%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Transducers.jl 71.42% <ø> (ø)
src/basics.jl 95.31% <ø> (+6.25%) ⬆️
src/splitby.jl 96.29% <96.29%> (ø)
src/lister.jl 47.72% <100.00%> (-38.64%) ⬇️
src/library.jl 93.76% <0.00%> (-0.26%) ⬇️
src/processes.jl 94.71% <0.00%> (+0.44%) ⬆️
src/progress.jl 92.30% <0.00%> (+0.96%) ⬆️
src/core.jl 89.37% <0.00%> (+0.96%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c11379f...aad1c7f. Read the comment docs.

@tkf tkf changed the title Add ReduceSplitBy Add SplitBy Mar 15, 2021
@tkf
Copy link
Member Author

tkf commented Mar 15, 2021

ID time GC time memory allocations
... ... ... ... ...
["splitby", "count", "foldl"] 3.797 ms (5%) 96 bytes (1%) 4
["splitby", "count", "man"] 1.504 ms (5%)
["splitby", "count", "reduce"] 786.290 μs (5%) 2.84 KiB (1%) 54

--- https://github.com/JuliaFolds/Transducers-data/blob/multi-thread-benchmark-results/2021/03/15/010146/result.md


The performance is nice, but it's strange that reduce is much better than foldl, even with no multithreading:

julia> suite = include("benchmark/multi-thread/bench_splitby.jl")
       results = run(suite; verbose = true)
(1/1) benchmarking "count"...
  (1/3) benchmarking "foldl"...
  done (took 6.10017195 seconds)
  (2/3) benchmarking "reduce"...
  done (took 5.655999351 seconds)
  (3/3) benchmarking "man"...
  done (took 5.316228889 seconds)
done (took 17.61219897 seconds)
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.618 ms)
          "reduce" => Trial(1.788 ms)
          "man" => Trial(1.473 ms)

julia> Threads.nthreads()
1

then with more threads:

1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.611 ms)
          "reduce" => Trial(383.696 μs)
          "man" => Trial(1.440 ms)

julia> Threads.nthreads()
4

@mergify mergify bot merged commit eb6f73a into master Mar 15, 2021
@mergify mergify bot deleted the reduce-split-by branch March 15, 2021 02:19
mergify bot pushed a commit that referenced this pull request Apr 17, 2021
It looks like the filtering condition introduced in #461 for hiding
`ReduceSplitBy` (IIRC) was unnecessary and is actually hiding
everything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants