Median by brenhinkeller · Pull Request #12 · JuliaSIMD/VectorizedStatistics.jl

brenhinkeller · 2022-01-02T05:52:13Z

So this only manages to vectorize a relatively small part of the underlying sorting, but still seems to beat base pretty handily in most non-memory-bandwidth-limited cases. Good enough for now?

julia> A = rand(100); sort(A) == vsort!(A)
true

julia> @benchmark sort!(A) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 199 evaluations.
 Range (min … max):  413.638 ns …  1.454 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     442.970 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   474.331 ns ± 82.305 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▇▅█▆▃▄▅▂▁ ▂▂▃▄▁   ▂▂▃▃▃                                  ▁  ▂
  ▇███████████████████████▇▆▇▇▇▇▆▆▆▅▅▇▁▆▅▅▅▄▅▅▄▄▄▅▅▃▃▅▃▅▅██▆██ █
  414 ns        Histogram: log(frequency) by time       860 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vsort!(A) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 962 evaluations.
 Range (min … max):  85.747 ns … 246.201 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     86.377 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   92.525 ns ±  16.230 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▁ ▄  ▁   ▄        ▄                                      ▂ ▁
  █████▇▇█▇▇▇██▇▇▆▆▆▆▆█▆▆▆▅▅▅▆▅▅▅▅▄▃▅▅▅▄▅▄▄▅▄▄▄▄▃▃▁▄▃▄▄▁▃▄▃▁▄█ █
  85.7 ns       Histogram: log(frequency) by time       168 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sort!(A) setup = A = rand(10_000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  410.692 μs … 933.455 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     431.529 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   458.630 μs ±  67.393 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▆█▇▄▄▅▄▂▂▂▂▂▃▃▂▁▁ ▁▁▂▃▂▁                                     ▂
  ██████████████████████████▇▆▇▆▆▇▆▆▅▅▅▆▆▄▅▅▆▄▄▅▆▆▁▄▄▅▆▅▅▅▄▄▅▆▇ █
  411 μs        Histogram: log(frequency) by time        827 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vsort!(A) setup = A = rand(10_000)
BenchmarkTools.Trial: 6582 samples with 5 evaluations.
 Range (min … max):  130.921 μs … 274.166 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     137.940 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   144.805 μs ±  16.559 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁█▇▁                                                         
  ▂▄████▆▅▄▄▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂ ▃
  131 μs           Histogram: frequency by time          214 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

The way I've implemented this currently, we also do especially well with arrays that are antisorted or have large sorted/antisorted runs (especially given that the reversal can be nicely vectorized)

julia> @benchmark sort!(A) setup = A = collect(1000:-1:1)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.814 μs … 15.853 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.085 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.524 μs ±  1.012 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▅▆▄▁▇▁ ▅▁    ▄▄ ▁▁   ▅▂                                ▃ ▂
  ████████▇███▇▆▅██████▇▇██▆▅▅▆▅▆▅▅▅▇▆▅▄▅▅▅▅▄▄▅▄▅▅▄▂▄▃▂▄▂▃▃█ █
  3.81 μs      Histogram: log(frequency) by time     8.36 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vsort!(A) setup = A = collect(1000:-1:1)
BenchmarkTools.Trial: 10000 samples with 200 evaluations.
 Range (min … max):  417.085 ns …  1.540 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     441.080 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   480.585 ns ± 96.013 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▅▃▄ ▆▃▂▄▂▁ ▁▁ ▄       ▅                                  ▃ ▂
  █████▇███████████▇███▇▇▅███▅▅▅▆▇▆▆▅▅▅▆▄▄▄▃▃▅▅▄▄▅▄▅▁▃▁▃▃▄▄▁▃█ █
  417 ns        Histogram: log(frequency) by time       909 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

brenhinkeller · 2022-01-02T05:56:23Z

As for the actual median,
Small arrays, where quicksort! is used by default:

julia> A = rand(100); Statistics.median(A) == vmedian!(A)
true

julia> @benchmark Statistics.median!(A) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 250 evaluations.
 Range (min … max):  306.088 ns … 988.924 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     316.208 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   342.815 ns ±  64.051 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▆▃▂▄▃ ▁▁ ▂▄▃▁     ▃▄▃▁                                  ▁▁▂ ▂
  █████████████████▇▇█████▇▆▆▅▇▇▅▆▆▅▆▃▅▅▁▄▄▅▃▅▄▄▄▄▁▄▁▄▁▄▃▃▄▆███ █
  306 ns        Histogram: log(frequency) by time        617 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vmedian!(A) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 963 evaluations.
 Range (min … max):  83.958 ns … 278.579 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     84.805 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   91.658 ns ±  16.531 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▃ ▅▁ ▁   ▄▁       ▄                                      ▂ ▁
  ██████████▇████▆▇▇█▇███▆▅▆▆▇▇▆▆▆▆▆▆▆▅▅▆▅▃▅▄▅▄▅▅▆▄▁▄▅▃▅▅▁▄▄▆█ █
  84 ns         Histogram: log(frequency) by time       165 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Larger arrays, where quickselect! is used by default:

julia> A = rand(10_000); Statistics.median(A) == vmedian!(A)
true

julia> @benchmark Statistics.median!(A) setup = A = rand(10_000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   61.734 μs … 278.162 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     114.070 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   115.872 μs ±  23.300 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▁▂▄▄▆▇██▇▇▇▄▂                                     
  ▁▁▁▂▂▂▃▄▄▅▆▇▇██████████████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  61.7 μs          Histogram: frequency by time          221 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vmedian!(A) setup = A = rand(10_000)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  15.664 μs … 101.265 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     27.004 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.720 μs ±   5.672 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▁▃▄▅█▇▇█▆▅▃▁                                     
  ▁▁▁▂▂▃▃▄▅▆▇██████████████▇▆▅▅▄▃▃▃▃▂▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  15.7 μs         Histogram: frequency by time           51 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

brenhinkeller · 2022-01-02T06:02:44Z

Quantile/percentile, small arrays (quicksort!)

julia> A = rand(100); StatsBase.quantile(A, 0.33) == vquantile!(A, 0.33)
true

julia> @benchmark StatsBase.quantile!(A, 0.33) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 230 evaluations.
 Range (min … max):  325.417 ns …  1.102 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     332.954 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   364.569 ns ± 68.025 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▂▁▄▅  ▁▁ ▃▅       ▄▃                                     ▂ ▁
  ██████▇▇███████▇▇▇▇▇██▇▆▆▆▆▆▆▇▆▅▆▅▅▅▅▄▆▅▄▅▄▄▄▄▄▄▅▄▄▂▃▅▃▅▅▃██ █
  325 ns        Histogram: log(frequency) by time       646 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vquantile!(A, 0.33) setup = A = rand(100)
BenchmarkTools.Trial: 10000 samples with 960 evaluations.
 Range (min … max):  87.403 ns … 309.148 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     88.156 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   95.835 ns ±  17.409 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁ ▅▁ ▁   ▄▁       ▄▁                                     ▂ ▁
  ███▇████▆▇▆██▇▇▆▆▅▇▆██▆▅▅▆▅▆▅▅▅▅▅▅▄▃▅▄▄▄▄▅▅▃▃▃▅▃▃▄▄▄▄▄▃▄▃▃▄█ █
  87.4 ns       Histogram: log(frequency) by time       171 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Quantile/percentile, larger arrays (quickselect!)

julia> A = rand(10_000); StatsBase.quantile(A, 0.33) == vquantile!(A, 0.33)
true

julia> @benchmark StatsBase.quantile!(A, 0.33) setup = A = rand(10_000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   52.767 μs … 410.060 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     111.597 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   112.350 μs ±  24.431 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▁▁▂▂▅▆▆▆▇█▇█▅▅▃                                  
  ▁▁▁▂▂▂▃▄▄▅▅▅▆▇████████████████▆▆▄▃▃▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  52.8 μs          Histogram: frequency by time          213 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vquantile!(A, 0.33) setup = A = rand(10_000)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  12.238 μs … 58.217 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.722 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.199 μs ±  5.066 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▂▃▄▄▆▇▇█▆▆▄▁                                    
  ▁▁▂▂▂▃▄▅▅▆▇▇██████████████▇▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  12.2 μs         Histogram: frequency by time        43.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

codecov · 2022-01-02T06:08:14Z

Codecov Report

Merging #12 (e568c52) into main (15fda55) will decrease coverage by 1.77%.
The diff coverage is 93.48%.

@@            Coverage Diff             @@
##             main      #12      +/-   ##
==========================================
- Coverage   98.71%   96.93%   -1.78%     
==========================================
  Files           7       11       +4     
  Lines         934     1404     +470     
==========================================
+ Hits          922     1361     +439     
- Misses         12       43      +31

Impacted Files	Coverage Δ
src/VectorizedStatistics.jl	`100.00% <ø> (ø)`
src/vsort.jl	`91.48% <91.48%> (ø)`
src/quicksort.jl	`91.51% <91.51%> (ø)`
src/vmedian.jl	`95.28% <95.28%> (ø)`
src/vquantile.jl	`96.19% <96.19%> (ø)`
src/vcov.jl	`97.63% <100.00%> (ø)`
src/vmean.jl	`99.57% <100.00%> (ø)`
src/vstd.jl	`100.00% <100.00%> (ø)`
src/vsum.jl	`99.55% <100.00%> (ø)`
src/vvar.jl	`98.13% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fda55...e568c52. Read the comment docs.

…ather than allowing it to stackoverflowerror

brenhinkeller added 18 commits November 1, 2021 12:10

Add partially-simd sorting functions

415bb3b

Add vsort, reduce-all only

5261b34

Add baseline vmedian implementation

c60bd4c

include vsort, vmedian

b513dc9

Add docstrings for vsort

b5629ac

Short-circuit methods for sortnans! when type can't contain NaNs

d16622e

Full multidimensional vmedian implementation

75b373f

Fix sortnans! edge case, export vsort!

274c06e

specify module for vsort tests

341fa82

Minor improvement to quicksort!

f3c159b

Docstring, test, and minor changes to vmedian!

e643ff4

Docstring, test, and minor changes to vmedian!

552d81d

Add vpctile!/vquantile!

4142aeb

Rename partialquicksort! to quickselect!, more accurately

05dc019

Simplified pivoting scheme in quicksort

0252de0

Safer quickselect!

6e4bc18

Update README with new functions

b3badb9

Spell out percentile in full, as in StatsBase

fa55311

chriselrod reviewed Jan 2, 2022

View reviewed changes

Comment thread src/vsort.jl Outdated

brenhinkeller added 5 commits January 2, 2022 18:36

Should explicitly disallow reducing median over non-contiguous dims r…

d1ce708

…ather than allowing it to stackoverflowerror

Use === for comparison in multithreading options

ba60e51

Slightly improve test coverage on sorting and median

3896a77

Support sorting dims for feature parity with Base.sort!

9a897b9

Improve docstrings for new functions

e568c52

brenhinkeller merged commit 4861579 into main Jan 5, 2022

brenhinkeller deleted the median branch January 5, 2022 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Median#12

Median#12
brenhinkeller merged 23 commits intomainfrom
median

brenhinkeller commented Jan 2, 2022 •

edited

Loading

Uh oh!

brenhinkeller commented Jan 2, 2022 •

edited

Loading

Uh oh!

brenhinkeller commented Jan 2, 2022

Uh oh!

codecov bot commented Jan 2, 2022 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brenhinkeller commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brenhinkeller commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brenhinkeller commented Jan 2, 2022

Uh oh!

codecov bot commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brenhinkeller commented Jan 2, 2022 •

edited

Loading

brenhinkeller commented Jan 2, 2022 •

edited

Loading

codecov bot commented Jan 2, 2022 •

edited

Loading