chore: Sampling algorithms now use nth method. Fixes #667.#760
chore: Sampling algorithms now use nth method. Fixes #667.#760RobertJacobsonCDC merged 3 commits intomainfrom
Conversation
`EntitySetIterator` now selects sampling algorithm based on size hint (like `sample_entity` already does).
Sampling algorithm benchmarksSummarySome algorithm benches (which apply sampling functions directly to a BenchmarksSmaller (faster) time in bold, relative speedup = main ÷ dev
Algorithm & Sampling Benchmarks (main vs dev)
sample_entity Benchmarks (main vs dev)
The apparent regression in sampling Benchmarks (main vs dev)
Running the benchmarks# On main:
cargo bench -p ixa-bench 'sampl' -- --save-baseline 'main'
# On the dev branch:
cargo bench -p ixa-bench 'sampl' -- --baseline 'main' |
Benchmark ResultsHyperfineCriterionNote: A comparison could not be generated. Maybe you added new benchmarks? |
|
A dump of some of my notes for posterity. Sampling Benchmark Coverage AnalysisAfter switching the sampling algorithms to use
These two properties combine into three execution paths that matter in practice:
Gap in benchmark coverageThere are two ways to end up with an
We add two benchmarks to cover case 2: Running the benchmarks# On main:
cargo bench -p ixa-bench 'sampl' -- --save-baseline 'main'
# On the dev branch:
cargo bench -p ixa-bench 'sampl' -- --baseline 'main' |
| // index `idx` we skip `idx - consumed` where `consumed` tracks how many | ||
| // elements have already been consumed. | ||
| for idx in indexes { | ||
| if let Some(item) = iter.nth(idx - consumed) { |
There was a problem hiding this comment.
I'm confused by this change, why don't you break early anymore
There was a problem hiding this comment.
Previously we iterated over the elements of iter (the source set), checked if we found the next index, and if we did land on the next index, updated the next index from the list of precomputed indexes. We break if there are no more indexes.
In the new version, we iterate over the precomputed indexes. There's no need to break, because it's implicit in the for idx in indexes. We move through the iter iterator by calling iter.nth on it.
Benchmark ResultsHyperfineCriterionNote: A comparison could not be generated. Maybe you added new benchmarks? |
This PR
Iterator::nthinstead of iterating over every element.There are some dramatic speedups that do not represent actual performance improvements in the code, and there are small apparent regressions that are not real in the sense that the code in their execution path is unchanged. See notes on benchmarks below for details.