Unable to complete 10 samples #322

Tockra · 2019-08-26T18:40:28Z

Hey,

I'm using criterion 0.3 and I saw that I need to increase the sample_size at least to 10.
So I did. Now I get following warning: Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 62.2s or reduce sample count to 10 .
These message doesn't make a lot of sense in my oppinion. Or did I look over something in the guide?

T

The text was updated successfully, but these errors were encountered:

bheisler · 2019-08-26T23:51:12Z

No, that's just an oversight on my part.

The idea here is that Criterion.rs will automatically suggest reducing the sample count to make your benchmarks fit into the specified measurement time. Unfortunately, your benchmark is so long that it would have to reduce the sample count below the minimum safe value of 10, so it gives a confusing recommendation.

I'll leave this open as a reminder to myself to skip the warning if the recommended count is equal to the actual sample count.

jadbox · 2019-09-18T20:19:44Z

What are we suppose to do if we want to use a large amount of data to test a function per iteration, but doing so 10x times takes hours? I'm not exactly sure if iter_batched works for this usecase, but ideally, I want iter_batched to iterate through each item in the source vector until it runs out of time.

Here's my setup, note that words is a vector of hundreds of thousands of words:
https://github.com/jadbox/rust_trie/blob/master/benches/benchmark.rs

bheisler · 2019-09-19T13:21:27Z

To be honest, I'd recommend just not doing that. It's tempting to assume that larger benchmarks are better, but (at least from a statistical point of view) things are already about as good as they're going to get after a few seconds of benchmarking. There isn't much to be gained by extending a single benchmark to an hour or more unless your benchmarking environment is super noisy (in which case you might look at improving that instead).

If you want to do it anyway, I would say that Criterion.rs is not intended for that use case and you might consider /usr/bin/time instead. If your benchmark is that long, you don't have time to generate a meaningful sample, so the statistics would be too unreliable to be useful anyway. I'd like to add better support for measuring very long benchmarks eventually (if only so that they can be included in the reports), but Criterion.rs' internal code makes that tricky to do so it will be a while before that's ready.

jadbox · 2019-09-19T14:46:47Z

Thanks for the insight @bheisler , what do you think about this approach where instead of benching over the entire range, I actually change the sample per iteration?

    let words = BIG_VECTOR_OF_WORDS();
    let mut _trie = Node::new('\x00');
    c.bench_function("trie insert", |b| {
        let mut i = 0;
    
        b.iter(|| {
            _trie.insert(&words[i]);
            // increment which word we're testing inserting per iteration
            i = i + 1;
            if i == words.len() {
                i = 0;
            }
        })
    });

    // Populate a trie for benchmarking reading
    let mut trie = Node::new('\x00');
    for w in words.iter() {
        trie.insert(&w);
    }

    c.bench_function("trie lookup", |b| {
        let mut i = 0;
        b.iter(|| {
            trie.lookup(&words[i], None, None);
             // increment which lookup word we're testing inserting per iteration
            i = i + 1;
            if i == words.len() {
                i = 0;
            }
        })
    });

bheisler · 2019-09-27T02:15:22Z

Hey, thanks for your patience, I've been super busy IRL lately so I haven't had the energy to keep up with Criterion as much as I'd like.

What you're doing (looking up a different word on each iteration) is... probably OK, as long as you're doing a lot of iterations. It does slightly bias the measurements towards over-weighting the values in the beginning of the list, and the analysis kind of assumes every iteration does the same work. In this case, the differences from one iteration to the next are probably small and probably washed out by having lots of iterations. Probably.

Honestly, I wouldn't take the chance; I'd do a random sampling of the BIG_VECTOR_OF_WORDS before starting the benchmark (ideally, saving the random sample or at least the random seed in Git for reproducibility) and then having each iteration look up all of the words in the subsample. That gives you shorter iterations without any chance of statistical bias, as long as your random sample itself isn't significantly biased.

ghost · 2021-05-26T12:29:27Z

@bheisler You may wish to increase target time to... may I query if we can set the target time? no answer from google so far.

bheisler · 2021-05-26T12:35:21Z

Yes, using the measurement_time functions:

https://docs.rs/criterion/0.3.4/criterion/struct.BenchmarkGroup.html#method.measurement_time

https://docs.rs/criterion/0.3.4/criterion/struct.Criterion.html#method.measurement_time

gthb · 2022-02-06T13:29:26Z

How is the latter intended to be used? Trying just c.measurement_time(Duration::from_secs(100)).bench_function(...) I get:

❯ cargo bench
   Compiling waspiary-parser v0.8.0 (/Users/gthb/git/waspiary/waspiary-parser)
error[E0507]: cannot move out of `*c` which is behind a mutable reference
  --> waspiary-parser/benches/parsefile.rs:22:3
   |
22 |   c.measurement_time(Duration::from_secs(100)).bench_function("parse all", |b| {
   |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*c` has type `Criterion`, which does not implement the `Copy` trait

gthb · 2022-02-06T13:52:07Z

Oh, I should have perused a little more before asking — the answer is to use config in the criterion_group! macro invocation (in the full form), like:

criterion_group!{
  name = benches;
  config = Criterion::default().measurement_time(Duration::from_secs(100));
  targets = criterion_benchmark
}

... right? (At least that's working for me.)

Folks suggest using 'measurement_time' equals to 100s in order to make measurements more resilient to transitory peak loads caused by external programs. bheisler/criterion.rs#322

* feat: Merge Hyperkzg and Shplonk * bench: Include HyperKzg+Shplonk PCS to the benchmark Folks suggest using 'measurement_time' equals to 100s in order to make measurements more resilient to transitory peak loads caused by external programs. bheisler/criterion.rs#322 * chore: Move HyperKZG+Shponk code to hyperkzg.rs source file * chore: Requested refactoring * chore: Implement SubAssign trait for UniPoly

bheisler added this to the Version 0.3.1 milestone Aug 26, 2019

bheisler added Beginner Suitable for people new to Criterion.rs Bug Something isn't working right labels Aug 26, 2019

bheisler closed this as completed in 41cc8fb Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to complete 10 samples #322

Unable to complete 10 samples #322

Tockra commented Aug 26, 2019

bheisler commented Aug 26, 2019

jadbox commented Sep 18, 2019

bheisler commented Sep 19, 2019

jadbox commented Sep 19, 2019 •

edited

bheisler commented Sep 27, 2019

ghost commented May 26, 2021

bheisler commented May 26, 2021

gthb commented Feb 6, 2022

gthb commented Feb 6, 2022

Unable to complete 10 samples #322

Unable to complete 10 samples #322

Comments

Tockra commented Aug 26, 2019

bheisler commented Aug 26, 2019

jadbox commented Sep 18, 2019

bheisler commented Sep 19, 2019

jadbox commented Sep 19, 2019 • edited

bheisler commented Sep 27, 2019

ghost commented May 26, 2021

bheisler commented May 26, 2021

gthb commented Feb 6, 2022

gthb commented Feb 6, 2022

jadbox commented Sep 19, 2019 •

edited