Record Return Values in Trial #314

AlexanderNenninger · 2023-06-13T13:37:20Z

This PR is intended as a minor extension to the Trial and TrialEstimate types. It enables benchmarking of non-deterministic functions. This is useful in monte carlo settings to calculate success probabilities and expected runtimes.

Currently given the code below

using Random

function mayfail()
    if rand() < 0.1
        return "Returned early due to lazyness."
    end
    # Some expensive operations ...
end

suite = BenchmarkGroup()
suite["mayfail"] = @benchmarkable mayfail()
run(suite)

there is no way to determine post-benchmark, what the return value of a function was.

Summary of Changes

Added return_values::Vector{Any} to Trial and TrialEstimate
Modified functions taking Trial and TrialEstimate where it makes sense.
- minimum and maximum will retain their return values
- other aggregation functions won't
- copy makes a deepcopy of the return values. This is due to copy not being implemented for String (s. copy not implemented for String JuliaLang/julia#31995 (comment))
If no return value is provided, push! et. al. default to pushing nothing. This adds a little memory overhead, since most Trials are expected to contain a Vector of nothing.
The serialization tests in tests/SerializationTests.jl relied on all fields of Trial being comparable using Base.approx. This invariant is now broken. The local eq function has been modified. eq now falls back to isequal comparison, if isapprox is not defined for its arguments.
Added a few tests in tests/TrialsTests.jl
copy(::TrialJudgement) was broken independently of the changes listed above. Fixed and added test.

AlexanderNenninger · 2023-06-20T07:22:14Z

Is the feature implemented in this PR in principle something that could be merged and if so, what are the requirements?

gdalle · 2023-09-18T17:56:24Z

Hey @AlexanderNenninger, sorry for the neglect and thanks for the PR!
I think this is not something we want to enable by default, since function outputs can take up a huge amount of memory. Users might be surprised by this, for instance when they dump the benchmarking result into a file and it takes up several Gb.
I'm wondering if we can enable it with an additional macro argument like evals or samples?

AlexanderNenninger · 2023-09-22T13:11:25Z

Hi @gdalle,

So you mean keeping the Trial.return_values vector in place, but condition pushing computation results onto it on some sort of default call argument?

Then there's still the issue of how we deal with a lack of computation results. I just checked and a Vector{Nothing} is always zero-sized, independently of its length, so we can use that for missing return values.

If we make Trial.return_values of type AbstractVector, the feature will be near zero cost:

mutable struct ContainsVec
    v::AbstractVector
end

cv = ContainsVec(fill(nothing, 2^32))
Base.summarysize(cv) # = sizeof(cv.v) + sizeof(cv) = 48

cv.v[2^32] # = nothing

gdalle · 2023-09-22T14:14:51Z

Given the discussion on Discourse I'm really skeptical about the benefit/risk ratio of this feature. What do you think?

AlexanderNenninger · 2023-09-23T12:06:45Z

The feature would be pretty useful, and the implementation was a pretty minor change to the code base. I think some empirical evidence is necessary at this point. To gather that evidence, it would be good to know

What benchmark cases are likely to be impacted by this change?
What metrics are likely to be impacted?
Is there already a low noise environment, where someone could gather data?

If we can show that there's only negligible (to be defined) impact on benchmark quality and acceptable (to be defined) impact simplicity of the code base, this PR makes sense, otherwise not.

Edit: Discussion on Discourse for reference.

vchuravy · 2023-09-23T12:23:02Z

It enables benchmarking of non-deterministic functions.

For me this is a non-goal of BenchmarkTools, and I also don't understand how recording the values helps with that?

There is a maintenance cost and a mental model cost associated with changing the inner benchmarking code.
The contract BT has with its users is that it tries to precisely execute the user code as given with as little as extra fluff as possible.

You could write your own harness code to do what you want, but as the maintainers of BT we must ask ourselves if a feature for a few people is worth it.

vchuravy · 2023-09-23T12:28:06Z

I should add that Info appreciate you putting in the effort! I also handed over the maintenance burden almost entirely to @gdalle, so I will leave the final call up to him.

AlexanderNenninger · 2023-09-23T19:13:44Z

For me this is a non-goal of BenchmarkTools,

Alright.

gdalle · 2023-09-23T19:56:21Z

Sorry for the wasted efforts, and thank you for the contribution nonetheless!
I do agree that given the dwindling and underqualified maintenance team (me), keeping the library simple and to the point takes an even bigger role. Maybe we could think of something to add to the docs for the case of non deterministic benchmarks?

gdalle · 2023-09-23T19:56:46Z

#335 is also tangentially related

AlexanderNenninger · 2023-09-24T13:38:14Z

Sorry for the wasted efforts, and thank you for the contribution nonetheless!
I do agree that given the dwindling and underqualified maintenance team (me), keeping the library simple and to the point takes an even bigger role. Maybe we could think of something to add to the docs for the case of non deterministic benchmarks?

No effort was wasted. We're using the code of this PR anyway, it just stays forked.

Nennia and others added 5 commits June 13, 2023 15:13

Record return values in trial.

394d0cf

imporve test converage

838b141

add tests for copy.

4ecae02

use deepcopy on copy(::TrialEstimate)

158c1fe

Merge branch 'JuliaCI:master' into record-return-values

b9b26fa

AlexanderNenninger marked this pull request as draft June 14, 2023 13:24

Merge branch 'JuliaCI:master' into record-return-values

65d2c12

AlexanderNenninger marked this pull request as ready for review June 20, 2023 07:21

Merge branch 'JuliaCI:master' into record-return-values

abfdae3

gdalle added the enhancement label Sep 18, 2023

gdalle mentioned this pull request Sep 18, 2023

Feature Idea: Custom Benchmarking Metric #176

Open

AlexanderNenninger closed this Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record Return Values in Trial #314

Record Return Values in Trial #314

AlexanderNenninger commented Jun 13, 2023 •

edited

Loading

AlexanderNenninger commented Jun 20, 2023 •

edited

Loading

gdalle commented Sep 18, 2023

AlexanderNenninger commented Sep 22, 2023 •

edited

Loading

gdalle commented Sep 22, 2023

AlexanderNenninger commented Sep 23, 2023 •

edited

Loading

vchuravy commented Sep 23, 2023

vchuravy commented Sep 23, 2023

AlexanderNenninger commented Sep 23, 2023

gdalle commented Sep 23, 2023

gdalle commented Sep 23, 2023

AlexanderNenninger commented Sep 24, 2023

Record Return Values in Trial #314

Record Return Values in Trial #314

Conversation

AlexanderNenninger commented Jun 13, 2023 • edited Loading

Summary of Changes

AlexanderNenninger commented Jun 20, 2023 • edited Loading

gdalle commented Sep 18, 2023

AlexanderNenninger commented Sep 22, 2023 • edited Loading

gdalle commented Sep 22, 2023

AlexanderNenninger commented Sep 23, 2023 • edited Loading

vchuravy commented Sep 23, 2023

vchuravy commented Sep 23, 2023

AlexanderNenninger commented Sep 23, 2023

gdalle commented Sep 23, 2023

gdalle commented Sep 23, 2023

AlexanderNenninger commented Sep 24, 2023

AlexanderNenninger commented Jun 13, 2023 •

edited

Loading

AlexanderNenninger commented Jun 20, 2023 •

edited

Loading

AlexanderNenninger commented Sep 22, 2023 •

edited

Loading

AlexanderNenninger commented Sep 23, 2023 •

edited

Loading