Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Return Values in Trial #314

Conversation

AlexanderNenninger
Copy link

@AlexanderNenninger AlexanderNenninger commented Jun 13, 2023

This PR is intended as a minor extension to the Trial and TrialEstimate types. It enables benchmarking of non-deterministic functions. This is useful in monte carlo settings to calculate success probabilities and expected runtimes.

Currently given the code below

using Random

function mayfail()
    if rand() < 0.1
        return "Returned early due to lazyness."
    end
    # Some expensive operations ...
end

suite = BenchmarkGroup()
suite["mayfail"] = @benchmarkable mayfail()
run(suite)

there is no way to determine post-benchmark, what the return value of a function was.

Summary of Changes

  • Added return_values::Vector{Any} to Trial and TrialEstimate
  • Modified functions taking Trial and TrialEstimate where it makes sense.
  • If no return value is provided, push! et. al. default to pushing nothing. This adds a little memory overhead, since most Trials are expected to contain a Vector of nothing.
  • The serialization tests in tests/SerializationTests.jl relied on all fields of Trial being comparable using Base.approx. This invariant is now broken. The local eq function has been modified. eq now falls back to isequal comparison, if isapprox is not defined for its arguments.
  • Added a few tests in tests/TrialsTests.jl
  • copy(::TrialJudgement) was broken independently of the changes listed above. Fixed and added test.

@AlexanderNenninger AlexanderNenninger marked this pull request as draft June 14, 2023 13:24
@AlexanderNenninger AlexanderNenninger marked this pull request as ready for review June 20, 2023 07:21
@AlexanderNenninger
Copy link
Author

AlexanderNenninger commented Jun 20, 2023

Is the feature implemented in this PR in principle something that could be merged and if so, what are the requirements?

@gdalle
Copy link
Collaborator

gdalle commented Sep 18, 2023

Hey @AlexanderNenninger, sorry for the neglect and thanks for the PR!
I think this is not something we want to enable by default, since function outputs can take up a huge amount of memory. Users might be surprised by this, for instance when they dump the benchmarking result into a file and it takes up several Gb.
I'm wondering if we can enable it with an additional macro argument like evals or samples?

@AlexanderNenninger
Copy link
Author

AlexanderNenninger commented Sep 22, 2023

Hi @gdalle,

So you mean keeping the Trial.return_values vector in place, but condition pushing computation results onto it on some sort of default call argument?

Then there's still the issue of how we deal with a lack of computation results. I just checked and a Vector{Nothing} is always zero-sized, independently of its length, so we can use that for missing return values.

If we make Trial.return_values of type AbstractVector, the feature will be near zero cost:

mutable struct ContainsVec
    v::AbstractVector
end

cv = ContainsVec(fill(nothing, 2^32))
Base.summarysize(cv) # = sizeof(cv.v) + sizeof(cv) = 48

cv.v[2^32] # = nothing

@gdalle
Copy link
Collaborator

gdalle commented Sep 22, 2023

Given the discussion on Discourse I'm really skeptical about the benefit/risk ratio of this feature. What do you think?

@AlexanderNenninger
Copy link
Author

AlexanderNenninger commented Sep 23, 2023

The feature would be pretty useful, and the implementation was a pretty minor change to the code base. I think some empirical evidence is necessary at this point. To gather that evidence, it would be good to know

  • What benchmark cases are likely to be impacted by this change?
  • What metrics are likely to be impacted?
  • Is there already a low noise environment, where someone could gather data?

If we can show that there's only negligible (to be defined) impact on benchmark quality and acceptable (to be defined) impact simplicity of the code base, this PR makes sense, otherwise not.

Edit: Discussion on Discourse for reference.

@vchuravy
Copy link
Member

It enables benchmarking of non-deterministic functions.

For me this is a non-goal of BenchmarkTools, and I also don't understand how recording the values helps with that?

There is a maintenance cost and a mental model cost associated with changing the inner benchmarking code.
The contract BT has with its users is that it tries to precisely execute the user code as given with as little as extra fluff as possible.

You could write your own harness code to do what you want, but as the maintainers of BT we must ask ourselves if a feature for a few people is worth it.

@vchuravy
Copy link
Member

I should add that Info appreciate you putting in the effort! I also handed over the maintenance burden almost entirely to @gdalle, so I will leave the final call up to him.

@AlexanderNenninger
Copy link
Author

For me this is a non-goal of BenchmarkTools,

Alright.

@gdalle
Copy link
Collaborator

gdalle commented Sep 23, 2023

Sorry for the wasted efforts, and thank you for the contribution nonetheless!
I do agree that given the dwindling and underqualified maintenance team (me), keeping the library simple and to the point takes an even bigger role. Maybe we could think of something to add to the docs for the case of non deterministic benchmarks?

@gdalle
Copy link
Collaborator

gdalle commented Sep 23, 2023

#335 is also tangentially related

@AlexanderNenninger
Copy link
Author

Sorry for the wasted efforts, and thank you for the contribution nonetheless!
I do agree that given the dwindling and underqualified maintenance team (me), keeping the library simple and to the point takes an even bigger role. Maybe we could think of something to add to the docs for the case of non deterministic benchmarks?

No effort was wasted. We're using the code of this PR anyway, it just stays forked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants