Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide option to run a function after/before every benchmark invocation (teardown/setup) + before/after a scenario #59

Closed
5 tasks done
PragTob opened this issue Feb 26, 2017 · 10 comments
Milestone

Comments

@PragTob
Copy link
Member

PragTob commented Feb 26, 2017

Sometimes I wish I could run a function after every invocation of the benchmarking function, of course outside of the measurement of the benchmarked function. Why would I want that?

  • run assertions on the return of the function to see that it really does the right thing all the time. When benchmarking n things that all should do the same function, we could have one assertion for them that checks that they really did. Of course, that's what tests are for but when trying out which function to use those tests are often rather duplicated.
  • reset some global state (for instance cache busting) if we don't wanna benchmark the caching, as doing the same call 1000s of times is rather atypical.

Based on these use case the after function would need access to:

  • return value of the function
  • input or input name currently benchmarked
  • maybe name of the job currently benchmarked

A good name apparently (at least similar to benchfella) seems to be teardown and as teardown alone would be lonely we can add a setup sibling that behaves similarly.

Progress

  • before_each/after_each hooks
  • before/after_scenario hooks
  • passing values from before hooks to before hook
  • passing results to after hooks
  • README documentation

edit: updated as this blog post also calls for some setup/teardown

edit2: also before/after a scenario sounds sensible to do (specifically I think I need it for benchmarking Hound as I need to start hound for our specific PID)

@PragTob PragTob changed the title Provide option to run a function after every benchmark invocation Provide option to run a function after/before every benchmark invocation (teardown/setup) Feb 26, 2017
@PragTob PragTob changed the title Provide option to run a function after/before every benchmark invocation (teardown/setup) Provide option to run a function after/before every benchmark invocation (teardown/setup) + before/after a scenario Aug 28, 2017
@PragTob
Copy link
Member Author

PragTob commented Aug 28, 2017

As this will discuss API, it's also interesting to delve into #124 / keep it in mind while discussing this to keep the API reasonably consistent within itself.

Feedback is very welcome, although I know it's a long read :)

Problem/Requirements

Basically we need both:

  • invocation before/after every function invocation (i.e. every benchmarking sample)
  • Invocation before/after every scenario (job + input, so every distinct category that shows up afterwards)

(we don't need before/after all as that can just be done in the script)

Each of those needs to have some way of knowing which job and maybe input it is running for as we probably don't want to run them for all. I.e. in my example I want to run something before scenario only for the hound use case, not for wallaby.

Also bonus points - before function should have some way of passing their result to the benchmarking function. I.e. they might want to start a process and pass the pid on.

Possible solutions

Very much looking forward to other proposals and feedback

Tuple options

(Akin to #124 on some level)

The functions could be provided along with the benchmarking jobs in the initial map by using a tuple with the function and the options.

"My function" => {fn -> magic end, before_each: some_function, after_each: some_other_function, before_scenario: cool_beans}
"Other function" => fn -> dont_need_no_options end

Pros:

  • before/afters are defined together with the job they are for so no/little need to perform any kind of matching

Cons:

  • Possibly annoying if something should apply to all jobs (think cache busting)

General options

a before_each/after_each function could be defined as part of the general options, it could then be handed a Scenario-like struct with information about the job and the input to decide what to do.

before_each: fn(%{job_name: job, input_name: input}) -> if job_name == ..., do: ... end

Pros:

  • Stuck together with the other options
  • Easy to do if before/after applies to all jobs

Cons:

  • Might require some matching logic if the befores/afters are only attached to certain jobs

The Job-Module

Instead of just defining a benchmarking function, one could define a whole module that implements or does not implement the before_/after_ methods.

defmodule MyJob do

  @behaviour Benchee.Job

  def before_scenario, do: # something before the whole scenario
  def before_each, do: # something before every invocation
  def benchmark, do: # actual benchmark
end

# ...

"My Job" => MyJob

Pros:

  • attached to a specific job so it's easy to do custom stuff for it
  • Introduces an (optional) module for benchmarking code, which somehow feels right

Cons:

  • bit of overhead if people want to do something before every benchmark
  • Might feel like a loss of simplicity for people

The passing of values

from before_* to the benchmarking function and the return value of the benchmarking function to after_*

From before_* to benchmarking function

This is the fun part. We could instate another (optional) argument to the benchmarking function but that'd just get confusing because it would always change based on if an input is supplied, if a before_* function is supplied and so on. No thanks.

We could pass the return values (and the input?) in a map so we could pattern match on it, I'd have to benchmark this but I feel slightly cautious about it since that'd add pattern matching overhead to the function invocation which could have a negative impact on micro benchmarks. Hell, people even benchmark pattern matching ;)

My favorite idea (so far) is to pass the input into before_scenario, treat it's return value as the new input and pass that to before_each and then pass the return value of before_each to the job function. If people just want to add stuff they can put it in a tuple, otherwise they just change the one argument etc...

From benchmarking function to after_*

Rather simply, we just take the return value (which :timer.tc gives us either way) and pass it on. Dunno if this deserved it's own section. 😜

Here have a little fresh photo (~1 minute old!) of my little bunny overlords as a thanks for reading this far!

img_20170828_163909

@devonestes
Copy link
Collaborator

I like where this is headed. I think that since this is such a large addition, it might make sense to try and find a sort of MVP for this feature. I would think that MVP could be to allow a user to:

  1. Define a function to run before and/or after each scenarios
  2. Define a function to run before and/or after each scenario in a given job

I think we can do that with a sort of combination of two of the options you have above. I would imagine that something like the following wouldn't be too hard for a user:

Benchee.run({
  "job 1" => {fn -> needs_setup, before_each: fn -> do_a_thing end, after_each: fn -> do_another_thing end},
  "job 2" => fn -> does_not_need_setup end  
}, before_each: fn -> seed_process_with_data end,
   after_each: fn -> truncate_database end)

I would think that the "global" setup in this case would run before the individual setup (like they do in RSpec). That seems to make sense to me - what do you think?

One thing I feel kind of strongly about is that the Job Module is asking a lot of a user. I would bet that for 85-90% of people, most of the time they use Benchee it'll be for pretty simple usage. Having these more advanced features is really great, but it would make sense to me to optimize for the more simple use case. Basically, make Benchee.run/2 as easy as possible to use for simple benchmarks.

I think it might make sense to keep Benchee.run/2 really simple and maybe not add these kinds of advanced configuration options to that function. We already have the sort of "lower level" data transformation approach available and documented, and it might make sense to only add options there so Benchee.run/2 doesn't keep getting bigger and more complicated.

Basically, I think it's fair to tell a user "If you want to do simple benchmarks, use Benchee.run/2. If you want to start doing advanced configuration, we have another API that sits below Benchee.run/2 with much more fine grained levels of customization that you're welcome to use. It's a little more difficult to use, but it gives you much more flexibility." That way we offer the flexibility for "advanced" users without sacrificing the simplicity of Benchee.run/2 for "beginner" users.

@PragTob
Copy link
Member Author

PragTob commented Aug 31, 2017

👋 Thanks for the input once again! 👍

I agree about the MVP thought, I wanna get it as far as possible before releasing because I don't wanna break API again :D Plus, I believe it's actually not that big. I believe function without getting passed in data (and some way to pass data along) can achieve very little if people write pure functions.

Ah it took me a while to realize what you want to do... so before_each really means before_each but you can supply it in both the global setup and for an individual job? Cool beans 👍 . That actually sounds more complicated to implement, but seems far more user friendly than either repeating it all over or having to switch on different scenarios/inputs all over.

_global setup vs. individual setup: global setup runs first definitely! 👍

Job Module maybe I didn't make my intention clear or I misunderstand the comment, the module would be optional as in normally you can just hand it a function and be done with it. If you want to do more fancy stuff provide a module. However, another downside that I see now is that often the module would be defined in a separate file which makes just posting the benchmarking code harder. Also, when we go "optional tuples" with the formatters in #124 then I also think doing about the same thing makes sense here.

I like the idea of an advanced interface versus beginner interface :) Although, the current "advanced" interface is very verbose and also more aimed at people who wanna just substitute a whole step/enhance it. If we can still comfortably fit something into Benchee.run then I'd do that :)

Omnomnom changes incoming

img_20170831_093033

@PragTob
Copy link
Member Author

PragTob commented Aug 31, 2017

I'll try to give this a somewhat time boxed shot :) 🚀

@PragTob
Copy link
Member Author

PragTob commented Aug 31, 2017

Ok and I hit the first fun edge case...

When iterations are too fast we run them multiple times to get some sort of sensible measurement. Now, we have 2 options:

  • run before_each before the repeated invocation, meaning we won't exactly run it before each function invocation
  • run before_each for every invocation, thereby adding it to the total of the measured time

I absolutely wanna avoid the latter option personally, but one way or another it's broken when we have to run the function multiple times. Either we run before_each just once before a "batch" of invocations or the before_each time gets added to the measured times. 😓

@devonestes
Copy link
Collaborator

I don't think we can run before_each before the batch. It needs to actually be run before each invocation of a given function. I can see plenty of ways it would blow up for a user if we did that.

I think we can definitely avoid having the before_each run times added to the actual function run times - it'll require some additional changes. The general shape of it could be like:

def record_runtime(before_each, fun, arg, after_each) do
  before_each.()
  run_time = :tc.timer(fn -> fun.(arg) end)
  after_each.()
  run_time
end

It'll take some modification, but I think something like the above might work. This will definitely add significant complexity to this operation, though. I can't see any way around that.

@PragTob
Copy link
Member Author

PragTob commented Aug 31, 2017

@devonestes we can't for repeated times :( What you're showing is the normal case and that's what I do, but for repeating it's different...

The whole problem why we do repeated measurements is that we can't get a good measurement for just one iteration through tc.timer because it is too fast so measurments don't give us anything. Which is why we repeat the call n times and measure the whole time:

    {microseconds, _return_value} = :timer.tc fn ->
      RepeatN.repeat_n(function, num_iterations)
    end

If we also wanted to repeat the before_each, it has to be within that repeat block. Chopping it up and adding the individual measurements together won't help because then the measurements will still be wrong.

@PragTob
Copy link
Member Author

PragTob commented Sep 16, 2017

Updated with progress :)

@PragTob
Copy link
Member Author

PragTob commented Sep 18, 2017

I'm gonna be taking a stab/stabs at the before_/after_secenario things :)

@PragTob
Copy link
Member Author

PragTob commented Oct 16, 2017

And we're all done 🎉

@PragTob PragTob closed this as completed Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants