Adding `Benchee.CollectionData` struct #264

devonestes · 2019-01-29T09:57:42Z

There are two parts to this commit - adding a new Benchee.CollectionData struct, and renaming the previous measurers to collectors. Now when we have a new thing that we're benchmarking, we keep the raw samples and the calculated statistics together for that thing.

Resolves #259

PragTob

👋

Heryho, thanks for tackling this! 🎉

Getting it in now would delay a release that we've delayed for long already (hell I looked it up, I implemented nano second measurements in May last year and we still haven't released it), on the other hand this touches about everything so if we can't merge it soon it'd be a bad experience for us.

Once again, I'll try if I can get 0.14 released this weekend and then we can see whether to do it or not do it.

As for the PR, I left lots of comments but they're mainly the same and mostly minor:

Should the main keys in Scenario be called #{type}_data, #{type}s or just #{type}
I think whenever we setup test data, we should use the correct struct aka CollectionData and not just a map - I know we don't reliably do this with other stuff everywhere but I think it'd be the right call? What do you think? (Also this sounds bothersome, but I think a search and replace should fix this rather easily)
I think when matching on it we don't always have to specify the Struct, although it can be nice in places to provide more conext/name the concepts 🤷‍♀️

lib/benchee/benchmark/scenario.ex

lib/benchee/collection_data.ex

lib/benchee/benchmark/runner.ex

lib/benchee/conversion.ex

test/benchee/formatters/console/memory_test.exs

PragTob · 2019-02-09T11:21:30Z

test/benchee/formatters/console/memory_test.exs

-            maximum: 333.3,
-            mode: 201.2,
-            sample_size: 50_000
+          memory_usage_data: %{


Yeah I think everywhere in this file it should be DataCollection here in teh test setup :)

test/benchee/formatters/console/memory_test.exs

test/benchee/formatters/console/run_time_test.exs

Formatters often have to check whether run time or memory data was processed - so if data was collected at all and if statistics have already been generated so that they can be rendered. This should simplify this/not tie all of them to the internal data structure. The decision as made to check directly on statistics to see if values actually made it into it, rather than just checking the configuration if it was intended to measure, so that we're sure that values are there. This might sound like a check on `Suite` to check if all scenarios have all data might be a good thing, however mostly formatters split things by input so it will more often be checked on a sub set of scenarios or individual scenarios. This PR doesn't use the function in our formatters yet, to avoid merge conflicts with #264 :) #264 will have to adjust its implementation though, but that should be very easy and straight forward.

* Add Scenario.data_processed? to simplify checks in formatters Formatters often have to check whether run time or memory data was processed - so if data was collected at all and if statistics have already been generated so that they can be rendered. This should simplify this/not tie all of them to the internal data structure. The decision as made to check directly on statistics to see if values actually made it into it, rather than just checking the configuration if it was intended to measure, so that we're sure that values are there. This might sound like a check on `Suite` to check if all scenarios have all data might be a good thing, however mostly formatters split things by input so it will more often be checked on a sub set of scenarios or individual scenarios. This PR doesn't use the function in our formatters yet, to avoid merge conflicts with #264 :) #264 will have to adjust its implementation though, but that should be very easy and straight forward.

devonestes · 2019-02-10T09:41:37Z

I keep going back and forth about where to put ‘CollectionData’ actually. My thinking is this:

In pattern matches, it’s best to pattern match on the “simplest” possible pattern to do what we need. Basically, if I’m seeing a pattern match on one struct type in one place, then I would expect to see a similar match on a different type right next to it.

Same with test data. I sort of like the idea of using the simplest test data that will work. If a bare map will work, then there’s no reason to over specify in the tests the data that our function accepts. It makes it (theoretically) easier to both change the implementation later on if the tests are as general as possible, and to refactor the existing implementation if the test isn’t as coupled to the current implementation.

The trade off there I guess is in explicitness and self-documentation.

I do think that in doctests I should probably specify the struct, since those tests serve a specific documentation purpose.

That said, those are just theoretical benefits/drawbacks and my personal taste. I don’t think it will make a big difference to the actual maintenance later on.

One thing I feel pretty strongly about is the “memory_usage_data” and “run_time_data” keys. Just “run_time” or “run_times” I think is much too general for the value for those keys. Otherwise you need to know the type of the data stored at that key to know what that “run_time” is referring to, and I think it should be the other way around - they key tells you (or at least gives you a hint about) what’s stored in that key.

So, I think I’m going to go back and add they struct type, but I also think we should keep those key names.

PragTob · 2019-02-10T10:34:56Z

👋

Didn't expect you back here yet :D

If you feel strongly about the key names, I'm happy to keep them! Was more a hunch with me 👍

Regarding pattern matches, I agree - simplest thing that works seems good 👍

As for test data, I used to think the same - aka just maps with the needed keys in tests and we're good. However, now I think we should use the right struct more because:

we have type specs, then technically the tests violate the type specs 🤷‍♂️
I think it's still kinda the simplest thing that works because the difference between the map and the struct is specifying the struct type - we still only supply the keys relevant to the test in question so I think that property still kinda holds
if we'd ever pattern match on the struct in function heads or what not, the tests would break as we only supplied a map - so supplying the struct seems to make it more robust to me :)

Cheerio!

This just changes some names of files and functions from `measurer` to `collector, `measure` to `collect`.

Now when we have a thing we're measuring, we keep all the collected samples and the calculated statistics together in a single data structure.

PragTob

🚀

devonestes force-pushed the Issue-259 branch from cb424f7 to 09d321a Compare January 29, 2019 10:12

PragTob reviewed Feb 9, 2019

View reviewed changes

PragTob mentioned this pull request Feb 9, 2019

Add Scenario.data_processed? to simplify checks in formatters #268

Merged

devonestes added 2 commits February 11, 2019 17:23

Refactor for collectors

be7e01f

This just changes some names of files and functions from `measurer` to `collector, `measure` to `collect`.

Add new Benchee.CollectionData struct

646a1fd

Now when we have a thing we're measuring, we keep all the collected samples and the calculated statistics together in a single data structure.

devonestes force-pushed the Issue-259 branch from 0ccf0cf to 3aa0af7 Compare February 11, 2019 16:30

specify struct types in tests

864b030

devonestes force-pushed the Issue-259 branch from 3aa0af7 to 864b030 Compare February 11, 2019 16:34

PragTob reviewed Feb 11, 2019

View reviewed changes

PragTob merged commit bdfeb2e into master Feb 11, 2019

PragTob deleted the Issue-259 branch February 11, 2019 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `Benchee.CollectionData` struct #264

Adding `Benchee.CollectionData` struct #264

devonestes commented Jan 29, 2019 •

edited

Loading

PragTob left a comment •

edited

Loading

PragTob Feb 9, 2019

devonestes commented Feb 10, 2019

PragTob commented Feb 10, 2019

PragTob left a comment

Adding Benchee.CollectionData struct #264

Adding Benchee.CollectionData struct #264

Conversation

devonestes commented Jan 29, 2019 • edited Loading

PragTob left a comment • edited Loading

Choose a reason for hiding this comment

PragTob Feb 9, 2019

Choose a reason for hiding this comment

devonestes commented Feb 10, 2019

PragTob commented Feb 10, 2019

PragTob left a comment

Choose a reason for hiding this comment

Adding `Benchee.CollectionData` struct #264

Adding `Benchee.CollectionData` struct #264

devonestes commented Jan 29, 2019 •

edited

Loading

PragTob left a comment •

edited

Loading