New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding `Benchee.CollectionData` struct #264

Merged
merged 3 commits into from Feb 11, 2019

Conversation

Projects
None yet
2 participants
@devonestes
Copy link
Collaborator

devonestes commented Jan 29, 2019

There are two parts to this commit - adding a new Benchee.CollectionData struct, and renaming the previous measurers to collectors. Now when we have a new thing that we're benchmarking, we keep the raw samples and the calculated statistics together for that thing.

Resolves #259

@devonestes devonestes force-pushed the Issue-259 branch from cb424f7 to 09d321a Jan 29, 2019

@PragTob
Copy link
Owner

PragTob left a comment

👋

Heryho, thanks for tackling this! 🎉

Getting it in now would delay a release that we've delayed for long already (hell I looked it up, I implemented nano second measurements in May last year and we still haven't released it), on the other hand this touches about everything so if we can't merge it soon it'd be a bad experience for us.

Once again, I'll try if I can get 0.14 released this weekend and then we can see whether to do it or not do it.

As for the PR, I left lots of comments but they're mainly the same and mostly minor:

  • Should the main keys in Scenario be called #{type}_data, #{type}s or just #{type}
  • I think whenever we setup test data, we should use the correct struct aka CollectionData and not just a map - I know we don't reliably do this with other stuff everywhere but I think it'd be the right call? What do you think? (Also this sounds bothersome, but I think a search and replace should fix this rather easily)
  • I think when matching on it we don't always have to specify the Struct, although it can be nice in places to provide more conext/name the concepts 🤷‍♀️

whatsapp image 2019-01-21 at 15 24 38 1

Show resolved Hide resolved lib/benchee/benchmark/scenario.ex
Show resolved Hide resolved lib/benchee/collection_data.ex Outdated
Show resolved Hide resolved lib/benchee/benchmark/runner.ex
Show resolved Hide resolved lib/benchee/conversion.ex
Show resolved Hide resolved lib/benchee/conversion.ex
Show resolved Hide resolved test/benchee/formatters/console/memory_test.exs Outdated
maximum: 333.3,
mode: 201.2,
sample_size: 50_000
memory_usage_data: %{

This comment has been minimized.

@PragTob

PragTob Feb 9, 2019

Owner

Yeah I think everywhere in this file it should be DataCollection here in teh test setup :)

Show resolved Hide resolved test/benchee/formatters/console/memory_test.exs Outdated
Show resolved Hide resolved test/benchee/formatters/console/memory_test.exs Outdated
Show resolved Hide resolved test/benchee/formatters/console/run_time_test.exs Outdated

PragTob added a commit that referenced this pull request Feb 9, 2019

Add Scenario.data_processed? to simplify checks in formatters
Formatters often have to check whether run time or memory data
was processed - so if data was collected at all and if statistics
have already been generated so that they can be rendered.

This should simplify this/not tie all of them to the internal
data structure.

The decision as made to check directly on statistics to see if
values actually made it into it, rather than just checking the
configuration if it was intended to measure, so that we're sure
that values are there.

This might sound like a check on `Suite` to check if all scenarios
have all data might be a good thing, however mostly formatters
split things by input so it will more often be checked on a
sub set of scenarios or individual scenarios.

This PR doesn't use the function in our formatters yet, to avoid
merge conflicts with #264 :) #264 will have to adjust its
implementation though, but that should be very easy and straight
forward.

PragTob added a commit that referenced this pull request Feb 10, 2019

Add Scenario.data_processed? to simplify checks in formatters (#268)
* Add Scenario.data_processed? to simplify checks in formatters

Formatters often have to check whether run time or memory data
was processed - so if data was collected at all and if statistics
have already been generated so that they can be rendered.

This should simplify this/not tie all of them to the internal
data structure.

The decision as made to check directly on statistics to see if
values actually made it into it, rather than just checking the
configuration if it was intended to measure, so that we're sure
that values are there.

This might sound like a check on `Suite` to check if all scenarios
have all data might be a good thing, however mostly formatters
split things by input so it will more often be checked on a
sub set of scenarios or individual scenarios.

This PR doesn't use the function in our formatters yet, to avoid
merge conflicts with #264 :) #264 will have to adjust its
implementation though, but that should be very easy and straight
forward.
@devonestes

This comment has been minimized.

Copy link
Collaborator Author

devonestes commented Feb 10, 2019

I keep going back and forth about where to put ‘CollectionData’ actually. My thinking is this:

In pattern matches, it’s best to pattern match on the “simplest” possible pattern to do what we need. Basically, if I’m seeing a pattern match on one struct type in one place, then I would expect to see a similar match on a different type right next to it.

Same with test data. I sort of like the idea of using the simplest test data that will work. If a bare map will work, then there’s no reason to over specify in the tests the data that our function accepts. It makes it (theoretically) easier to both change the implementation later on if the tests are as general as possible, and to refactor the existing implementation if the test isn’t as coupled to the current implementation.

The trade off there I guess is in explicitness and self-documentation.

I do think that in doctests I should probably specify the struct, since those tests serve a specific documentation purpose.

That said, those are just theoretical benefits/drawbacks and my personal taste. I don’t think it will make a big difference to the actual maintenance later on.

One thing I feel pretty strongly about is the “memory_usage_data” and “run_time_data” keys. Just “run_time” or “run_times” I think is much too general for the value for those keys. Otherwise you need to know the type of the data stored at that key to know what that “run_time” is referring to, and I think it should be the other way around - they key tells you (or at least gives you a hint about) what’s stored in that key.

So, I think I’m going to go back and add they struct type, but I also think we should keep those key names.

@PragTob

This comment has been minimized.

Copy link
Owner

PragTob commented Feb 10, 2019

👋

Didn't expect you back here yet :D

If you feel strongly about the key names, I'm happy to keep them! Was more a hunch with me 👍

Regarding pattern matches, I agree - simplest thing that works seems good 👍

As for test data, I used to think the same - aka just maps with the needed keys in tests and we're good. However, now I think we should use the right struct more because:

  • we have type specs, then technically the tests violate the type specs 🤷‍♂️
  • I think it's still kinda the simplest thing that works because the difference between the map and the struct is specifying the struct type - we still only supply the keys relevant to the test in question so I think that property still kinda holds
  • if we'd ever pattern match on the struct in function heads or what not, the tests would break as we only supplied a map - so supplying the struct seems to make it more robust to me :)

Cheerio!

29090720_339834176521801_6828362667602739200_n

devonestes added some commits Jan 29, 2019

Refactor for collectors
This just changes some names of files and functions from `measurer` to
`collector, `measure` to `collect`.
Add new `Benchee.CollectionData` struct
Now when we have a thing we're measuring, we keep all the collected
samples and the calculated statistics together in a single data
structure.

@devonestes devonestes force-pushed the Issue-259 branch from 0ccf0cf to 3aa0af7 Feb 11, 2019

@devonestes devonestes force-pushed the Issue-259 branch from 3aa0af7 to 864b030 Feb 11, 2019

@PragTob
Copy link
Owner

PragTob left a comment

🚀

@PragTob PragTob merged commit bdfeb2e into master Feb 11, 2019

5 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 93.81%
Details

@PragTob PragTob deleted the Issue-259 branch Feb 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment