New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate memory stats #180

Merged
merged 18 commits into from Mar 10, 2018

Conversation

Projects
None yet
3 participants
@devonestes
Collaborator

devonestes commented Feb 20, 2018

Ok, at this point a lot is going on here. Right now there are minimal tests on the new behavior, which I've used to refactor the initial messy implementation into something not too bad. At this point, things I still have left to do are:

  • rename some functions that still imply relationship to run time measurements
  • add a bunch more tests for new behavior
  • document this fully
  • add extended output for memory usage stats if requested

The major things I've done in this PR so far are:

  • add @michalmuskala's new memory measurement module
  • add memory measurement to the console formatter
  • split the console formatter into three modules - one for run times, one for memory, and one for shared functions between the two of them

So, right now just about everything is working (except the extended statistics for memory), and if y'all wanted to you could play with it. It'll take some time to fully test and document everything, which is why I wanted to put this up now to get feedback while I'm starting on that stuff.

devonestes added some commits Feb 17, 2018

Rough calculation of statistics for memory usage
A bunch of tests need to be updated and a ton of functions need to be
renamed, but I want to make this just work first so Tobi can play
around with it.
Add actually useful memory measurement
Got this from @michalmuskala, and it works _way_ better than what we
had. Actually, because of how well it works, we need to add a new
output in cases where the deviation is 0%.
Fix memory measurement to be in bytes
Turns out the previous calculation was in words, which we need to
multiply by the word size (which is either 4 or 8 bytes depending on
the CPU architecture).
All tests passing
Now that measuring memory usage is optional, we have far fewer tests to
update!

I still need to add comprehensive tests to measure memory usage, then
refactor a ton of stuff cause the current implementation is a mess, and
then I need to document the heck out of this.
Add first tests
I think we need some more before we can say this is ready for
refactoring.
Split console formatter on run time vs. memory usage
I was having some issues keeping things straight with everything in one
file for the console formatter, especially since there was so much that
was _similar_ but subtly different, so now I've pulled that apart into
three modules so it'll be easier to keep stuff straight.
Update output for memory measurement
Now, if we're not showing statistics because all of the measurements
were the same, we only show the comparison report.
Add tests for MemoryMeasure
I'm just covering that it returns the correct result (in bytes) and the
result of the function to be measured.
@PragTob

This comment has been minimized.

Owner

PragTob commented Feb 21, 2018

Wow, this is terrific! Thank you so much for all the work you put into this! Also thanks to @michalmuskala on this!!

20180120_124611-animation

I don't have the mental capacity to review it atm - but I'll take it and play with it :D

Quick fix relative display of memory usage
Memory usage statistics:

Name           Memory usage
flat_map          625.54 KB
map.flatten       781.85 KB - 0.80x more memory

--->

Memory usage statistics:

Name           Memory usage
flat_map          625.54 KB
map.flatten       781.85 KB - 1.25x memory usage
@PragTob

This comment has been minimized.

Owner

PragTob commented Feb 21, 2018

Contributed a short fix for relative memory display 🎉

Memory usage statistics:

Name           Memory usage
flat_map          625.54 KB
map.flatten       781.85 KB - 0.80x more memory

--->

Memory usage statistics:

Name           Memory usage
flat_map          625.54 KB
map.flatten       781.85 KB - 1.25x memory usage
@PragTob

This comment has been minimized.

Owner

PragTob commented Feb 21, 2018

FYI I just ran my TCO blogpost code (tail-recursive vs. body-recursive) with memory measurements enabled, results look good:

tobi@comfy ~/github/elixir_playground $ mix run bench/tco_small_sample.exs 
Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Number of Available Cores: 4
Available memory: 7.67 GB
Elixir 1.6.1
Erlang 20.2
Benchmark suite executing with the following configuration:
warmup: 10 s
time: 20 s
parallel: 1
inputs: Big (1 Million), Bigger (5 Million), Middle (100 Thousand), Small (10 Thousand)
Estimated total run time: 6 min


Benchmarking body-recursive with input Big (1 Million)...
Benchmarking body-recursive with input Bigger (5 Million)...
Benchmarking body-recursive with input Middle (100 Thousand)...
Benchmarking body-recursive with input Small (10 Thousand)...
Benchmarking stdlib map with input Big (1 Million)...
Benchmarking stdlib map with input Bigger (5 Million)...
Benchmarking stdlib map with input Middle (100 Thousand)...
Benchmarking stdlib map with input Small (10 Thousand)...
Benchmarking tail-recursive with input Big (1 Million)...
Benchmarking tail-recursive with input Bigger (5 Million)...
Benchmarking tail-recursive with input Middle (100 Thousand)...
Benchmarking tail-recursive with input Small (10 Thousand)...

##### With input Big (1 Million) #####
Name                     ips        average  deviation         median         99th %
stdlib map             36.53       27.37 ms    ±12.06%       27.98 ms       29.02 ms
body-recursive         35.21       28.40 ms    ±10.47%       28.71 ms       30.03 ms
tail-recursive         32.22       31.04 ms    ±15.69%       30.24 ms       40.86 ms

Comparison: 
stdlib map             36.53
body-recursive         35.21 - 1.04x slower
tail-recursive         32.22 - 1.13x slower

Memory usage statistics:

Name              Memory usage
stdlib map            15.26 MB
body-recursive        15.26 MB - 1.00x memory usage
tail-recursive        28.74 MB - 1.88x memory usage

**All measurements for memory usage were the same**

##### With input Bigger (5 Million) #####
Name                     ips        average  deviation         median         99th %
tail-recursive          5.58      179.14 ms    ±17.44%      173.43 ms      332.49 ms
stdlib map              5.06      197.75 ms     ±2.16%      197.19 ms      215.91 ms
body-recursive          4.82      207.58 ms     ±2.08%      207.76 ms      220.62 ms

Comparison: 
tail-recursive          5.58
stdlib map              5.06 - 1.10x slower
body-recursive          4.82 - 1.16x slower

Memory usage statistics:

Name              Memory usage
tail-recursive       150.15 MB
stdlib map            76.30 MB - 0.51x memory usage
body-recursive        76.30 MB - 0.51x memory usage

**All measurements for memory usage were the same**

##### With input Middle (100 Thousand) #####
Name                     ips        average  deviation         median         99th %
stdlib map            445.85        2.24 ms    ±44.11%        2.12 ms        2.78 ms
body-recursive        421.86        2.37 ms    ±45.30%        2.22 ms        2.98 ms
tail-recursive        320.45        3.12 ms     ±5.86%        3.15 ms        3.75 ms

Comparison: 
stdlib map            445.85
body-recursive        421.86 - 1.06x slower
tail-recursive        320.45 - 1.39x slower

Memory usage statistics:

Name              Memory usage
stdlib map             1.53 MB
body-recursive         1.53 MB - 1.00x memory usage
tail-recursive         1.80 MB - 1.18x memory usage

**All measurements for memory usage were the same**

##### With input Small (10 Thousand) #####
Name                     ips        average  deviation         median         99th %
stdlib map            3.96 K      252.47 μs    ±49.77%         237 μs     1300.85 μs
body-recursive        3.83 K      261.47 μs    ±47.22%         246 μs     1309.53 μs
tail-recursive        3.22 K      310.91 μs   ±125.02%         294 μs      957.68 μs

Comparison: 
stdlib map            3.96 K
body-recursive        3.83 K - 1.04x slower
tail-recursive        3.22 K - 1.23x slower

Memory usage statistics:

Name              Memory usage
stdlib map           156.85 KB
body-recursive       156.85 KB - 1.00x memory usage
tail-recursive       291.46 KB - 1.86x memory usage

**All measurements for memory usage were the same**

@michalmuskala

This comment has been minimized.

Contributor

michalmuskala commented Feb 23, 2018

Leaving a comment here, so it's not forgotten.

As we talked about, there's a need for some tests with the memory measurement module to make sure it won't leak processes when the user code raises or fails in some other way.

devonestes added some commits Feb 24, 2018

Add tests for memory measure module and memory formatting module
We now have pretty solid testing over most of the newly introduced
behavior here, including making sure that we're not leaking processes
in the event of a failure by a user's function.
Add documentation
Now we have the Readme updated, and a solid example case so folks can
see how to enable this and how the output looks.
defp measure_time_and_memory(function, true) do
{microseconds, return_value} = :timer.tc(function)
{_, memory_usage} = MemoryMeasure.apply(function)
{microseconds, memory_usage, return_value}

This comment has been minimized.

@michalmuskala

michalmuskala Feb 24, 2018

Contributor

Does this mean the tests with memory measurements will run twice as long? If so, is it included in the time estimate?

What if instead we had a separate configuration like memory_measure_time or something like that? This would mean the two things - measuring memory and ips would be completely separate. I think it could make sense, since they are measured separate anyway. Additionally, because memory measurements are so consistent, it might make sense to default it to just a very small number of runs.

This comment has been minimized.

@devonestes

devonestes Feb 26, 2018

Collaborator

At the moment, the run time remains unchanged. So if you're measuring memory usage, you'll be getting a significantly smaller sample size.

I think the UX for this feature is going to need some time to settle. There's going to be a balance between what is intuitive vs. what we just have to document and hope that folks actually read that we'll need to find, and I'm not sure we're quite there yet.

For example, I could make an argument that we could take just one memory usage measurement and document that choice, making it clear that if for some reason there is randomness in your benchmark, that won't be measured. Or, we could separate them entirely - that would work too!

I kind of want to release this as it is first and see how folks use it, and once we get a little more user feedback then that will help inform how things work later on.

mem_collected = get_collected_memory(tracer)
final =
{result, (info_after[:heap_size] - info_before[:heap_size] + mem_collected) * word_size}

This comment has been minimized.

@michalmuskala

michalmuskala Feb 24, 2018

Contributor

Would it make sense to switch the order in the tuple to be consistent with :timer.tc?

This comment has been minimized.

@devonestes

devonestes Feb 26, 2018

Collaborator

Yeah, that couldn't hurt.

safe_exit(tracer, parent)
catch
action, argument ->
IO.puts("Received `#{action}` with the argument `#{argument}`")

This comment has been minimized.

@michalmuskala

michalmuskala Feb 24, 2018

Contributor

We should be able to cover both things with just catch:

try do
  # ...
catch
  kind, reason ->
    stacktrace = System.stacktrace()
    IO.puts(Exception.format(kind, reason, stacktrace))
end
:otp_release
|> :erlang.system_info()
|> to_string
|> Integer.parse()

This comment has been minimized.

@michalmuskala

michalmuskala Feb 24, 2018

Contributor

List.to_integer will be simpler.

This comment has been minimized.

@devonestes

devonestes Feb 26, 2018

Collaborator

🙇

|> to_string
|> Integer.parse()
if otp_version > 18 do

This comment has been minimized.

@michalmuskala

michalmuskala Feb 24, 2018

Contributor

I think it might be better to do this using tags and configure from the test_helper.exs. For example, this whole module could use @moduletag :memory_measurements and then in test_helper.exs it could call:

otp_release = List.to_integer(:erlang.system_info(:otp_release))
exclusions = if otp_release > 18, do: [], else: [memory_measure: true]

ExUnit.start(exclude: exclusions)

This comment has been minimized.

@devonestes

devonestes Feb 26, 2018

Collaborator

Oh, yeah, I forgot to fix this all up a bit. Danke!

@PragTob

Looks great, thank you so much for all the work your poured into this! 😍 💚

Sorry it took me so long to get to this, I really wanted to have a relaxed moment (or rather multiple) to give this its due diligence. :)

Test wise I think it'd be nice to have a high level test in the benchee test or/and the runenr test but these might already exist from the previous approach in which case I'll shut up :)

img_20180302_120240-animation

@@ -0,0 +1,2 @@
elixir 1.6.0
erlang 18.3

This comment has been minimized.

@PragTob

This comment has been minimized.

@devonestes

devonestes Mar 6, 2018

Collaborator

Yeah, I don't know what this is, but I always get this file when I use asdf. Is it not normal?

I usually try to avoid checking it in, but I'll delete it now.

This comment has been minimized.

@PragTob

PragTob Mar 7, 2018

Owner

I don't usually get that file 🤔

You could add it to your global git ignore

README.md Outdated
@@ -584,6 +584,25 @@ Benchee.run(%{}, load: "save.benchee")
In the more verbose API this is triggered via `Benchee.load/1`.
### Measuring memory consumption
Starting with version 0.13, users can now get measurements of how much memory their benchmarks use. This measurement is **not** the actual effect on the size of the BEAM VM size, but the total amount of memory that was allocated during the execution of a given scenario. This includes all memory that was garbage collected during the execution of that scenario.

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

This is only the total amount of memory that was allocated in the given benchmarking function, right? Every other process that was started or used more memory is not taken into account.

This comment has been minimized.

@devonestes

devonestes Mar 6, 2018

Collaborator

Yeah, that can be clearer. Let me make that more explicit.

README.md Outdated
This measurement of memory does not affect the measurement of run times.
In cases where all measurements of memory consumption are identical, which happens very frequently, the full statistics will be omitted from the standard console formatter.

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

Maybe we wanna add that because if a function is deterministic it is likely to require the same amount of memory.

otp_version = List.to_integer(:erlang.system_info(:otp_release))
{microseconds, memory_usage, return_value} =
measure_time_and_memory(function, measure_memory && otp_version > 18)

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

I think we could do this at an earlier stage, i.e. the config and then force set it to false? Might be difficult for the tests alas.. I mean, we're already getting the otp version in system so in a measure_iteration call seems rather late. Maybe before we even start benchmarking? Not sure.

{microseconds, memory_usage_after - memory_usage_before}
{microseconds, memory_usage, _return_value} =
measure_time_and_memory(function, measure_memory)

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

this doesn't respect the same otp logic as above? Although I'd like to get rid off it anyway :)

%Scenario{scenario | run_time_statistics: stats}
end)
@spec statistics(Suite.t()) :: Suite.t()
def statistics(suite = %Suite{scenarios: scenarios, configuration: %{measure_memory: measure_memory}}) do

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

if we kept memory_usages we wouldn't need to worry about configuration here. It'd just be an empty list and we could then just go out of our way.

For instance, I think right now on OTP 18 this might break

median = Map.fetch!(percentiles, 50)
mode = Mode.mode(run_times)
minimum = Enum.min run_times
maximum = Enum.max run_times

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

😢 😢 😢

Such beauty.. utterly destroyed.

mode: 400,
minimum: 200,
maximum: 900,
sample_size: 8
}
}
]
}
"""
def add_percentiles(suite = %Suite{scenarios: scenarios}, percentile_ranks) do

This comment has been minimized.

@PragTob

PragTob Mar 5, 2018

Owner

is this function actually called/used anywhere?

This comment has been minimized.

@devonestes

devonestes Mar 9, 2018

Collaborator

Huh, funny enough it doesn't look like it is. I remember when we added this it was so folks could see data on custom percentiles, but maybe we forgot to fully implement it?

Should we remove it, or make that feature actually work?

This comment has been minimized.

@PragTob

PragTob Mar 9, 2018

Owner

Good point that's what it was there for... I ummm - dunno. It's not a point to handle in this PR. So far no one but us wanted custom percentiles so maybe we drop it... on the other hand, it shouldn't be hard to implement and it does sound nifty 🤔

@@ -1,4 +1,4 @@
defmodule Benchee.ConversionTest do
use ExUnit.Case
use ExUnit.Case, async: true

This comment has been minimized.

@PragTob
MemoryMeasure.apply(Enum, :to_list, [1..10])
assert memory_used > 350
assert memory_used < 380

This comment has been minimized.

@PragTob
Incorporate feedback from Tobi
This is now much better. Still a couple things we might want to change,
but I think we're much closer now.
@PragTob

🎉 🎉 🎉

Looks great, I have a couple of more minor points somewhere but we can do those another time. This has been open long enough and should finally land :)

There's one super tiny thing but I'll just fix it right away.

Thanks Devon! 💚 Also thanks @michalmuskala for idea, input et. al. 👋

yum yum I like to measure the length of sticks

img_20180226_084549-animation

column_descriptors(label_width, hide_statistics),
scenario_reports(scenarios, units, label_width, hide_statistics),
comparison_report(scenarios, units, label_width, config, hide_statistics)
])

This comment has been minimized.

@PragTob

PragTob Mar 10, 2018

Owner

Lots better, thanks!

format_comparison(scenario, units, label_width, slower)
end)
end
defp calculate_slower_value(0, _), do: "N/A"
defp calculate_slower_value(_, 0), do: "N/A"

This comment has been minimized.

@PragTob
stacktrace = System.stacktrace()
IO.puts(Exception.format(kind, reason, stacktrace))
exit(:normal)
kind, reason -> graceful_exit(kind, reason, tracer, parent)

This comment has been minimized.

@PragTob

PragTob Mar 10, 2018

Owner

Loving high level methods, thanks! 👍

@@ -126,7 +126,7 @@ defmodule BencheeTest do
test "integration super fast function warning is printed once per job" do
output = capture_io fn ->
Benchee.run(%{"Fast" => fn -> 0 end}, time: 0.001, warmup: 0.001)
Benchee.run(%{"Fast" => fn -> 0 end}, time: 1.001, warmup: 0.001)

This comment has been minimized.

@PragTob

PragTob Mar 10, 2018

Owner

this shouldn't be like this - should it?

@PragTob PragTob merged commit 6763dcc into master Mar 10, 2018

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.3%) to 94.707%
Details

@PragTob PragTob deleted the calculate-memory-stats branch Mar 10, 2018

@PragTob

This comment has been minimized.

Owner

PragTob commented Mar 10, 2018

🎉

@PragTob PragTob referenced this pull request Mar 10, 2018

Closed

Measure memory consumption #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment