New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure memory 3 #168

Merged
merged 2 commits into from Feb 8, 2018

Conversation

Projects
None yet
3 participants
@devonestes
Collaborator

devonestes commented Jan 9, 2018

I think :erlang.process_info(self(), :memory) was the trick! Using
that function, I got no negative values, and relatively consistent
memory usage. This just adds those measured values to our scenarios,
but doesn't do anything with them yet.

Here's a screenshot of the unique values I got after running the test
I added. It seems consistently giving us a highest value of 133952 for
the function we're measuring in the test.

For now I build this on top of #167 to make merging easier later on, so
just check the third commit if you want to see the new stuff.

memory_usage

Encouraging progress!

@PragTob

looks good imo :) Thanks 🎉

It has the problem that we only track the current process (so no other processes are measured) but there's some ways around that for future us.

(Namely taking whole BEAM memory again or allowing to track the memory of specified processes - I think we might need to implement these differents variants of measuring memory but not now) :)

@@ -155,17 +160,20 @@ defmodule Benchee.Benchmark.Runner do
defp iteration_time(scenario, scenario_context = %ScenarioContext{
num_iterations: num_iterations
}) do
measure_iteration(scenario, scenario_context) / num_iterations
{microseconds, memory_usage} = measure_iteration(scenario, scenario_context)
{microseconds / num_iterations, memory_usage}

This comment has been minimized.

@PragTob

PragTob Jan 9, 2018

Owner

shouldn't we also divid memory_usage by num_iterations?

This comment has been minimized.

@devonestes

devonestes Jan 9, 2018

Collaborator

Yep - good catch!

@devonestes

This comment has been minimized.

Collaborator

devonestes commented Jan 9, 2018

Actually, we are tracking the memory of the process running the benchmarks! I realized that since we're calling this within the process that's running in the Task, self() in this case is in reference to the Task process, not in reference to the main process. For example, I just put some logging in there to check, and I got this:

MAIN PROCESS ID: #PID<0.232.0>
TASK PROCESS ID: #PID<0.232.0>
TASK PROCESS ID: #PID<0.233.0>
TASK PROCESS ID: #PID<0.234.0>
TASK PROCESS ID: #PID<0.235.0>
TASK PROCESS ID: #PID<0.236.0>
TASK PROCESS ID: #PID<0.237.0>
TASK PROCESS ID: #PID<0.232.0>
TASK PROCESS ID: #PID<0.238.0>
TASK PROCESS ID: #PID<0.239.0>
TASK PROCESS ID: #PID<0.240.0>
TASK PROCESS ID: #PID<0.241.0>
TASK PROCESS ID: #PID<0.242.0>
TASK PROCESS ID: #PID<0.243.0>
TASK PROCESS ID: #PID<0.244.0>
TASK PROCESS ID: #PID<0.245.0>
TASK PROCESS ID: #PID<0.246.0>
TASK PROCESS ID: #PID<0.247.0>
TASK PROCESS ID: #PID<0.248.0>
TASK PROCESS ID: #PID<0.249.0>
TASK PROCESS ID: #PID<0.250.0>
TASK PROCESS ID: #PID<0.251.0>
TASK PROCESS ID: #PID<0.252.0>
TASK PROCESS ID: #PID<0.253.0>
TASK PROCESS ID: #PID<0.254.0>
TASK PROCESS ID: #PID<0.255.0>
TASK PROCESS ID: #PID<0.256.0>
TASK PROCESS ID: #PID<0.257.0>
TASK PROCESS ID: #PID<0.258.0>
TASK PROCESS ID: #PID<0.259.0>
TASK PROCESS ID: #PID<0.260.0>
TASK PROCESS ID: #PID<0.261.0>
TASK PROCESS ID: #PID<0.262.0>

So, those are the values for self() as we're running the actual benchmarks. The main process is used to call the benchmarking function once (for warmup) which is why it's duplicated, but after that all executions of the scenario take place in separate processes.

I have to admit, keeping track of what process is running what is starting to get a little complicated at times, but I think that's why this is working so well now. It's basically the same idea we spoke about earlier, but without having to spin up a GenServer.

@PragTob

This comment has been minimized.

Owner

PragTob commented Jan 9, 2018

@devonestes Yes I much prefer this to a separate GenServer and I meant yesterday. So, tracking self is good, but that's the problem I mean: We are only tracking the current benchmarking process. If the benchmark directly or indirectly involves spawning additional processes we won't measure their memory consumption. That might be new processes that are spawned during benchmarking (super hard to track manually afaik) - it might also be existing processes that are used to handle the work load and which also grow over time in size, like worker processes that take over work (think ecto or something bigger). This won't track their memory consumptio as far as I can tell :)

@devonestes

This comment has been minimized.

Collaborator

devonestes commented Jan 10, 2018

Ohhh, I get it now... Yeah, this approach will only work for benchmarks that are contained within a single process. That's a pretty big drawback.

I have a feeling that it might be really difficult to measure the "whole picture" of memory usage. Let's imagine the following code is being benchmarked:

def expensive_operation(big_list) do
  results =
    big_list
    |> Enum.map(fn(element) -> Task.async(fn() -> expensive_function(element) end) end)
    |> Enum.map(&Task.await(&1, :infinity))

  List.flatten(results)
end

# get initial memory usage snapshot
# run function
# get total memory usage snapshot
# memory used = total memory - initial memory

I don't think those Task processes will still be around after the function completes, right? I think they'll be shut down and their process heaps will be allocated back to the VM as unused memory. So we wouldn't know how much memory is being used because any process that's finished during the function being measured is garbage collected before we get a chance to measure it.

Maybe we could use some sort of asyncronous process that monitors the VM memory usage during the execution of the function being measured? Something like this:

  1. Spin up process that continually takes measurements of total VM memory usage
  2. Run the function to be measured
    3 When the function finishes, get all readings from the memory measurement process

Because the BEAM VM will allocate time fairly to all the processes with frequent context changes, that "memory monitor" process would actually get a pretty good idea of the memory used during the execution of the function, and we can take whatever the highest number is there as the max_memory_used or something? This would of course mean that it's pulling CPU resources to do that memory monitoring and the speed would be impacted, but that's not the end of the world.

What do you think - go with this single process measurement for now and just make it really clear that we're only measuring memory consumption for the "main" process that the benchmark runs in and not any other processes that it spawns, or try and get the "whole picture" memory consumption picture?

Boy, this is way harder than in Ruby! Maybe we need to keep our own fork of OTP that allows us to turn off GC that folks can use for memory measurement? That kind of sounds easier at this point! :trollface:

@PragTob

This comment has been minimized.

Owner

PragTob commented Jan 12, 2018

:trollface:

haha yeah we should do that :D

Great find on the case with tasks... I'm not sure if the VM immedeatly reclaims the memory after the process finished but that'd make sense if it is reclaimed (concurrently albeit, so race condition likely).

I think a single process is like the ~80% use case (although Michal won't be as happy ;P) and then we can improve upon it later through:

  • option to track memory usage of additional processes known upfront (this is what Michal wants for ecto afaik)
  • other "max memory usage during execution" measurement method

We'll see whenever we develop the 2nd but I think both have their advantages/disadvantages. I think we can implement either in abackwards compatible release so it could be minor releases and we don't have to hold off on 1.0 but can already supply some form of memory measurement.

It's also good for me to leave my perfectionism behind a bit :D

Thanks a lot for this!

@@ -101,7 +105,7 @@ defmodule Benchee.Benchmark.Runner do
end
defp measure_runtimes(scenario, context, run_time, fast_warning)
defp measure_runtimes(scenario, _, 0, _), do: scenario
defp measure_runtimes(scenario, _, 0, _), do: []

This comment has been minimized.

@devonestes

devonestes Jan 13, 2018

Collaborator

Found this bug in working on this. measure_runtimes/4 now returns a list, not a Scenario struct. I figured that an empty list was enough, and the modifications I made here ended up catching this bug with existing tests, so I didn't think I needed to add a new one.

@PragTob

This comment has been minimized.

Owner

PragTob commented Jan 15, 2018

Sorry, forgot to comment here. Wanted to hold off here until we get 0.12 out of the door so that we can focus on memory for 0.13 (so that we actually have a UI etc. for it and not just gather it).

Also means I'll backport the bugfix and go up in tears that the test suite didn't catch it.

PragTob added a commit that referenced this pull request Jan 15, 2018

devonestes added a commit that referenced this pull request Jan 16, 2018

Merge pull request #174 from PragTob/0-runtime-handling
Backported fix for 0 runtime handling from #168
@OvermindDL1

This comment has been minimized.

OvermindDL1 commented Jan 16, 2018

Great find on the case with tasks... I'm not sure if the VM immedeatly reclaims the memory after the process finished but that'd make sense if it is reclaimed (concurrently albeit, so race condition likely).

As I recall, when an actor/process 'ends' (returns) the memory is 'reclaimed' immediately, though it is reclaimed to ERTS for immediate re-use, not to the underlying OS.

@PragTob

This comment has been minimized.

Owner

PragTob commented Feb 2, 2018

👋
With 0.12 out of the door I think it might be a good time to go nuts on us checking how memory measurements work/can work.

Plan would be:

  • get this merge
  • compute and populate statistics
  • console formatter output
  • output in HTML (as a dev version I'm fine skipping the others, but I'm REALLY interested in seeing the raw memory graph)

What do you think?

Al likes the sound of this.

img_20180126_102207

Add measurement of memory usage
This gives us a very basic measurement of memory usage. It is
constrained to measuring the memory used by the _single_ process
executing the function. If the function being measured spawns further
processes, the memory used by those processes would not be counted with
this method. However, this single process measurement represents the
vast majority of use cases for Benchee, so it's a solid first step.
@devonestes

This comment has been minimized.

Collaborator

devonestes commented Feb 5, 2018

Seems like a good plan! I've rebased this so it should be in good shape to merge now, and then we can get cracking on the rest of that plan for 0.13!

@PragTob

minor thing otherwise golden, thank you!

img_20180205_193223

)
when current_time > end_time do
# restore correct order - important for graphing
Enum.reverse(run_times)
{Enum.reverse(run_times), memory_usages}

This comment has been minimized.

@PragTob

PragTob Feb 6, 2018

Owner

I think we also want correct order for the memory usages :)

@PragTob

This comment has been minimized.

Owner

PragTob commented Feb 8, 2018

Thanks 🎉

@PragTob PragTob merged commit 3b60c17 into master Feb 8, 2018

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.05%) to 95.0%
Details

@PragTob PragTob deleted the measure-memory-3 branch Feb 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment