Measure with best available precision #219

PragTob · 2018-05-10T18:03:39Z

Fixes #207

Introduce a separate measurer to use during warmup which uses the native time
add nano seconds to our list of available units
change up display of times in places

* native time measurer for determining n_times * added nano second unit, unit tests pass there PICK ME UP: * Integration tests failing * running fast_functions seems to break as well

PragTob · 2018-05-10T18:06:00Z

Fast functions are still fast but they don't trigger a fast warning anymore on nanosecond systems so we had to tag a couple of tests to not execute anymore. Their times still don't differ by much, I call compiler optimization so they all basically return static values.

tobi@speedy:~/github/benchee(nano-seconds)$ mix run samples/fast_functions.exs 
Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores: 8
Available memory: 15.61 GB
Elixir 1.6.4
Erlang 20.3

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 1 s
memory time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s


Benchmarking ++ array concat...
Benchmarking Enum.map(10)...
Benchmarking Integer addition...
Benchmarking String concatention...
Benchmarking adding a head to an array...
Benchmarking noop...

Name                                ips        average  deviation         median         99th %
String concatention             12.01 M       83.25 ns    ±39.91%          81 ns         115 ns
noop                            12.01 M       83.29 ns    ±38.90%          81 ns         108 ns
adding a head to an array       11.97 M       83.57 ns    ±33.53%          81 ns         123 ns
Integer addition                11.91 M       83.95 ns    ±52.59%          82 ns         117 ns
++ array concat                 11.69 M       85.51 ns   ±119.00%          81 ns         127 ns
Enum.map(10)                     1.87 M      533.42 ns  ±1974.45%         443 ns      876.06 ns

Comparison: 
String concatention             12.01 M
noop                            12.01 M - 1.00x slower
adding a head to an array       11.97 M - 1.00x slower
Integer addition                11.91 M - 1.01x slower
++ array concat                 11.69 M - 1.03x slower
Enum.map(10)                     1.87 M - 6.41x slower

Memory usage statistics:

Name                         Memory usage
String concatention                 616 B
noop                                616 B - 1.00x memory usage
adding a head to an array           616 B - 1.00x memory usage
Integer addition                    616 B - 1.00x memory usage
++ array concat                     616 B - 1.00x memory usage
Enum.map(10)                        424 B - 0.69x memory usage

Normal functions behave normal:

tobi@speedy:~/github/benchee(nano-seconds)$ mix run samples/run.exs 
Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores: 8
Available memory: 15.61 GB
Elixir 1.6.4
Erlang 20.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 28 s


Benchmarking flat_map...
Benchmarking map.flatten...

Name                  ips        average  deviation         median         99th %
flat_map           2.26 K      443.23 μs    ±13.60%      430.59 μs      816.87 μs
map.flatten        1.20 K      830.05 μs    ±18.35%      783.26 μs     1305.12 μs

Comparison: 
flat_map           2.26 K
map.flatten        1.20 K - 1.87x slower

Memory usage statistics:

Name           Memory usage
flat_map          625.54 KB
map.flatten       781.85 KB - 1.25x memory usage

**All measurements for memory usage were the same**

…nersion like we want

PragTob · 2018-05-10T18:31:15Z

ah jeez coveralls is failing now because in the runner all the repetition code is now untested as only the linux builds report to it :'(

We could adjust coveralls max coverage drop or extract the code and test it in more isolation.

* RepeatedMeasurement to unit test it and not pollute general runner with this very specific code for runtime measurements that's only relevant on Windows these days it seems * Hooks code in its own module to not pollute and reuse from both place * made `measure` public on the runner... not 100% happy with this but putting it into even another module seems like an overhead now. * temporarily duplicated `main_function` code

PragTob · 2018-05-11T12:59:43Z

Ok this should start passing now :)

PragTob

Don't be scared by the big diff please - this is a lot of moving code around and simple changes :)

PragTob · 2018-05-11T13:00:45Z

.travis.yml

@@ -4,7 +4,6 @@ elixir:
  - 1.5.3
  - 1.6.4
 otp_release:
-  - 18.3


scaling native to nano seconds threw an error on OTP 18 so I'm inclined to believe it doesn't support it. As newer elixir versions also drop support I thought it was ok for us to do so as well.

PragTob · 2018-05-11T13:02:01Z

lib/benchee/benchmark/runner.ex

-  This module actually runs our benchmark scenarios, adding information about
-  run time and memory usage to each scenario.
-  """
+  @moduledoc false


going towards 1.0 I think we should also be more vigilant what we want to be official API vs our internal stuff.

Also mainly did this because I made some methods here public to be used from our repeating code.

PragTob · 2018-05-11T13:04:25Z

lib/benchee/benchmark/runner.ex

-  end
+  @no_input Benchmark.no_input()
+  def main_function(function, @no_input), do: function
+  def main_function(function, input), do: fn -> function.(input) end


I had hoped to get rid off this together and saving it in the ScenarioContext - something I wanted to do for a long time as I also fear that this might introduce a performance overhead/stands in the way of optimizations.

However, as hooks can change the input it's not as easy. Now that we have our own measurers we could just call them with an argument if we feel like it and implement them like this. I'd wanna leave this for another day though :)

PragTob · 2018-05-11T13:05:27Z

test/benchee/benchmark/runner_test.exs

-        end)
-      end)
-    end
-


this test was always a flaky candidate and it started failing consistently with more than 3 std_dev so I don't think it's worth it anymore.

PragTob · 2018-05-11T13:07:44Z

lib/benchee/benchmark/repeated_measurement.ex

+        },
+        measurer
+      ) do
+    Runner.measure(scenario, scenario_context, measurer)


Not a fan of this personally. It's a circular dependency (Runner depends on us, we depend on runner) and if things go very run we might even trigger an endless loop.

Extracting the function to another module might be the better call but then it'd only have that one function (likely) and I had trouble coming up with a name for it which suggests that I didn't find the higher level concept yet or that it really wouldn't be an extraction of a higher level concept.

I think the thing we're seeing here is that these functions are really part of the Runner process, which, honestly, they are. I think that's the higher concept this dependency graph is trying to tell us. But I also don't think it's a problem to split this up as it is now.

PragTob · 2018-05-11T13:09:39Z

Rebuilding appveyor, it seems to have problems to access hex.pm

PragTob · 2018-05-12T09:59:15Z

Ok I wrote the beginning of a little benchmark I wanted to do for board games and compared this against master (yes I know lists are bad for this, end goal is to compare vs. maps, tuples, records):

defmodule BoardBuilder do
  def list(size) do
    range = 1..size
    Enum.map(range, fn _ ->
      Enum.map(range, fn _ -> :value end)
    end)
  end
end

alias Benchee.Utility.RepeatN

list = BoardBuilder.list(19)

Benchee.run(
  %{
    "get(0, 0)" => fn -> list |> Enum.at(0) |> Enum.at(0) end,
    "get(max, max)" => fn -> list |> Enum.at(18) |> Enum.at(18) end
  }
)

this branch:

Name                    ips        average  deviation         median         99th %
get(0, 0)            5.86 M      170.62 ns ±11344.34%         150 ns         226 ns
get(max, max)        2.08 M      481.84 ns  ±2347.39%         459 ns         693 ns

Comparison: 
get(0, 0)            5.86 M
get(max, max)        2.08 M - 2.82x slower

master:

# fast warning with I think even 100 repetition

Name                    ips        average  deviation         median         99th %
get(0, 0)           12.25 M      0.0816 μs   ±292.68%      0.0800 μs       0.130 μs
get(max, max)        2.62 M        0.38 μs     ±9.06%        0.38 μs        0.44 μs

Comparison: 
get(0, 0)           12.25 M
get(max, max)        2.62 M - 4.68x slower

so I decided to put in some manual repetition:

defmodule BoardBuilder do
  def list(size) do
    range = 1..size
    Enum.map(range, fn _ ->
      Enum.map(range, fn _ -> :value end)
    end)
  end
end

alias Benchee.Utility.RepeatN

list = BoardBuilder.list(19)

Benchee.run(
  %{
    "get(0, 0)" => fn -> RepeatN.repeat_n(fn -> list |> Enum.at(0) |> Enum.at(0) end, 100) end,
    "get(max, max)" => fn -> RepeatN.repeat_n(fn -> list |> Enum.at(18) |> Enum.at(18) end, 100) end
  }
)

Then my results are about the same as on master but more precise:

Name                    ips        average  deviation         median         99th %
get(0, 0)          119.71 K        8.35 μs    ±89.99%        8.25 μs       10.35 μs
get(max, max)       26.03 K       38.42 μs     ±8.97%       38.23 μs       40.53 μs

Comparison: 
get(0, 0)          119.71 K
get(max, max)       26.03 K - 4.60x slower

It makes it seem to me like the overhead from the function call from the function call is bigger than the one from the repetition - i.e. I feel that the repeated samples are somewhat more realistic 🤔

Now let me set an empty function call as the baseline we get these results:

Name                    ips        average  deviation         median         99th %
fn -> nil end       12.62 M       79.22 ns    ±34.72%          78 ns         103 ns
get(0, 0)            5.48 M      182.34 ns ±10699.05%         163 ns         263 ns
get(max, max)        2.04 M      489.85 ns  ±2556.78%         461 ns         698 ns

Comparison: 
fn -> nil end       12.62 M
get(0, 0)            5.48 M - 2.30x slower
get(max, max)        2.04 M - 6.18x slower

Let me deduct the empty average then we end up with: 103 and 410 which says that get(max, max) is about 4 times slower than my get(0, 0). If I do the same using the median (flipping the bird to outliers) We get 85 and 383 respectively arriving at ~ 4.5 slower

So now the question is - which one of these values is most trust worthy?

2.8 (original nanosecond branch)
4.0 (nanosecond branch trying to account for function call overhead with average)
~4.5-4.7 (master and manual repetition on nanosecond, nanosecond branch accounting for overhead using median)

devonestes

I think we're basically there. Just a couple little questions!

devonestes · 2018-05-18T09:30:39Z

lib/benchee/benchmark/hooks.ex

+
+  alias Benchee.Benchmark.{Scenario, ScenarioContext}
+
+  def run_before_scenario(


So, a formatting thing. I find these really tricky to read, and one thing that they're doing in the Elixir codebase itself is saying basically "if you don't need to pattern match in the function head, don't unless it fits on one line." Do you think you'd be cool with adopting a similar heuristic for benchee? So, this function would end up looking something like this:

def run_before_scenario(scenario, scenario_context) do %Scenario{ before_scenario: local_before_scenario, input: input } = scenario %ScenarioContext{ config: %{before_scenario: global_before_scenario} } = scenario_context input |> run_before_function(global_before_scenario) |> run_before_function(local_before_scenario) end

What do you think?

I've heard/seen this rule before... and I'm not sure. I sort of like just pattern matching on everything in the function "head" so that it's easy at a first glance to see what data the function needs.

Although to be fair, in your version it's fairly easy as well as the matches just moved to the first couple of lines. I'll think on it but yeah probably a good idea

Well, I also like that in this case, when you see pattern matching in a function, you know it's needed for functionality. Like, I expect to see multiple function heads if I see pattern matching in a function head now.

It's just a personal thing that I've sort of gotten used to. I'm sure I'd get used to the opposite with enough time as well 😉

I propose we postpone the discussion to a separate PR/discussion not to hold this up. I think I'll cook up a topic on elixirforum to see what the feelings in the community are :)

Yeah, totally fine! We can discuss more later.

devonestes · 2018-05-18T09:36:12Z

lib/benchee/benchmark/measure/native_time.ex

@@ -0,0 +1,18 @@
+defmodule Benchee.Benchmark.Measure.NativeTime do
+  @moduledoc """


If we're thinking about what should be public and private API now, I'm not sure this should be public. The only reasons that I can think of that would make stuff should be public is if users need it to write their benchmarks or formatters need it, and I don't think users or formatters will need these modules, right?

They shouldn't right 👍

devonestes · 2018-05-18T10:17:21Z

lib/benchee/benchmark/measure/time.ex

@@ -1,11 +1,23 @@
 defmodule Benchee.Benchmark.Measure.Time do
  @moduledoc """
-  Measure the time consumed by a executing function.
+  Measure the time elapsed while executing a given function.


Should this be private API as well?

Here I'm... a bit torn. It's nothing I expect 99% users of benchee to ever need - so probably not. On the other hand, it can contain documentation on how benchee conducts its measurements which might be interesting to the more inclined reader. Also as an example if someone wants to implement their own measurer

We don't allow for people to implement their own measurer yet - is that something you see coming in the future? That sort of strikes me as a thing we might not want to let people do, since you've put a lot of time and effort to make sure measurements are taken in the best way possible.

And we can still document in line with #s to show how it's done if folks are curious. It's a little tricky since José has said that the only way a library can mark something as private API is to not document it, so we're sort of without options here. My preference is usually to make as much private as possible, and then later on make things public if a good reason comes up to do so.

You're right, thanks Devon. Will change it!

devonestes · 2018-05-18T10:21:51Z

lib/benchee/benchmark/repeated_measurement.ex

+  # Mac OS and we've failed to produce a measurable function call that takes less than 10 nano
+  # seconds.
+  #
+  # That's also why this code lives in a separate module and not `Runner` - as it's rarely used


Thanks for the helpful documentation!

devonestes · 2018-05-18T10:29:53Z

lib/benchee/benchmark/repeated_measurement.ex

+        },
+        measurer
+      ) do
+    Runner.measure(scenario, scenario_context, measurer)


I think the thing we're seeing here is that these functions are really part of the Runner process, which, honestly, they are. I think that's the higher concept this dependency graph is trying to tell us. But I also don't think it's a problem to split this up as it is now.

devonestes · 2018-05-18T10:30:05Z

lib/benchee/benchmark/runner.ex

-  This module actually runs our benchmark scenarios, adding information about
-  run time and memory usage to each scenario.
-  """
+  @moduledoc false


PragTob added 3 commits May 10, 2018 14:01

Basics to get nanosecond measurements going

7efd6e7

* native time measurer for determining n_times * added nano second unit, unit tests pass there PICK ME UP: * Integration tests failing * running fast_functions seems to break as well

And we are green with nano second level measurements

bebdeb2

get time display right again

7a68352

Fix Credo, CI, remove Erlang 18 as it doesnt seem to support time cov…

915d1c7

…nersion like we want

PragTob added 4 commits May 10, 2018 20:32

changelog

2dc2a98

Ok just reuse the main_function from Runner for now

95ef82c

Test Repetition code even on platforms that don't trigger it/unit

a446195

PragTob commented May 11, 2018

View reviewed changes

devonestes reviewed May 18, 2018

View reviewed changes

be more private about our modules

dd5393b

PragTob merged commit e885f6b into master May 21, 2018

PragTob deleted the nano-seconds branch May 21, 2018 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure with best available precision #219

Measure with best available precision #219

PragTob commented May 10, 2018

PragTob commented May 10, 2018

PragTob commented May 10, 2018 •

edited

PragTob commented May 11, 2018

PragTob left a comment

PragTob May 11, 2018

PragTob May 11, 2018

devonestes May 18, 2018

PragTob May 11, 2018

PragTob May 11, 2018

PragTob May 11, 2018

devonestes May 18, 2018

PragTob commented May 11, 2018

PragTob commented May 12, 2018

devonestes left a comment

devonestes May 18, 2018

PragTob May 18, 2018

devonestes May 18, 2018

PragTob May 21, 2018

devonestes May 21, 2018

devonestes May 18, 2018

PragTob May 18, 2018

devonestes May 18, 2018

PragTob May 18, 2018

devonestes May 18, 2018

PragTob May 19, 2018

devonestes May 18, 2018

devonestes May 18, 2018

devonestes May 18, 2018


		alias Benchee.Benchmark.{Scenario, ScenarioContext}

		def run_before_scenario(

		@@ -0,0 +1,18 @@
		defmodule Benchee.Benchmark.Measure.NativeTime do
		@moduledoc """

Measure with best available precision #219

Measure with best available precision #219

Conversation

PragTob commented May 10, 2018

PragTob commented May 10, 2018

PragTob commented May 10, 2018 • edited

PragTob commented May 11, 2018

PragTob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PragTob commented May 11, 2018

PragTob commented May 12, 2018

devonestes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PragTob commented May 10, 2018 •

edited