New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First shot at an appveyor configuration #195

Merged
merged 3 commits into from Apr 4, 2018

Conversation

Projects
None yet
3 participants
@PragTob
Copy link
Owner

PragTob commented Mar 12, 2018

Fixes #79

@PragTob PragTob force-pushed the appveyor branch from 4dc14e9 to 5d1ef2f Mar 12, 2018

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 12, 2018

It builds 🎉 🎉 🎉

But apparently it's super slow and hence a bunch of tests fail 😢

https://ci.appveyor.com/project/PragTob/benchee/build/build19

............................................................................................................................................................................................
138
139
  1) test .run_scenarios doesn't take longer than advertised for very fast funs (Benchee.Benchmark.RunnerTest)
140
     test/benchee/benchmark/runner_test.exs:162
141
     excution took too long 47000 vs. 30000 +- 1.2e4
142
     code: retrying(fn ->
143
     stacktrace:
144
       test/benchee/benchmark/runner_test.exs:163: (test)
145
146
..................................................................
147
148
  2) test .run_scenarios runs a benchmark suite and enriches it with measurements (Benchee.Benchmark.RunnerTest)
149
     test/benchee/benchmark/runner_test.exs:43
150
     Assertion with >= failed
151
     code:  assert length(run_times) >= 5
152
     left:  4
153
     right: 5
154
     stacktrace:
155
       test/benchee/benchmark/runner_test.exs:44: (test)
156
157
..........
158
159
  3) test .run_scenarios each triggers for every invocation, scenario once (Benchee.Benchmark.RunnerTest)
160
     test/benchee/benchmark/runner_test.exs:587
161
     Assertion with >= failed
162
     code:  assert hook_call_count >= 2
163
     left:  1
164
     right: 2
165
     stacktrace:
166
       test/benchee/benchmark/runner_test.exs:656: (test)
167
168
..........
169
170
  4) test formatters have full access to the suite data, values in assigns (BencheeTest)
171
     test/benchee_test.exs:231
172
     Assertion with =~ failed
173
     code:  assert output =~ ~r"Run time: #{@rough_10_milli_s}$"m
174
     left:  "Operating System: Windows\nCPU Information: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz\nNumber of Available Cores: 2\nAvailable memory: 1.46 GB\nElixir 1.6.3\nErlang 20.2\nBenchmark suite executing with the following configuration:\nwarmup: 5 ms\ntime: 10 ms\nparallel: 1\ninputs: none specified\nEstimated total run time: 15 ms\n\n\nBenchmarking Sleeps...\nRun time: 16 ms\nAverage: 16 ms\nCustom value\n"
175
     right: ~r/Run time: ((8|9|10|11|12|13|14)\.\d{2} ms)$/m
176
     stacktrace:
177
       test/benchee_test.exs:261: anonymous fn/0 in BencheeTest."test formatters have full access to the suite data, values in assigns"/1
178
       test/benchee_test.exs:232: (test)
179
180
181
182
  5) test formatters can be supplied as the output/1 function (BencheeTest)
183
     test/benchee_test.exs:194
184
     Expected false or nil, got true
185
     code: refute Regex.match?(~r"fast"i, output)
186
     stacktrace:
187
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
188
       test/benchee_test.exs:208: (test)
189
190
191
192
  6) test integration expanded README example (BencheeTest)
193
     test/benchee_test.exs:99
194
     Expected false or nil, got true
195
     code: refute Regex.match?(~r"fast"i, output)
196
     stacktrace:
197
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
198
       test/benchee_test.exs:116: (test)
199
200
...
201
202
  7) test erlang style :benchee integration (BencheeTest)
203
     test/benchee_test.exs:84
204
     Expected false or nil, got true
205
     code: refute Regex.match?(~r"fast"i, output)
206
     stacktrace:
207
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
208
       test/benchee_test.exs:96: (test)
209
210
.
211
212
  8) test formatters can be supplied as just the module name (BencheeTest)
213
     test/benchee_test.exs:177
214
     Expected false or nil, got true
215
     code: refute Regex.match?(~r"fast"i, output)
216
     stacktrace:
217
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
218
       test/benchee_test.exs:191: (test)
219
220
.
221
222
  9) test integration keywordlist as options in second place (BencheeTest)
223
     test/benchee_test.exs:69
224
     Expected false or nil, got true
225
     code: refute Regex.match?(~r"fast"i, output)
226
     stacktrace:
227
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
228
       test/benchee_test.exs:81: (test)
229
230
.
231
232
 10) test .run_scenarios populates results for all inputs (Benchee.Benchmark.RunnerTest)
233
     test/benchee/benchmark/runner_test.exs:265
234
     Assertion with >= failed
235
     code:  assert length(run_times_for(new_suite, "sleep", "Short wait")) >= 8
236
     left:  7
237
     right: 8
238
     stacktrace:
239
       test/benchee/benchmark/runner_test.exs:281: anonymous fn/0 in Benchee.Benchmark.RunnerTest."test .run_scenarios populates results for all inputs"/1
240
       test/benchee/benchmark/runner_test.exs:266: (test)
241
242
.....
243
244
 11) test inputs feature version of readme example (BencheeTest)
245
     test/benchee_test.exs:267
246
     Expected false or nil, got true
247
     code: refute Regex.match?(~r"fast"i, output)
248
     stacktrace:
249
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
250
       test/benchee_test.exs:281: (test)
251
252
...
253
254
 12) test integration high level README example old school map config (BencheeTest)
255
     test/benchee_test.exs:53
256
     Expected false or nil, got true
257
     code: refute Regex.match?(~r"fast"i, output)
258
     stacktrace:
259
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
260
       test/benchee_test.exs:66: (test)
261
262
...
263
264
 13) test .run_scenarios runs a suite with multiple jobs and gathers results (Benchee.Benchmark.RunnerTest)
265
     test/benchee/benchmark/runner_test.exs:60
266
     Assertion with >= failed
267
     code:  assert length(run_times_for(new_suite, "Name 2")) >= 8
268
     left:  7
269
     right: 8
270
     stacktrace:
271
       test/benchee/benchmark/runner_test.exs:61: (test)
272
273
..................
274
275
Finished in 10.1 seconds
276
106 doctests, 217 tests, 13 failures

For reference, even elixir 1.4 with erlang 18.3 on travis took 3.8 seconds...

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 12, 2018

So that leaves us at an every dangerous crossroad:

  • make our performance tests even more lenient and/or retrying
  • tag performance sensitive tests not to run on Windows CI and declare them good enough on Linux?

There's also some other stuff which seems to fail that's not entirely related to fast/slow - will have to look another day though :)

@devonestes

This comment has been minimized.

Copy link
Collaborator

devonestes commented Mar 13, 2018

This is great! Well, great-ish.

I'm only seeing failures related to speed things. Like, either the expected counts for run times don't match, or it's not printing the fast function warning, which we're expecting to see in the captured output.

As to a solution, I don't think we should make the existing performance tests more lenient. I think the two best options are:

  1. Excluding those performance tests from AppVeyor and just ignoring performance specifically for Windows, or
  2. Keeping a separate set of performance tests for AppVeyor with different expectations.

Personally, I think the cases where we would make a change that doesn't fail one of these performance tests on Linux but would make benchee noticeably slower on Windows are so rare that we don't need to worry about it. Frankly, I don't think it would ever happen, but I can't say that for sure.

That's why I'd lean towards option 1, tagging these tests to be skipped on Windows machines.

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 13, 2018

the regex thing still has me puzzled... it refutes that fast occurs in the text. But it does occur... so are they unbeknowingly super fast or does "fast" just occur somewhere else? To the debug we go!

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 13, 2018

Omg... we actually print the fast warning... but WHY? It's slow in the measurement... it's weird:

3) test formatters can be supplied as just the module name (BencheeTest)
164
     test/benchee_test.exs:177
165
     Refute with =~ failed
166
     code:  refute output =~ ~r"fast"i
167
     left:  "Operating System: Windows\nCPU Information: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz\nNumber of Available Cores: 2\nAvailable memory: 1.46 GB\nElixir 1.6.3\nErlang 20.2\nBenchmark suite executing with the following configuration:\nwarmup: 5 ms\ntime: 10 ms\nparallel: 1\ninputs: none specified\nEstimated total run time: 30 ms\n\n\nBenchmarking flat_map...\nWarning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning\n\nYou may disable this warning by passing print: [fast_warning: false] as configuration options.\n\nBenchmarking map.flatten...\n\nName                  ips        average  deviation         median         99th %\nflat_map              800        1.25 ms     ±0.00%        1.25 ms        1.25 ms\nmap.flatten         62.50          16 ms     ±0.00%          16 ms          16 ms\n\nComparison: \nflat_map              800\nmap.flatten         62.50 - 12.80x slower\n"
168
     right: ~r/fast/i
169
     stacktrace:
170
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
171
       test/benchee_test.exs:191: (test)
172
173
..
174

(I mean look at the average it's milliseconds)

I have no idea? Hopeful insight from @OvermindDL1 ?

@OvermindDL1

This comment has been minimized.

Copy link

OvermindDL1 commented Mar 19, 2018

Is it printed to stderr instead of stdout or so? (/me has not looked yet)

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 19, 2018

@OvermindDL1 problem is that we print the warning at all, which shouldn't happen at all as the benchmark isn't that fast - I vaguely remembered that there was something that the native time unit might change but couldn't find it again. Also :timer.tc's doc says it returns it in microseconds so I don't see how our usually slower appveyor should be that much faster all of a sudden

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 28, 2018

Good news! I'm able to relatively reliably reproduce all/most of this errors - my last test run also had 13 errors and also the very weird "fast" errors. Also the errors about being too slow. Mind you this is the exact same machine I often develop on with linux. So either elixir/erlang performance on Windows is that bad or it's Windows itself ;P

Funnily enough, I also get the fast warning when I just run mix run samples/run.exs so something is definitely fishy here:

Operating System: Windows
CPU Information: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores: 8
Available memory: 15.94 GB
Elixir 1.6.4
Erlang 20.2
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
parallel: 1
inputs: none specified
Estimated total run time: 24 s


Benchmarking flat_map...
Warning: The function you are trying to benchmark is super fast, making measurements more unreliable!
This holds especially true for memory measurements.
See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

You may disable this warning by passing print: [fast_warning: false] as configuration options.

Benchmarking map.flatten...
Warning: The function you are trying to benchmark is super fast, making measurements more unreliable!
This holds especially true for memory measurements.
See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

You may disable this warning by passing print: [fast_warning: false] as configuration options.


Name                  ips        average  deviation         median         99th %
flat_map           1.25 K        0.80 ms     ±6.33%        0.78 ms        0.94 ms
map.flatten        0.81 K        1.23 ms     ±4.00%        1.25 ms        1.25 ms

Comparison:
flat_map           1.25 K
map.flatten        0.81 K - 1.54x slower

(seems to be about half the speed of what it does on Linux)

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 28, 2018

Grumble mumble windows grumble mumble.... TLDR; the code is right

19> Data1000 = lists:seq(1, 1000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
20> Data10000 = lists:seq(1, 10000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
21> Data100000 = lists:seq(1, 100000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
22> timer:tc(lists, map, [Succ, Data1000]).
{0,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}
23> timer:tc(lists, map, [Succ, Data10000]).
{15000,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}
24> timer:tc(lists, map, [Succ, Data100000]).
{77999,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}

Seems like the native time unit in Erlang on Windows isn't Microseconds but Milliseconds. That's why it doesn't register.

There's something on time correction in the erlang docs - but before we all go mumbo jumbo - timer:tc/1 already does the right thing:

tc(F) ->
    T1 = erlang:monotonic_time(),
    Val = F(),
    T2 = erlang:monotonic_time(),
    Time = erlang:convert_time_unit(T2 - T1, native, microsecond),
{Time, Val}.

source

--> we either gotta run more juicy benchmarks or also exclude these or those specific checks on Windows. Sigh.

Side mark, it's also documented in the elixir System docs where I feel it should be mentioned that you can also just use timer.tc - I should send a patch and see what they say.

@devonestes

This comment has been minimized.

Copy link
Collaborator

devonestes commented Mar 29, 2018

Woah! That's a pretty huge difference between Windows and Unix!!

Happy you got to the bottom of it! 🎉

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Mar 30, 2018

Interestingly, the native unit on Linux seems to be nanoseconds. I believe I tried this out once - but probably didn't take monotonic time 🤦‍♂️

Might be worth another shot, it means we have to replace :timer.tc but that's fine by me - we'll basically call out to the same functions as it does so we should be as fast as it is.

iex(11)> time = :erlang.monotonic_time()
-576460357193550789
iex(12)> :erlang.convert_time_unit(time, :native, :microsecond)
-576460357193551
iex(13)> :erlang.convert_time_unit(time, :native, :nanosecond)
-576460357193550789
iex(14)> :erts_internal.time_unit()
1000000000

edit: leaves the question if we use the above workaround (converting times until the times are the same) or if we're naughty and call :erts_internal and trust our CI to tell us if erlang people wanna remove it :D

%% As the module name imply, this module is here for ERTS internal
%% functionality. As an application programmer you should *never*
%% call anything in this module directly. Functions exported by
%% this module may change behaviour or be removed at any time
%% without any notice whatsoever. Everything in this module is
%% intentionally left undocumented, and should remain so.
@OvermindDL1

This comment has been minimized.

Copy link

OvermindDL1 commented Mar 30, 2018

Just tested here, Windows and Linux report VERY different time values, this really seems like a bug to me, maybe report it? At the very least not reporting the temporal range of the return is a bug.

PragTob added some commits Mar 13, 2018

Get better error messages for regex test failures
* =~ seems to print the text that was matched against while
  Regex.match? doesn't

@PragTob PragTob force-pushed the appveyor branch from faa6ef2 to 955d9af Apr 1, 2018

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Apr 1, 2018

Excluded theses tests or the one problematic assertions and did it on my windows machine --> high hopes that this passes now :)

@OvermindDL1 What exactly do you mean by different values? You mean return values from timer.tc or monotonic time or what? But yeah I'm thinking of reporting a couple of things:

  • timer:tc variant/new function that always reports in native time unit but also returns the native time unit
  • an interface to maybe access the native time unit of the running erlang vm (there's a workaround I can think of and I don't like it...)
  • can we maybe have better time resolution than milliseconds on Windows?

Not sure how the OTP team rolls but wanna report these one of these days :)

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Apr 2, 2018

It passed 💃

@OvermindDL1

This comment has been minimized.

Copy link

OvermindDL1 commented Apr 2, 2018

For grabbing time like monotonic time. The extra accuracy on linux is welcome, but we need to know what that accuracy is. ^.^;

Thus I'd think an extra function to get the timing accuracy of the BEAM VM on the current system would be both very nice and backwards compatible.

@devonestes
Copy link
Collaborator

devonestes left a comment

This all looks good to me!! I'm happy we got to the bottom of why that's happening (and that you included that history in comments 😉)

@PragTob

This comment has been minimized.

Copy link
Owner Author

PragTob commented Apr 4, 2018

:shipit:

@PragTob PragTob merged commit 250bd0f into master Apr 4, 2018

5 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 94.216%
Details

@PragTob PragTob deleted the appveyor branch Apr 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment