BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

qwhelan · 2018-12-17T06:05:10Z

This PR closes #775 in two different ways:

The clock used for determining the number of runs to target sample_time is time.time(), which has a resolution of 15.6ms on Windows. This leads to a lot of error in the proper scaling and is clearly better off being a wall time measurement
Additionally, we change the default for Benchmark.timer to timeit.default_timer(). This fixes the poor benchmark resolution on Windows

To see proof of the latter, we can run the following:

asv run HEAD^..HEAD --record-samples

When testing this change, be careful to not use asv run to iterate through commits as it uses asv/benchmark.py from HEAD. Instead, step through with git and the above command.

For our baseline, we see the following output in asv's own benchmarks:

    "results": {
        "step_detect.Simple.time_detect_regressions": {
            "result": [
                0.125
            ],
            "samples": [
                [
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.109375
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.109375,
                        0.125
                    ],
                    "max": 0.125,
                    "mean": 0.1234375,
                    "min": 0.109375,
                    "number": 1,
                    "q_25": 0.125,
                    "q_75": 0.125,
                    "repeat": 10,
                    "std": 0.0046875
                }
            ]
        },
        "step_detect.Simple.time_solve_potts_approx": {
            "result": [
                0.015625
            ],
            "samples": [
                [
                    0.015625,
                    0.0,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.0,
                    0.015625,
                    0.015625
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.0,
                        0.015625
                    ],
                    "max": 0.015625,
                    "mean": 0.0125,
                    "min": 0.0,
                    "number": 1,
                    "q_25": 0.015625,
                    "q_75": 0.015625,
                    "repeat": 10,
                    "std": 0.006250000000000001
                }
            ]
        }
    },

As you can see, the quantization of process_time() on Windows is very clear.

And now with this PR:

    "results": {
        "step_detect.Simple.time_detect_regressions": {
            "result": [
                0.1247326135635376
            ],
            "samples": [
                [
                    0.12278103828430176,
                    0.1226201057434082,
                    0.13060903549194336,
                    0.1316690444946289,
                    0.1291341781616211,
                    0.12431907653808594,
                    0.1271800994873047,
                    0.12331914901733398,
                    0.1220710277557373,
                    0.12514615058898926
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.1220710277557373,
                        0.1316690444946289
                    ],
                    "max": 0.1316690444946289,
                    "mean": 0.12588489055633545,
                    "min": 0.1220710277557373,
                    "number": 1,
                    "q_25": 0.12291556596755981,
                    "q_75": 0.128645658493042,
                    "repeat": 10,
                    "std": 0.003353031858757166
                }
            ]
        },
        "step_detect.Simple.time_solve_potts_approx": {
            "result": [
                0.011478066444396973
            ],
            "samples": [
                [
                    0.011333942413330078,
                    0.011632919311523438,
                    0.011541128158569336,
                    0.011580944061279297,
                    0.011635065078735352,
                    0.01182699203491211,
                    0.011229991912841797,
                    0.011066913604736328,
                    0.011046171188354492,
                    0.01141500473022461
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.011046171188354492,
                        0.01182699203491211
                    ],
                    "max": 0.01182699203491211,
                    "mean": 0.011430907249450683,
                    "min": 0.011046171188354492,
                    "number": 1,
                    "q_25": 0.011255979537963867,
                    "q_75": 0.011619925498962402,
                    "repeat": 10,
                    "std": 0.0002458795224520173
                }
            ]
        }
    },

In both cases, we see that the standard deviations have decreased - in the latter case, they're reduced by 25x. This effect would be even more pronounced if the mean didn't happen to be close to the process_time() quantum.

h-vetinari

Haven't had time to test it yet, but LGTM aside from two things I don't understand.

asv/benchmark.py

h-vetinari · 2018-12-17T17:42:37Z

benchmarks/step_detect.py

@@ -17,4 +17,4 @@ def time_detect_regressions(self):
            step_detect.detect_regressions(self.y)

    def time_solve_potts_approx(self):
-        step_detect.solve_potts_approx(self.y, 0.3, p=1)
+        step_detect.solve_potts_approx(self.y, [0.3] * len(self.y), gamma=1)


I don't understand how that change relates to the PR.

This is actually #779, which came about from my wanting to use asv's own suite to demonstrate this change - but the one benchmark in the vicinity of 15.6ms was broken due to a typo. It's necessary to reproduce the results above, but I will remove prior to merge.

CHANGES.rst

docs/source/benchmarks.rst

pv · 2018-12-17T18:14:51Z

Looks ok, modulo comments above.
Not sure how I'd write tests for this.
The CI failures seem to be due to some changes in Conda, not going to look fixing that in immediate future so they should be ignored.

issues on Windows

qwhelan · 2018-12-17T18:59:49Z

@pv All changes implemented.

As far as testing, a simple regression test could look at the samples output from a test like:

def test_pass():
    pass

For the baseline, we get:

            "samples": [
                [
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    0.0,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    0.0,
                    1.939644470927057e-07,
                    1.939644470927057e-07
                ]
            ],

And for this PR:

            "samples": [
                [
                    1.2104441769390362e-07,
                    1.1922779948882427e-07,
                    1.2616902625495711e-07,
                    1.2339782410223055e-07,
                    1.2387379393762252e-07,
                    1.2318628195316746e-07,
                    1.2182183509171048e-07,
                    1.2359878914384048e-07,
                    1.2223434228238353e-07,
                    1.2985250392551827e-07
                ]
            ],

Simply looking for zeros (or values < clock resolution) seems like it would be sufficient (albeit stochastic, and much less reliable on slower machines like cloud VMs)

qwhelan mentioned this pull request Dec 17, 2018

Default sample_time < clock resolution on Windows, leading to very noisy results #775

Closed

h-vetinari reviewed Dec 17, 2018

View reviewed changes

pv reviewed Dec 17, 2018

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

docs/source/benchmarks.rst Outdated Show resolved Hide resolved

BUG: change default timer to timeit.default_timer to fix resolution

c5ee784

issues on Windows

qwhelan force-pushed the windows_clock_fix branch from 826e58e to c5ee784 Compare December 17, 2018 18:48

pv merged commit 58f5396 into airspeed-velocity:master Dec 20, 2018

pv added this to the 0.4 milestone Dec 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

qwhelan commented Dec 17, 2018

h-vetinari left a comment

h-vetinari Dec 17, 2018

qwhelan Dec 17, 2018

pv commented Dec 17, 2018 •

edited

qwhelan commented Dec 17, 2018

BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

Conversation

qwhelan commented Dec 17, 2018

h-vetinari left a comment

Choose a reason for hiding this comment

h-vetinari Dec 17, 2018

Choose a reason for hiding this comment

qwhelan Dec 17, 2018

Choose a reason for hiding this comment

pv commented Dec 17, 2018 • edited

qwhelan commented Dec 17, 2018

pv commented Dec 17, 2018 •

edited