Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: change default timer to timeit.default_timer to fix resolution on Windows #780

Merged
merged 1 commit into from
Dec 20, 2018

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Dec 17, 2018

This PR closes #775 in two different ways:

  • The clock used for determining the number of runs to target sample_time is time.time(), which has a resolution of 15.6ms on Windows. This leads to a lot of error in the proper scaling and is clearly better off being a wall time measurement
  • Additionally, we change the default for Benchmark.timer to timeit.default_timer(). This fixes the poor benchmark resolution on Windows

To see proof of the latter, we can run the following:

asv run HEAD^..HEAD --record-samples

When testing this change, be careful to not use asv run to iterate through commits as it uses asv/benchmark.py from HEAD. Instead, step through with git and the above command.

For our baseline, we see the following output in asv's own benchmarks:

    "results": {
        "step_detect.Simple.time_detect_regressions": {
            "result": [
                0.125
            ],
            "samples": [
                [
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.125,
                    0.109375
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.109375,
                        0.125
                    ],
                    "max": 0.125,
                    "mean": 0.1234375,
                    "min": 0.109375,
                    "number": 1,
                    "q_25": 0.125,
                    "q_75": 0.125,
                    "repeat": 10,
                    "std": 0.0046875
                }
            ]
        },
        "step_detect.Simple.time_solve_potts_approx": {
            "result": [
                0.015625
            ],
            "samples": [
                [
                    0.015625,
                    0.0,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.015625,
                    0.0,
                    0.015625,
                    0.015625
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.0,
                        0.015625
                    ],
                    "max": 0.015625,
                    "mean": 0.0125,
                    "min": 0.0,
                    "number": 1,
                    "q_25": 0.015625,
                    "q_75": 0.015625,
                    "repeat": 10,
                    "std": 0.006250000000000001
                }
            ]
        }
    },

As you can see, the quantization of process_time() on Windows is very clear.

And now with this PR:

    "results": {
        "step_detect.Simple.time_detect_regressions": {
            "result": [
                0.1247326135635376
            ],
            "samples": [
                [
                    0.12278103828430176,
                    0.1226201057434082,
                    0.13060903549194336,
                    0.1316690444946289,
                    0.1291341781616211,
                    0.12431907653808594,
                    0.1271800994873047,
                    0.12331914901733398,
                    0.1220710277557373,
                    0.12514615058898926
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.1220710277557373,
                        0.1316690444946289
                    ],
                    "max": 0.1316690444946289,
                    "mean": 0.12588489055633545,
                    "min": 0.1220710277557373,
                    "number": 1,
                    "q_25": 0.12291556596755981,
                    "q_75": 0.128645658493042,
                    "repeat": 10,
                    "std": 0.003353031858757166
                }
            ]
        },
        "step_detect.Simple.time_solve_potts_approx": {
            "result": [
                0.011478066444396973
            ],
            "samples": [
                [
                    0.011333942413330078,
                    0.011632919311523438,
                    0.011541128158569336,
                    0.011580944061279297,
                    0.011635065078735352,
                    0.01182699203491211,
                    0.011229991912841797,
                    0.011066913604736328,
                    0.011046171188354492,
                    0.01141500473022461
                ]
            ],
            "stats": [
                {
                    "ci_99": [
                        0.011046171188354492,
                        0.01182699203491211
                    ],
                    "max": 0.01182699203491211,
                    "mean": 0.011430907249450683,
                    "min": 0.011046171188354492,
                    "number": 1,
                    "q_25": 0.011255979537963867,
                    "q_75": 0.011619925498962402,
                    "repeat": 10,
                    "std": 0.0002458795224520173
                }
            ]
        }
    },

In both cases, we see that the standard deviations have decreased - in the latter case, they're reduced by 25x. This effect would be even more pronounced if the mean didn't happen to be close to the process_time() quantum.

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to test it yet, but LGTM aside from two things I don't understand.

asv/benchmark.py Outdated Show resolved Hide resolved
@@ -17,4 +17,4 @@ def time_detect_regressions(self):
step_detect.detect_regressions(self.y)

def time_solve_potts_approx(self):
step_detect.solve_potts_approx(self.y, 0.3, p=1)
step_detect.solve_potts_approx(self.y, [0.3] * len(self.y), gamma=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how that change relates to the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually #779, which came about from my wanting to use asv's own suite to demonstrate this change - but the one benchmark in the vicinity of 15.6ms was broken due to a typo. It's necessary to reproduce the results above, but I will remove prior to merge.

CHANGES.rst Outdated Show resolved Hide resolved
docs/source/benchmarks.rst Outdated Show resolved Hide resolved
@pv
Copy link
Collaborator

pv commented Dec 17, 2018

Looks ok, modulo comments above.
Not sure how I'd write tests for this.
The CI failures seem to be due to some changes in Conda, not going to look fixing that in immediate future so they should be ignored.

@qwhelan
Copy link
Contributor Author

qwhelan commented Dec 17, 2018

@pv All changes implemented.

As far as testing, a simple regression test could look at the samples output from a test like:

def test_pass():
    pass

For the baseline, we get:

            "samples": [
                [
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    0.0,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    1.939644470927057e-07,
                    0.0,
                    1.939644470927057e-07,
                    1.939644470927057e-07
                ]
            ],

And for this PR:

            "samples": [
                [
                    1.2104441769390362e-07,
                    1.1922779948882427e-07,
                    1.2616902625495711e-07,
                    1.2339782410223055e-07,
                    1.2387379393762252e-07,
                    1.2318628195316746e-07,
                    1.2182183509171048e-07,
                    1.2359878914384048e-07,
                    1.2223434228238353e-07,
                    1.2985250392551827e-07
                ]
            ],

Simply looking for zeros (or values < clock resolution) seems like it would be sufficient (albeit stochastic, and much less reliable on slower machines like cloud VMs)

@pv pv merged commit 58f5396 into airspeed-velocity:master Dec 20, 2018
@pv pv added this to the 0.4 milestone Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default sample_time < clock resolution on Windows, leading to very noisy results
3 participants