Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between asv dev and asv run (due to OpenMP) #671

Closed
prisae opened this issue Jul 12, 2018 · 22 comments
Closed

Difference between asv dev and asv run (due to OpenMP) #671

prisae opened this issue Jul 12, 2018 · 22 comments
Labels
question Triaged as a question, not requiring action after answering

Comments

@prisae
Copy link
Contributor

prisae commented Jul 12, 2018

I run into a weird issue which I cannot track down (using asv v0.2.1), so I thought I'd ask here if it is a known or common problem.

If I run a simple test using asv dev with a parameter that can take three different values, I get the following output:

asv/> asv dev -b dipole_time
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_dtr_anaconda3_bin_python
[  0.00%] ·· Benchmarking existing-py_home_dtr_anaconda3_bin_python
[100.00%] ··· Running model.Dipole.time_dipole_time                                    ok
[100.00%] ···· 
               ====== ==========
                loop            
               ------ ----------
                None   139.11ms 
                freq   322.62ms 
                off    139.09ms 
               ====== ==========

These times agree with the times I get if I run the tests simply with %timeit in an IPython console.

Now if I run asv run instead of asv dev,

asv/> asv run -b time_dipole_time
· Fetching recent changes
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6
·· Building for conda-py3.6
·· Installing into conda-py3.6.
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] · For empymod commit hash b804477a:
[  0.00%] ·· Building for conda-py3.6.
[  0.00%] ·· Benchmarking conda-py3.6
[100.00%] ··· Running model.Dipole.time_dipole_time                          655.54ms;...

the result stored in b804477a-conda-py3.6.json is:

[snip]
    "results": {
        "model.Dipole.time_dipole_time": {
            "params": [
                [
                    "None",
                    "'freq'",
                    "'off'"
                ]
            ],
            "result": [
                0.6555442148,
                0.3023681289999999,
                0.6842122744999999
            ]
        }
[snip]

The times vastly differ between the asv dev and the asv run results. Any idea why that could be?

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

I just saw that v0.2.2rc1 was released, and I tested the above issue with this version. However, it results in the same.

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

Thanks @pv. Running asv run --quick indeed gives the same result as asv dev. I learned something new here. Would you say then the default %timeit is heavily flawed? I kind of always relied on the default %timeitto compare runtimes of different things. Seems that is not such a good idea.

Not running on a laptop, but on an Intel NUC, which might behave similar indeed.

In addition, maybe you get a different liner algebra library (e.g. MKL
vs openblas) in the new environment created by asv run.

I hope not. I think in both cases it uses conda MKL.

@prisae prisae closed this as completed Jul 12, 2018
@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

Nope. This is the default output:

In [22]: for loop in [None, 'freq', 'off']:
    ...:     %timeit dipole(loop=loop, **model)
    ...:     
141 ms ± 5.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
355 ms ± 28.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
137 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So the times agrees with asv dev or asv run --quick. The numbers of %timeit don't change even if I fix -r 1 -n 1, -r 7 -n 10, or -r 10 -n 10; they are all similar.

There seems to be another issue: If I run asv run, the CPU usage (CPU @ 1.60GHz × 4) goes to 100%. However, when I run asv dev, asv --quick, or %timeit, it tops at 50% max. Can it be that asv run handles threads differently?

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

Is it possible to limit the number of threads for asv?

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

I'll give it a shot with v0.3.dev and report back.

@pv
Copy link
Collaborator

pv commented Jul 12, 2018 via email

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

If I set number = 1 and repeat = 5 I get the same output with asv run as with asv run --quick, asv dev, and %timeit.

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

...and the CPU usage does not explode...

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

With v0.3.dev (commit 747d928) it looks different.

  • asv dev yields slower times than in v0.2.2rc1.
  • asv run and asv run --quick are now the same, yielding similar results to asv run in v0.2.2rc1
  • The trick with number = 1 and repeat = 5 does not seem to have an effect.

asv dev (v0.3.dev):

asv/> asv dev
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_dtr_anaconda3_bin_python
[  0.00%] ·· Benchmarking existing-py_home_dtr_anaconda3_bin_python
[100.00%] ··· Running model.Dipole.time_dipole_time                           ok
[100.00%] ···· 
               ====== =======
                loop         
               ------ -------
                None   278ms 
                freq   321ms 
                off    341ms 
               ====== =======

asv run (v0.3.dev) :

        "model.Dipole.time_dipole_time": {
            "params": [
                [
                    "None",
                    "'freq'",
                    "'off'"
                ]
            ],
            "result": [
                0.6451392059999999,
                0.8246050810000001,
                0.6298039939999999
            ],
            "stats": [
                {
                    "ci_99": [
                        0.6274002239703472,
                        0.6647882010296525
                    ],
                    "max": 0.6549872219999999,
                    "mean": 0.6460942124999999,
                    "min": 0.6391112159999999,
                    "n": 4,
                    "q_25": 0.6402948367499998,
                    "q_75": 0.6509385817500001,
                    "std": 0.0065052649672145
                },
                {
                    "ci_99": [
                        0.5469139810000008,
                        1.1022961809999992
                    ],
                    "max": 0.8301589030000001,
                    "mean": 0.8246050810000001,
                    "min": 0.8190512590000001,
                    "n": 2,
                    "q_25": 0.8218281700000001,
                    "q_75": 0.8273819920000001,
                    "std": 0.005553821999999986
                },
                {
                    "ci_99": [
                        0.5361086224943821,
                        0.7356948780056181
                    ],
                    "max": 0.689744374,
                    "mean": 0.63590175025,
                    "min": 0.5942546389999999,
                    "n": 4,
                    "q_25": 0.6095456847499998,
                    "q_75": 0.6561600594999999,
                    "std": 0.03594302763343558
                }
            ]
        }
    },

Here again %timeit for comparison

In [22]: for loop in [None, 'freq', 'off']:
    ...:     %timeit dipole(loop=loop, **model)
    ...:     
141 ms ± 5.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
355 ms ± 28.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
137 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@pv
Copy link
Collaborator

pv commented Jul 12, 2018

Finally, to check again for the environment effects, you can do asv run -e --python=same (without specifying commit) instead of asv run. Note that asv dev == asv run --quick -e --python=same. The -e makes it print the result tables (so you don't need to paste the json).

If you don't get the strange bigger timings with asv 0.2.x for any combination of number and repeat (e.g. larger number, smaller repeat), then I'm running out of guesses.

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

If I put number and repeat higher, then the times do go up to similar levels (say number = 10, repeat = 5).

I run the whole benchmark of my project now (https://github.com/empymod/asv), paying close attention to the CPU. Almost the entire benchmark runs on 1 thread (25% of CPU usage), as you say, asv does not know about threads. But there are 4 benchmarks where the CPU usage immediately shoots up to 100% (4 threads). So I wonder if in some cases something goes wrong and it tries to run distribute the tests, which then makes it slower?

@pv
Copy link
Collaborator

pv commented Jul 12, 2018

Intel MKL is multithreaded, so that's one possibility, and perhaps it tries to enable them in a "smart" way that's problematic here, so you can try with MKL_NUM_THREADS=1 or setting some of the other MKL environment variables.

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

If I set asv run -e --python=same I get yet another result. Slower than asv dev, but faster than asv run. It uses 50% of the CPU, versus 25% with asv dev and 100% with asv run.

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

There we go. Setting in bash export OMP_NUM_THREADS=1 before running asv run did the trick. Now everything is run in 1 thread, and the times of asv run agree with %timeit and asv dev.

Non sure if this is something that should be mentioned in the docs or set by asv itself to ensure consistency. Thanks @pv for resolving this!

@pv pv changed the title Difference between asv dev and asv run Difference between asv dev and asv run (due to OpenMP) Jul 12, 2018
@pv pv added the question Triaged as a question, not requiring action after answering label Jul 12, 2018
@pv
Copy link
Collaborator

pv commented Jul 12, 2018

There could be a FAQ on setting up the benchmarking environment, as it does have this sort of caveats.

@prisae
Copy link
Contributor Author

prisae commented Jul 12, 2018

That could help. I guess the question-tag is also OK. I did check the issues before posting, so in that case I would have found it. Thanks again for letting me pick your brain, and for maintaining asv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Triaged as a question, not requiring action after answering
Projects
None yet
Development

No branches or pull requests

2 participants