New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

average() indentation error #1048

Closed
painter1 opened this Issue Feb 18, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@painter1
Contributor

painter1 commented Feb 18, 2015

In cdutil/times.py, the function average(), the block beginning "if len(sub)>1:" is indented too far. The effect is to repeat a calculation more than needed. This has major performance implications.

@painter1 painter1 self-assigned this Feb 18, 2015

@painter1 painter1 added Bug High labels Feb 18, 2015

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Feb 18, 2015

@painter1 can you push a PR for this since you already have it on your computer? Thanks!

@painter1

This comment has been minimized.

Contributor

painter1 commented Feb 18, 2015

Yes, I'll do a PR but want to test some more first.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Feb 18, 2015

That's awesome, even better would be to include the test in the test suite. If you don't remember how do it, just put the tests in testing/genutil and I will integrate them.

@painter1

This comment has been minimized.

Contributor

painter1 commented Feb 18, 2015

What I did was to run some diagnostics which print out scalars. I happen to have some old values in my notes so I could eyeball them and see that they aren't changed. It will be another job to turn this into an automated test. Note that a few gigabytes of data are required. Here's what I ran:
diags --path ~/metrics_data/cam35_data/ --path2 /Users/painter1/metrics_data/obs_data_5.6/ --package AMWG --set 1 --seasons ANN

@durack1

This comment has been minimized.

Member

durack1 commented Feb 18, 2015

@painter1 out of curiosity what's the speed up by removing this unintentional loop?

painter1 added a commit that referenced this issue Feb 18, 2015

Reduce indentation level of the block beginning "if len(sub)>1:" in a…
…verager().

This will dramatically improve performance because some major calculations are done only once now.
This fixes issue #1048.
@painter1

This comment has been minimized.

Contributor

painter1 commented Feb 18, 2015

I'm only profiling on Rhea (at Oak Ridge) which is abysmally slow for reasons which I haven't identified, but must be about the OS and file system. I'm not sure that you will notice the difference on any other machine, maybe you will. Anyway, on my test which is probably typical, the core average calculation was repeated 12 times before and is being done once now. On Rhea, the overall speedup is 3 times on the largish test data I've been using.

@painter1

This comment has been minimized.

Contributor

painter1 commented Feb 18, 2015

The test I've been profiling is climatology.py in the UV-CDAT diagnostics.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Feb 19, 2015

fixed by #1050

@doutriaux1 doutriaux1 closed this Feb 19, 2015

@doutriaux1 doutriaux1 added this to the 2.2 milestone Feb 19, 2015

@durack1 durack1 added the cdutil label May 11, 2015

@doutriaux1 doutriaux1 added High Priority and removed High labels Jul 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment