Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight problems in cdutil.ANNUALCYCLE.climatology (daily data) #4

Closed
chaosphere2112 opened this issue Nov 23, 2016 · 7 comments
Closed

Weight problems in cdutil.ANNUALCYCLE.climatology (daily data) #4

chaosphere2112 opened this issue Nov 23, 2016 · 7 comments
Labels
Milestone

Comments

@chaosphere2112
Copy link

@chaosphere2112 chaosphere2112 commented Nov 23, 2016

I have found out that when using cdutil.ANNUALCYCLE.climatology on daily data, the months of different years seem to be equally weighted.

This results in (slight but visible) errors:

  • for February: eg for years, when averaging the February values for 1959-1960-1961, the weights used seem to be [1, 1, 1], when they should be the actual month length: [28, 29, 28]
  • for months that have missing days: e.g. if the first month of the data file starts on January 2nd, but all the other January month in the time series have 31 days, the weights used seem to be [1, 1, 1, ...] when they should be [30, 31, 31, ..., 31]

Test script: https://files.lsce.ipsl.fr/public.php?service=files&t=115de1c1cb57217b85776fd08fe21b23
Test data: https://files.lsce.ipsl.fr/public.php?service=files&t=c7258c48f12d3451bfa67e5585bbedef

Running the test script above will get the following output, and you can see that the difference between ANNUALCYCLEclimatology and computation by hand for all months, including unweighted February is ~1e6. But there is a bigger difference when I compute a correctly weighted February mean

python -i bug_ANNUALCYCLEclimatology.py
Variable shape = (3653, 9, 11)
Time axes go from 1959-1-1 12:0:0.0 to 1968-12-31 12:0:0.0
Latitude values = [-66.7283256  -67.84978441 -68.97123954 -70.09269035 -71.21413608
 -72.33557575 -73.45700815 -74.57843166 -75.69984422]
Longitude values = [ 120.375  121.5    122.625  123.75   124.875  126.     127.125  128.25
  129.375  130.5    131.625]
Fri Nov  6 17:35:08 2015  - Computing the climatology with cdutil.ANNUALCYCLE.climatology,
                please wait...
Fri Nov  6 17:35:26 2015  - Done!
time.clock diff = 17.36

Fri Nov  6 17:35:26 2015  - Computing the climatology by hand...
year, month, nb of selected days = 1959  2 28
year, month, nb of selected days = 1960  2 29
year, month, nb of selected days = 1961  2 28
year, month, nb of selected days = 1962  2 28
year, month, nb of selected days = 1963  2 28
year, month, nb of selected days = 1964  2 29
year, month, nb of selected days = 1965  2 28
year, month, nb of selected days = 1966  2 28
year, month, nb of selected days = 1967  2 28
year, month, nb of selected days = 1968  2 29
Difference range for month 1 = (-4.903731820604662e-06, 6.318861416332311e-06)
Difference range for February (unweighted) = (-5.2588326724389844e-06, 5.322724128120626e-06)
Difference range for February (WEIGHTED) = (-0.014045400572531008, -0.0006070483053850495)
Difference range for month 3 = (-7.192550192769431e-06, 6.8172331566529465e-06)
Difference range for month 4 = (-8.519490563685395e-06, 6.39597574547679e-06)
Difference range for month 5 = (-7.727838379878449e-06, 7.407895935784836e-06)
Difference range for month 6 = (-8.519490563685395e-06, 6.81559244952723e-06)
Difference range for month 7 = (-7.284841260002395e-06, 9.992045690410123e-06)
Difference range for month 8 = (-8.761498278886393e-06, 7.900114923131696e-06)
Difference range for month 9 = (-8.850097671597723e-06, 7.451375310552066e-06)
Difference range for month 10 = (-7.30945221505408e-06, 6.18965391652182e-06)
Difference range for month 11 = (-6.2370300426550784e-06, 6.243387851156967e-06)
Difference range for month 12 = (-6.288097772255696e-06, 6.534207223296562e-06)
Fri Nov  6 17:35:26 2015  - Finished computing the climatology by hand!
time.clock diff = 0.3
Acceleration = 57.8666666667

Migrated from: CDAT/cdat#1664

@lee1043
Copy link

@lee1043 lee1043 commented Feb 27, 2017

Same issue found when using cdutil.ANNUALCYCLE.departures(). Wondering if there was any progress with this issue...? @chaosphere2112 @doutriaux1

Loading

@durack1
Copy link
Member

@durack1 durack1 commented Feb 27, 2017

@jypeter it seems you were lost in the original issue migration across repos, so pinging you here for your input

Loading

@jypeter
Copy link
Member

@jypeter jypeter commented Feb 28, 2017

I had kind of forgotten about this issue! I don't have anything new to add to this except maybe that the original thread also mentioned that improving the processing speed would be a nice addition

Loading

@golaz
Copy link

@golaz golaz commented Mar 3, 2017

Thanks to @jypeter for reporting this bug. Let's hope it gets fixed soon.

Adding my 2 cents here regarding speed. I came up with an alternate direct computation of the climatologies using only 4 lines of code and got an even bigger speed-up. I'm seeing a 30x speed-up over cdutil with @jypeter's implementation and 1500x speed-up with my alternate implementation.

Here are the key lines:

# Compute climatologies
climato_values2 = np.zeros([12]+list(np.shape(var))[1:])
month = np.array( [ timax_absolute[i].month for i in range(len(timax_absolute)) ], dtype=np.int)
for i in range(1,13):
  climato_values2[i-1] = np.average(var[np.argwhere(month==i)],axis=0)

Attached is the script and below the full output. Just trying to raise the bar for @dnadeau4 and @doutriaux1 😉


Variable shape = (3653, 9, 11)
Time axes go from 1959-1-1 12:0:0.0 to 1968-12-31 12:0:0.0
Latitude values = [-66.7283256  -67.84978441 -68.97123954 -70.09269035 -71.21413608
 -72.33557575 -73.45700815 -74.57843166 -75.69984422]
Longitude values = [ 120.375  121.5    122.625  123.75   124.875  126.     127.125  128.25
  129.375  130.5    131.625]
Thu Mar  2 17:47:15 2017  - Computing the climatology with cdutil.ANNUALCYCLE.climatology,
				please wait...
Thu Mar  2 17:47:20 2017  - Done!
time.clock diff = 4.515801

Thu Mar  2 17:47:20 2017  - Computing the climatology by hand...
year, month, nb of selected days = 1959  2 28
year, month, nb of selected days = 1960  2 29
year, month, nb of selected days = 1961  2 28
year, month, nb of selected days = 1962  2 28
year, month, nb of selected days = 1963  2 28
year, month, nb of selected days = 1964  2 29
year, month, nb of selected days = 1965  2 28
year, month, nb of selected days = 1966  2 28
year, month, nb of selected days = 1967  2 28
year, month, nb of selected days = 1968  2 29
Difference range for month 1 = (-4.903731820604662e-06, 6.318861416332311e-06)
Difference range for February (unweighted) = (-5.2588326724389844e-06, 5.322724128120626e-06)
Difference range for February (WEIGHTED) = (-0.014047166386458088, -0.0006071392919806406)
Difference range for month 3 = (-7.192550192769431e-06, 6.8172331566529465e-06)
Difference range for month 4 = (-8.519490563685395e-06, 6.39597574547679e-06)
Difference range for month 5 = (-7.727838379878449e-06, 7.407895935784836e-06)
Difference range for month 6 = (-8.519490563685395e-06, 6.81559244952723e-06)
Difference range for month 7 = (-7.284841260002395e-06, 9.992045690410123e-06)
Difference range for month 8 = (-8.761498278886393e-06, 7.900114923131696e-06)
Difference range for month 9 = (-8.850097671597723e-06, 7.451375310552066e-06)
Difference range for month 10 = (-7.30945221505408e-06, 6.18965391652182e-06)
Difference range for month 11 = (-6.2370300426550784e-06, 6.243387851156967e-06)
Difference range for month 12 = (-6.288097772255696e-06, 6.534207223296562e-06)
Thu Mar  2 17:47:20 2017  - Finished computing the climatology by hand!
time.clock diff = 0.155475
Acceleration = 29.0451905451
Thu Mar  2 17:47:20 2017  - Computing the climatology by hand, second algorithm
Difference range for month 1 = (-6.318861437648593e-06, 4.903731806393807e-06)
Difference range for month 2 = (-2.3858707294266424e-06, 2.938530052176702e-06)
Difference range for month 3 = (-6.8172331779692286e-06, 7.192550164347722e-06)
Difference range for month 4 = (-6.39597574547679e-06, 8.519490556579967e-06)
Difference range for month 5 = (-7.4078959784174e-06, 7.727838337245885e-06)
Difference range for month 6 = (-6.81559244952723e-06, 8.519490556579967e-06)
Difference range for month 7 = (-9.992045747253542e-06, 7.284841231580685e-06)
Difference range for month 8 = (-7.900114979975115e-06, 8.761498236253829e-06)
Difference range for month 9 = (-7.451375324762921e-06, 8.850097657386868e-06)
Difference range for month 10 = (-6.189653952048957e-06, 7.309452179526943e-06)
Difference range for month 11 = (-6.243387858262395e-06, 6.237030028444224e-06)
Difference range for month 12 = (-6.5342072517182714e-06, 6.288097750939414e-06)
Thu Mar  2 17:47:20 2017  - Finished computing the climatology by hand, second algorithm!
time.clock diff = 0.002744
Acceleration = 1645.70007289

bug_ANNUALCYCLEclimatology2.py

Loading

@doutriaux1
Copy link
Contributor

@doutriaux1 doutriaux1 commented Mar 6, 2017

@golaz this a totally different issues your script will barf on any general dataset. This is the difference between one off user geenrated script that works for his data only and general application software that works on any dataset. cdutil will work on 6hourly data for example. It will work on data starting in February. It will work on data with missing months, etc...

Loading

@golaz
Copy link

@golaz golaz commented Mar 6, 2017

@doutriaux1, sure, everyone understands that there will be a performance penalty for generality, and generality is what makes cdutil in general and ANNUALCYCLE in particular so useful. Still, a factor 1500x in speed would suggest that there is substantial room for performance improvement without loss of generality.

Any progress with the fix?

Loading

@doutriaux1
Copy link
Contributor

@doutriaux1 doutriaux1 commented Mar 6, 2017

@golaz I believe fix is in place. @dnadeau4 ?

Loading

@doutriaux1 doutriaux1 added the bug label May 9, 2017
@doutriaux1 doutriaux1 added this to the 2.10 milestone May 9, 2017
@doutriaux1 doutriaux1 closed this May 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants