Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache for area-weighted regridder #2472

Closed
11 of 12 tasks
corinnebosley opened this issue Mar 31, 2017 · 21 comments
Closed
11 of 12 tasks

Cache for area-weighted regridder #2472

corinnebosley opened this issue Mar 31, 2017 · 21 comments

Comments

@corinnebosley
Copy link
Member

corinnebosley commented Mar 31, 2017

This has been a major issue for several people, here are a few links to provide some context:
#2370
https://exxconfigmgmt:6391/browse/EPM-1542


@abooton - Updating the description 03/12/2019.
As a user of the area-weighted regridder I would like to cache the area-weights (as well as snapshot the grid info) so that I can reduce the time taken to regrid multiple cubes.

Description:
As described in the iris documentation:
https://scitools.org.uk/iris/docs/latest/userguide/interpolation_and_regridding.html?highlight=regridding#caching-a-regridder
"If you need to regrid multiple cubes with a common source grid onto a common target grid you can ‘cache’ a regridder to be used for each of these regrids. This can shorten the execution time of your code as the most computationally intensive part of a regrid is setting up the regridder."

Unfortunately the weights are not currently cached, so the benefit described is not realised when carrying out area-weighted regridding.
It is noted that although, at present, the majority of time is currently spent calculating the weighted-mean, computing the weights can be significant e.g ~25% for non-masked arrays (not masked for which stats are reported on below).

See #2370 for a good example of setting the regridder up.

Acceptance Criteria:

  • Main API: regrid_area_weighted_rectilinear_src_and_grid API should be maintained as it is
  • Optimisation: Regridding one cube should not take longer than it does at present. (According to the ASV benchmarking)
  • The area-weights should only be computed once when using the regridder method iris.analysis.AreaWeighted().regridder(cube1, cube2) (see example in Area weighted regridder caching #2370)
  • If using masked data, but the mask is all false, treat as if it is non-masked during the meaning calculation.
  • The regridder should retain grid checking for compatibility with the src_cube
  • The current infrastructure should be followed (i.e. __prepare and __perform)
  • The _regrid_area_weighted code should remain in 'experimental/regrid.py' module (moving it into the main code is out of scope)
  • Weights should be calculated akin to now (i.e. reading from a file is out-of-scope)
  • Weights should be included as part of the regridder cache (i.e. reading from a file, or making the weights available for further use is out-of-scope)
  • If weights computation is refactored, ASV testing is required
  • If numba is used for code speed-up, it should be implemented on an "if available - use it" basis
  • Any other performance results should preferably reference the examples in this or the associated tickets (see above).

Note:
The weights are currently computed alongside the weighted mean calculation, in the loop. If the weights calculation is refactored, the grid points will be looped over twice instead of once. Therefore, it is suggested that the work is developed in a feature branch, and implemented once code refactoring and optimisation are both complete.

@pp-mo pp-mo added this to the v2.0 milestone Apr 3, 2017
@cpelley
Copy link

cpelley commented Apr 4, 2017

The same is true for all of the regridders currently in iris I believe (Linear/Nearest/AreaWeighted). No caching of weights and indices takes place.

@marqh
Copy link
Member

marqh commented Apr 7, 2017

i would like to put this together into a use case for some planned enhancement work we have for later this year

further comments and thoughts on this, to keep it in my notifications, are much appreciated

@tv3141
Copy link
Contributor

tv3141 commented Apr 13, 2017

Use case: The AreaWeighted is currently the only conservative scheme.

AreaWeighted is ~1000 time slower than Linear or Nearest.

Is it necessary to have two ways of doing regridding?

  • pass a regridding scheme to the regrid method of a cube
  • create a Regridder object

If the weight creation is the most expensive step, a regridding function could return the weights explicitly, and feed them into the regridding function the next time. This would not add more complexity to the user than using Regridder objects, but would potentially be much simpler.

Testing the current caching method seems difficult. (Is it caching? Is it using the cache? it there a benefit using the cache?)

@cpelley
Copy link

cpelley commented Apr 18, 2017

@tv3141 the current UI allows for a generic interface to regridding/interpolation and for the implementation details be hidden from the user. This framework is what allows caching to be developed without changing the UI for users.
I think the issue you raise would be better approached by extending the docstrings which should highlight current capabilities and limitations. help(iris.analysis.Linear) for example is rather lacking in necessary details.

@tv3141
Copy link
Contributor

tv3141 commented May 3, 2017

The bottleneck is averaging the source grid points using np.ma.average. The weight calculation takes <10%.

The area-weighted regridding is done in a large loop over the target grid points. The weights are calculated inside the loop for each target grid point. The regridding is done in one step for all dimensions (time, levels).

https://github.com/SciTools/iris/blob/v1.12.x/lib/iris/experimental/regrid.py#L503

def _regrid_area_weighted_array(src_data, x_dim, y_dim,
                                src_x_bounds, src_y_bounds,
                                grid_x_bounds, grid_y_bounds,
                                grid_x_decreasing, grid_y_decreasing,
                                area_func, circular=False, mdtol=0):
    
    [...]

    # Simple for loop approach.
    for j, (y_0, y_1) in enumerate(grid_y_bounds):
        
        [...]
        
        for i, (x_0, x_1) in enumerate(grid_x_bounds):
            
            [...]
            
                data = src_data[tuple(indices)]

                # Calculate weights based on areas of cropped bounds.
                weights = area_func(y_bounds, x_bounds)
    
                [...]

                # Calculate weighted mean taking into account missing data.
                new_data_pt = _weighted_mean_with_mdtol(data, 
                    weights=weights, axis=axis, mdtol=mdtol)

                # Insert data (and mask) values into new array.
                
                [...]
                
                new_data[tuple(indices)] = new_data_pt

    [...]

    return new_data


def _weighted_mean_with_mdtol(data, weights, axis=None, mdtol=0):
    """
    ...
    """
    res = ma.average(data, weights=weights, axis=axis)

    [...]
    
    return res

Profiling

  • around 3/4 of the time is spent in np.ma.average
  • weight calculation takes currently less than 10% of the time
  • linear complexity
  • largest factor: number of target grid points
  • run time is almost independent of source grid
  • increasing the amount of data (more time steps or levels) increases the run with a small linear factor
import cProfile
import pstats
import iris

areaweighted_regridder = iris.analysis.AreaWeighted().regridder(create_cube(*resolution['N768']), \
                                                                create_cube(*resolution['N96'])) 
cube = create_cube(*resolution['N768'], levels=3)
cProfile.run('areaweighted_regridder(cube)', 'aw_regrid.prof')
stats = pstats.Stats('aw_regrid.prof')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats()
Tue May  2 17:05:24 2017    aw_regrid.prof

         9573273 function calls (9573143 primitive calls) in 12.114 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   12.114   12.114 <string>:1(<module>)
        1    0.000    0.000   12.114   12.114 _area_weighted.py:89(__call__)
        1    0.000    0.000   12.114   12.114 regrid.py:627(regrid_area_weighted_rectilinear_src_and_grid)
        1    0.487    0.487   12.110   12.110 regrid.py:405(_regrid_area_weighted_array)
    27648    0.070    0.000    9.490    0.000 regrid.py:353(_weighted_mean_with_mdtol)
--> 27648    0.270    0.000    9.400    0.000 extras.py:464(average)
   138241    1.139    0.000    3.060    0.000 core.py:2865(__array_finalize__)
    27648    0.066    0.000    2.861    0.000 core.py:3908(__truediv__)
    55296    0.230    0.000    2.842    0.000 core.py:1002(reduce)
   248833    1.224    0.000    2.785    0.000 core.py:2839(_update_from)
    27648    0.774    0.000    2.769    0.000 core.py:1109(__call__)
    ...
    27648    0.041    0.000    0.801    0.000 regrid.py:324(_spherical_area)  # weights

np.ma.average 10x slower than np.average

import numpy as np
print np.__version__

arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)

print 'np.ma.average axis'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.ma.average(arr, weights=w, axis=(1,2))
print

print 'np.average axis'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.average(arr, weights=w, axis=(1,2))
1.12.1
np.ma.average axis
[[0.5558622851924222 0.5015630215812361 0.512327031679091]]
1000 loops, best of 3: 240 µs per loop

np.average axis
[[0.5558622851924222 0.5015630215812361 0.512327031679091]]
10000 loops, best of 3: 27.4 µs per loop

Even without mask np.ma.average is ~10x slower than the normal array version np.average. I do not understand why, identical lines of code have large differences in execution time.

import numpy as np
print np.__version__  # 1.12.1
from line_profiler import LineProfiler

arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)

lp = LineProfiler()
lp_wrapper = lp(np.ma.average)
lp_wrapper(arr, weights=w)
lp.print_stats()

lp = LineProfiler()
lp_wrapper = lp(np.average)
lp_wrapper(arr, weights=w)
lp.print_stats()
Timer unit: 1e-06 s

Total time: 0.000392 s
File: ~/localdata/python_test/miniconda/envs/iris_new_np/lib/python2.7/site-packages/numpy/ma/extras.py
Function: average at line 521

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   521                                           def average(a, axis=None, weights=None, returned=False):

   572         1          105    105.0     26.8      a = asarray(a)
   573         1            4      4.0      1.0      m = getmask(a)
   574                                           
   575                                               # inspired by 'average' in numpy/lib/function_base.py
   576                                           
   577         1            1      1.0      0.3      if weights is None:
   578                                                   avg = a.mean(axis)
   579                                                   scl = avg.dtype.type(a.count(axis))
   580                                               else:
   581         1            6      6.0      1.5          wgt = np.asanyarray(weights)
   582                                           
   583         1            3      3.0      0.8          if issubclass(a.dtype.type, (np.integer, np.bool_)):
   584                                                       result_dtype = np.result_type(a.dtype, wgt.dtype, 'f8')
   585                                                   else:
   586         1            3      3.0      0.8              result_dtype = np.result_type(a.dtype, wgt.dtype)
   587                                           
   588                                                   # Sanity checks
   589         1            2      2.0      0.5          if a.shape != wgt.shape:
                                         
   605         1            1      1.0      0.3          if m is not nomask:
   606                                                       wgt = wgt*(~a.mask)
   607                                           
   608         1           35     35.0      8.9          scl = wgt.sum(axis=axis, dtype=result_dtype)
   609         1          230    230.0     58.7          avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
   610                                           
   611         1            1      1.0      0.3      if returned:
   612                                                   if scl.shape != avg.shape:
   613                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   614                                                   return avg, scl
   615                                               else:
   616         1            1      1.0      0.3          return avg


Timer unit: 1e-06 s

Total time: 9.7e-05 s
File: ~/localdata/python_test/miniconda/envs/iris_new_np/lib/python2.7/site-packages/numpy/lib/function_base.py
Function: average at line 1021

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1021                                           def average(a, axis=None, weights=None, returned=False):

  1095                                               # 3/19/2016 1.12.0:
  1096                                               # replace the next few lines with "a = np.asanyarray(a)"
  1097         1            3      3.0      3.1      if (type(a) not in (np.ndarray, np.matrix) and
                                    
  1106         1            3      3.0      3.1      if not isinstance(a, np.matrix):
  1107         1            7      7.0      7.2          a = np.asarray(a)
  1108                                           
  1109         1            1      1.0      1.0      if weights is None:
  1110                                                   avg = a.mean(axis)
  1111                                                   scl = avg.dtype.type(a.size/avg.size)
  1112                                               else:
  1113         1            4      4.0      4.1          wgt = np.asanyarray(weights)
  1114                                           
  1115         1            4      4.0      4.1          if issubclass(a.dtype.type, (np.integer, np.bool_)):
  1116                                                       result_dtype = np.result_type(a.dtype, wgt.dtype, 'f8')
  1117                                                   else:
  1118         1            3      3.0      3.1              result_dtype = np.result_type(a.dtype, wgt.dtype)
  1119                                           
  1120                                                   # Sanity checks
  1121         1            2      2.0      2.1          if a.shape != wgt.shape:
                                         
  1137         1           32     32.0     33.0          scl = wgt.sum(axis=axis, dtype=result_dtype)
  1138         1           16     16.0     16.5          if (scl == 0.0).any():
  1139                                                       raise ZeroDivisionError(
  1140                                                           "Weights sum to zero, can't be normalized")
  1141                                           
  1142         1           20     20.0     20.6          avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
  1143                                           
  1144         1            1      1.0      1.0      if returned:
  1145                                                   if scl.shape != avg.shape:
  1146                                                       scl = np.broadcast_to(scl, avg.shape).copy()
  1147                                                   return avg, scl
  1148                                               else:
  1149         1            1      1.0      1.0          return avg

@tv3141
Copy link
Contributor

tv3141 commented May 3, 2017

To learn more about np.ma.average, I copied its code in a Jupyer notebook, and started removing non-essential code. I noticed that running a copy of the function np.ma.average in the current namespace speeds it up more than 10x. It also produces single precision results instead of double precision.

Any ideas what makes np.ma.average slow, and what causes the speed-up?
What determines the precision of the calculation?

import inspect
import numpy as np
print np.__version__  # 1.12.1

# imports for copied code
from numpy import asarray
from numpy.ma import getmask, nomask

def ma_average(a, axis=None, weights=None, returned=False):
    # copied code from
    # print inspect.getsourcefile(np.ma.average)
arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)

print 'np.ma.average'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.ma.average(arr, weights=w, axis=(1,2))
print

print 'ma_average'
print ma_average(arr, weights=w, axis=(1,2))
%timeit a = ma_average(arr, weights=w, axis=(1,2))
print

print 'np.average'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.average(arr, weights=w, axis=(1,2))
print
np.ma.average
[[0.5798026614723218 0.45410607639507333 0.4378237178069648]]
1000 loops, best of 3: 238 µs per loop

ma_average
[[ 0.57980266  0.45410608  0.43782372]]
10000 loops, best of 3: 21.9 µs per loop

np.average
[[0.5798026614723218 0.45410607639507333 0.4378237178069648]]
10000 loops, best of 3: 27.4 µs per loop

@tv3141
Copy link
Contributor

tv3141 commented May 9, 2017

np.ma.average calls a = asarray(a) to convert a into a masked array. When I copied the code I imported the wrong asarray function:

from numpy import asarray 

when it should have been

from numpy.ma import asarray

This also makes the 'local' ma.average slow again, so things make sense again.

np.ma.average is much slower than np.average when applied to small unmasked arrays, because of the expensive conversion to a masked array np.ma.asarray(a), and np.multiply(arr) being much slower when called with a masked array.

@cpelley
Copy link

cpelley commented Dec 4, 2017

I realised I perhaps neglected to mention that I would also really like to see caching implemented in iris :)
Really interesting observations on masked arrays.
Feels like quite a significant refactor to make this happen.
To make weights really useful, these should also be accessible to the end user (example: users might use them directly to sparse arrays) and secondly, perhaps having these cached to disk or something. A lot to think about I think...

@abooton
Copy link
Contributor

abooton commented Dec 9, 2019

  1. Re-order to be [..., y_dim, _dim].
    See PR PI-2472: Tweak area weighting regrid move xdim and ydim axes #3594 to feature branch (was originally PR PI-2472: _regrid_area_weighted_array: Set order y_dim, x_dim axis to be last dimensions #3587)
  2. Ensure x_dim and y_dim.
    See PR PI-2472: Tweak area weighting regrid enforce xdim ydim #3595 to feature branch. (was originally PR PI-2472: _regrid_area_weighted_array:: enforce xdim ydim #3588)
  3. Optimise averaging: Move averaging out of loop and refactor weights.
    See PR PI-2472: Tweak area weighting regrid move averaging out of loop #3596 to feature branch (was originally PR PI-2472: _regrid_area_weighted_array: Move averaging out of loop #3589)

This section has been reviewed in the "area_weighted_regridding" feature branch.
It is self-contained and gives the same results as current master: See PR #3598

@abooton
Copy link
Contributor

abooton commented Dec 10, 2019

Update from refinement meeting:

The current status of the work was discussed. We think completing it is achievable with approx:
• ½ day review work associated with re-jigging the current regrid_area_weighted_rectilinear_src_and_grid function
• 4 days refactoring the tests (1 day for weights function, 1 day for __prepare, __perform, 1 day regridder, 1 day review comments)
• 1 day review work for the above
• Assistance from the sprint team (e.g. setting up an Iris build, updating test data)
• (This is additional to the ~week of scoping/dev work already undertaken by AVD)

This nicely aligns with the 5 days dev work that Emma/Andy are providing.

@abooton
Copy link
Contributor

abooton commented Dec 20, 2019

Optimisation AC - time to process a single cube is quicker than previously see #2370

@abooton
Copy link
Contributor

abooton commented Dec 20, 2019

The feature branch, upstream/area_weighted_regridding can be merged.
There are four outstanding items:

  1. Some extra comments describing the regrid_info weight variables would be useful
  2. The "jit" could be added in the future for further speed up
  3. The ASV test needs further investigation as this should be demonstrated a speed up. (Our other testing shows the optimization delivers speed-up)
  4. The dtype of the data returned remains unchanged. However, we think the current regridding should return the original data dtype, but it isn't necessarily doing this at present.

@cpelley
Copy link

cpelley commented Dec 24, 2019

Hi @abooton apologies if this isn't useful or I'm too late, but for what it's worth:

  • Awesome stuff! The issue of offline weights that you rightly keep off the table for this ticket might be worth talking about when you are ready - we have done this for our own ESMF regridder scheme.
  • I wonder what the implications are for memory usage here from these changes. Do you have any thoughts? - I think my gut would be that it would be that it should still be reasonable given the alternative approaches available which certainly give iris a lot more wiggle in this regard :)

...However, we think the current regridding should return the original data dtype...

I'm not 100% sure I have understood the context here but I would have thought it reasonable/flexible that area weighted regridding return data which relates to the source dtype by the following relation:

tgt_dtype = np.promote_types(src_dtype, 'float16')

Hope this helps in some way.

Merry Christmas!

@abooton
Copy link
Contributor

abooton commented Dec 29, 2019

The following quick check indicates we expect the ASV tests should have seen a noticeable improvement:

# commit  738093f1:  time: areawieghted_regridder 13.124 s  pre-optimisation
# commit  0d1ddb40:  time: areawieghted_regridder 1.575 s   post-optimisation

from iris import tests
import iris
from time import time

def regridding_test():
    file_path = tests.get_data_path(
                ["NetCDF", "global", "xyt", "SMALL_hires_wind_u_for_ipcc4.nc"]
            )
    cube = iris.load_cube(file_path)
    scheme_area_w = iris.analysis.AreaWeighted()
    ts = time()
    cube.regrid(cube, scheme_area_w)
    tf = time()
    print('areawieghted_regridder',round(tf - ts, 3), 's')

regridding_test()

@abooton
Copy link
Contributor

abooton commented Jan 3, 2020

Hi @cpelley,

Thanks for your thoughts.

  • Regarding the memory use:

I wonder what the implications are for memory usage here from these changes. Do you have any thoughts? - I think my gut would be that it would be that it should still be reasonable given the alternative approaches available which certainly give iris a lot more wiggle in this regard :)

We'd not specifically looked at this as the user requirement was for reducing computation time. The regridder object now retains the weights (which may be similar in size to the 2D data array, or might be bigger as they are float64). I would think the memory requried to average the data would be more, particularly for 2D diagnostics, because 1) the weights are pushed into a additional array and 2) the src data is initially stored in an array larger than that of the target grid prior to computing the mean. (Previously the weights and mean were computed independently by looping through every point on the target grid, so I think the memory would have been released between each calculation). It should be noted that the total memory useage of the new code could easily be optimised in a future PR, by initally storing the weights in the numpy array structure required by the regridding "perform" code. This further refactoring hasn't been done yet, as we didn't wish to introduce a bug into the weights calculations.

  • Regarding the data type:

I'm not 100% sure I have understood the context here but I would have thought it reasonable/flexible that area weighted regridding return data which relates to the source dtype by the following relation:
tgt_dtype = np.promote_types(src_dtype, 'float16')

To clarify, the changes here retain the current dtype behaviour. However I was concerned that integer source data can be returned as type float.

src_dtype = np.int32
tgt_dtype = np.promote_types(src_dtype, 'float16')
print(tgt_dtype)
>> float64

@cpelley
Copy link

cpelley commented Jan 3, 2020

...could easily be optimised in a future PR, by initally storing the weights in the numpy array structure required by the regridding "perform" code...

Awesome! Yes this is how we made ESMF regridding approachable (the memory overhead there really is too large). Do feel free to give me a shout for a chat if/when you think it might be useful to do so.

To clarify, the changes here retain the current dtype behaviour...

I don't think AreaWeighted/bilinear regridding should ever return integer types. Performing area conservative regridding implies to me floats in the return since area calculations should be float. The result needs to be always both locally and globally conservative.

@ehogan
Copy link
Contributor

ehogan commented Jan 6, 2020

I wonder what the implications are for memory usage here from these changes. Do you have any thoughts? - I think my gut would be that it would be that it should still be reasonable given the alternative approaches available which certainly give iris a lot more wiggle in this regard :)

@cpelley, using memory_profiler I generated the following plots by loading a cube (where cube.data.shape = (31, 160, 320)), then performing regridding 10 times:

using 738093f, profiling regrid_area_weighted_rectilinear_src_and_grid:
738093f

using the feature branch, profiling _regrid_area_weighted_rectilinear_src_and_grid__prepare and _regrid_area_weighted_rectilinear_src_and_grid__perform:
updated

Edit: There is an ~15% increase in the average memory usage, but an ~90% decrease in the total time taken to perform the regridding.

@cpelley
Copy link

cpelley commented Jan 23, 2020

Generating data using float16 vs float64 reveals huge differences. That is, in hindsight, I don't think data recorded as integers should be treated as anything but the highest precision (float64). Integer data doesn't mean precision only to an integer. Example here is the land cover type fraction ancillary generation (int8->float64).
I think I would make sense to always promote to float64 on this basis as not doing so assumes what the users data represents. Sorry for going back and fourth on this.

@pp-mo
Copy link
Member

pp-mo commented Feb 25, 2020

I think this is probably addressed by #3617
It's complicated, so need to spend some effort to confirm this is the case ?

@trexfeathers
Copy link
Contributor

The following quick check indicates we expect the ASV tests should have seen a noticeable improvement:
commit 738093f: time: areawieghted_regridder 13.124 s pre-optimisation
commit 0d1ddb4: time: areawieghted_regridder 1.575 s post-optimisation

This has been addressed by #3660 - asv now demonstrates a change in performance at the appropriate commit.

@abooton
Copy link
Contributor

abooton commented Sep 24, 2020

closed by #3623

Outstanding (numba) acceptance criteria is "nice to have".

@abooton abooton closed this as completed Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants