Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial averaging over a polygon - roadmap discussion #422

Closed
huard opened this issue Apr 24, 2020 · 21 comments
Closed

Spatial averaging over a polygon - roadmap discussion #422

huard opened this issue Apr 24, 2020 · 21 comments

Comments

@huard
Copy link
Collaborator

huard commented Apr 24, 2020

Description

Propose strategy to compute spatial averages over a polygon. Needs to account for non-uniform grid area and partial overlap, holes in polygons.

Options discussed so far:

  • conservative regriding using ESMPy.
  • rioxarray

Document below experience with both approaches, testing, benchmarks, etc.

@mathause
Copy link

I'm interested in this discussion.

For regular grids I thought of subsampling each grid point 10 x 10 times, apply contains and then coarsen the grid again. This should yield a precision of about 1%.

For irregular grids I don't have an idea...

@huard
Copy link
Collaborator Author

huard commented Apr 27, 2020

Hi @mathause,

Please do chime in.

@huard huard added this to the v0.18 milestone May 12, 2020
@huard huard modified the milestones: v0.18, v0.19 Jun 26, 2020
@Zeitsperre
Copy link
Collaborator

@huard Wouldn't this be better discussed in https://github.com/roocs/clisops?

@aulemahal
Copy link
Collaborator

@Zeitsperre not a bad idea. So let me write here the conclusions of the little review I did:

I looked at many python package providing geospatial tools for xarray datasets and realized that there aren't many. Of all studied packages, rioxarray was found to be the most complete, but even then, it did not provide the features wanted in this issue and was lacking many features useful for xclim-users. As such, clisops seems the best option for going on with xclim-useful geospatial tools. Thus, #508.

As more specifically for spatial averages over a polygon, no other xarray-compatible package offer the feature. There is an implementation in ocgis and it looks like ESMF could do it, pending implementations in ESMFpy and xESMF. However, methods suggest by myself in internal discussions seem easier to implement and would be fit for clisops. I suggest we close this issue and reopen an new one in clisops discussing these.

@tlogan2000
Copy link
Collaborator

tlogan2000 commented Aug 25, 2020

In my mind the the most complete methodology would be to approach this more as a GIS analysis.
Assuming we have (or can derive) the 4 corners of each grid point we can:

  1. Create a polygon layer of the model grid via geopandas or shapely
  2. Intersect the grid polygon layer from step 1 with the subset polygon(s) (see https://geopandas.org/set_operations.html)
  3. Reproject the interesected layer and calculate areas using an appropriate area conserving projection (https://en.wikipedia.org/wiki/Map_projection#Equal-area ... potential here to have as user input)
  4. Use intersected polygons areas to derive weights for weighted spatial averaging

This should work for both irregular and regular grids.. biggest challenge (as often is the case) is likely dealing with irregular grid cells that cross the -180,180 longitude or are near the poles.

This could also potentially be computationally 'long' depending on grid resolution and spatial domain (e.g. 1km over the entire globe?) in which case we might like to have different options

@aulemahal
Copy link
Collaborator

I agree completely. What I tried and presented last time was mainly that without 3. It should be easy to implement, I am just waiting on both an answer on my clisops issue and some "free" time on my side.
Also, as far as my experimentation went, the method you are suggesting was faster than resampling at higher frequency in most cases.

@huard
Copy link
Collaborator Author

huard commented Aug 25, 2020

@aulemahal
Copy link
Collaborator

aulemahal commented Aug 25, 2020

Just read the code quickly. Implementing the ESMF Mesh into xESMF would indeed be quite useful. It looks like a lot of work, though!
Only intuition here, but I believe the shapely/geopandas/clisops solution suggested above would be faster in most cases for spatial averaging. All the ESMF overhead of creating a mesh of 1 element could potentially be heavier than not?

Implementing our method in clisops wouldl be much faster for quantitatively similar results. But working on xESMF has a greater potential and userbase.

@huard
Copy link
Collaborator Author

huard commented Aug 26, 2020

I thought that since we had a reference implementation, the amount of work would not be so large, but that was just a guess, not the result of an actual analysis.
My intuition was the opposite: that a fortran library would run faster.
Let's just not underestimate the time needed to get something operational, with all the details worked out. I agree it might be quicker to get 90% of the job done by starting from scratch, but the remaining 10% can be very time consuming. I'm just wary of building "yet another averager".

@tlogan2000
Copy link
Collaborator

See https://github.com/NCPP/ocgis/blob/61d88c60e9070215f28c1317221c2e074f8fb145/src/ocgis/regrid/base.py#L24 doing pretty much this with xESMF.

Getting a bit confused ... but the example actually using ESMpy directly correct? And we would like to reproduce via xESMF in order to be able to leverage dask?

@huard
Copy link
Collaborator Author

huard commented Aug 26, 2020

Yes, xESMF was not a thing at the time.
I would say we mostly want to leverage the xarray interface (coordinates) to facilitate the process. The dask parallelism for me is a bonus. If the conclusion is that this would be nice but too time consuming, I can work on finding collaborators for this.

@tlogan2000
Copy link
Collaborator

In any case there seems to be a lot of robust code in the example @huard linked to ... I would tend to agree that we should do our best to avoid reproducing something that exists already unless we can make a very clear case for it.

Would there be a way to use the code to provide an output of spatial weights only.... e,g, identify grid ids in the subset (any grid that touches the subset polygon) and associated areas as the output and not necessarily apply the regridding itself? Dask shouldn't be critical in this case. A subsequently call to xarray.weighted could handle the heavy lifting?

@aulemahal
Copy link
Collaborator

aulemahal commented Aug 26, 2020

Maybe my lack of experience with ESMF is making me estimate longer times, but the main problem is that xESMF is lacking all implementation of the Mesh object of ESMpy/ESMF. This is the part I think is not simple, if even possible? Lacking in ESMpy is mesh element vith more than 4 corners, but it seems to be lacking in ESMF also ! https://esmf-org.github.io/801branch_docs/ESMF_refdoc/node5.html#SECTION050102100000000000000
EDIT : I read the thing too fast, N-polygons are supported by dividing them under the hood. But that is lacking in ESMpy, it will need to be implemented.

Speed wise, Fortran is clearly faster for large tasks, but the shapely option should be fast enough too.

But you're right : even if a first draft implementation would be easy, all the testing + covering all cases will take some time.

@tlogan2000
Copy link
Collaborator

In any case there seems to be a lot of robust code in the example @huard linked to ... I would tend to agree that we should do our best to avoid reproducing something that exists already unless we can make a very clear case for it.

Would there be a way to use the code to provide an output of spatial weights only.... e,g, identify grid ids in the subset (any grid that touches the subset polygon) and associated areas as the output and not necessarily apply the regridding itself? Dask shouldn't be critical in this case. A subsequently call to xarray.weighted could handle the heavy lifting?

i.e. in my case ESMpy should be fine

@huard
Copy link
Collaborator Author

huard commented Aug 26, 2020

@aulemahal By Mesh, do you mean the grid with the cell corners ? If yes, there is a PR for this already by yours truly. Still needs to work, but nothing major.
@tlogan2000 Yep, xESMF first computes the weight, then applies them to the fields. The weights are saved as sparse matrices in memory and can be written to disk (pangeo-data/xESMF#3). I like the idea of using weighted, but unclear if it would work out of the box with sparse arrays.

@tlogan2000
Copy link
Collaborator

Answering myself but there seems to be a weights_only flag in the regrid_field method

@aulemahal
Copy link
Collaborator

@huard Cool! I'm guessing you mean "generating the grid corners" in order to use conservative methods?
What I am talking about is the "Mesh" grid type of ESMF. Passing a polygon to ESMF will need to transform it to a Mesh of one element with a large number of sides. I can check if ESMpy accepts those this morning, but my first understanding is that it doesn't?
Also question : which is the main repo, pangeo-data's or JiaweiZhuang's?

@tlogan2000
Copy link
Collaborator

@huard sorry if I am still a bit confused.

Does xESMF weight calculation etc only work on 'grids' .. i.e. it will correctly intersect and weight the transfer of one grid to another but would not necessarily handle a case where we intersect a grid with an irregular shaped drainage basin / province boundary ... in this case gridcells intersecting the edge of the polygon create oddly shaped 'slivers' with mutliple vertices (likely >4)

This could be linked to @aulemahal comments about mesh triangles as well I suppose?

@huard
Copy link
Collaborator Author

huard commented Aug 26, 2020

@aulemahal Got it. Yes, it's possible that it does not yet exist in ESMPy. I agree this would be problematic, since it's not just a matter of xESMF wrapping.

@tlogan2000 Correct for xESMF, but the fortran ESMF library handles the case of irregular polygons. OCGIS uses this functionality, which we used in flyingpigeon.

@aulemahal
Copy link
Collaborator

@huard @tlogan2000
So I did some tests. I computed weight masks for the spatial averaging of a given grid (xarray's air tutorial) to a region (mexico). I already had a working version using shapely for this simple case with a regular grid and a simple poly. Weights are fraction of the input gridcell [0, 1] that intersects the polygon.

  1. I was wrong : ESMpy does support mesh with generic-size element. Thus we can represent a polygon as a mesh of 1 element.
  2. I was also somewhat wrong about the speed. For a spatial averaging of a 130x150 grid to a 170-node polygon, I had times of: 35 ms (ESMpy) vs 47 ms (shapely / pyproj).
  3. Results are very close, with variation in the fractions of about 0.4 % between the two versions. There are strange features in the ESMpy output, but they do not contribute significantly to the results. A bug for later.

For single polygons, I believe implementation to xESMF shouldn't be so hard, given that the PR computing lat and lon bounds is merged. However it lacks:

  • Multipolygon support : Mesh elements cannot represent polygon collections. A solution would be to use a Mesh of many elements and sum the weights?
  • Holes : mesh elements cannot have holes. I guess we could recompute weights for the holes and substract them?

Note that adding the dependency of either shapely + pyproj or directly geopandas would be necessary for implementation in xESMF. A logical way I see would be to accept GeoDataFrames as inputs (instead of xarray datasets) in order to generate ESMpy Meshes.

@huard
Copy link
Collaborator Author

huard commented Sep 14, 2020

I suggest we close this issue now that the roadmap is fairly clear.

@huard huard closed this as completed Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants