-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare different zonal statistics method to quantify the effect of area weights and cross-validate the methods #209
Comments
Here is a notebook comparing different methods: https://gist.github.com/j08lue/ac09ed69e5dd58f4058466ec3b108b99 It runs without extra AWS S3 configuration in https://hub.ghg.center/ - please download it into the hub and run it there, if you want to try out or fix things. Once the validation is completed, we will add this notebook to the GHG Center docs. Main takeawaysThe zonal average calculated with the current TiTiler (v0.2.3) implementation is identical to a plain (unweighted) zonal average computed with Xarray / rioxarray. So far so good. Grid cell area weighting (here implemented by calculating area on a spherical Earth) makes a difference (of on average 35%) for CASA-GFED3 averaged over the Northern Hemisphere. The difference is negligible though for an area spanning ~2000 km north-south near the equator (Dem Rep Congo bbox, not shown here). Next stepsWe need to repeat this calculation with the new TiTiler implementation, either running locally or deployed. If you have any comments on this comparison or want to further iterate on it, please go ahead - contributions welcome! |
😓 looking at the notebook I think there is a miss understood about what we are trying to achieve here! in TiTiler/rio-tiler we added a method to take sub-pixel shape intersection in order to weight the statistics by the pixel coverage. In the notebook you're using the with xr.open_dataset("area.nc") as ds:
area_cdo = ds["cell_area"].values
...
data = xds_clip["band_data"].isel(band=0).to_masked_array()
weights = xds_clip["area"].to_masked_array()
weights.mask = data.mask
timeseries["average_weighted"].append((data * weights).sum() / weights.sum()) Those are two different things and applying weights from another dataset is not possible via titiler/rio-tiler directly! |
Thanks for checking! The external weights (CDO-generated |
Had a clarifying call. Conclusion was that I'll try out the Furthermore, we discussed that climate data (mostly NetCDF or Zarr in lat/lon projection) specific methods might be better served by a package that is optimized for that kind of data, possibly titiler-xarray or a future evolution of a Xarray-serving service. |
I did some testing with Updated notebook: The very different behavior of different equal-area projections makes me think that the method of reprojecting the image and geometry to some different projection is perhaps not safe enough. Not a surprise - there probably does not exist one projection that works for all regions and sizes. Since this method is most important for climate datasets (mostly on regular geographic coordinate grids), maybe we should focus on the approach to implement this for that specific case only. |
Ok, turns out the I take my true / benchmark to be the lat/lon grid cell area weighted average (with weights from CDO or an equivalent function). For a dataset with great variation over latitude like global net primary production (NPP), the methods compare like this: Where Both the rio-tiler method above and a rasterio function I coded up that uses WarpedVRT in a few lines give the expected results. tl;dr@vincentsarago, I need to use this How do we do this best? Expose I'll do some more testing with other source image reference systems than geographic coordinates and smaller geometries, to be on the safe side, but let's move forward making a plan for the implementation in VEDA and (most importantly) GHG Center backend, please. |
@j08lue I might have an idea! We could have a custom algorithm which could be applied on the image before calculating the statistics. This will only be available for the Basically you would tell the tiler to apply This will be possible with next version of titiler developmentseed/titiler#726 (but compatible with the next veda-backend release) Algorithm sketch: import numpy
from rio_tiler.models import ImageData
from rio_tiler.constants import WGS84_CRS
from titiler.core.algorithm.base import BaseAlgorithm
def _get_coord_specs(arr):
assert numpy.ndim(arr) == 1
delta = _get_unique_diff(arr)
return ((arr[0] - delta / 2), (arr[-1] + delta / 2), len(arr))
def surfaceAreaGrid(lat_specs, lon_specs):
"""
Construct a 2D array (nlat, nlon) where each element is the surface area of the corresponding grid in square meters.
lat_specs is a tuple, (starting latitude, ending latitude, latitude divisions)
lon_specs is a tuple, (starting longitude, ending longitude, longitude divisions)
Example: surfaceAreaGrid((-30,60,45), (20,80,120)) should return a (45,120) array with the surface areas of the
(2 degree lat x 0.5 degree lon) cells in square meters.
"""
lat_beg, lat_end, lat_div = lat_specs
lon_beg, lon_end, lon_div = lon_specs
R_e = 6371000.0 # Radius of Earth in meters
dLon = (numpy.pi / 180.) * (lon_end - lon_beg) / lon_div
dS = numpy.zeros((lat_div + 1, lon_div), numpy.float64)
Lats = (numpy.pi / 180.) * numpy.linspace(lat_beg, lat_end, lat_div + 1)
for i, lat in enumerate(Lats):
dS[i] = R_e * R_e * dLon * numpy.sin(lat)
dS = numpy.diff(dS, axis=0)
return dS
class WeightedArea(BaseAlgorithm):
"""Apply Area Weight."""
# metadata
input_nbands: int = 1
output_nbands: int = 1
def __call__(self, img: ImageData) -> ImageData:
"""apply weight."""
assert img.crs == WGS84_CRS, "'WeighedArea' algorithm only work with dataset in EPSG:4326 projection"
b1 = img.array[0].astype("float32")
# Compute weight array using img.bounds and shape
lat = numpy.linspace(img.bounds[1], img.bounds[3], img.height)
area_lat = _spherical_grid_cell_area(lat)
area_lat_2d = numpy.ones((img.height, img.width)) * area_lat[:, numpy.newaxis]
weights = surfaceAreaGrid(
(img.bounds[1], img.bounds[3], img.height),
(img.bounds[0], img.bounds[2], img.width),
)
arr = numpy.where(img.mask, img.array, img.array / weights)
return ImageData(
arr,
assets=img.assets,
crs=img.crs,
bounds=img.bounds,
band_names=img.band_names,
) Next
|
Oooh, I see I'll test that right away. |
@j08lue you jinxed me! we commented two solution at the same time!
You should be able to pass
in titiler-pgstac: https://github.com/stac-utils/titiler-pgstac/blob/0.8.0/titiler/pgstac/factory.py#L1053 ( (Note to self: I need to update the veda-backend requirements for titiler)
adding a new flag is a bit more complex!
starting with titiler 0.15.2, there are now 2 resampling options:
ref: https://github.com/developmentseed/titiler/blob/0.15.3/CHANGES.md#0152-2023-10-23
If we are happy with using |
This is awesome news, @vincentsarago! I'll do some more testing with our main use case - datasets with WGS84 coordinates. Scientific accuracy is key. If I find that the |
Hmm, getting a 500 error with a POST request to
while the request without RASTER_API_URL = "https://test-raster.delta-backend.com"
ITEM_ASSET_URL = "s3://veda-data-store-staging/no2-monthly/OMI_trno2_0.10x0.10_202109_Col3_V4.nc.tif"
AOI = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
-115,
82
],
[
-115,
-82
],
[
-43,
-82
],
[
-43,
82
],
[
-115,
82
]
]
],
"type": "Polygon"
}
}
]
}
result = requests.post(
f"{RASTER_API_URL}/cog/statistics",
params={
"url": ITEM_ASSET_URL,
"dst_crs": "+proj=cea"
},
json=AOI,
)
result.raise_for_status()
result.json() Do you have access to the function logs to see what is going wrong, @vincentsarago? |
Note:
There might be a bug in rio-tiler or titiler
☝️ this is a log from titiler.xyz (devseed aws) |
Sadly this will require a new deployment to catch newest rio-tiler version |
I can confirm that with this bug fix (testing against I will prepare the notebooks for our (GHG Center and VEDA) docs, so we have somewhere to point to for others to validate our methods. |
To complete our goal of providing grid-cell area weighted zonal statistic
we want to make sure that our zonal statistics calculations are accurate.
For that, we need to quantify
The text was updated successfully, but these errors were encountered: