Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour of clip_box #617

Closed
gcaria opened this issue Dec 13, 2022 · 14 comments
Closed

Inconsistent behaviour of clip_box #617

gcaria opened this issue Dec 13, 2022 · 14 comments
Labels
question Further information is requested

Comments

@gcaria
Copy link

gcaria commented Dec 13, 2022

import geopandas as gpd
import rioxarray as rxr

root = 'https://esa-worldcover.s3.eu-central-1.amazonaws.com/v100/2020/map/ESA_WorldCover_10m_2020_v100_{fname}_Map.tif'
da1 = rxr.open_rasterio(root.format(fname='N45E003'))
da2 = rxr.open_rasterio(root.format(fname='N45E000'))

xr.testing.assert_equal(da1.y, da2.y) # all good

gdf = gpd.read_file("https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/WRS2_descending_0.zip")
gdf_scene = gdf[((gdf.PATH==198) & (gdf.ROW==28))]
bounds = gdf_scene.geometry.values[0].bounds

da1_clipped = da1.rio.clip_box(*bounds)
da2_clipped = da2.rio.clip_box(*bounds)
xr.testing.assert_equal(da1_clipped.y, da2_clipped.y) # AssertionError: Left and right DataArray objects are not equal

Problem description

I am clipping two rasters which are side by side along the x-axis (so they share the same y-axis coordinates) with the same box polygon, so expect the outputs to also share the y-axis coordinates, but they don't.

Expected Output

I expect the outputs to also share the y-axis coordinates.

Environment Information

rioxarray (0.13.2) deps:
  rasterio: 1.3.4
    xarray: 2022.12.0
      GDAL: 3.5.3
      GEOS: 3.11.1
      PROJ: 9.0.1
 PROJ DATA: /home/ubuntu/envs/exp/lib/python3.9/site-packages/rasterio/proj_data
 GDAL DATA: /home/ubuntu/envs/exp/lib/python3.9/site-packages/rasterio/gdal_data
Other python deps:
     scipy: 1.9.3
    pyproj: 3.4.1

System:
    python: 3.9.13 (main, May 27 2022, 10:19:08)  [GCC 11.2.0]
executable: /home/ubuntu/envs/exp/bin/python
   machine: Linux-5.15.0-1026-aws-x86_64-with-glibc2.35

Installation method

pypi

@gcaria gcaria added the bug Something isn't working label Dec 13, 2022
@snowman2
Copy link
Member

The the x-coordinates the same? If not, that is likely the reason.

@gcaria
Copy link
Author

gcaria commented Dec 13, 2022

If also the x-coordinate were the same, the two input rasters would overlap completely (as the MRE shows, that's not the case).

A simple diagram that I hope complements the MRE above:
Screen Shot 2022-12-13 at 6 51 41 pm

@snowman2
Copy link
Member

image

@snowman2
Copy link
Member

len(da1_clipped.y),len(da2_clipped.y)
(22018, 22017)

This is true:

xr.testing.assert_equal(da1_clipped.y[1:], da2_clipped.y)

@snowman2
Copy link
Member

When clipping rasters, you cannot be guaranteed that everything will mach up exactly in your example here. Floating point issues and other things can have an impact on the behavior of the clip. If you want to join them together, I recommend using merge:

from rioxarray.merge import merge_arrays

da_merged = merge_arrays([da1_clipped, da2_clipped])

@snowman2 snowman2 added question Further information is requested and removed bug Something isn't working labels Dec 19, 2022
@gcaria
Copy link
Author

gcaria commented Dec 20, 2022

I'm very surprised. I was using clip_box as a shorter version of:

da1_clipped = da1.sel(x=slice(bounds[0], bounds[2]), y=slice(bounds[3], bounds[1]))
da2_clipped = da2.sel(x=slice(bounds[0], bounds[2]), y=slice(bounds[3], bounds[1]))
xr.testing.assert_equal(da1_clipped.y, da2_clipped.y) # no exception raised

Out of curiosity, why are the results of clip_box different?

@snowman2
Copy link
Member

The implementation is different. clip_box uses rasterio to determine the window to select:

window_error = None

numeric precision differences when calculating the transform can cause there to be a slight offset.

@snowman2
Copy link
Member

One part I forgot to mention is that the example you showed only considers the centroid of the grid cell whereas the rasterio version considers the entire grid cell.

@gcaria
Copy link
Author

gcaria commented Dec 23, 2022

I can see how that's one difference, but wouldn't that impact the "left" and "right" rasters (da1 and da2) in the same way? My main issue with this result from clip_box is its lack of symmetry.

@snowman2
Copy link
Member

wouldn't that impact the "left" and "right" rasters (da1 and da2) in the same way?

Not necessarily. Depending on how close the boundary is to the edge of the grid cells, floating point precision differences could make it include grid cells on one side versus the other. Also, the boundary may be slightly shifted compared to the raster. If the boundary or raster is shifted, there can be differences in the resulting raster.

@snowman2
Copy link
Member

If you want to be sure everything matches up exactly on both sides, one option is to merge the rasters before clipping.

@snowman2
Copy link
Member

I dug into the code and here are the rasterio Windows used for clipping:

da1:

Window(col_off=-20099.16, row_off=12617.999999999884, width=34670.16, height=22016.40000000014)
# this is converted to:
# row_start=12617.999999999884 row_stop=34634.40000000002 col_start=-20099.16 col_stop=14571.000000000004

da2:

Window(col_off=15900.84, row_off=12618.0, width=34670.16, height=22016.40000000014)
# this is converted to:
# row_start=12618.0 row_stop=34634.40000000014 col_start=15900.84 col_stop=50571.0

Note that da1 has row_start=12617.999999999884 . In rio.isel_window it uses math.floor to select the index of the row to ensure any intersecting cells are added. For da1 this converts row_start=12617 and for da2 this is row_start=12618.

And that is why there is an additional row for one and not the other. It all comes down to floating point precision issues.

@gcaria
Copy link
Author

gcaria commented Jan 2, 2023

Thanks for looking into this, any reason why math.floor has been chosen instead of round?

@snowman2
Copy link
Member

snowman2 commented Jan 2, 2023

" it uses math.floor to select the index of the row to ensure any intersecting cells are added."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants