Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster downsampling and cropping operations seem to be incompatible #408

Closed
atedstone opened this issue Oct 18, 2023 · 8 comments · Fixed by #448
Closed

Raster downsampling and cropping operations seem to be incompatible #408

atedstone opened this issue Oct 18, 2023 · 8 comments · Fixed by #448
Labels
bug Something isn't working

Comments

@atedstone
Copy link
Member

Describe the bug
When opening a Raster with the downsample option, running Raster.crop() immediately afterwards results in a blank Raster.

To Reproduce
Steps to reproduce the behavior:

im = gu.Raster(im_uri, downsample=5)
im2 = im.crop(v, inplace=False)
im2.show()

In contrast, if loading im is forced with

im = gu.Raster(im_uri, downsample=5, load_data=True)

then the crop functions as expected.

Expected behavior
The cropping operation should take account of the downsampling during the on-the-fly data loading, returning a cropped raster at the specified downsample factor.

System (please complete the following information):
JupyterLab, geoutils=0.0.16.dev2+gaf2fc8c (also geoutils=0.0.11)

Additional context
Add any other context about the problem here.

@atedstone atedstone added the bug Something isn't working label Oct 18, 2023
@adehecq
Copy link
Member

adehecq commented Oct 20, 2023

Hi @atedstone,
could you provide the data to reproduce the error, or even better make a reproducible example with the sample data that comes with geoutils?
I've tried reproducing the error with a sample data and it actually worked fine:

import geoutils as gu
import matplotlib.pyplot as plt
rst1 = gu.Raster(gu.examples.get_path("everest_landsat_b4"), downsample=2)
rst2 = gu.Raster(gu.examples.get_path("everest_landsat_b4_cropped"))
rst_new = rst1.crop(rst2, inplace=False)
rst_new.show(cmap="gray")
plt.show()

The outcome looks fine to me...
I was wondering if your error was related to #409 where downsampling also causes an error.

@adehecq
Copy link
Member

adehecq commented Oct 20, 2023

There is also another bug with downsampling that may explain the issue. The bounds are incorrect:

import geoutils as gu
import matplotlib.pyplot as plt
rst1 = gu.Raster(gu.examples.get_path("everest_landsat_b4"))
print(rst1.bounds)
# -> BoundingBox(left=478000.0, bottom=3088490.0, right=502000.0, top=3108140.0)

rst2 = gu.Raster(gu.examples.get_path("everest_landsat_b4"), downsample=2)
print(rst2.bounds)
# -> BoundingBox(left=478000.0, bottom=3068840.0, right=526000.0, top=3108140.0)

This is because self.height and self.width are not updated in this case. The issue comes from the setter at this line. The definition is different whether the data is loaded or not, which explains @atedstone's bug.

However when saving to file, the bounds are correct, so the issue can be difficult to see...

@atedstone
Copy link
Member Author

atedstone commented Nov 8, 2023

Thanks for the extra info @adehecq. Yes, it looks like this is at leastly partly the cause of the problem, but not the entire explanation. I have tried modifying the height and width definitions to reflect the _out_shape, as follows:

   @property
    def height(self) -> int:
        """Height of the raster in pixels."""
        if not self.is_loaded:
            if self._out_shape is not None:
                # Image will be downsampled on load, so return this shape
                return self._out_shape[1]
            else:
                return self._disk_shape[1]  # type: ignore
...

For my current test case, this yields:

r1 = gu.Raster(file, downsample=5, indexes=[5])
r2 = gu.Raster(file, indexes=[5])
print(r1.bounds)
print(r2.bounds)
print(r1._out_shape)
print(r2._out_shape)

BoundingBox(left=-1634200.0, bottom=-3385700.0, right=1920800.0, top=-560700.0)
BoundingBox(left=-1634200.0, bottom=-3383700.0, right=1919800.0, top=-560700.0)
(1130, 1422)
(5646, 7108)

which is about right, but does not fix the subsequent crop() operation, which comes back with a Raster of correct spatial extent but data filled with nans. So more work is required here.

@rhugonnet
Copy link
Contributor

Nice catch!
I think the main reason for this bug is that crop() is one of the few class methods of Raster that, instead of loading (often implicitly by just calling self.data somewhere) + applying the function, actually re-calls rasterio on the filename when the data is unloaded (to save memory if possible when loading the cropped image):
https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/raster.py#L1881

And the if/else loop we currently have is too "crude" to deal with arguments such as downsample defined at instantiation (and stored in out_shape).
We would need to refine the rasterio cropping functionality to accept a list of argument similar than the one for load(): https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/raster.py#L563.

@atedstone
Copy link
Member Author

Thanks @rhugonnet - maybe we can refactor crop() to also use load() for the rasterio functionality then?

@rhugonnet
Copy link
Contributor

Yes I think that's the way to go!

@atedstone
Copy link
Member Author

atedstone commented Jan 18, 2024

@adehecq here is a simplified test case which uses our example data.

import geoutils as gu
f = gu.examples.get_path('everest_landsat_b4')

# Check the image dimensions without downsampling
r_fullsize = gu.Raster(f)
r_fullsize.shape
(655, 800)

# Load downsampled image
r = gu.Raster(f, downsample=5)
# shape is smaller, as expected
r.shape
(131, 160)

# Apply a crop
r2 = r.crop([479000, 3001000, 597000, 3107000], inplace=False)

# Shape should be smaller than (131, 160)
r2.shape
(647, 787)

# transform of r2 is invalid for the outputs as it states pixel size of 150 m, should be 30 m given the shape reported above
r2.transform
Affine(150.0, 0.0, 478900.0,
       0.0, -150.0, 3107090.0)

Note that, unlike the initial bug report, this minimum working example does produce an output, but that output is wrong. I believe that this is because in the initial report I was working with a VRT mosaic, rather than the single GeoTIFF MWE here.

@adehecq
Copy link
Member

adehecq commented Jan 22, 2024

Thanks @rhugonnet - maybe we can refactor crop() to also use load() for the rasterio functionality then?

I started looking into this. load() does not have the functionality needed to load a subset of the data without modifying the raster's attributes transform and shape. We should probably use _load_rio or directly rasterio's read. But we need to store the transform of the file on disk in any case, as it is the one than we need to use, not the modified transform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

3 participants