Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve cutline handling #598

Merged
merged 3 commits into from
May 2, 2023

Conversation

yellowcap
Copy link
Contributor

This is another attempt to resolve #588

The proposed changes ensure that the FirstMethod finishes when all pixels over the target geometry is filled.

The main implementation challenge is: the pixel_selection class needs to know the target mask to use in is_done function.

This means that the input geometry needs to be converted to a mask as numpy array to which the pixel_selection needs to have access.

For this case, the cutline vrt_option approach does not work, because for gdal cutlines the mask generation happens on the gdal side and the mask is not returned. So the proposed changes do not use the vrt_option argument, but rasterized the geometry directly and applies this mask on the python side after requesting pixels.

import datetime

import httpx
import matplotlib.pyplot as plt
from rasterio.crs import CRS
from rasterio.warp import transform_geom

from rio_tiler.io import STACReader
from rio_tiler.mosaic import mosaic_reader

stac_endpoint = "https://earth-search.aws.element84.com/v0/search"

# Date filter
date_min = "2022-08-01"
date_max = "2022-08-31"

start = datetime.datetime.strptime(date_min, "%Y-%m-%d").strftime("%Y-%m-%dT00:00:00Z")
end = datetime.datetime.strptime(date_max, "%Y-%m-%d").strftime("%Y-%m-%dT23:59:59Z")

# Bern large
aoi = {
    "type": "Feature",
    "properties": {},
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [296.138689305058506, -3217.936052156386722],
                [2770.964285295431182, -5479.41461400965818],
                [3496.34420136157496, -8466.273091929069778],
                [2045.584369229287404, -11794.486824467847327],
                [1106.857419026044226, -15506.725218453404523],
                [4435.071151564818138, -16701.46860962116989],
                [10110.10225961170363, -16232.105134519548301],
                [10408.78810740364861, -15336.047591143724276],
                [10238.110480093964725, -12903.891401980770752],
                [9811.416411819765926, -11367.792756193644891],
                [7763.284884103592049, -7271.529700761304412],
                [5629.814542732583504, -4113.993595532210747],
                [3795.030049153516302, -1980.523254161202203],
                [1490.882080472823873, -401.755201546653552],
                [296.138689305058506, -3217.936052156386722],
            ]
        ],
    },
}
epsg = 21780

query = {
    "collections": [
        "sentinel-s2-l2a-cogs"
    ],  # Make sure to query only sentinel-2 COGs collection
    "datetime": f"{start}/{end}",
    "query": {"eo:cloud_cover": {"lt": 90}},
    "intersects": transform_geom(f"epsg:{epsg}", "epsg:4326", aoi["geometry"]),
    "limit": 2,
    "fields": {
        "include": [
            "id",
            "properties.datetime",
            "properties.eo:cloud_cover",
        ],
        "exclude": ["links"],
    },
}

headers = {
    "Content-Type": "application/json",
    "Accept-Encoding": "gzip",
    "Accept": "application/geo+json",
}


data = httpx.post(stac_endpoint, headers=headers, json=query).json()
print(data["context"])

sceneid = [f["id"] for f in data["features"]]
cloudcover = [f["properties"]["eo:cloud_cover"] for f in data["features"]]
dates = [f["properties"]["datetime"][0:10] for f in data["features"]]

items = [feat | {"stac_version": data["stac_version"]} for feat in data["features"]]


def tiler(asset, *args, **kwargs):
    print(asset["id"])
    with STACReader(None, asset) as src:
        img = src.feature(
            shape=kwargs["shape"],
            shape_crs=kwargs["shape_crs"],
            assets=kwargs["assets"],
        )
        print("min/max/shape", img.data.min(), img.data.max(), img.data.shape)

        return img

img, assets_used = mosaic_reader(
    items,
    tiler,
    shape=aoi,
    shape_crs=CRS.from_epsg(21780),
    assets=["B04", "B03", "B02"],
    threads=1,
)

fig = plt.figure(figsize=(30, 10))

ax = fig.add_subplot(1, 2, 1)
img.rescale(in_range=((0, 5000),))
ax.imshow(img.data_as_image())

ax = fig.add_subplot(1, 2, 2)
ax.imshow(img.mask)

plt.show()

@yellowcap
Copy link
Contributor Author

For the docs, it seems like aside from the all_touched part, there is no api change on the functions. So maybe just adding another example would be a good way to document?

Either an extension of https://cogeotiff.github.io/rio-tiler/examples/Using-rio-tiler-mosaic/

or a fresh one similar to the snipped in the ticket above.

@yellowcap yellowcap marked this pull request as ready for review April 24, 2023 17:17
@vincentsarago
Copy link
Member

👍 all good for me.

Would you be able to write a note in the changelog 🙏

@yellowcap
Copy link
Contributor Author

Added changelog note

@yellowcap yellowcap force-pushed the cutline-improvements branch 2 times, most recently from 214a1a3 to a6e70d9 Compare April 26, 2023 11:38
@yellowcap
Copy link
Contributor Author

One sidenote: the create_cutline utils function is no longer used in this codebase. What to do in this case? Just leave it be, or deprecate?

@vincentsarago
Copy link
Member

@yellowcap why is the create_cutline no more used? I think it should still be!

@yellowcap
Copy link
Contributor Author

@yellowcap why is the create_cutline no more used? I think it should still be!

I first left it in, but the problem is that the cutline mask is done in the projection of the source image, so it won't exactly align with the cutline mask done in the target projection. The "finish early" will fail in some of these cases because there are pixels that are masked in the original projection, but not anymore in the target one. So these pixels will always be masked in the target projection.

I thought about this in terms of what it means for data transfer efficiency. My guess is that this will not have implications for data transfers, but not sure. Maybe some COG internal tile requests can be skipped with the cutline vrt_option in place. Do you know if GDAL optimizes that under the hood, i.e. identifying what tiles not to get based on the cutline?

@yellowcap
Copy link
Contributor Author

Also, I just realized that the masking will not fully work for rasters with multiple resolutions (10m vs 20m) in this case. So I'll have to digg into that, the stop early fails when multiple resolutions are included.

@yellowcap
Copy link
Contributor Author

I added another commit that solves the multiple resolution problem. The approach is to only apply the same cutline mask to all assets using the highest resolution rasterization.
0894fad

@vincentsarago
Copy link
Member

vincentsarago commented Apr 27, 2023

I thought about this in terms of what it means for data transfer efficiency. My guess is that this will not have implications for data transfers, but not sure. Maybe some COG internal tile requests can be skipped with the cutline vrt_option in place. Do you know if GDAL optimizes that under the hood, i.e. identifying what tiles not to get based on the cutline?

It might (really doubt) but I don't think this will be a major issueQ 👍

Let's keep create_cutline in utils because it might be useful outside the feature method (e.g to mask tiles developmentseed/titiler#376)

for img in data:
if img.height == max_h and img.width == max_w:
continue
arr = numpy.ma.MaskedArray(
resize_array(img.array.data, max_h, max_w),
mask=resize_array(img.array.mask * 1, max_h, max_w).astype("bool"),
mask=cutline_mask,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 what happens if there is no cutline_mask in the ImageData object? I think this line shouldn't be changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right again! But that opens the problem that pixels might never get unmasked when upsampling. In the following example, the green pixel would always be masked at the coarse resolution, but at the higher resolution it would not be masked. So the finish early would not work again. The solution is to go back to what you already suggested earlier - setting always all_touched to True. In that case the finer mask should always be included in the coarser mask. Will make change accordingly.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we could also add a mechanism to set an option for finish early when like 95% of the cutline is filled 🤷

@@ -551,7 +549,7 @@ def feature(
[shape],
out_shape=(img.height, img.width),
transform=img.transform,
all_touched=all_touched,
all_touched=True, # Necesary for matching masks at different resolutions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vincentsarago
Copy link
Member

@yellowcap all seems good to me now! do you want to add a tests for the multiple resolution case?

@yellowcap
Copy link
Contributor Author

ok test for multires added, took the blue.tif test file, resampled from 900m to 1800m, and renamed the resampled file to lowres.tif. Added this as an asset to the stac.json item file, and using the lowres file in the test.

Copy link
Member

@vincentsarago vincentsarago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle projection difference between the input feature and output crs

@vincentsarago vincentsarago self-requested a review May 2, 2023 20:26
@vincentsarago
Copy link
Member

thanks a ton @yellowcap for working on this 🙏

@vincentsarago vincentsarago merged commit 54319e9 into cogeotiff:dev May 2, 2023
@yellowcap
Copy link
Contributor Author

Awesome! Thanks for all the help and getting it over the last humps @vincentsarago !

vincentsarago added a commit that referenced this pull request Jun 1, 2023
* update readme

* PER_BAND mask and MaskedArray (#580)

* sketch use of MaskedArray with PER_BAND mask

* update PointData and fix tests

* update mosaics

* update nodata mask

* remove unused

* update changelog

* migration guide

* split resampling option to RasterIO and Warp options (#590)

* refactor Mask tests and Benchmark

* refactor Mask tests and Benchmark (#591)

* change arguments in dynamic_tiler.md (#592)

* change arguments

The argument  ("tile", {"z": "{z}", "x": "{x}", "y": "{y}"}) causes errors below.

File "/home/ubuntu/app/app.py", line 48, in tilejson
    tile_url = request.url_for("tile", {"z": "{z}", "x": "{x}", "y": "{y}"})
TypeError: HTTPConnection.url_for() takes 2 positional arguments but 3 were given

("tile", z='{z}', x='{x}', y='{y}') is correct.

* Update docs/src/advanced/dynamic_tiler.md

---------

Co-authored-by: Vincent Sarago <vincent.sarago@gmail.com>

* refactor MosaicMethods (#594)

* refactor MosaicMethods

* fix tests

* Optional boto3 (#597)

* make boto3 an optional dependency

* update migration

* update test dependencies

* add flake8 rules in ruff (#600)

* update dev version

* update mosaic example notebook custom pixel_selection class (#602)

* improve cutline handling (#598)

* Improve cutline handling

Closes #588

* handle projection

---------

Co-authored-by: vincentsarago <vincent.sarago@gmail.com>

* remove useless cache

* save benchmarks

* add benchmarking in docs

* fix nodata/mask/alpha forwarding for GCPS dataset (#604)

* remove boto3

* Allow clip-box to auto expand (#608)

* Allow clip-box to auto expand

* Add test

* Update tests/test_io_xarray.py

* Fix import order

* Changes from black linter

* Change default to true

* use auto_expand in part and update changelog

---------

Co-authored-by: Vincent Sarago <vincent.sarago@gmail.com>

* allow morecantile 4.0 (#606)

* allow morecantile 4.0

* set morecantile min version to 4.0

* update changelog

* forward statistics from STAC raster:bands (#611)

* forward statistics from STAC raster:bands

* forward tags as metadata and forward metadata when creating ImageData from list

* update changelog

* handle nodata in XarrayReader (#612)

* handle nodata in XarrayReader

* update changelog

* add AWS credential overrides for S3 stac (#613)

* add AWS credential overrides for S3 stac

* catch warnings

* deprecated model methods (#615)

* update readme

---------

Co-authored-by: TTY6335 <36815385+TTY6335@users.noreply.github.com>
Co-authored-by: Daniel Wiesmann <yellowcap@users.noreply.github.com>
Co-authored-by: Aimee Barciauskas <aimee@developmentseed.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finish mosaicking when all pixels within a cutline are populated
2 participants