Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignment load hint documentationi is incorrect #1382

Open
clausmichele opened this issue Jan 16, 2023 · 14 comments · Fixed by #1384
Open

alignment load hint documentationi is incorrect #1382

clausmichele opened this issue Jan 16, 2023 · 14 comments · Fixed by #1384
Assignees

Comments

@clausmichele
Copy link

clausmichele commented Jan 16, 2023

Expected behaviour

Data loaded from ODC should be aligned with the original one.

Actual behaviour

I am working with data projected in EPSG:4326:

Size is 4332, 2000
Origin = (4.003000000000000,49.000000000000000)
Pixel Size = (0.003000000000000,-0.003000000000000)

Unfortunately, when loading the data with ODC, the data has one additional row and column, leading to a shift in the data. I attach a zip with a sample file and the product and dataset definitions.
This screenshot shows the misalignment between the resulting data loaded from ODC vs the original one (layer below in dark gray is ODC):

image

I also tried using the align parameter but nothing changed, I am not sure how to properly use it. I also tried to use the load hints, but again no difference.

Steps to reproduce the behaviour

import datacube
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE')
print(data)

<xarray.Dataset>
Dimensions:      (time: 1, latitude: 2001, longitude: 4333)
Coordinates:
  * time         (time) datetime64[ns] 2022-12-16T10:09:22
  * latitude     (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
  * longitude    (longitude) float64 4.003 4.006 4.01 4.013 ... 16.99 17.0 17.0
    spatial_ref  int32 4326
Data variables:
    band_0       (time, latitude, longitude) float64 0.0 0.0 0.0 ... nan nan 0.0
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref
data.geobox.extent.boundingbox

BoundingBox(left=4.002, bottom=42.999, right=17.001, top=49.002)

Environment information

Datacube version = 1.8.9
Python version = 3.9.13

sample_data.zip

@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Jan 16, 2023

I feel like I've seen something like this before, but can't remember the details. I suspect there's a mismatch in assumptions about cell coordinates (edge vs centre) somewhere.

Indeed, the offset in the attached image looks like it is off by half-a-pixel rather than a whole pixel.

@Kirill888
Copy link
Member

@clausmichele it's most likely doing the right thing, but documentation on the topic is lacking, please read through this issue first:

opendatacube/odc-stac#95

@clausmichele
Copy link
Author

I know that the coordinates of the data I'm seeing are the pixel centres and not the boundaries, and I don't care if there's a padding but the problem is that the data is not aligned with the original one. Could you please try with the data I provided?
I will change the title of this issue to reflect the problem.

@clausmichele clausmichele changed the title ODC is padding 1 row and 1 column of zeros on X and Y when loading ODC data not aligned with source after loading Jan 17, 2023
@Kirill888
Copy link
Member

but that's the point, it won't be aligned with the original data. It's simply not possible in the general case when there are more than one dataset, or more than one band even...

To use exactly the grid you need you have to use like=GeoBox(...). "Native load" is not something dc.load supports even in situations where it should be possible (1 band of 1 dataset, for example). Trying to use align=(...) parameter is just an exercise in frustration and confusion, best option is to construct exact GeoBox you want and then pass it in like= parameter.

@Kirill888
Copy link
Member

also you should remove "storage" section from you product spec.

@clausmichele
Copy link
Author

clausmichele commented Jan 17, 2023

Thanks @Kirill888, using like + GeoBox solved the issue for me! Anyway, I didn't find some good docs about this, I guess it's a recurrent issue many will encounter.

import datacube
from datacube.utils.geometry import intersects, GeoBox
import affine

ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE',like=ref_geobox)
print(data)

<xarray.Dataset>
Dimensions:      (time: 1, latitude: 2000, longitude: 4332)
Coordinates:
  * time         (time) datetime64[ns] 2022-12-16T10:09:22
  * latitude     (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
  * longitude    (longitude) float64 4.005 4.008 4.011 ... 16.99 16.99 17.0
    spatial_ref  int32 4326
Data variables:
    band_0       (time, latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general?
Because I didn't see deprecation warnings when using it and it's also still in the docs https://datacube-core.readthedocs.io/en/latest/installation/product-definitions.html?highlight=product

@Kirill888
Copy link
Member

Kirill888 commented Jan 17, 2023

For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general?

@clausmichele In general, this definition is incomplete, it's missing tile_size, so it would have zero impact. See code snippet below, storage section maps to .grid_spec, but since it's incomplete it's as if it's not even there

gs_params = {name: extract_point(name)
for name in ('tile_size', 'resolution', 'origin')}
complete = all(gs_params[k] is not None for k in ('tile_size', 'resolution'))
if not complete:
return None

storage section was originally there for ingested products, i.e. products that are regularly tiled. It's not enough to have uniform resolution and projection across datasets, there is also an expectation for each dataset to have the same size in pixels and for any 2 datasets to overlap either fully or not at all.

You want to use "load hints" instead to supply default projection and resolution to use during dc.load from the product

https://datacube-core.readthedocs.io/en/latest/about-core-concepts/products.html#product-definition-api

errors in docs

@SpacemanPaul @pindge docs for product storage section missing essential tile_size parameter, and also have indentation issues in yaml:

https://github.com/opendatacube/datacube-core/blob/develop/docs/installation/product-definitions.rst

@clausmichele
Copy link
Author

clausmichele commented Jan 18, 2023

@Kirill888 I did some tests using load instead of storage, but I can't understand how it is supposed to work, since the values does not seem to follow a logic:

So, starting point, using like + GeoBox gives the right coordinates:

from datacube.utils.geometry import GeoBox
import affine
import rioxarray

ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,like=ref_geobox,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045  4.0075  4.0105 ... 16.9915 16.9945 16.9975]

Then, here a the results using different align parameters:

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0
#         latitude: 0.0

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0015
#         latitude: 0.0015

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.999 48.996 48.993 ... 43.005 43.002 42.999]
[ 4.002  4.005  4.008 ... 16.992 16.995 16.998]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.003
#         latitude: 0.003

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.001
#         latitude: 0.001

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045  4.0075  4.0105 ... 16.9915 16.9945 16.9975]

-> correct!

But before finding this values, I also tried 0.002, 0.0005 and many more. So what's the logic behind? The right align values should be 0.0015 in my opinion, looking at the docs (half the resolution, which here is 0.003)

align.{x,y} (optional)

By default the pixel grid is aligned such that pixel boundaries fall on x,y axis. This option allows us to translate the pixel grid. For example, to ensure that the pixel center of a 30m pixel grid is coincident with 0,0 use align:{x:15,y:15}.

@clausmichele
Copy link
Author

This issue shouldn't be closed yet. It is not yet clear why the load + align parameters in the product definition give the showed outputs.

@SpacemanPaul SpacemanPaul reopened this Jan 19, 2023
@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Jan 19, 2023

a val in coord list = align + (n + 0.5) * resolution (where n is an integer.)

The +0.5 comes from converting from grid-edge coordinates to pixel-centre coordinates.

E.g. for:

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0
#         latitude: 0.0

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

49.0005 = 0.0 + (n+0.5) * 0.003 (n=16333, an integer)

I think all the numbers you see as "wrong" work out with that.

@Kirill888
Copy link
Member

@clausmichele for a coordinate of a pixel EDGE x, resolution r and alignment argument a, we have the following

  • 0 <= a < |r|, where |..| is absolute value operator
  • x == (N*r + a), where N is an integer

So every pixel spans from (N*r + a) to ((N+1)*r + a) with the center being N*r + a + r/2. So if x' = ds.x[0] (left most coordinate recorded in xarray), you should expect (x' - r/2 - a)/r to be pretty close to some integer value. Obviously floating point issues are likely when working with resolution like 0.003.

@clausmichele
Copy link
Author

clausmichele commented Feb 2, 2023

x == (N*r + a), where N is an integer

So, if a = 0 and I know x, which is the pixel edge and I can find N.
In my case, x = 49.000, r = -0.003 -> N = x/r = 49/-0.003 = -16333.(3), we round the result to get an integer and we get N = -16333.

Now that I know N, if I want my pixel edge to be at 49.00:

(N*r + a) = 49.00 -> a = 49.00 - N*r -> a = 49.00 - (-16333*-0.003)-> a = 0.001

Maybe all of this should be documented better in the ODC docs? This rounding to integer (when computing N) is what creates the unexpected pixel center. Probably a good example could avoid other people having the same doubts that I solved thanks to you!

Edit: is the rounding to get N always down to the closest integer?

@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Feb 6, 2023

There's a lot that could be better documented in the ODC docs. Documentation PRs are most welcome!

@SpacemanPaul SpacemanPaul changed the title ODC data not aligned with source after loading alignment load hint documentationi is incorrect Jun 2, 2023
@SpacemanPaul
Copy link
Contributor

Re: load v storage in product metadata.

Storage should be deprecated - use load. The exact behaviour of storage at the moment is complicated, but it almost definitely doesn't do what you want. Yes, the documentation is wildly incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants