-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first/last: getting first/last element of empty array should error #408
Comments
In "dimension-based" reducers such as reduce_dimension this case doesn't actually happen as empty dimensions are not a thing in openEO as they would be dropped if empty. But, this case can happen in e.g. aggregate_spatial where you get am empty array when you have no data for a given polygon. In this case you need to return something as "void" is not defined in openEO, every process needs to return something. So I'm not sure how I can fix this. Also, this behavior is actually defined in all reducers, e.g. mean, median, max, min, sum etc. Can't you just check if the array is empty and return null in this special case (or catch the error)? |
Okay, but
True, but it's effectively only a problem for the not-only-numerical reducers (e.g.
|
ignore_nodata is available in most numerical reducers. If that's set to true, it indeed can lead to the same issue. That's why it's probably a good idea to return a value. Throwing an error doesn't make sense in the aggregate_spatial use case, I think. If there's no data it should indicate that there's no data instead of aborting the whole workflow. Again, can't you handle the special cases explicitly in code? |
I agree that for these reducers it's fine to return a null value, because we can use the float NaN for that. But that's not the case for these other reducers.
I think the error would be thrown by the reducer and caught within aggregate_spatial, so it doesn't mean that the whole workflow has to be aborted!
I can't think of a good way to do it. E.g. I don't want to go into |
I'm really not into this, but can't you set None/null for other data types? Alternatively use a masked array or so?
Ah, if you are referring to throwing an error in the context of openEO processes I assume it means openEO error, which always aborts execution. If it's possible to recover, you can still catch it internally and make it a warning or so, sure. But if you catch, you'd eventually also just set some kind of no-data values for the geometry that failed, which is essentially what the process wants you to do, no?
Aren't we here just talking about empty arrays? I'm already seeing |
Yes, but I can only do that within the parent process, not from within the child process! From the perspective of the child process, this operation isn't doable, so it should error out. The first item of an empty array is not a null value!
Hmm - is this set in stone? I think it does make sense for a Child process graph to throw a standardised OpenEO error and its parent process attempting to catch it.
Ah, the crucial piece is that the code as proposed in this PR leads to the same issue I raise in #410, In short, |
@m-mohr could we reopen this one please? Forgot about it yesterday because it's closed, but it does seem worth discussing in the next telco! |
Sure, although I'm not sure what exactly to discuss? Isn't this the basically same issue (regarding nodata) and "solution" as in #410? |
Update: |
But an empty array means there is no data, i.e. the process indicates that there is no data by returning null, which is the nodata value. Doesn't that makes sense? This is how openEO tries to work meaningful on raster data cubes without always throwing errors and making use of no-data values. Other processes also return null on empty arrays, e.g. all, any, array_find, max, min, ... basically every reducer except for count. Deviating from this for just first and last doesn't seem very consistent. If you want to enforce an error on empty arrays, use array_element instead. |
Process ID: first/last
Describe the issue:
Currently, according to example 4 in the docs, calling
first
orlast
on an empty array should return null:In our implementation this is causing problems when calling
first
orlast
as a reducer on an empty datacube, because it suggests that the process should return the valuenull
and return that as the reduced array (i.e. reduced dimension ending up as[null]
). In numpy this is only possible for float-type arrays, where I can return[np.nan]
, but for all other datatypes, such a missing data value does not exist.Proposed solution:
IMHO
first
andlast
should just error if the array is empty, similar to how callingarray[0]
on an empty array will result in an index out of bounds exception.Additional context:
This is also how numpy/xarray handles this case:
>>> np.take(np.array([]), indices=0)
IndexError: cannot do a non-empty take from an empty axes.
PR in openeo-processes-dask
cc @ValentinaHutter
The text was updated successfully, but these errors were encountered: