Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube list concatenation oddity #3696

Closed
valeriupredoi opened this issue Apr 6, 2020 · 13 comments
Closed

Cube list concatenation oddity #3696

valeriupredoi opened this issue Apr 6, 2020 · 13 comments

Comments

@valeriupredoi
Copy link

valeriupredoi commented Apr 6, 2020

Hey guys, hope yous doing well and healthy and working from home 😁
I got a question described in this comment and it seems to me there has to be some sort of check in iris for such cases. here goes:

  • function used: iris.cube.CubeList(cubes).concatenate()
  • used case: a cube list like [A, B, C, D] where A spans over a time t, B continues from where A stopped and spans for another time t, C starts a bit before B finishes and spans for another time t, D starts where C stopped and spans for another time t (t is arbitrary, what matters is where cubes start and end on the time axis)
  • result of concatenation: A, B and D will be concatenated into a composite M, C will not be concatenated so the result of iris.cube.CubeList([A, B, C, D]).concatenate() is [M=(A+B+D), C]

This is pretty bad since there is a serious time gap between end of B and start of D (where C should be, partially) -> is this something that you would consider writing a warning or a fix for? It's particularly difficult to notice such gapey concatenations when dealing with, say, multiple CMIP experiments to be concatenated into one.

If you guys say meh, then we'll plug a fix in our concatenation wrapper, but I just wanted to alert you on this - 🍺

@valeriupredoi
Copy link
Author

btw I just gave you a star so now you have to answer 😀 🍺

@pp-mo pp-mo added this to Backlog in Iris v3.0.0 via automation May 14, 2020
@pp-mo pp-mo added this to the v3.0.0 milestone May 14, 2020
@bjlittle bjlittle added this to Backlog in Iris v3.1.0 via automation Sep 8, 2020
@bjlittle bjlittle removed this from Backlog in Iris v3.0.0 Sep 8, 2020
@bjlittle bjlittle modified the milestones: v3.0.0, v3.1.0 Sep 8, 2020
@bjlittle bjlittle modified the milestones: v3.1.0, v3.2.0 Aug 4, 2021
@bjlittle bjlittle removed this from Backlog in Iris v3.1.0 Aug 4, 2021
@bjlittle bjlittle added this to Backlog in Iris v3.2.0 via automation Aug 4, 2021
@jamesp jamesp modified the milestones: v3.2.0, v3.3.0 Oct 28, 2021
@wjbenfold wjbenfold self-assigned this Jan 12, 2022
@wjbenfold
Copy link
Contributor

wjbenfold commented Jan 12, 2022

Hi @valeriupredoi, sorry for the delay getting back to you on this.

In terms of what Iris should do here, what's the ideal behaviour for you? My go-to list of options (below) mostly have big enough downsides that I don't think we'd want to implement them. The behaviour is currently expected to "warn" (I realise it's a cryptic warning) by returning you (from your example) [M, C] which will be a longer list than your intended result ([M]) and thus indicate that something went wrong/didn't "complete".

My options and their downsides:

  • Drop spare data (silently dropping data seems bad, we'd have to prioritise some data over other data)
  • Track some form of "maximum gappiness" in a concatenated cube and warn if it's too high (this seems a bit too close to attempting magic IMO as reading user's minds is hard)
  • Raise an exception (breaks existing code of users who are using the "get multiple cubes out" functionality)
  • Warn that concatenation wasn't "completed" (not got so much against this actually, though introducing warnings to code people think is working fine might worry them a bit)
  • Make a new concatenate_to_one (name TBC) method for CubeLists (I actually quite like this - it would do for concatenate what load_cube does for load or extract_cube does for extract...)

Let me know what you think!

@rcomer
Copy link
Member

rcomer commented Jan 12, 2022

We already have concatenate_cube.

@valeriupredoi
Copy link
Author

cheers muchly @wjbenfold for your reply and being willing to improve the functionality - I have just opened an issue where we may discuss this from our end's point, it may as well be that people are perfectly fine with the current functionality so we'll close this, but let's give my fellow devs a couple days see if there are any suggestions/complaints 😁 Many thanks to you guys for being well responsive to our input BTW 🍺

@rcomer
Copy link
Member

rcomer commented Mar 11, 2022

An instance of this just came up on MO Yammer. In that particular case, the overlapping period had equal data, metadata, etc. So in that case Iris might reasonably be expected to throw away the duplicate part and concatenate anyway. How easy that would be to implement I've no idea!

@wjbenfold
Copy link
Contributor

How easy that would be to implement I've no idea!

My immediate thought is that we'd at least have to realise the data to do that check, which might be enough to rule it out without at least a special mode that users have actively chosen?

@rcomer
Copy link
Member

rcomer commented Mar 14, 2022

I notice that over at ESMValGroup/ESMValCore#1423, @zklaus has argued that such functionality should be added in downstream packages to address specific use-cases.

We could maybe improve the error message from concatenate_cube though.

import iris

fname = iris.sample_data_path("SOI_Darwin.nc")
complete_cube = iris.load_cube(fname)

overlapping_parts = iris.cube.CubeList([complete_cube[:100], complete_cube[50:]])
overlapping_parts.concatenate_cube()
ConcatenateError: failed to concatenate into a single cube.
  An unexpected problem prevented concatenation.
  Expected only a single cube, found 2.

@wjbenfold
Copy link
Contributor

What do you reckon would be more helpful @rcomer? Given that a failure of the cubes to all look the same will, from my understanding, fail inside concatenate because of error_on_mismatch=True, maybe we can go for

ConcatenateError: failed to concatenate into a single cube.
  The cubes provided may overlap.
  Expected only a single cube, found 2.

If that seems sensible then I can put up the PR to make it happen (or you can and I'll review it)

@wjbenfold
Copy link
Contributor

Looking back at this issue, if there's future appetite for functionality that handles overlap, I can imagine an argument to concatenate_cube (or an additional function) that prioritises cubes that come earlier in the cubeList and drops data from later cubes in the list if it was already provided earlier in the list.

@rcomer
Copy link
Member

rcomer commented Mar 14, 2022

I think match becomes False here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only if error_on_mismatch is True of course.)

@wjbenfold
Copy link
Contributor

I think match becomes False here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only if error_on_mismatch is True of course.)

We could add descriptive errors all the way through that function (_ProtoCube.register) I think?

It looks like _sequence will also have a problem with overlapping bounds so we'd probably want to mention that too

@wjbenfold wjbenfold removed their assignment Jun 21, 2022
@stephenworsley stephenworsley removed this from the v3.3.0 milestone Jun 29, 2022
@pp-mo
Copy link
Member

pp-mo commented Jun 27, 2023

Put this in Iris 3.7 , but can't commit to resolving it : we will look into it now

@pp-mo pp-mo assigned stephenworsley and acchamber and unassigned bjlittle Jul 17, 2023
stephenworsley pushed a commit that referenced this issue Jul 26, 2023
* first draft of fix with comments to be removed

* Removed comments, added warning

* responded to pull request

* Updated tests and test warnings

* fixed dtypes on bounds tests

* Updated "what's new" docs section

* added link to github page

---------

Co-authored-by: alex.chamberlain-clay <achamber@vld760.cmpd1.metoffice.gov.uk>
@trexfeathers
Copy link
Contributor

Closed by #5382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: 🏁 Done
Status: Done
Status: Done
Development

No branches or pull requests

9 participants