Ensure cubelists only contain cubes #3238

rcomer · 2018-12-07T11:28:55Z

For completeness, I have addressed insert and extend as well as append (have I missed any?), although

I can't think of a reason to ever use insert.
I'm not sure I've quite got the behaviour right for extend, particularly in the case where a cube is passed. Would the distinction between append and extend seem a bit arbitrary to a new Python user?

pp-mo · 2018-12-07T18:17:44Z

I scanned the methods, and if we want to be strict about this I think we also have to consider the special methods __setitem__, __add__, __iadd__ : That is, effectively, the operators for cubelist[i] = x, x = cubelist + y and cubelist += x.

Read on ...

pp-mo · 2018-12-07T18:17:49Z

... ASIDE :

In the process, I have found that the existing code in the __new__ method is wrong !!

The reason you can say, e.g. cl = Cubelist([None, 1, 'anything']) without error is because it does ...

    cube_list = list.__new__(cls, list_of_cubes)
    if not all([isinstance(cube, Cube) for cube in cube_list]):
        ...

That is wrong, because the list.__new__() call does not initialise the new 'cube_list' object
(instead, the __init__ call is supposed to do that) : it just ignores the passed 'list_of_cubes'.
So cube_list is in fact always empty, and this 'check' on it is doing nothing.
( It took me ages to work this out !! )

So, the __new__ code seems to me to be "both ancient + wrong" (!)

In fact, I don't think we need a 'new' at all -- an 'init' will work just fine for this.
Here's what I have tried + seems to be working for me ...

    def __init__(self, list_of_cubes=None):
        # Check that all items in the incoming list (if any) are cubes.
        if list_of_cubes is not None:
            if not all(isinstance(cube, Cube) for cube in list_of_cubes):
                raise ValueError('CubeList create arguments are not all Cube '
                                 'instances : {}'.format(list_of_cubes))
        else:
            # Start empty, avoiding 'list(None)' which is an error.
            list_of_cubes = []

        # Initialise as a list.
        super(CubeList, self).__init__(list_of_cubes)

But I also found that making this work as (apparently) intended does break some other tests.
So, I will submit some such proposal shortly ...

pp-mo · 2018-12-07T18:27:16Z

... MEANWHILE ...
Following up on __setitem__, __add__, __iadd__ .
I found that the __add__ call is already overidden, in such a way that it forces a new creation -- so that does the new check, as it should (once the create check works properly).
But the __setitem__ and __iadd__ do seem to need fixing (I can still pass them non-cubes).

rcomer · 2018-12-08T15:35:25Z

😮

It’s never as simple as it first appears is it? 😆

Looks like __setitem__ and __iadd__ should be straightforward to sort out.

Apart from the fact that the check in __new__ doesn’t check anything, is there a good reason to use a ValueError rather than a TypeError?

pp-mo · 2018-12-10T11:40:09Z

It’s never as simple as it first appears is it? 😆

!never! @rcomer 😭

Apart from the fact that the check in new doesn’t check anything, is there a good reason to use a ValueError rather than a TypeError?

I guess it's a question of intent + interpretation.
A snippet from https://docs.python.org/3/library/exceptions.html :

TypeError : "Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError, but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError."

So I think in this case the strict interpretation would be that TypeError is only appropriate if the arg is not iterable, whereas bad contents should cause a ValueError ?
But the other view might have its points.

rcomer

So I think in this case the strict interpretation would be that TypeError is only appropriate if the arg is not iterable, whereas bad contents should cause a ValueError

OK thanks. So in that case I think my last TypeError in extend should be a ValueError, but all the others are fine.

lib/iris/cube.py

rcomer · 2018-12-13T10:05:25Z

I now have a bunch of methods that follow very similar patterns, and repeated code makes me feel slightly queasy. Is there a more elegant way to do this?

rcomer · 2018-12-13T10:11:11Z

This TODO has been there since the initial commit to GitHub. I'm not clear what "overload" means in this context. Does it mean what we are currently doing? If so, I can make a note to take it out.

rcomer · 2018-12-13T11:51:42Z

I've added tests for setting more than one item in a slice, e.g. cubelist[:2] = [a, b]. These new tests prove that I don't know how to implement __setitem__.

rcomer · 2018-12-14T18:48:58Z

lib/iris/tests/unit/cube/test_CubeList.py

@@ -171,7 +172,7 @@ def setUp(self):
        self.cubelist2 = iris.cube.CubeList([self.cube2])

    def test_pass(self):
-        cubelist = self.cubelist1.copy()
+        cubelist = copy.copy(self.cubelist1)
        cubelist += self.cubelist2


Changed copy method to copy.copy function for Python2 compatibility, but now cubelist becomes None after the add. 🤷‍♀️

Travis output showing the above:
Python2.7

FAIL: test_pass (iris.tests.unit.cube.test_CubeList.Test_iadd) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/scitools_iris-2.3.0.dev0-py2.7.egg/iris/tests/unit/cube/test_CubeList.py", line 177, in test_pass self.assertEqual(cubelist, self.cubelist1 + self.cubelist2) AssertionError: None != [<iris 'Cube' of foo / (unknown) (scalar cube)>, <iris 'Cube' of bar / (unknown) (scalar cube)>]

Python3.6

FAIL: test_pass (iris.tests.unit.cube.test_CubeList.Test_iadd) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages/scitools_iris-2.3.0.dev0-py3.6.egg/iris/tests/unit/cube/test_CubeList.py", line 177, in test_pass self.assertEqual(cubelist, self.cubelist1 + self.cubelist2) AssertionError: None != [<iris 'Cube' of foo / (unknown) (scalar [50 chars]be)>]

This test passes if I point it at the master branch. The equivalent test for extend passes on this branch. So this appearance of None is specific to:

Use of copy.copy function (or copying via cubelist = self.cubelist1[:]) rather than copy method.

Using __iadd__ with the changes I've made.

I'm officially confused.

lib/iris/tests/unit/cube/test_CubeList.py

rcomer · 2018-12-14T18:52:12Z

I think I've fixed the __setitem__ for Python3 but I've also looked at Python2, which caused some problems (see inline comments).

rcomer · 2018-12-17T11:34:18Z

Addressed __setslice__ for Python2.7 compatibility. Do we have a list somewhere of "things we can take out when we stop supporting Python2"?

pelson · 2019-02-01T15:57:46Z

Similar question in general regarding ensuring types within a list: https://stackoverflow.com/questions/12201811/subclass-python-list-to-validate-new-items/12203829#12203829

Personally, I'd implement this as a warning rather than an error - we don't want to completely prevent duck typed Cubes going in (if it acts like a duck, and quacks like a duck, then treat it as a duck).

rcomer · 2019-02-01T16:26:02Z

Thanks @pelson, yes I'd seen that or similar advice to inherit from collections.MutableSequence instead of list. If the CubeList class was changed to inherit from MutableSequence, could this cause headaches elsewhere? E.g. if user code has something like if isinstance(cubelist, list):.

The warning seems sensible to me, as it would still provide some information to help debugging when you try to print/extract/save your cubelist.

lib/iris/cube.py

rcomer · 2019-02-06T12:03:29Z

I've changed the exceptions to warnings following @pelson's suggestion. I also rationalised the message, which allowed me to make everything a lot cleaner!

I have modified some of the test_fail tests to reflect the fact that the exceptions are now coming from somewhere else (e.g. an attempt to iterate over a non-iterable). Is this the right thing to do, or should I simply remove these tests as redundant?

Still having problems with a copied cubelist becoming None when I try to += to it (see inline comments from December).

rcomer · 2019-02-06T11:57:27Z

lib/iris/cube.py

@@ -216,7 +216,30 @@ def __repr__(self):
        """Runs repr on every cube."""
        return '[%s]' % ',\n'.join([repr(cube) for cube in self])

+    def _check_iscube(self, obj):


Style question: Is this the right place for these checking functions, or should they be defined outside the class?

My take : these don't use instance properties, so they could be static methods.
... They don't use any class properties either, so they don't really need to be in the class at all.

In fact, they don't use private properties of Cube, so they don't really need to be in the module.
At that point (they are just functions), they could go somewhere else.

However, for personal preference, I'd remove them from the class but keep them as private methods in cube.py, just in case they might need to use 'private' cube concepts in future.

Also ... the use of isinstance prevents any duck typing (lookalike objects can't masquerade as Cubes), which is arguably un-Pythonic.
We have previously used hasattr('add_aux_coord') for this elsewhere

OK, so if we check whether the object quacks with an auxcoord instead of checking the type, is there any reason not to revert to raising an exception rather than a warning?

Oh dear, I had somehow skimmed over that latest discussion without taking it in.
Now I see that @pelson and I are just advocating different approaches, and I honestly don't know how to choose between.

Personally though, I must say I do hate all the warnings in Iris. There are still far too many, most occurrences are a pointless nuisance, and on the rare occasions when they aren't no-one is listening any more.

No problem: I'm partly using this as a learning exercise, so exploring different solutions to the same issue is fine 👍

I think I prefer to have an exception on the grounds that, if my code is going to fail, it's better for it to fail sooner rather than later. Also, having the failure at the point that the object is included into the cubelist means that the traceback is going to point me a lot closer to where I made the mistake. Which so far has always been

cubelist.append(some_function_i_forgot_to_put_a_return_statement_in())

we should get our ducks in a row 🦆 🦆 🦆 before sending @rcomer on a wild goose chase

nicely done! 😆

As a user, I'd still rather have an exception if possible, for the reasons I gave above.

If something does go wrong with my cubelist, the first thing I'm going to do in an attempt to debug is print it. So if we're looking for a minimal set of cube-like attributes, summary ought to be up there.

Are we thinking about this the wrong way round? Rather than trying to define which types should be allowed in a cubelist, it might be easier to focus on which types should definitely be rejected.

This started because a rogue None in a cubelist caused problems, and I wanted a more informative error message. So far I’m not aware of any other types that have caused issues. So my case would be solved by simply throwing an exception if object is None. We could generalise that a bit if we decide that, at minimum, the cubelist should be printable, so reject any objects that don’t have have a summary attribute.

The exception message needn’t say anything about how similar to a Cube the object is, but could just say ”object of type [whichever] does not belong in a cubelist”.

They don't have to be cube-like in all respects.

Are we thinking about this the wrong way round?

I don't think so in this case. Because users can create and use their own CubeList instances, the minimum set of behaviour required for an entry into a CubeList is precisely the Cube's behaviour (and no less).

in fact I was strongly opposed to creating another warning here

Useful to know, thank you. So my biggest concern is that we are essentially introducing a breaking change if we do this as an exception - if @rcomer has been adding None into a cube list by accident, just think of all the wild things that some of our less educated users have been doing! 😭
I guess there is a workaround though... if users really want to do this they can still do list.append(cube_list_instance, thing_that_isnt_a_cube) until they sort their 💩 out.

In an attempt to get consensus and prevent this conversation from being open-ended, my refined suggestion:

CubeList._assert_is_cube - raise a ValueError if not isinstance of cube.

~~CubeList._assert_is_iterable_of_cubes~~ -> just construct a CubeList of the subset - that way you can honour iterators, and then add the constructed CubeList as necessary.

Update the existing call in CubeList.__new__ to use _assert_is_cube.

So my biggest concern is that we are essentially introducing a breaking change if we do this as an exception... just think of all the wild things that some of our less educated users have been doing!

I hadn't considered the possibility of cubelists being used to store random types 😮 . Semi-serious question: how far away is Iris 3?

refined suggestion

Just to check I've understood: we make it strict so only Cube instances are allowed. Because the check is restricted to one method, someone who wants to include ducks in the cubelist just needs to replace that one method?

Points 1 and 2 sound good to me.

Point 3: I think I need to wait for #3264 to be merged, and then update __init__!

@pelson my biggest concern is that we are essentially introducing a breaking change if we do this as an exception - if @rcomer has been adding None into a cube list by accident, just think of all the wild things that some of our less educated users have been doing! sob

Not sure if it helps, but..
In my mind, even if it was previously possible to put non-cubes into a CubeList, that was never intended behaviour -- evidence the code covered by #3264.
So that is a bug, and fixing a bug is not a "breaking" change.
Weaselly, but we've accepted that principle before.

rcomer · 2019-02-06T12:08:25Z

Still to do:

address __new__/__init__ issue
update whatsnew

lib/iris/cube.py

lib/iris/tests/unit/cube/test_CubeList.py

…in-only-cubes * 'master' of github.com:SciTools/iris: Test with newest cf-units: no longer called cf_units. (SciTools#3265)

rcomer · 2019-06-18T10:11:36Z

Hi @pp-mo, do you think it's worth persevering with this? If yes, given the discussion above about it possibly being a breaking change, should we target it for Iris3.0?

I'm happy to carry on if you think it's worth it, but the issue has only come up for me twice (3 years apart) so I wouldn't say it's top priority.

rcomer · 2019-10-03T21:14:06Z

I’m now leaning towards the conclusion that this isn’t really worth it: the amount of new code here seems disproportionate to the size of the problem I was originally trying to solve, particularly given the stated desire to have less in this module/class.

And then there’s the ducks 🦆 🦆🦆😳

Should this PR be put out of its misery?

pp-mo · 2019-10-08T13:50:59Z

now leaning towards the conclusion that this isn’t really worth it

@rcomer I've been racking my brains for a single killer answer to this one !

It probably is the case that, without this, certain awkward + confusing bugs are easy to come by.
I also think it is clear that, according the __new__ method, it was originally intented that a CubeList could be trusted to only contain cubes. So fixing that is a reasonable thing, and in fact a bugfix.

But as you say, it's quite a lot of fuss to completely resolve it.
A search on "python typed list" resulted in this gist.
I think that looks well constructed, and indicates that the code you have is about right.
It may also contain useful pointers to reducing the duplicated code.
Except ... I'd just like to repeat a personal preference for somehow allowing "cubelike objects", instead of type checking.
( Note: this code also suggests that you should add __radd__ to your list of methods to override. )

TBH I've always been a wee bit ambivalent about CubeList myself, as it doesn't seem to serve much purpose except as a home for merge+concatenate, and a specialist printing format -- and only the print format really needs a class object.
So, another possible approach is simply to improve those operations that fail when you give them a list (ideally, an iterable??) of things that it "expects" to be cubes.
Your issue doesn't explain where your original confusing failure occurred : what was the failing operation ?
Frankly though, such a piecemeal approach sounds to me like more code to maintain than this approach.

rcomer · 2019-10-08T14:11:51Z

Thanks @pp-mo for giving this some thought.

Some operations that fail were listed further down the issue (comment and comment) . Specifically:

[cube] = cubelist.extract(name)

gives

AttributeError: 'NoneType' object has no attribute 'ndim'

If you try to print a cubelist with None in it, you get

AttributeError: 'NoneType' object has no attribute 'summary'

If you try to save a cubelist with a None to NetCDF you get

AttributeError: 'NoneType' object has no attribute 'attributes'

bjlittle · 2019-10-22T20:10:30Z

@rcomer and @pp-mo Where we at guys on the state of play for this PR?

rcomer · 2019-10-23T16:37:56Z

@bjlittle basically we need a decision whether it’s actually worth all the new code (see our last 3 comments). I did not realise how complicated it would get when I started! 😬

If it is worth it, there is also a question of how you could allow cube-like objects (duck types) to exist in the CubeList. The current implementation uses isinstance(thing, Cube). Both @pp-mo and @pelson have expressed a desire to allow the ducks in.

pp-mo · 2019-11-05T13:56:31Z

Hi @rcomer @bjlittle .
Just spotted a user problem that looks like a case for this. The reported problem is

"I am trying to concatenate/merge them into one cube but I get "AttributeError: 'CubeList' object has no attribute 'standard_name'".

Sounds to me like this person has a CubeList with another CubeList inside it !
Easily done if you are a bit vague about what the load functions returns (as, I observe, many naive users are).

rcomer · 2020-08-02T10:04:08Z

Right. I think my enthusiasm for this one has waned somewhat.

Due to staff changes in my team I don’t currently have the spare capacity I did so, even if we knew where we wanted to go with it, I’d struggle to justify the time any time soon. So I think it’s time to call it a day.

Thanks @pp-mo for all your advice. I learned a lot, so you can still chalk it up against your teaching objectives 👍

rcomer mentioned this pull request Dec 8, 2018

cftime again #3239

Closed

rcomer commented Dec 10, 2018

View reviewed changes

lib/iris/cube.py Outdated Show resolved Hide resolved

rcomer force-pushed the cubelist-contain-only-cubes branch from 4e6b744 to 0ff3183 Compare December 12, 2018 17:53

rcomer commented Dec 14, 2018

View reviewed changes

lib/iris/tests/unit/cube/test_CubeList.py Show resolved Hide resolved

rcomer mentioned this pull request Jan 11, 2019

Should cubelists contain objects that are not cubes? #1897

Closed

rcomer added 6 commits February 1, 2019 13:17

ensure cubelists only contain cubes

4ab7cfa

address __iadd__ and __setitem__

0b293ec

__setitem__ tests

8a74df9

test for setting more than 1 item

2622254

Fix __setitem__ and Py2 tweaks

39d5f4d

implement __setslice__ for Python2.7

4187162

rcomer force-pushed the cubelist-contain-only-cubes branch from cac1f49 to 4187162 Compare February 1, 2019 13:18

change exceptions to warnings

569f35e

stickler-ci reviewed Feb 6, 2019

View reviewed changes

lib/iris/cube.py Outdated Show resolved Hide resolved

stickler

5b13dcc

rcomer commented Feb 6, 2019

View reviewed changes

pp-mo mentioned this pull request Feb 7, 2019

Replace '__new__' with '__init__' in CubeList creation. #3264

Closed

duck type check; move helpers outside class

8f272fb

stickler-ci reviewed Feb 8, 2019

View reviewed changes

lib/iris/cube.py Outdated Show resolved Hide resolved

lib/iris/cube.py Outdated Show resolved Hide resolved

rcomer added 2 commits February 8, 2019 09:30

blank lines

f7134c1

proposed: revert warnings to exceptions

bc7eb7a

stickler-ci reviewed Feb 8, 2019

View reviewed changes

lib/iris/tests/unit/cube/test_CubeList.py Outdated Show resolved Hide resolved

rcomer added 3 commits February 8, 2019 10:23

remove stray extra 'test_fail'

d961a68

Merge branch 'master' of github.com:SciTools/iris into cubelist-conta…

a84823e

…in-only-cubes * 'master' of github.com:SciTools/iris: Test with newest cf-units: no longer called cf_units. (SciTools#3265)

pass sequences through __init__; _assert_is_cube

7fe5d0a

pelson assigned pp-mo Apr 16, 2019

rcomer added the Status: Decision Required label Oct 3, 2019

rcomer mentioned this pull request Nov 5, 2019

Straw man: alternative solution for non-cubes in cubelists #3510

Closed

rcomer closed this Aug 2, 2020

rcomer added the Status: Stalled label Aug 2, 2020

pp-mo mentioned this pull request May 25, 2022

Cubelist contain only cubes -- resurrected #4767

Merged

rcomer deleted the cubelist-contain-only-cubes branch June 28, 2022 12:56

Ensure cubelists only contain cubes #3238

Ensure cubelists only contain cubes #3238

Conversation

rcomer commented Dec 7, 2018

pp-mo commented Dec 7, 2018 • edited Loading

pp-mo commented Dec 7, 2018 • edited Loading

pp-mo commented Dec 7, 2018 • edited Loading

rcomer commented Dec 8, 2018

pp-mo commented Dec 10, 2018

rcomer left a comment

Choose a reason for hiding this comment

rcomer commented Dec 13, 2018

rcomer commented Dec 13, 2018

rcomer commented Dec 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcomer commented Dec 14, 2018

rcomer commented Dec 17, 2018

pelson commented Feb 1, 2019

rcomer commented Feb 1, 2019

rcomer commented Feb 6, 2019

rcomer Feb 6, 2019 • edited Loading

Choose a reason for hiding this comment

pp-mo Feb 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Feb 8, 2019 • edited Loading

Choose a reason for hiding this comment

rcomer Feb 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Feb 15, 2019 • edited Loading

Choose a reason for hiding this comment

rcomer commented Feb 6, 2019

rcomer commented Jun 18, 2019

rcomer commented Oct 3, 2019

pp-mo commented Oct 8, 2019 • edited Loading

rcomer commented Oct 8, 2019

bjlittle commented Oct 22, 2019

rcomer commented Oct 23, 2019

pp-mo commented Nov 5, 2019

rcomer commented Aug 2, 2020

pp-mo commented Dec 7, 2018 •

edited

Loading

pp-mo commented Dec 7, 2018 •

edited

Loading

pp-mo commented Dec 7, 2018 •

edited

Loading

rcomer Feb 6, 2019 •

edited

Loading

pp-mo Feb 7, 2019 •

edited

Loading

pp-mo Feb 8, 2019 •

edited

Loading

rcomer Feb 8, 2019 •

edited

Loading

pp-mo Feb 15, 2019 •

edited

Loading

pp-mo commented Oct 8, 2019 •

edited

Loading