Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cubelists only contain cubes #3238

Closed
wants to merge 14 commits into from

Conversation

rcomer
Copy link
Member

@rcomer rcomer commented Dec 7, 2018

See #1897.

For completeness, I have addressed insert and extend as well as append (have I missed any?), although

  • I can't think of a reason to ever use insert.
  • I'm not sure I've quite got the behaviour right for extend, particularly in the case where a cube is passed. Would the distinction between append and extend seem a bit arbitrary to a new Python user?

@pp-mo
Copy link
Member

pp-mo commented Dec 7, 2018

I scanned the methods, and if we want to be strict about this I think we also have to consider the special methods __setitem__, __add__, __iadd__ : That is, effectively, the operators for cubelist[i] = x, x = cubelist + y and cubelist += x.

Read on ...

@pp-mo
Copy link
Member

pp-mo commented Dec 7, 2018

... ASIDE :

In the process, I have found that the existing code in the __new__ method is wrong !!

The reason you can say, e.g. cl = Cubelist([None, 1, 'anything']) without error is because it does ...

    cube_list = list.__new__(cls, list_of_cubes)
    if not all([isinstance(cube, Cube) for cube in cube_list]):
        ...

That is wrong, because the list.__new__() call does not initialise the new 'cube_list' object
(instead, the __init__ call is supposed to do that) : it just ignores the passed 'list_of_cubes'.
So cube_list is in fact always empty, and this 'check' on it is doing nothing.
( It took me ages to work this out !! )

So, the __new__ code seems to me to be "both ancient + wrong" (!)

In fact, I don't think we need a 'new' at all -- an 'init' will work just fine for this.
Here's what I have tried + seems to be working for me ...

    def __init__(self, list_of_cubes=None):
        # Check that all items in the incoming list (if any) are cubes.
        if list_of_cubes is not None:
            if not all(isinstance(cube, Cube) for cube in list_of_cubes):
                raise ValueError('CubeList create arguments are not all Cube '
                                 'instances : {}'.format(list_of_cubes))
        else:
            # Start empty, avoiding 'list(None)' which is an error.
            list_of_cubes = []

        # Initialise as a list.
        super(CubeList, self).__init__(list_of_cubes)

But I also found that making this work as (apparently) intended does break some other tests.
So, I will submit some such proposal shortly ...

@pp-mo
Copy link
Member

pp-mo commented Dec 7, 2018

... MEANWHILE ...
Following up on __setitem__, __add__, __iadd__ .
I found that the __add__ call is already overidden, in such a way that it forces a new creation -- so that does the new check, as it should (once the create check works properly).
But the __setitem__ and __iadd__ do seem to need fixing (I can still pass them non-cubes).

@rcomer
Copy link
Member Author

rcomer commented Dec 8, 2018

😮

It’s never as simple as it first appears is it? 😆

Looks like __setitem__ and __iadd__ should be straightforward to sort out.

Apart from the fact that the check in __new__ doesn’t check anything, is there a good reason to use a ValueError rather than a TypeError?

@rcomer rcomer mentioned this pull request Dec 8, 2018
@pp-mo
Copy link
Member

pp-mo commented Dec 10, 2018

It’s never as simple as it first appears is it? 😆

!never! @rcomer 😭

Apart from the fact that the check in new doesn’t check anything, is there a good reason to use a ValueError rather than a TypeError?

I guess it's a question of intent + interpretation.
A snippet from https://docs.python.org/3/library/exceptions.html :

  • TypeError : "Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError, but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError."

So I think in this case the strict interpretation would be that TypeError is only appropriate if the arg is not iterable, whereas bad contents should cause a ValueError ?
But the other view might have its points.

Copy link
Member Author

@rcomer rcomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think in this case the strict interpretation would be that TypeError is only appropriate if the arg is not iterable, whereas bad contents should cause a ValueError

OK thanks. So in that case I think my last TypeError in extend should be a ValueError, but all the others are fine.

lib/iris/cube.py Outdated Show resolved Hide resolved
@rcomer
Copy link
Member Author

rcomer commented Dec 13, 2018

I now have a bunch of methods that follow very similar patterns, and repeated code makes me feel slightly queasy. Is there a more elegant way to do this?

@rcomer
Copy link
Member Author

rcomer commented Dec 13, 2018

This TODO has been there since the initial commit to GitHub. I'm not clear what "overload" means in this context. Does it mean what we are currently doing? If so, I can make a note to take it out.

@rcomer
Copy link
Member Author

rcomer commented Dec 13, 2018

I've added tests for setting more than one item in a slice, e.g. cubelist[:2] = [a, b]. These new tests prove that I don't know how to implement __setitem__.

@@ -171,7 +172,7 @@ def setUp(self):
self.cubelist2 = iris.cube.CubeList([self.cube2])

def test_pass(self):
cubelist = self.cubelist1.copy()
cubelist = copy.copy(self.cubelist1)
cubelist += self.cubelist2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed copy method to copy.copy function for Python2 compatibility, but now cubelist becomes None after the add. 🤷‍♀️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis output showing the above:
Python2.7

FAIL: test_pass (iris.tests.unit.cube.test_CubeList.Test_iadd)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/scitools_iris-2.3.0.dev0-py2.7.egg/iris/tests/unit/cube/test_CubeList.py", line 177, in test_pass
    self.assertEqual(cubelist, self.cubelist1 + self.cubelist2)
AssertionError: None != [<iris 'Cube' of foo / (unknown) (scalar cube)>,
<iris 'Cube' of bar / (unknown) (scalar cube)>]

Python3.6

FAIL: test_pass (iris.tests.unit.cube.test_CubeList.Test_iadd)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages/scitools_iris-2.3.0.dev0-py3.6.egg/iris/tests/unit/cube/test_CubeList.py", line 177, in test_pass
    self.assertEqual(cubelist, self.cubelist1 + self.cubelist2)
AssertionError: None != [<iris 'Cube' of foo / (unknown) (scalar [50 chars]be)>]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes if I point it at the master branch. The equivalent test for extend passes on this branch. So this appearance of None is specific to:

  • Use of copy.copy function (or copying via cubelist = self.cubelist1[:]) rather than copy method.
  • Using __iadd__ with the changes I've made.

I'm officially confused.

@rcomer
Copy link
Member Author

rcomer commented Dec 14, 2018

I think I've fixed the __setitem__ for Python3 but I've also looked at Python2, which caused some problems (see inline comments).

@rcomer
Copy link
Member Author

rcomer commented Dec 17, 2018

Addressed __setslice__ for Python2.7 compatibility. Do we have a list somewhere of "things we can take out when we stop supporting Python2"?

@rcomer rcomer force-pushed the cubelist-contain-only-cubes branch from cac1f49 to 4187162 Compare February 1, 2019 13:18
@pelson
Copy link
Member

pelson commented Feb 1, 2019

Similar question in general regarding ensuring types within a list: https://stackoverflow.com/questions/12201811/subclass-python-list-to-validate-new-items/12203829#12203829

Personally, I'd implement this as a warning rather than an error - we don't want to completely prevent duck typed Cubes going in (if it acts like a duck, and quacks like a duck, then treat it as a duck).

@rcomer
Copy link
Member Author

rcomer commented Feb 1, 2019

Thanks @pelson, yes I'd seen that or similar advice to inherit from collections.MutableSequence instead of list. If the CubeList class was changed to inherit from MutableSequence, could this cause headaches elsewhere? E.g. if user code has something like if isinstance(cubelist, list):.

The warning seems sensible to me, as it would still provide some information to help debugging when you try to print/extract/save your cubelist.

lib/iris/cube.py Outdated Show resolved Hide resolved
@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2019

I've changed the exceptions to warnings following @pelson's suggestion. I also rationalised the message, which allowed me to make everything a lot cleaner!

I have modified some of the test_fail tests to reflect the fact that the exceptions are now coming from somewhere else (e.g. an attempt to iterate over a non-iterable). Is this the right thing to do, or should I simply remove these tests as redundant?

Still having problems with a copied cubelist becoming None when I try to += to it (see inline comments from December).

lib/iris/cube.py Outdated
@@ -216,7 +216,30 @@ def __repr__(self):
"""Runs repr on every cube."""
return '[%s]' % ',\n'.join([repr(cube) for cube in self])

def _check_iscube(self, obj):
Copy link
Member Author

@rcomer rcomer Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style question: Is this the right place for these checking functions, or should they be defined outside the class?

Copy link
Member

@pp-mo pp-mo Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take : these don't use instance properties, so they could be static methods.
... They don't use any class properties either, so they don't really need to be in the class at all.

In fact, they don't use private properties of Cube, so they don't really need to be in the module.
At that point (they are just functions), they could go somewhere else.

However, for personal preference, I'd remove them from the class but keep them as private methods in cube.py, just in case they might need to use 'private' cube concepts in future.

Also ... the use of isinstance prevents any duck typing (lookalike objects can't masquerade as Cubes), which is arguably un-Pythonic.
We have previously used hasattr('add_aux_coord') for this elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so if we check whether the object quacks with an auxcoord instead of checking the type, is there any reason not to revert to raising an exception rather than a warning?

Copy link
Member

@pp-mo pp-mo Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear, I had somehow skimmed over that latest discussion without taking it in.
Now I see that @pelson and I are just advocating different approaches, and I honestly don't know how to choose between.

Personally though, I must say I do hate all the warnings in Iris. There are still far too many, most occurrences are a pointless nuisance, and on the rare occasions when they aren't no-one is listening any more.

Copy link
Member Author

@rcomer rcomer Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem: I'm partly using this as a learning exercise, so exploring different solutions to the same issue is fine 👍

I think I prefer to have an exception on the grounds that, if my code is going to fail, it's better for it to fail sooner rather than later. Also, having the failure at the point that the object is included into the cubelist means that the traceback is going to point me a lot closer to where I made the mistake. Which so far has always been

cubelist.append(some_function_i_forgot_to_put_a_return_statement_in())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should get our ducks in a row 🦆 🦆 🦆 before sending @rcomer on a wild goose chase

nicely done! 😆

As a user, I'd still rather have an exception if possible, for the reasons I gave above.

If something does go wrong with my cubelist, the first thing I'm going to do in an attempt to debug is print it. So if we're looking for a minimal set of cube-like attributes, summary ought to be up there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we thinking about this the wrong way round? Rather than trying to define which types should be allowed in a cubelist, it might be easier to focus on which types should definitely be rejected.

This started because a rogue None in a cubelist caused problems, and I wanted a more informative error message. So far I’m not aware of any other types that have caused issues. So my case would be solved by simply throwing an exception if object is None. We could generalise that a bit if we decide that, at minimum, the cubelist should be printable, so reject any objects that don’t have have a summary attribute.

The exception message needn’t say anything about how similar to a Cube the object is, but could just say ”object of type [whichever] does not belong in a cubelist”.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't have to be cube-like in all respects.

Are we thinking about this the wrong way round?

I don't think so in this case. Because users can create and use their own CubeList instances, the minimum set of behaviour required for an entry into a CubeList is precisely the Cube's behaviour (and no less).

in fact I was strongly opposed to creating another warning here

Useful to know, thank you. So my biggest concern is that we are essentially introducing a breaking change if we do this as an exception - if @rcomer has been adding None into a cube list by accident, just think of all the wild things that some of our less educated users have been doing! 😭
I guess there is a workaround though... if users really want to do this they can still do list.append(cube_list_instance, thing_that_isnt_a_cube) until they sort their 💩 out.

In an attempt to get consensus and prevent this conversation from being open-ended, my refined suggestion:

  • CubeList._assert_is_cube - raise a ValueError if not isinstance of cube.
  • CubeList._assert_is_iterable_of_cubes -> just construct a CubeList of the subset - that way you can honour iterators, and then add the constructed CubeList as necessary.
  • Update the existing call in CubeList.__new__ to use _assert_is_cube.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my biggest concern is that we are essentially introducing a breaking change if we do this as an exception... just think of all the wild things that some of our less educated users have been doing!

I hadn't considered the possibility of cubelists being used to store random types 😮 . Semi-serious question: how far away is Iris 3?

refined suggestion

Just to check I've understood: we make it strict so only Cube instances are allowed. Because the check is restricted to one method, someone who wants to include ducks in the cubelist just needs to replace that one method?

Points 1 and 2 sound good to me.

Point 3: I think I need to wait for #3264 to be merged, and then update __init__!

Copy link
Member

@pp-mo pp-mo Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelson my biggest concern is that we are essentially introducing a breaking change if we do this as an exception - if @rcomer has been adding None into a cube list by accident, just think of all the wild things that some of our less educated users have been doing! sob

Not sure if it helps, but..
In my mind, even if it was previously possible to put non-cubes into a CubeList, that was never intended behaviour -- evidence the code covered by #3264.
So that is a bug, and fixing a bug is not a "breaking" change.
Weaselly, but we've accepted that principle before.

@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2019

Still to do:

  • address __new__/__init__ issue
  • update whatsnew

lib/iris/cube.py Outdated Show resolved Hide resolved
lib/iris/cube.py Outdated Show resolved Hide resolved
@rcomer
Copy link
Member Author

rcomer commented Jun 18, 2019

Hi @pp-mo, do you think it's worth persevering with this? If yes, given the discussion above about it possibly being a breaking change, should we target it for Iris3.0?

I'm happy to carry on if you think it's worth it, but the issue has only come up for me twice (3 years apart) so I wouldn't say it's top priority.

@rcomer
Copy link
Member Author

rcomer commented Oct 3, 2019

I’m now leaning towards the conclusion that this isn’t really worth it: the amount of new code here seems disproportionate to the size of the problem I was originally trying to solve, particularly given the stated desire to have less in this module/class.

And then there’s the ducks 🦆 🦆🦆😳

Should this PR be put out of its misery?

@pp-mo
Copy link
Member

pp-mo commented Oct 8, 2019

now leaning towards the conclusion that this isn’t really worth it

@rcomer I've been racking my brains for a single killer answer to this one !

It probably is the case that, without this, certain awkward + confusing bugs are easy to come by.
I also think it is clear that, according the __new__ method, it was originally intented that a CubeList could be trusted to only contain cubes. So fixing that is a reasonable thing, and in fact a bugfix.

But as you say, it's quite a lot of fuss to completely resolve it.
A search on "python typed list" resulted in this gist.
I think that looks well constructed, and indicates that the code you have is about right.
It may also contain useful pointers to reducing the duplicated code.
Except ... I'd just like to repeat a personal preference for somehow allowing "cubelike objects", instead of type checking.
( Note: this code also suggests that you should add __radd__ to your list of methods to override. )

TBH I've always been a wee bit ambivalent about CubeList myself, as it doesn't seem to serve much purpose except as a home for merge+concatenate, and a specialist printing format -- and only the print format really needs a class object.
So, another possible approach is simply to improve those operations that fail when you give them a list (ideally, an iterable??) of things that it "expects" to be cubes.
Your issue doesn't explain where your original confusing failure occurred : what was the failing operation ?
Frankly though, such a piecemeal approach sounds to me like more code to maintain than this approach.

@rcomer
Copy link
Member Author

rcomer commented Oct 8, 2019

Thanks @pp-mo for giving this some thought.

Some operations that fail were listed further down the issue (comment and comment) . Specifically:

[cube] = cubelist.extract(name)

gives

AttributeError: 'NoneType' object has no attribute 'ndim'

If you try to print a cubelist with None in it, you get

AttributeError: 'NoneType' object has no attribute 'summary'

If you try to save a cubelist with a None to NetCDF you get

AttributeError: 'NoneType' object has no attribute 'attributes'

@bjlittle
Copy link
Member

@rcomer and @pp-mo Where we at guys on the state of play for this PR?

@rcomer
Copy link
Member Author

rcomer commented Oct 23, 2019

@bjlittle basically we need a decision whether it’s actually worth all the new code (see our last 3 comments). I did not realise how complicated it would get when I started! 😬

If it is worth it, there is also a question of how you could allow cube-like objects (duck types) to exist in the CubeList. The current implementation uses isinstance(thing, Cube). Both @pp-mo and @pelson have expressed a desire to allow the ducks in.

@pp-mo
Copy link
Member

pp-mo commented Nov 5, 2019

Hi @rcomer @bjlittle .
Just spotted a user problem that looks like a case for this. The reported problem is

"I am trying to concatenate/merge them into one cube but I get "AttributeError: 'CubeList' object has no attribute 'standard_name'".

Sounds to me like this person has a CubeList with another CubeList inside it !
Easily done if you are a bit vague about what the load functions returns (as, I observe, many naive users are).

@rcomer
Copy link
Member Author

rcomer commented Aug 2, 2020

Right. I think my enthusiasm for this one has waned somewhat.

Due to staff changes in my team I don’t currently have the spare capacity I did so, even if we knew where we wanted to go with it, I’d struggle to justify the time any time soon. So I think it’s time to call it a day.

Thanks @pp-mo for all your advice. I learned a lot, so you can still chalk it up against your teaching objectives 👍

@rcomer rcomer closed this Aug 2, 2020
@rcomer rcomer deleted the cubelist-contain-only-cubes branch June 28, 2022 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants