Skip to content

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Jan 18, 2017

Note "temporary" testing approach : just selected new tests, for the new code functions :
python -m unittest discover -v lib/iris/tests/integration/temp_dask

Addresses #2307

@marqh
Copy link
Member

marqh commented Jan 18, 2017

Hi @pp-mo

please may you target the dask feature branch with such changes in the short term, rather than master?

thank you

@pp-mo pp-mo changed the base branch from master to dask January 18, 2017 18:56
@pp-mo
Copy link
Member Author

pp-mo commented Jan 18, 2017

target the dask feature branch

Sorry, forgot about that !

@pp-mo
Copy link
Member Author

pp-mo commented Jan 18, 2017

WIP
I'm going to try amending the travis.yml to get it to install dask + run the temporary tests...

@pp-mo pp-mo force-pushed the dask_pp branch 4 times, most recently from 7a4e5d1 to 524775c Compare January 18, 2017 20:17
@pp-mo
Copy link
Member Author

pp-mo commented Jan 18, 2017

Selected limited testing now working !!

Hoping you like it, as I'm wanting to build on this testing strategy in further work ...

Copy link
Member

@DPeterK DPeterK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo like this! Just a few thoughts here from me.

if isinstance(other_attr, biggus.NumpyArrayAdapter):
other_attr = other_attr.concrete
self_attr = as_concrete_data(getattr(self, attr))
other_attr = as_concrete_data(getattr(other, attr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intrigued you have to concrete each attr to compare it - can dask do lazy object comparisons? Is such a thing even possible??

Copy link
Member Author

@pp-mo pp-mo Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly is possible : (self_attr == other_attr) would be a lazy object, i.e. it hasn't looked at the data yet.
(and you can further process that, index it, etc, all still being lazy).

So, I think we really want this code to be explicit that it is realising the content here.
In fact, I fully expected that you would always be required to call 'compute' + nothing else will do it (as with biggus),
however it seems that applying np.all() will realise it anyway (and so will do the compare).
In my view that is not actually nice, and may even be a bug, as it's not documented anywhere -- see comment on #2308

Copy link
Member

@bjlittle bjlittle Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkillick Yup, I can confirm that dask maintains the laziness when comparing lazy objects; it just generates another lazy dask graph, which you need to realize to get the answer. So the following gives a lazy result:

result = getattr(self, attr) == getattr(other, attr)

And the answer is realized with, for example:

>>> result.compute()
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

Or indeed, as @pp-mo suggests, the lazy dask result is made concreate by np.all:

>>> np.all(result)
True

... which is all good to know, and pretty darn cool!

# You should have received a copy of the GNU Lesser General Public License
# along with Iris. If not, see <http://www.gnu.org/licenses/>.
"""
Test lazy data handlingin iris.fileformats.pp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "handlingin" --> "handling in".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix!

@@ -1,4 +1,4 @@
# (C) British Crown Copyright 2010 - 2016, Met Office
# (C) British Crown Copyright 2010 - 2017, Met Office
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Member Author

@pp-mo pp-mo Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes, then didn't need them + reverted them all in a subsequent commit.
But then you can't revert the date without the licence check failing.
I think you can remove this when you squish (if you remember!).

cube = self.cube
raw_data = cube._my_data
lazy = cube.lazy_data()
self.assertIs(cube.lazy_data(), raw_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to check the type of lazy here too? I don't think you've checked that cube.lazy_data returns a dask object.

If you do choose to do this then this test method will get quite long, with lots of assertions, so it would be worth looking into splitting it up a bit.

Copy link
Member Author

@pp-mo pp-mo Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to be explicit about the type, having hidden details behind the iris._lazy_data interface.
It also doesn't seem right to use is_lazy_data when we are also testing that here in another class.
I'll fix it anyway : I have done that in test_load above.

self.assertIsNot(cube.lazy_data(), raw_data)
self.assertArrayAllClose(lazy.compute(), raw_data.compute())

def test_lazy_data__set(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this name implies you're going to run a set() operation on the lazy data.

Copy link
Member Author

@pp-mo pp-mo Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll rename these functions.
I thought "test_lazy_data__new_content" but its getting lengthy for the subsequent ones.
How about...

test_lazy_data__newdata
test_lazy_data__fail_newdata_bad_shape
test_lazy_data__fail_newdata_not_lazy

New thought:
TODO:
can usefully split the class into tests for ".data"; ".has_lazy_data" and ".lazy_data()".
That would add context so you can then simplify the testcase names



# A magic value, borrowed from biggus
_MAX_CHUNK_SIZE = 8 * 1024 * 1024 * 2
Copy link
Member

@bjlittle bjlittle Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo Difficult to know what this should default to when it's unknown (at this point) what type of operation is going to be performed, as the choice of chunking should really be aligned with the expected operation/use in order to be optimal (from what I understand)

Copy link
Member Author

@pp-mo pp-mo Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is obviously a preliminary.
I propose to "just not worry" about this for now !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

A :class:`biggus.Array` representing the multi-dimensional
data of the Cube.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo Should we care about updating the doc-string at this point ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I might as well...

# Unmask the array only if it is filled.
if isinstance(data, np.ndarray) and ma.count_masked(data) == 0:
if (isinstance(data, np.ma.masked_array) and
ma.count_masked(data) == 0):
Copy link
Member

@bjlittle bjlittle Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo It's not possible to put a masked array into a dask.array.Array (well, certainly one that does have its mask set) ... so is this really still valid in the new dask world? Or am I missing something here ...

Are you imagining that a user has a non-masked masked array wrapped up in a dask.array.Array ?

Copy link
Member Author

@pp-mo pp-mo Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I'm just totally ignoring the masked issue.
This code is effectively a dead branch with the latest changes, as as_concrete_data will never return a masked result.
But this will need fixing later, so I left it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed at length 😄

from iris._lazy_data import is_lazy_data, as_lazy_data, as_concrete_data


class MixinLazyTests(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo MixinLazyTestData really ... kinda, sorta, rather than tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok



class Test_as_lazy_data(MixinLazyTests, tests.IrisTest):
def test_lazy(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo This is really a test_lazy_pass_thru ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed now (coming soon)...

@pp-mo
Copy link
Member Author

pp-mo commented Jan 20, 2017

Latest hopefully addresses the outstanding review comments ?
Please re-review @dkillick @bjlittle

@marqh
Copy link
Member

marqh commented Jan 25, 2017

ok, late to the party here, but I am fully on board with this implementation, given it's target is the 'dask' feature branch

i am highly minded to merge this and work from here, e.g. in #2319

any objections to a merge, i'll aim to merge GMT am tomorrow otherwise

@QuLogic QuLogic added this to the dask milestone Feb 13, 2017
@QuLogic QuLogic added this to the dask milestone Feb 13, 2017
@pp-mo pp-mo closed this Mar 8, 2017
@QuLogic QuLogic modified the milestones: dask, v2.0 Aug 2, 2017
@pp-mo pp-mo deleted the dask_pp branch March 18, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants