LeafletFinder issues #76

orbeckst · 2018-10-30T01:19:53Z

Very quick issue with varying things that came up during PR #66; apologies for the messy report. See PR #81 for initial (failing) tests.

n_jobs

LeafletFinder with n_jobs == 2 does not pass tests, see #66 (comment)

Currently XFAIL in update to dask 0.18.0 #66 and master

distributed

LeafletFinder with scheduler as distributed.client fails , see also started PR #81.

______________________________________________ TestLeafLet.test_leaflet_single_frame[distributed-2-1] _______________________________________________

self = <test_leaflet.TestLeafLet object at 0xd26ea5fd0>, u_one_frame = <Universe with 5040 atoms>
correct_values_single_frame = [array([   1,   13,   25,   37,   49,   61,   73,   85,   97,  109,  121,
        133,  145,  157,  169,  181,  193,  ..., 4477, 4489,
       4501, 4513, 4525, 4537, 4549, 4561, 4573, 4585, 4597, 4609, 4621,
       4633, 4645, 4657, 4669])]
n_jobs = 1, scheduler = <Client: scheduler='tcp://127.0.0.1:56156' processes=2 cores=4>

    @pytest.mark.parametrize('n_jobs', (-1, 1, 2))
    def test_leaflet_single_frame(self,
                                  u_one_frame,
                                  correct_values_single_frame,
                                  n_jobs,
                                  scheduler):
        lipid_heads = u_one_frame.select_atoms("name PO4")
        u_one_frame.trajectory.rewind()
        leaflets = leaflet.LeafletFinder(u_one_frame,
                                         lipid_heads).run(start=0, stop=1,
                                                          n_jobs=n_jobs,
>                                                         scheduler=scheduler)

/Volumes/Data/oliver/Biop/Projects/Methods/MDAnalysis/pmda/pmda/test/test_leaflet.py:67:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Volumes/Data/oliver/Biop/Projects/Methods/MDAnalysis/pmda/pmda/leaflet.py:295: in run
    cutoff=cutoff)
/Volumes/Data/oliver/Biop/Projects/Methods/MDAnalysis/pmda/pmda/leaflet.py:205: in _single_frame
    Components = parAtomsMap.compute(**scheduler_kwargs)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/dask/base.py:155: in compute
    (result,) = compute(self, traverse=False, **kwargs)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/dask/base.py:392: in compute
    results = schedule(dsk, keys, **kwargs)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/client.py:2308: in get
    direct=direct)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/client.py:1647: in gather
    asynchronous=asynchronous)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/client.py:665: in sync
    return sync(self.loop, func, *args, **kwargs)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/utils.py:277: in sync
    six.reraise(*error[0])
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/six.py:693: in reraise
    raise value
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/utils.py:262: in f
    result[0] = yield future
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/tornado/gen.py:1099: in run
    value = future.result()
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/tornado/gen.py:1107: in run
    yielded = self.gen.throw(*exc_info)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/client.py:1492: in _gather
    traceback)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/six.py:692: in reraise
    raise value.with_traceback(tb)
/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/protocol/pickle.py:59: in loads
    return pickle.loads(x)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   for k in _ANCHOR_UNIVERSES.keys()])))
E   RuntimeError: Couldn't find a suitable Universe to unpickle AtomGroup onto with Universe hash 'f065a285-b5d1-44db-a2e9-c1de8b73c716'.  Available hashes:

/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/MDAnalysis/core/groups.py:127: RuntimeError
--------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------
distributed.worker - WARNING - Could not deserialize task
Traceback (most recent call last):
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/MDAnalysis/core/groups.py", line 119, in _unpickle
    u = _ANCHOR_UNIVERSES[uhash]
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/weakref.py", line 137, in __getitem__
    o = self.data[key]()
KeyError: UUID('f065a285-b5d1-44db-a2e9-c1de8b73c716')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/worker.py", line 1387, in add_task
    self.tasks[key] = _deserialize(function, args, kwargs, task)
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/worker.py", line 801, in _deserialize
    function = pickle.loads(function)
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
  File "/Users/oliver/anaconda3/envs/pmda/lib/python3.6/site-packages/MDAnalysis/core/groups.py", line 127, in _unpickle
    for k in _ANCHOR_UNIVERSES.keys()])))
RuntimeError: Couldn't find a suitable Universe to unpickle AtomGroup onto with Universe hash 'f065a285-b5d1-44db-a2e9-c1de8b73c716'.  Available hashes:

(complete error message from pytest)

The text was updated successfully, but these errors were encountered:

orbeckst · 2018-10-30T01:24:09Z

These things came up in the context of me trying out various tests in #66. We need to test these problems separately here, just to make sure it's not something from me running the tests incorrectly.

iparask · 2018-10-30T01:31:53Z

Okay let me check them and I will get back at you.

VOD555 · 2018-11-01T23:21:42Z

@orbeckst @iparask It's due to line 74 in leaflet.py

        self._atomgroup = atomgroups

distributed.client cannot pickle atomgroup or universe. Currently, the only way is to avoid using atomgroup or universe type self attribute.

EDIT (@orbeckst ): see #79

kain88-de · 2018-11-04T11:05:21Z

So mdanalysis technically supports pickle and unpickle. We never documented how they should be used though. @richardjgowers @jbarnoud

iparask · 2018-11-05T16:52:19Z

Hello @orbeckst @kain88-de,

conserning the distributed issue, I could use a deep copy and essentially create a new numpy object in memory for the atomgroups. Although, I think that a solution such as the one in #65 is more reasonable. Any preference on this?

The reason for the first error is that the number of atoms that are present are not dividable with the number of processes (see leaflet.py#L192). There are two things, I can think of doing here:

Reduce the number of partitions during runtime. That means that on the fly I would find a number of n_jobs that divide the number of atoms and be as close as possible to what the user has selected. This would also mean that the cluster utilization will drop.
Introduce dummy atoms (by copying one of them) so that the number of atoms become dividable by n_jobs and filter them out during the reduce phase.
Any preference here?

orbeckst · 2018-11-05T17:33:21Z

AtomGroups

If we can use pickling of AGs then that would be great. Otherwise the approach in the standard serial version should work, whereby you

communicate index arrays
re-build the universe
re-create the index group by slicing the new universe

However, come to think, that will be awful for performance because you would be doing this for every frame. So scratch that idea.

Can you write it such that only coordinates are communicated to the dask workers? Numpy arrays are not problematic.

Perhaps @VOD555 and @kain88-de have some better ideas.

n_jobs

Have the partitions got to be the same size? Is it not possible to have some that have different sizes?

Changing n_jobs, at least if the default is to use as many workers as cores, might lead to unexpected performance degradation. There was some discussion on related matters in #71 (and associated PR #75).

If possible, unequal partition sizes would be my preferred solution, followed by dummies. Alternatively, oversubscribing workers might also help but I'd be interested in seeing performance data.

jbarnoud · 2018-11-05T17:44:36Z

I'll have a look at the pickling, see if I can recall how it works. But I never really needed to use it. @mnmelo is probably one who knows the most about it, though. Ping?

kain88-de · 2018-11-05T18:02:14Z

For your n_jobs problem you can also use [make balanced slices] https://github.com/MDAnalysis/pmda/blob/master/pmda/util.py#L62 . It solves the same problem with our standard classes problem.

…

On Mon, Nov 5, 2018 at 6:44 PM Jonathan Barnoud ***@***.***> wrote: I'll have a look at the pickling, see if I can recall how it works. But I never really needed to use it. @mnmelo <https://github.com/mnmelo> is probably one who knows the most about it, though. Ping? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#76 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEGnVssElasUmsHIxLsRbhq9U_S5pj9Aks5usHkEgaJpZM4YA_kn> .

richardjgowers · 2018-11-05T19:25:06Z

Pickling should work like:

u = mda.Universe(...., anchor_name='this')
# make a pickle of each atomgroup
pickles = [pickle.dumps(ag) for ag in atomgroups]


# In parallel processes
# make a Universe with the same anchor_name
# this only has to happen once per worker, so could be done using `init_func` in multiprocessing
u = mda.Universe(....., anchor_name='this')

ags = [pickle.loads(s) for s in pickles]

orbeckst · 2019-05-07T20:23:11Z

@iparask could have a look at this issue again? It would be good to have this fixed for the SciPy paper.

orbeckst added the bug label Oct 30, 2018

orbeckst assigned iparask Oct 30, 2018

orbeckst mentioned this issue Oct 30, 2018

update to dask 0.18.0 #66

Merged

4 tasks

VOD555 mentioned this issue Nov 1, 2018

distributed.client error with atomgroup or universe type self attribute #79

Open

This was referenced Nov 2, 2018

fixes for leafletfinder #80

Closed

LeafletFinder should work with distributed scheduler #81

Open

orbeckst mentioned this issue Jul 15, 2020

PMDA with refactored _single_frame #128

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeafletFinder issues #76

LeafletFinder issues #76

orbeckst commented Oct 30, 2018 •

edited

Loading

orbeckst commented Oct 30, 2018

iparask commented Oct 30, 2018

VOD555 commented Nov 1, 2018 •

edited by orbeckst

Loading

kain88-de commented Nov 4, 2018

iparask commented Nov 5, 2018

orbeckst commented Nov 5, 2018

jbarnoud commented Nov 5, 2018

kain88-de commented Nov 5, 2018 via email

richardjgowers commented Nov 5, 2018

orbeckst commented May 7, 2019

LeafletFinder issues #76

LeafletFinder issues #76

Comments

orbeckst commented Oct 30, 2018 • edited Loading

n_jobs

distributed

orbeckst commented Oct 30, 2018

iparask commented Oct 30, 2018

VOD555 commented Nov 1, 2018 • edited by orbeckst Loading

kain88-de commented Nov 4, 2018

iparask commented Nov 5, 2018

orbeckst commented Nov 5, 2018

AtomGroups

n_jobs

jbarnoud commented Nov 5, 2018

kain88-de commented Nov 5, 2018 via email

richardjgowers commented Nov 5, 2018

orbeckst commented May 7, 2019

orbeckst commented Oct 30, 2018 •

edited

Loading

VOD555 commented Nov 1, 2018 •

edited by orbeckst

Loading