Skip to content

Conversation

ogrisel
Copy link
Contributor

@ogrisel ogrisel commented May 31, 2018

This requires: joblib/joblib#689 to work efficiently but should not crash or degrade performance when used with an older joblib.

@ogrisel ogrisel requested a review from jcrist May 31, 2018 22:51
@ogrisel
Copy link
Contributor Author

ogrisel commented May 31, 2018

@TomAugspurger if you have a chance it would be great to update the docker image of your pangeo instance to test nested parallelism with this new feature (this requires upgrading the joblib branch too).

@ogrisel
Copy link
Contributor Author

ogrisel commented Jun 1, 2018

I ran some tests on @TomAugspurger's pangeo instance with 30 workers (randomized search for a random forests) and everything looks fine: the CPUs get saturated and there is very little white space in the task stream view of the diagnostics page of the dask scheduler.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jun 1, 2018

I also checked that all the futures are garbage collected by the end of the grid search: client.futures is empty.

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems fine to me.

return False


class _WeakKeyDictionary:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to subclass from MutableMapping, which would turn this into a full dict interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to implement __delitem__ (easy) and __iter__ not as simple but we have no use for those. So I don't think it's worth it.

if f is None and arg_id in self.data_futures:
f = self.data_futures[arg_id]

if f is None and call_data_futures is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be elif f is None... instead?

try:
f = call_data_futures[arg]
except KeyError:
if (call_data_futures is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_data_futures will always be not None due to the if check above, this check is not needed.

f = call_data_futures[arg]
except KeyError:
if (call_data_futures is not None
and is_weakrefable(arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, _WeakKeyDictionary can handle non-weakrefable objects, so this check is not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's an oversight on my side when implementing the _WeakKeyDictionary. I don't think it's possible to safely handle non-weakrefable objects like dict instances: if the id of a recently collected dict argument is reused we will compute the wrong thing. I will make the _WeakKeyDictionary refuse to accept non-weakrefable keys.

# The explicit call to clear is required to break a cycling reference
# to the futures.
self.call_data_futures.clear()
self.call_data_futures = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set this to None? If call_data_futures always exists and is just cleared after running, then you never need to check if it's None below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jun 1, 2018

@jcrist I think I addressed your comments. Let's see of the CI likes it.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jun 1, 2018

I think the travis failure on test_administration is unrelated to this PR.

@mrocklin
Copy link
Member

mrocklin commented Jun 1, 2018 via email

@TomAugspurger
Copy link
Member

CI seems green now.

@mrocklin
Copy link
Member

mrocklin commented Jun 2, 2018

I think @jcrist gave a +1 in person. Merging this now.

@mrocklin mrocklin merged commit a5db7c9 into dask:master Jun 2, 2018
@ogrisel ogrisel deleted the auto-scatter branch June 4, 2018 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants