Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheritance of Purity #2073

Closed
jrmlhermitte opened this issue Mar 13, 2017 · 9 comments
Closed

Inheritance of Purity #2073

jrmlhermitte opened this issue Mar 13, 2017 · 9 comments
Milestone

Comments

@jrmlhermitte
Copy link
Contributor

jrmlhermitte commented Mar 13, 2017

(As discussed here: http://stackoverflow.com/questions/42773134/inheritance-of-purity/42773643)

I have a question about inheritance of function purity. For example, consider this case:

In [1]: from dask import delayed

In [2]: myArr = delayed(np.ones, pure=True)((10,10))

In [3]: myArr
Out[3]: Delayed('ones-850ff41a84b309775d01d5f3e5c4d1c4')

In [4]: myArr = delayed(np.ones, pure=True)((10,10))

In [5]: myArr
Out[5]: Delayed('ones-850ff41a84b309775d01d5f3e5c4d1c4')

In [6]: myArr.shape
Out[6]: Delayed('getattr-de14f312-2605-4fc4-a419-c376520f73b4')

In [7]: myArr.shape
Out[7]: Delayed('getattr-9e547a62-0bc6-43e5-8d3c-cac532b73511')

In [8]: delayed(getattr, pure=True)(myArr, 'shape')
Out[8]: Delayed('getattr-5224be928bad33c9778d4f96d610cc37')

In [9]: delayed(getattr, pure=True)(myArr, 'shape')
Out[9]: Delayed('getattr-5224be928bad33c9778d4f96d610cc37')

I would expect line [6] and [7] to yield the same key, since they are accessing an attribute from a delayed instance declared pure. However, it is not the case, and I must explicitly declare the getattr method to be pure for this to work, as seen in lines [8] and [9].

As discussed in the stackoverflow link, this seems to be purely an issue of dask semantics. I would be inclined to voting for purity to be inherited. But I am a new user, so I would appreciate feedback/comments on this. Thanks!

PS: The eventual use case is for dask distributed. It seems that the way purity is treated in this case may possibly be different than the case listed here.

@shoyer
Copy link
Member

shoyer commented Mar 14, 2017

delayed(getattr,pure=True)(myArr, 'shape',pure=True) looks like one more pure=True than necessary -- are both of those really necessary?

@jrmlhermitte
Copy link
Contributor Author

oops yes, that was a typo, it still ran since it's treated as a kwarg to a delayed object. I corrected it, thanks!

@jrmlhermitte
Copy link
Contributor Author

For the record, getitem seems to be forced as pure=True, as opposed to getattr:

In [6]: myArr = delayed(np.ones, pure=False)((10,10))

In [7]: myArr[0]
Out[7]: Delayed('getitem-f7b0bc2869f1fe6c1221edc2aaac0a73')

In [8]: myArr[0]
Out[8]: Delayed('getitem-f7b0bc2869f1fe6c1221edc2aaac0a73')

Would it make sense instead of choosing purity from an absolute standpoint to choose it from a relative inherited standpoint? I don't think getitem will always necessary yield something pure. (For example, could be an object with a simulated set of random images fed into some image processing algorithm that runs on image sequences, in a loop.) This may be opening a can of worms. Any discussion/feedback appreciated!

@mrocklin
Copy link
Member

Just to be clear on terms here, a delayed object does not currently know if it is "pure" or not. Only delayed functions can be labeled as pure. I suspect that the confusion here arises because we use the function delayed both to mark a value as part of a graph and to decorate a function.

Even if we kept the label around it's not clear that the attributes of "pure" objects would themselves be pure.

@jcrist
Copy link
Member

jcrist commented Mar 14, 2017

I'd be fine with the following:

  • There's no concept of a "pure" object. Nested attribute accesses, function calls, etc... make this too hard to reason about. This should be clarified in the docs as needed.
  • Attribute access is always pure. We already assume that all "magic" methods on delayed objects are pure, so we shouldn't make __getattr__ the exception (which was unintentional).
  • Perhaps error if delayed is called on a non-callable object with pure=True to better indicate that pure=True only indicates that calling the resulting object is a pure operation.

Thoughts?

@mrocklin
Copy link
Member

delayed(obj, pure=True) is currently used as a signal that we should take the trouble to hash the object down to get the key.

@jcrist
Copy link
Member

jcrist commented Mar 14, 2017

Ah, ok, not the last one then.

jcrist added a commit to jcrist/dask that referenced this issue Mar 14, 2017
Makes attribute access pure on delayed objects, and updates docstring to
better reflect this. It is now consistent that all "magic" methods are
pure on all delayed objects. Fixes dask#2073.
@jcrist
Copy link
Member

jcrist commented Mar 14, 2017

I applied my proposal above in #2084. I think this is internally consistent, so this approach would be my vote.

jcrist added a commit to jcrist/dask that referenced this issue Mar 14, 2017
Makes attribute access pure on delayed objects, and updates docstring to
better reflect this. It is now consistent that all "magic" methods are
pure on all delayed objects. Fixes dask#2073.
jcrist added a commit to jcrist/dask that referenced this issue Mar 14, 2017
Makes attribute access pure on delayed objects, and updates docstring to
better reflect this. It is now consistent that all "magic" methods are
pure on all delayed objects. Fixes dask#2073.
@jrmlhermitte
Copy link
Contributor Author

Thanks, I vote for this as well :-). I like the elaborated documentation, very useful as well!

jcrist added a commit that referenced this issue Mar 14, 2017
Makes attribute access pure on delayed objects, and updates docstring to
better reflect this. It is now consistent that all "magic" methods are
pure on all delayed objects. Fixes #2073.
@sinhrks sinhrks added this to the 0.14.1 milestone Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants