You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
e.g., suppose x is a dask.array or dataframe that we want to verify is finite:
@dask.delayed
def assert_(condition):
assert condition
x = x.with_dependencies([assert_(da.notnull(x).all())])
Under the covers, this would create a new dask object with each task dependent on computing the dependent tasks.
Potentially, this would be quite useful for tools like xarray, so we could defer equality checks until we've built the entire graph. One concern is that this might trigger a fail case for the dask scheduler (#874) when checks inevitably require looking at the entire dataset.
res1=x.with_dependencies([assert_(da.notnull(x).all())])
# Is semantically equivalent to:defcheck(x, cond):
assertcondreturnxres2=x.map_blocks(check, da.notnull(x).all())
As you stated above, global checks will inevitably lead to computing and caching the entire array, removing any out-of-core benefits. As such, I'm reluctant to adding a method to do this when it will result in poor performance in many use cases (would rather not make it easy to do poorly performing things).
As such, I'm reluctant to adding a method to do this when it will result in poor performance in many use cases (would rather not make it easy to do poorly performing things).
Agreed. I do think this is further evidence of this how significant a shortcoming this is for the scheduler, though, because the need for this sort of pattern is quite common.
e.g., suppose
x
is a dask.array or dataframe that we want to verify is finite:Under the covers, this would create a new dask object with each task dependent on computing the dependent tasks.
Potentially, this would be quite useful for tools like xarray, so we could defer equality checks until we've built the entire graph. One concern is that this might trigger a fail case for the dask scheduler (#874) when checks inevitably require looking at the entire dataset.
This is a more general solution to #97, inspired by the corresponding design from TensorFlow:
https://www.tensorflow.org/versions/r0.10/api_docs/python/check_ops.html#assert_negative
The text was updated successfully, but these errors were encountered: