New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does ddf.pipe() make sense? #1555
Comments
I'm not sure I understand the concern here. How would this be more confusing for dask.dataframe than for pandas? |
I think this makes sense too.
Would dask expect the function to be a delayed function? Or would |
consider this scenario: import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({'x': [1, 2, 3, 4, 5],
'y': [1., 2., 3., 4., 5.]})
ddf = dd.from_pandas(df, npartitions=2)
def f(df):
return df.x.sum()
ddf.pipe(f).compute()
while pandas
What should be the correct behavior here? |
I think that dask.dataframe should almost always follow pandas semantics |
in that case, it makes sense for the input function to be a delayed method. Thoughts?
|
I would literally copy the implementation of
|
Often times, i end up writing a function that takes in a
dask.dataframe
. pandas implementspd.pipe(func)
that i find pretty convenient.This is a pretty easy to implement but i think pipe may be pretty confusing in
dask.dataframe
world especially if someone tries to do columnar reductions. Thoughts on if it makes sense to implement?The text was updated successfully, but these errors were encountered: