New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that df
is immutable after from_pandas
#10383
Conversation
@@ -5592,7 +5592,7 @@ def test_attrs_series_in_dataframes(): | |||
# the attrs dict for series in a dataframe. Dask uses df.iloc[:0] | |||
# when creating the _meta dataframe in make_meta_pandas(x, index=None). | |||
# Should start xpassing when df.iloc works. Remove the xfail then. | |||
assert df.A.attrs == ddf.A.attrs | |||
assert {"unit": "KG"} == ddf.A.attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting:
df2 = df.copy()
removes the attrs from df
. Almost certainly a pandas bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is not supposed to work and the fact that this worked at all was an accident (it's related to caching in pandas, we should definitely not rely on this especially since this will change with CoW).
# the attrs dict for series in a dataframe. Dask uses df.iloc[:0] | ||
# when creating the _meta dataframe in make_meta_pandas(x, index=None). | ||
# Should start xpassing when df.iloc works. Remove the xfail then. | ||
assert df.A.attrs == ddf.A.attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double-check: We don't support attrs (anymore)? Should we document this somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but doing
df.A.attrs = ...
shouldn't work. df.A
is a temporary object in a sense, which means that it's metadata should not propagate to df
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context: This works because pandas uses caching under the hood so that
df.A
df.A
returns the same object. This is an implementation detail that we should not rely on (it will also change when Copy-on-Write becomes the default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @phofl!
pre-commit run --all-files
We had a similar problem in dask-expr