Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame._data deprecation in pandas #10081

Merged
merged 3 commits into from Mar 17, 2023

Conversation

j-bennet
Copy link
Contributor

@j-bennet j-bennet commented Mar 17, 2023

Fixes test failure in upstream:

FAILED dask/dataframe/tseries/tests/test_resample.py::test_series_resample[frame-ohlc-5-h-left-right] - FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.

Xref pandas-dev/pandas#52003.

  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet! I pushed a small commit to just use ._mgr unconditionally since that appears to exist for all supported pandas versions (including dev pandas).

@phofl the deprecation warning says to use a public API instead of ._data. ._mgr works, but is still private. Is there a public API you'd recommend? Alternatively, is there a different approach we could take in this function? This function is meant to produce a unique, deterministic hash for a given pd.DataFrame

@jrbourbeau jrbourbeau changed the title NDFrame._data deprecation in pandas DataFrame._data deprecation in pandas Mar 17, 2023
@phofl
Copy link
Collaborator

phofl commented Mar 17, 2023

_mgr should be fine for now.

You could iterate over all columns of the DataFrame to collect the arrays, but this is obviously slower then accessing them directly. So not really a good idea if performance is relevant here. I guess hash_pandas_object is not what you are looking for?

@jrbourbeau
Copy link
Member

_mgr should be fine for now.

Sounds good 👍

I guess hash_pandas_object is not what you are looking for?

Ah, good point. If there was a utility in pandas that we could offload this to, and there weren't any serious performance regressions, we probably would. I'll open a separate issue for this

@jrbourbeau
Copy link
Member

xref #10083

@jrbourbeau jrbourbeau merged commit 37c7afb into dask:main Mar 17, 2023
25 checks passed
@j-bennet j-bennet deleted the j-bennet/pandas-mgr-deprecation branch March 17, 2023 15:56
@j-bennet j-bennet mentioned this pull request Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants