New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask Dataframe index renaming doesn't work in place #8082
Comments
Thanks for repoting @marcelned! I'm able to reproduce. My guess is that since we regenerate Lines 480 to 489 in ab69fe1
There's some underlying state that |
Using the following script to aid in finding the offending/missing logic: # test_script.py
import pandas as pd
import dask
import dask.dataframe as dd
df = pd.DataFrame(
{
"Animal": ["Falcon", "Falcon", "Parrot", "Parrot"],
"Max Speed": [380.0, 370.0, 24.0, 26.0],
}
)
ddf = dd.from_pandas(df, npartitions=2)
if __name__ == "__main__":
# dask dataframe index inplace rename test
ddf.index.rename("bar", inplace=True)
try:
assert ddf.index.name == "foo"
except AssertionError:
print(f"{ddf.index.name} != foo")
# dask series inplace rename test
ddf["Animal"].rename("Bird", inplace=True)
try:
assert ddf["Animal"].name == "Bird"
except AssertionError:
print(f'{ddf["Animal"].name} != Bird') output: ❯ python test_script.py
None != foo
Animal != Bird So it isn't a problem with only the index, but renaming Dask Series as a whole. Tracing the stack reveals that the name is indeed changed, but the new renamed Dask Series object gets passed to the ether: Line 3157 in ab69fe1
I'm not too knowledgeable on the exact workings of core Dask to know how to fix this, but one could argue that removing this parameter is a possible solution, unless there is some sort of performance gain that users could benefit from by using an inplace renaming mechanism.. |
Whoa I had no idea that a series could have a different name than its column name. For anyone else reading this, this is what pandas does: import pandas as pd
df = pd.DataFrame(
{
"Animal": ["Falcon", "Falcon", "Parrot", "Parrot"],
"Max Speed": [380.0, 370.0, 24.0, 26.0],
}
)
df["Animal"].rename("Bird", inplace=True)
print(df["Animal"])
# 0 Falcon
# 1 Falcon
# 2 Parrot
# 3 Parrot
# Name: Bird, dtype: object I think you are right to bring up the option of removing the |
Incidentally the inplace rename does work for a series in dask. The issue you are running into is that when you access the series by getting it off the dataframe like Line 3993 in e1974bf
I still like your idea of not allowing inplace as an option for rename, just trying to give more context about what's going on. |
* Deprecate 'inplace' argument for dask series renaming See issue #8082 * Remove inplace dataframe renaming equality test * Formatted commits with black
Closed by #8136 |
Inplace
parameter doesn't behave as expected:The text was updated successfully, but these errors were encountered: