Skip to content

BUG: Writes to DataFrame.attrs are not preserved #7401

Open
@noloerino

Description

@noloerino

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas as pd
df.attrs["x"] = 1
df.attrs  # attrs dict is still empty

Issue Description

DataFrame.attrs lets users specify metadata on frames that are deep-copied to new dataframes when operations are performed. In Modin, attrs defaults to pandas, but this means that any writes to it are not reflected in the original frame, much less any other operations.

When a write to attrs is attempted, it only modifies the attrs field of the native pandas.DataFrame that's produced within DataFrame._default_to_pandas, and the modin.pandas.DataFrame has no knowledge of this operation.

Expected Behavior

Writes to attrs are reflected in subsequent read operations, and propagated across operations.

Error Logs

Replace this line with the error backtrace (if applicable).

Installed Versions

INSTALLED VERSIONS

commit : 1c4d173
python : 3.10.13.final.0
python-bits : 64
OS : Darwin
OS-release : 23.6.0
Version : Darwin Kernel Version 23.6.0: Mon Jul 29 21:13:04 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.32.0+6.g1c4d173d
ray : 2.34.0
dask : 2024.8.1
distributed : 2024.8.1

pandas dependencies

pandas : 2.2.2
numpy : 1.26.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.3
Cython : None
pytest : 8.3.2
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.3.0
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.4
IPython : 8.17.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.5.0
fsspec : 2024.6.1
gcsfs : None
matplotlib : 3.9.2
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : 0.23.1
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.14.1
sqlalchemy : 2.0.32
tables : 3.10.1
tabulate : None
xarray : 2024.7.0
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Minor bugs or low-priority feature requestsbug 🦗Something isn't workingpandas concordance 🐼Functionality that does not match pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions