Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Preserve column names in DataFrame.from_records when nrows=0 #61143

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kpvenkat47
Copy link

@kpvenkat47 kpvenkat47 commented Mar 18, 2025

Description

Updates pandas/core/frame.py to preserve column names in empty DataFrames when nrows == 0. Changed from return Cls() to return Cls(columns=columns).

Closes #61140

Changes Made

  • Modified if nrows == 0 in core/frame.py.
  • Added test in tests/frame/test_constructors.py.

Testing

  • Added test case for empty DataFrame column retention.
  • Verified locally with pytest.

kpvenkat47 and others added 2 commits March 18, 2025 19:25
…if nrows == 0' to return Cls(columns=columns) in core/frame.py. - Added test to verify column preservation.
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! When making a PR, follow these steps here:

https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#making-a-pull-request

namely step 4.

Comment on lines +2 to +6
def test_empty_df_preserve_col():
rows = []
df = pd.DataFrame.from_records(iter(rows), columns=['col_1', 'Col_2'], nrows=0)
assert list(df.columns)==['col_1', 'Col_2']
assert len(df) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow the dev docs here: https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests

Namely, search the current tests for from_records and that should give a good indication of where to place this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be removed.

@rhshadrach rhshadrach added IO Data IO issues that don't fit into a more specific label Bug labels Mar 19, 2025
@kpvenkat47
Copy link
Author

I have updated the PR with the latest changes based on feedback. Please review again.

@a-holm
Copy link

a-holm commented Mar 31, 2025

Looks good! The change in pandas/core/frame.py appears to correctly address the issue of preserving column names when using DataFrame.from_records with nrows=0 and an empty iterator. The added test test_from_records_empty_iterator_with_preserve_columns in pandas/tests/frame/test_constructors.py also looks good and verifies the fix effectively.

One minor point: It looks like the test was also added as a new standalone file pandas/core/frame_test_constructors.py. This seems unintentional, duplicates the test already added in the correct test module, and places test code within the pandas/core directory. Or is it intentional?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Will also need to add to the whatsnew for v3.0.0 in the I/O section.

@@ -2780,6 +2780,12 @@ def test_construction_nan_value_timedelta64_dtype(self):
)
tm.assert_frame_equal(result, expected)

def test_from_records_empty_iterator_with_preserve_columns(self):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you start the test with a comment referencing the issue. # GH#61140

Comment on lines +2 to +6
def test_empty_df_preserve_col():
rows = []
df = pd.DataFrame.from_records(iter(rows), columns=['col_1', 'Col_2'], nrows=0)
assert list(df.columns)==['col_1', 'Col_2']
assert len(df) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be removed.

@@ -2780,6 +2780,12 @@ def test_construction_nan_value_timedelta64_dtype(self):
)
tm.assert_frame_equal(result, expected)

def test_from_records_empty_iterator_with_preserve_columns(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this test to tests/frame/constructors/test_from_records.py

def test_from_records_empty_iterator_with_preserve_columns(self):

rows = []
df = pd.DataFrame.from_records(iter(rows), columns=["col_1", "Col_2"], nrows=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call the result result instead of df.

Comment on lines +2787 to +2788
assert list(df.columns) == ["col_1", "Col_2"]
assert len(df) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these two lines, can you check the entire result.

expected = DataFrame(...)
tm.assert_frame_equal(result, expected)`

@rhshadrach rhshadrach added this to the 3.0 milestone Mar 31, 2025
@rhshadrach rhshadrach changed the title Preserve column names in empty Dataframe BUG: Preserve column names in DataFrame.from_records when nrows=0 Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.from_records() ignores columns with iterator and nrows=0
3 participants