Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe #12311

Closed
wants to merge 10 commits into from

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Feb 1, 2022

This PR tries to correct the roundtrip of an empty pandas.DataFrame with RangeIndex (so no columns, but a non-zero shape for the rows) by adding a check for empty columns and a pandas.RangeIndex in the from_arrays method called from from_pandas and then creating an empty table with schema and num_rows.

@github-actions
Copy link

github-actions bot commented Feb 1, 2022

python/pyarrow/tests/test_table.py Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 2, 2022

I have split the code, hope it makes sense.
I also added the same logic for RecordBatch as they are both using dataframe_to_arrays in from_pandas.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

python/pyarrow/pandas_compat.py Outdated Show resolved Hide resolved
python/pyarrow/pandas_compat.py Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank!

@ursabot
Copy link

ursabot commented Feb 7, 2022

Benchmark runs are scheduled for baseline = 4144c17 and contender = bd35629. bd35629 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.68% ⬆️0.04%] test-mac-arm
[Finished ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.26% ⬆️0.26%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@AlenkaF AlenkaF deleted the ARROW-10643 branch February 9, 2022 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants