New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index #7822
Conversation
python/pyarrow/pandas_compat.py
Outdated
@@ -1009,12 +1009,15 @@ def _is_generated_index_name(name): | |||
return re.match(pattern, name) is not None | |||
|
|||
|
|||
# ARROW-9096: added integer and floating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: generally only have comment like this for TODOs. git blame/git log can track when things were added.
python/pyarrow/pandas_compat.py
Outdated
@@ -1080,8 +1083,10 @@ def _reconstruct_columns_from_metadata(columns, column_indexes): | |||
] | |||
|
|||
# Convert each level to the dtype provided in the metadata | |||
# ARROW-9096: need numpy_type to match cast against original DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop JIRA reference. maybe also the whole comment since this is used directly below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense - it's easy to see what's going on w/o this line so dropped it as suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
thanks @wesm @emkornfield for reviewing!! |
…numeric column index - [x] closes ARROW-9096 - [x] tests added & passed This PR fixes the roundtrip conversion for Pandas DataFrames whose column index is numeric but has `dtype=object`, such as ``` df = pd.DataFrame([1], columns=pd.Index([1], dtype=object)) # underlying int df = pd.DataFrame([1], columns=pd.Index([1.1], dtype=object)) # underlying float df = pd.DataFrame([1], columns=pd.Index([datetime(2018, 1, 1)], dtype='object')) # underlying datetime ``` https://issues.apache.org/jira/browse/ARROW-3651 largely solved the datetime variant of this problem (such that the conversion ran correctly excepting that the dtype after roundtrip did not match). With the current fix a roundtrip of the problematic DataFrames from ARROW-3651 returns the exact original frame. Closes apache#7822 from arw2019/ARROW-9096 Authored-by: arw2019 <andrew.r.wieteska@gmail.com> Signed-off-by: Wes McKinney <wesm@apache.org>
This PR fixes the roundtrip conversion for Pandas DataFrames whose column index is numeric but has
dtype=object
, such ashttps://issues.apache.org/jira/browse/ARROW-3651 largely solved the datetime variant of this problem (such that the conversion ran correctly excepting that the dtype after roundtrip did not match). With the current fix a roundtrip of the problematic DataFrames from ARROW-3651 returns the exact original frame.