-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1997: [C++/Python] Ignore zero-copy-option in to_pandas when strings_to_categorical
is True
#1480
ARROW-1997: [C++/Python] Ignore zero-copy-option in to_pandas when strings_to_categorical
is True
#1480
Conversation
strings_to_categorical
is Truestrings_to_categorical
is True
@Licht-T The code looks good but I fail to understand the initial problem and thus cannot really understand what the change should actually do. Can you explain it a bit more? |
Thanks @xhochy, added the PR comment! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I get it 👍
ss << "Needed to copy " << data.num_chunks() << " chunks with " | ||
<< indices_first.null_count() << " indices nulls, but zero_copy_only was True"; | ||
if (needs_copy_) { | ||
ss << "Zero-copy is not allowed, but zero_copy_only was True"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a unit test that hits this code path, or is there a test already that does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is a little bit unclear, I will tweak
be58af6
to
c1bc353
Compare
Thanks @wesm, added tests! |
Change-Id: I30da415b6685795994ffe280cf531bf5110de9a7
+1. Will merge once the build is passing |
This closes ARROW-1997.
The problem is
When
strings_to_categorical=True
, the categorical index is newly created into_pandas
procedure.But, this passes data to Python by zero-copy, so the array is deallocated.
https://github.com/Licht-T/arrow/blob/be58af6dd0333652abbe2333ee5968df3f2e371f/cpp/src/arrow/python/arrow_to_pandas.cc#L1040