-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34884: [Python]: Support pickling pyarrow.dataset Partitioning subclasses #36462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
|
||
cdef cppclass CSegmentEncoding" arrow::dataset::SegmentEncoding": | ||
pass | ||
bint operator==(CSegmentEncoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own curiosity, is bint
different than c_bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c_bool
comes from from libcpp cimport bool as c_bool
, while bint
is a built-in cython type (for C boolean values, i.e. int with 0/non-0 values for False/True). Not fully sure in which case which should be used (or if it matters). I think generally where we expect to pass a Python boolean (eg in signatures) we use bint
, while c_bool
is mostly used when interfacing with the C++ declared methods. Although my expectation is that cython will cast the one to the other if needed.
In this case I copied it from another place in the pxd file that was using the same pattern.
Co-authored-by: Weston Pace <weston.pace@gmail.com>
…ory objects (#36550) ### Rationale for this change #36462 already added support for pickling Partitioning objects, but not yet the PartitioningFactory objects. The problem for PartitioningFactory is that we currently don't really expose the full class hierarchy in python, just the base class PartitioningFactory. We also don't expose creating those factory objects, except through the `discover` methods of the Partitioning classes. I think it would be nice to keep this minimal binding, but that means if we want to make them serializable with pickle, we need another way to do that (and if we don't want to add custom code for serialization on the C++ side). In this PR, I went for the route of essentially storing the constructor (the discover static method) and the arguments that were passed to the constructor, on the factory object, so we can use this info for pickling. Not the nicest code, but the simplest solution I could think of. ### Are these changes tested? Yes * Closes: #34884 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ring with other type (#36661) ### Rationale for this change Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance Small follow-up on #36462 * Closes: #36659 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ring with other type (#36661) ### Rationale for this change Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance Small follow-up on #36462 * Closes: #36659 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Conbench analyzed the 6 benchmark runs on commit There were 64 benchmark results indicating a performance regression:
The full Conbench report has more details. |
… comparing with other type (apache#36661) ### Rationale for this change Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance Small follow-up on apache#36462 * Closes: apache#36659 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
… comparing with other type (apache#36661) ### Rationale for this change Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance Small follow-up on apache#36462 * Closes: apache#36659 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Rationale for this change
Add support for pickling Directory/Hive/FilenamePartitioning objects.
Does not yet actually fix the issue #34884, because this PR only addresses the actual Partitioning subclasses, and not the PartitioningFactory subclasses.
Are these changes tested?
Yes
Are there any user-facing changes?
Only new support for pickling and
==
operation.