-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Introspect partition keys and values in fragments. #33825
Comments
There is |
That's an interesting broader use case. Can you add a new GH issue for the broader ask of a simpler way to do aggregations on partitions? |
Yes, we should probably do that (we did the same for Also @coady feel free to already use the current "private" method. It's private because it was thought to not be really user-facing, we know it is used (eg dask uses it as well), so we promise some stability for it. |
Something related that might be worth to mention (not that it solves your exact use case here though): there is also a |
…blicly (get key/value from partition expression)
Opened #33862 to make it public. |
… (get key/value from partition expression) (#33862) #### Rationale for this change We have an existing "semi-private" `pyarrow.dataset._get_partition_keys` function (to get the partitioning field's key/value from the partition expression of a certain fragment). This is used by external projects (eg dask), and generally useful for advanced users, so let's just make it public. * Closes: #33825 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…blicly (get key/value from partition expression) (apache#33862) #### Rationale for this change We have an existing "semi-private" `pyarrow.dataset._get_partition_keys` function (to get the partitioning field's key/value from the partition expression of a certain fragment). This is used by external projects (eg dask), and generally useful for advanced users, so let's just make it public. * Closes: apache#33825 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…blicly (get key/value from partition expression) (apache#33862) #### Rationale for this change We have an existing "semi-private" `pyarrow.dataset._get_partition_keys` function (to get the partitioning field's key/value from the partition expression of a certain fragment). This is used by external projects (eg dask), and generally useful for advanced users, so let's just make it public. * Closes: apache#33825 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Describe the enhancement requested
It's not possible to programmatically determine the values of partition keys in a fragment. Fragments have a
partition_expression
attribute, but theExpression
type doesn't allow any further introspection. I don't want to have to parse the string representation of the expression.My broader use case is more performant (speed and memory) aggregation of partitioned data. Using
pc._group_by
requires loaded arrays, so it ignores that the data is already partitioned. And iteratingget_fragments
is crippled if one can't identify the fragment.Component(s)
Python
The text was updated successfully, but these errors were encountered: