-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow.concat_tables did not really enable the unify_schemas, it only supply unify_schemas options but not actually enable it #38809
Comments
Looking at the last PR that was implementing the type promotion for concating pyarrow/Arrow Tables (see #36845) that there is a structure of which types can be promoted and how: In Python promote options are passed to arrow/python/pyarrow/table.pxi Lines 5251 to 5253 in eb5de18
but note that unify_schemas=True in pa.concat_tables([t1, t2], unify_schemas=True, promote_options="permissive") doesn't have any effect.
With the I am not sure but at first look it seems to me that struct type is not handled. I suggest looking at the python tests in the linked PR to see what kind of type promotion is possible in |
@AlenkaF Thanks! I get it. So the options.unify_schemas is automatically set to true if the promote_options is set. |
I'm also hitting this issue, might provide a fix when I get the time |
Hi @Fokko
any update on this? I guess the key is to implement MergeStructTypes in type.cc some thinking about this feature:
|
Describe the bug, including details regarding any error messages, version, and platform.
Describe the enhancement requested
in R there is
concat_tables(..., unify_schemas = TRUE)
(https://arrow.apache.org/docs/r/reference/concat_tables.html)In python this function does not allow concat table with different schema.
I tried the promote_options="permissive" which does not seem to do the trick.
It will have
ArrowTypeError: struct fields don't match or are in the wrong order: Input fields: struct<k1: null, k3: null> output fields: struct<k1: int64, k2: int64, k3: string>
I expect this will be just ok
We can see the bool field is already defined in there, but there is no way to set value in the actual function
arrow/python/pyarrow/table.pxi
Line 5169 in f98a132
in the code you only set the field_merge_options which is option of unify_schema().
I think you either should initialize the unify_schemas to true, oh you need to actually handle the keyword unify_schemas. Otherwise, the field merge options actually not in effect at all.
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: