You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If TRUE, the schemas of the tables will be first unified with fields of the same name being merged, then each table will be promoted to the unified schema before being concatenated. Otherwise, all tables should have the same schema.
I tried the promote_options="permissive" which does not seem to do the trick.
It will have ArrowTypeError: struct fields don't match or are in the wrong order: Input fields: struct<k1: null, k3: null> output fields: struct<k1: int64, k2: int64, k3: string>
in the code you only set the field_merge_options which is option of unify_schema().
I think you either should initialize the unify_schemas to true, oh you need to actually handle the keyword unify_schemas. Otherwise, the field merge options actually not in effect at all.
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered:
AlenkaF
changed the title
pyarrow.concat_tables did not really enable the unify_schemas, it only supply unify_schemas options but not actually enable it
[Python] pyarrow.concat_tables did not really enable the unify_schemas, it only supply unify_schemas options but not actually enable it
Nov 27, 2023
Looking at the last PR that was implementing the type promotion for concating pyarrow/Arrow Tables (see #36845) that there is a structure of which types can be promoted and how:
@AlenkaF Thanks! I get it. So the options.unify_schemas is automatically set to true if the promote_options is set.
And it is unfortunate that the struct type is not handled here
Describe the bug, including details regarding any error messages, version, and platform.
Describe the enhancement requested
in R there is
concat_tables(..., unify_schemas = TRUE)
(https://arrow.apache.org/docs/r/reference/concat_tables.html)In python this function does not allow concat table with different schema.
I tried the promote_options="permissive" which does not seem to do the trick.
It will have
ArrowTypeError: struct fields don't match or are in the wrong order: Input fields: struct<k1: null, k3: null> output fields: struct<k1: int64, k2: int64, k3: string>
I expect this will be just ok
We can see the bool field is already defined in there, but there is no way to set value in the actual function
arrow/python/pyarrow/table.pxi
Line 5169 in f98a132
in the code you only set the field_merge_options which is option of unify_schema().
I think you either should initialize the unify_schemas to true, oh you need to actually handle the keyword unify_schemas. Otherwise, the field merge options actually not in effect at all.
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: