-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark DataFrame write fails if input dataframe has columns in different order than iceberg schema #741
Comments
Was this resolved in #745 ? |
Hello, is this issue resolved? I am still getting this issue in iceberg 1.4.2 while trying to write in iceberg format to ADLS using spark-streaming. |
It was actually resolved earlier but I overlooked. To ignore the column ordering, add the spark write config " |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
For this test case, https://github.com/apache/incubator-iceberg/blob/6f28abfa62838d531be4faa93273965665af933d/spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java
if I replace https://github.com/apache/incubator-iceberg/blob/6f28abfa62838d531be4faa93273965665af933d/spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java#L135 with
df.select("data", "id").write()
the test case fails with below error,
Cannot write incompatible dataset to table with schema:
table {
1: id: optional int
2: data: optional string
}
Problems:
java.lang.IllegalArgumentException: Cannot write incompatible dataset to table with schema:
table {
1: id: optional int
2: data: optional string
}
Problems:
However if I set checkOrdering to false in here, https://github.com/apache/incubator-iceberg/blob/949c6a98ac80acec10568070772082c1178eb739/api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java
Result rows should match expected:<[{"id"=1,"data"="a"}, {"id"=2,"data"="b"}, {"id"=3,"data"="c"}, {"id"=4,"data"="null"}]> but was:<[{"id"=1,"data"=""}, {"id"=2,"data"=""}, {"id"=3,"data"=""}, {"id"=4,"data"="�"}]>
Expected :[{"id"=1,"data"="a"}, {"id"=2,"data"="b"}, {"id"=3,"data"="c"}, {"id"=4,"data"="null"}]
Actual :[{"id"=1,"data"=""}, {"id"=2,"data"=""}, {"id"=3,"data"=""}, {"id"=4,"data"="�"}]
this is because PartitionSpec accessors are being built out of iceberg schema. If we update the code to build the accessors from the input schema, then the re-order test case passes.
this is shown in below PR
#745
We are trying to understand if there is any specific reason to set checkOrdering to false by default and not expose it as a parameter and build the accessors in PartitionSpec from table schema instead of input schema.
And If possible , we would like to enable checkOrdering as a configurable parameter so that it can be turned off and write jobs do not have to use the same ordering as Iceberg Table.
The text was updated successfully, but these errors were encountered: