-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullPointerException after deleting old partition column #10626
Comments
Based on the stacktrace, looks like partitions table need to evaluate all historical partition specs to build the partition value |
Hi, same issues in 10234 |
@lurnagao-dahua thanks for linking that, I did not find it when searching. I see a few things possibly different here (though of course the underlying cause could be the same):
|
@mgmarino Can you check if this is still the case with 1.6? This is a known issue, and there has been some work around it. |
@Fokko Just tried with Spark 3.5.3, Iceberg 1.6.1, still the same null pointer error following a DROP column. The only recovery was to go back to the previous metadata file (as before). I'm happy to try and dig into this a little more, just let me know what I can do. |
Thanks! If you could share the stack trace of 1.6.1 that would be awesome, hope you still have it at hand. Let's see if we can get this fixed. |
Sure, no problem, I can also provide metadata files (rather via email, etc) if that would help:
|
@mgmarino If you could try to come up with a minimal example, like iceberg/core/src/test/java/org/apache/iceberg/TestPartitionSpecInfo.java Lines 124 to 145 in bedc711
|
Apache Iceberg version
1.5.2 (latest release)
Query engine
Spark
Please describe the bug 🐞
We have an iceberg table where we have changed the partitioning, going from an identity partition to hidden partitioning.
The partition specs are defined in the metadata json file:
We did this evolution quite some time ago (I can't unfortunately remember which version of Iceberg we were using at the point we changed the partitioning), and are now trying to clean up the table by removing the old
day
column. Running aDROP COLUMN
in spark (3.5.1, using Iceberg 1.5.2) succeeds, but then a subsequent read on the table, or e.g. the partitions metadata table results in:This fails in Spark, but writes/commits from Flink (1.18.1, also using Iceberg 1.5.2) also fail following this change. There the stack trace looks like:
We are using the AWS Glue Catalog to store information about the table. Here are the current table properties set:
The only way for us to recover was to force the table to point to the metadata file right before the change.
I can provide the two metadata files if that's helpful, but I would rather do that privately if possible.
This seems quite similar to #7386, the table was initially written using Iceberg 1.2.1.
Please let me know if I can provide any other information!
The text was updated successfully, but these errors were encountered: