New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621
Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621
Conversation
cbcc2fa
to
8bf71d5
Compare
return ancestorsOf(snapshotId, lookup, true); | ||
} | ||
|
||
public static Iterable<Snapshot> ancestorsOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method only exists so that we have consistent error reporting in
.hasMessage("Invalid table metadata: Cannot find current version"); |
8bf71d5
to
3452888
Compare
3452888
to
79ed04e
Compare
// we don't have any other branches. | ||
if (this.formatVersion == 1 | ||
&& refs.containsKey(SnapshotRef.MAIN_BRANCH) | ||
&& refs.size() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason that we need to clear out newer snapshots for v1 tables. I think we were just trying to avoid the case where a newer sequence number causes a failure. It should be fine for a v1 table to have newer snapshots, since extra snapshots don't affect the current table state or branching. It's probably safer just to keep them all.
79ed04e
to
a1b5c68
Compare
a1b5c68
to
7daa002
Compare
* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029) * Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028) * Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027) * Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036) * Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039) * Core: Handle optional fields (apache#8050) (apache#8064) * Core: Handle allow optional fields We expect: - current-snapshot-id - properties - snapshots to be there, but they are actually optional. * Use AssertJ * Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060) * Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059) * Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069) * Spark: Add actions for disater recovery. * Fix the compile error. * Fix merge conflicts and formatting * All tests are working and code integrated with Spark 3.3 * Fix union error and snapshots test * Fix Spark broadcast error * Add RewritePositionDeleteFilesSparkAction --------- Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com> Co-authored-by: Szehon Ho <szehon.apache@gmail.com> Co-authored-by: Yufei Gu <yufei_gu@apple.com> Co-authored-by: yufeigu <yufei@apache.org> Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com> Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>
I noticed that not all relevant snapshots are being retained with a V1 table and when using branches. I added a test that reproduces this issue and would fail with
This is because with V1 tables we removed non-relevant snapshots only based on timestamp.