Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

nastra · 2023-05-16T13:16:00Z

I noticed that not all relevant snapshots are being retained with a V1 table and when using branches. I added a test that reproduces this issue and would fail with

Snapshot for reference SnapshotRef{snapshotId=7532742534704478452, type=BRANCH, minSnapshotsToKeep=null, maxSnapshotAgeMs=null, maxRefAgeMs=null} does not exist in the existing snapshots list
java.lang.IllegalArgumentException: Snapshot for reference SnapshotRef{snapshotId=7532742534704478452, type=BRANCH, minSnapshotsToKeep=null, maxSnapshotAgeMs=null, maxRefAgeMs=null} does not exist in the existing snapshots list
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
	at org.apache.iceberg.TableMetadata.validateRefs(TableMetadata.java:835)
	at org.apache.iceberg.TableMetadata.ensureSnapshotsLoaded(TableMetadata.java:528)
	at org.apache.iceberg.TableMetadata.snapshots(TableMetadata.java:493)
	at org.apache.iceberg.BaseTable.snapshots(BaseTable.java:140)
	at org.apache.iceberg.rest.TestRESTCatalog.testTableSnapshotLoadingWithDivergedBranches(TestRESTCatalog.java:953)

This is because with V1 tables we removed non-relevant snapshots only based on timestamp.

core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java

nastra · 2023-05-16T14:27:45Z

core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java

+    return ancestorsOf(snapshotId, lookup, true);
+  }
+
+  public static Iterable<Snapshot> ancestorsOf(


this method only exists so that we have consistent error reporting in

iceberg/core/src/test/java/org/apache/iceberg/TestSnapshotLoading.java

Line 169 in 1521296

.hasMessage("Invalid table metadata: Cannot find current version");

for V1 and V2 tables (because we want to skip the validation of the starting snapshot id).

core/src/main/java/org/apache/iceberg/TableMetadata.java

rdblue · 2023-06-16T21:05:21Z

core/src/main/java/org/apache/iceberg/TableMetadata.java

+      // we don't have any other branches.
+      if (this.formatVersion == 1
+          && refs.containsKey(SnapshotRef.MAIN_BRANCH)
+          && refs.size() == 1) {


I don't see a reason that we need to clear out newer snapshots for v1 tables. I think we were just trying to avoid the case where a newer sequence number causes a failure. It should be fine for a v1 table to have newer snapshots, since extra snapshots don't affect the current table state or branching. It's probably safer just to keep them all.

… mode

… mode (apache#7621)

… mode (#7621) (#8027)

* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029) * Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028) * Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027) * Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036) * Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039) * Core: Handle optional fields (apache#8050) (apache#8064) * Core: Handle allow optional fields We expect: - current-snapshot-id - properties - snapshots to be there, but they are actually optional. * Use AssertJ * Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060) * Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059) * Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069) * Spark: Add actions for disater recovery. * Fix the compile error. * Fix merge conflicts and formatting * All tests are working and code integrated with Spark 3.3 * Fix union error and snapshots test * Fix Spark broadcast error * Add RewritePositionDeleteFilesSparkAction --------- Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com> Co-authored-by: Szehon Ho <szehon.apache@gmail.com> Co-authored-by: Yufei Gu <yufei_gu@apple.com> Co-authored-by: yufeigu <yufei@apache.org> Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com> Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>

github-actions bot added the core label May 16, 2023

nastra commented May 16, 2023

View reviewed changes

core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated Show resolved Hide resolved

nastra requested a review from danielcweeks May 16, 2023 13:19

nastra force-pushed the ref-snapshot-loading-and-branches branch from cbcc2fa to 8bf71d5 Compare May 16, 2023 14:26

nastra commented May 16, 2023

View reviewed changes

core/src/main/java/org/apache/iceberg/TableMetadata.java Outdated Show resolved Hide resolved

danielcweeks reviewed May 26, 2023

View reviewed changes

core/src/main/java/org/apache/iceberg/TableMetadata.java Outdated Show resolved Hide resolved

nastra force-pushed the ref-snapshot-loading-and-branches branch from 8bf71d5 to 3452888 Compare June 2, 2023 07:50

nastra requested a review from danielcweeks June 2, 2023 07:51

danielcweeks approved these changes Jun 16, 2023

View reviewed changes

nastra force-pushed the ref-snapshot-loading-and-branches branch from 3452888 to 79ed04e Compare June 16, 2023 15:22

rdblue reviewed Jun 16, 2023

View reviewed changes

nastra force-pushed the ref-snapshot-loading-and-branches branch from 79ed04e to a1b5c68 Compare June 17, 2023 04:53

nastra requested a review from rdblue June 17, 2023 04:53

Core: Include all reachable snapshots with v1 format and REF snapshot…

7daa002

… mode

nastra force-pushed the ref-snapshot-loading-and-branches branch from a1b5c68 to 7daa002 Compare June 17, 2023 05:19

rdblue approved these changes Jun 19, 2023

View reviewed changes

rdblue merged commit 6586788 into apache:master Jun 19, 2023
41 checks passed

nastra deleted the ref-snapshot-loading-and-branches branch June 19, 2023 15:49

nastra added this to the Iceberg 1.3.1 milestone Jul 7, 2023

nastra added a commit to nastra/iceberg that referenced this pull request Jul 10, 2023

Core: Include all reachable snapshots with v1 format and REF snapshot…

39dd0bf

… mode (apache#7621)

nastra mentioned this pull request Jul 10, 2023

[1.3.x] Core: Include all reachable snapshots with v1 format and REF snapshot mode #8027

Merged

nastra requested a review from szehon-ho July 10, 2023 06:53

szehon-ho pushed a commit that referenced this pull request Jul 10, 2023

Core: Include all reachable snapshots with v1 format and REF snapshot…

a72cd5f

… mode (#7621) (#8027)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

nastra commented May 16, 2023

nastra May 16, 2023

rdblue Jun 16, 2023

Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

Conversation

nastra commented May 16, 2023

nastra May 16, 2023

Choose a reason for hiding this comment

rdblue Jun 16, 2023

Choose a reason for hiding this comment