Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Include all reachable snapshots with v1 format and REF snapshot mode #7621

Merged
merged 1 commit into from Jun 19, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented May 16, 2023

I noticed that not all relevant snapshots are being retained with a V1 table and when using branches. I added a test that reproduces this issue and would fail with

Snapshot for reference SnapshotRef{snapshotId=7532742534704478452, type=BRANCH, minSnapshotsToKeep=null, maxSnapshotAgeMs=null, maxRefAgeMs=null} does not exist in the existing snapshots list
java.lang.IllegalArgumentException: Snapshot for reference SnapshotRef{snapshotId=7532742534704478452, type=BRANCH, minSnapshotsToKeep=null, maxSnapshotAgeMs=null, maxRefAgeMs=null} does not exist in the existing snapshots list
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
	at org.apache.iceberg.TableMetadata.validateRefs(TableMetadata.java:835)
	at org.apache.iceberg.TableMetadata.ensureSnapshotsLoaded(TableMetadata.java:528)
	at org.apache.iceberg.TableMetadata.snapshots(TableMetadata.java:493)
	at org.apache.iceberg.BaseTable.snapshots(BaseTable.java:140)
	at org.apache.iceberg.rest.TestRESTCatalog.testTableSnapshotLoadingWithDivergedBranches(TestRESTCatalog.java:953)

This is because with V1 tables we removed non-relevant snapshots only based on timestamp.

@github-actions github-actions bot added the core label May 16, 2023
@nastra nastra requested a review from danielcweeks May 16, 2023 13:19
@nastra nastra force-pushed the ref-snapshot-loading-and-branches branch from cbcc2fa to 8bf71d5 Compare May 16, 2023 14:26
return ancestorsOf(snapshotId, lookup, true);
}

public static Iterable<Snapshot> ancestorsOf(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method only exists so that we have consistent error reporting in

.hasMessage("Invalid table metadata: Cannot find current version");
for V1 and V2 tables (because we want to skip the validation of the starting snapshot id).

@nastra nastra force-pushed the ref-snapshot-loading-and-branches branch from 8bf71d5 to 3452888 Compare June 2, 2023 07:50
@nastra nastra requested a review from danielcweeks June 2, 2023 07:51
@nastra nastra force-pushed the ref-snapshot-loading-and-branches branch from 3452888 to 79ed04e Compare June 16, 2023 15:22
// we don't have any other branches.
if (this.formatVersion == 1
&& refs.containsKey(SnapshotRef.MAIN_BRANCH)
&& refs.size() == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason that we need to clear out newer snapshots for v1 tables. I think we were just trying to avoid the case where a newer sequence number causes a failure. It should be fine for a v1 table to have newer snapshots, since extra snapshots don't affect the current table state or branching. It's probably safer just to keep them all.

@nastra nastra force-pushed the ref-snapshot-loading-and-branches branch from 79ed04e to a1b5c68 Compare June 17, 2023 04:53
@nastra nastra requested a review from rdblue June 17, 2023 04:53
@nastra nastra force-pushed the ref-snapshot-loading-and-branches branch from a1b5c68 to 7daa002 Compare June 17, 2023 05:19
@rdblue rdblue merged commit 6586788 into apache:master Jun 19, 2023
41 checks passed
@nastra nastra deleted the ref-snapshot-loading-and-branches branch June 19, 2023 15:49
@nastra nastra added this to the Iceberg 1.3.1 milestone Jul 7, 2023
nastra added a commit to nastra/iceberg that referenced this pull request Jul 10, 2023
@nastra nastra requested a review from szehon-ho July 10, 2023 06:53
laithalzyoud added a commit to revolut-engineering/iceberg that referenced this pull request Jan 30, 2024
* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029)

* Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028)

* Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027)

* Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036)

* Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039)

* Core: Handle optional fields (apache#8050) (apache#8064)

* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ

* Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060)

* Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059)

* Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069)

* Spark: Add actions for disater recovery.

* Fix the compile error.

* Fix merge conflicts and formatting

* All tests are working and code integrated with Spark 3.3

* Fix union error and snapshots test

* Fix Spark broadcast error

* Add RewritePositionDeleteFilesSparkAction

---------

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com>
Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
Co-authored-by: Yufei Gu <yufei_gu@apple.com>
Co-authored-by: yufeigu <yufei@apache.org>
Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com>
Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants