Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Handle optional fields #8050

Merged
merged 2 commits into from Jul 14, 2023
Merged

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jul 12, 2023

In the Java code, we expect:

  • current-snapshot-id
  • properties
  • snapshots

to be there, but they are actually optional. Related #8049

image

Java still writes -1 for the current-snapshot-id when there is no snapshot, but I think it is important to fix the readers first.

@github-actions github-actions bot added the core label Jul 12, 2023
@Fokko Fokko changed the title Core: Handle null current-snapshot-id Core: Handle optional fields Jul 12, 2023
We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.
@zinking
Copy link
Contributor

zinking commented Jul 13, 2023

LGTM

on the other side, there doesn't seem to be parsing exceptions from my experience, is that because the writer essentially always write out the optional field. in this sense, what is changed ? or it's just to stick more strictly with the spec?

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit

@Fokko
Copy link
Contributor Author

Fokko commented Jul 13, 2023

on the other side, there doesn't seem to be parsing exceptions from my experience, is that because the writer essentially always write out the optional field. in this sense, what is changed ? or it's just to stick more strictly with the spec?

@zinking Thanks for the review. Let me elaborate. It is sticking to the spec on the read side. In the spec, the current-snapshot-id, properties, and snapshots fields can be omitted, but the parser doesn't allow this.

The Java code still writes a -1 when writing a Metadata JSON:

generator.writeNumberField(
CURRENT_SNAPSHOT_ID,
metadata.currentSnapshot() != null ? metadata.currentSnapshot().snapshotId() : -1);

But I don't want to change that today, since other readers might expect a -1 instead of omitting the field (I know Athena does). I think aking sure that the read-side adheres to the spec is a good first step.

@szehon-ho szehon-ho added this to the Iceberg 1.3.1 milestone Jul 13, 2023
@Fokko Fokko merged commit 3d929d4 into apache:master Jul 14, 2023
41 checks passed
@Fokko Fokko deleted the fd-allow-reading-a-null branch July 14, 2023 07:26
@Fokko
Copy link
Contributor Author

Fokko commented Jul 14, 2023

Thanks @zinking @nastra for the review, and @szehon-ho for adding the milestone. Let me create the backport right away

Fokko added a commit to Fokko/iceberg that referenced this pull request Jul 14, 2023
* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ
Fokko added a commit that referenced this pull request Jul 14, 2023
* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ
nastra pushed a commit to nastra/iceberg that referenced this pull request Jul 18, 2023
* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ
nastra pushed a commit to nastra/iceberg that referenced this pull request Aug 15, 2023
* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ
laithalzyoud added a commit to revolut-engineering/iceberg that referenced this pull request Jan 30, 2024
* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029)

* Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028)

* Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027)

* Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036)

* Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039)

* Core: Handle optional fields (apache#8050) (apache#8064)

* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ

* Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060)

* Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059)

* Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069)

* Spark: Add actions for disater recovery.

* Fix the compile error.

* Fix merge conflicts and formatting

* All tests are working and code integrated with Spark 3.3

* Fix union error and snapshots test

* Fix Spark broadcast error

* Add RewritePositionDeleteFilesSparkAction

---------

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com>
Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
Co-authored-by: Yufei Gu <yufei_gu@apple.com>
Co-authored-by: yufeigu <yufei@apache.org>
Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com>
Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants