Correct schema behavior #247

Fokko · 2024-01-01T21:08:37Z

When we alter the schema, we want to use the latest schema by default, except when you select a specific snapshot that has a schema-id.

amogh-jahagirdar · 2024-01-02T01:37:05Z

pyiceberg/table/__init__.py

-                snapshot_schema = self.table.schemas()[snapshot.schema_id]
+        current_schema = self.table.schema()
+        if self.snapshot_id is not None:
+            snapshot = self.table.snapshot_by_id(self.snapshot_id)


I think it would be an invalid state if snapshot is None but a snapshot_id is set, should we throw?

Maybe we could consider a schema_for(snapshot_id) API similar to https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java#L368 .

I think there's a difference in the Java implementation and Python implementation on the case where there is a schema ID on the snapshot but for whatever reason the schema with that ID cannot be found. In the schemaFor Java API implementation we throw, but here we fall back to the latest. I think we should probably throw rather than assume the latest in that case because that implies there is some bad metadata and it's safer to fail than coerce to the latest schema. I think latest should only be used when there is no schema ID on the snapshot and the original case when there is no snapshot_id set. What do you think?

Great catch @amogh-jahagirdar I'm not super strong on this one. Typically, I would not fail in these situations, but I agree that raising a warning might be appropriate here.

I know there are thoughts of pruning old schemas, which might lead to this situation, but I would expect this to happen regularly.

I've updated the code with a warning, let me know what you think!

I think the warning makes sense for the missing schema ID case but what about the case where the snapshot_id is set but cannot be found (if line 948 returns None)? I think the only option there would be to throw because that means there was some established snapshot_id but we can't find it anymore.

Oof, that's a good one. I think we should check if the snapshot-id is valid earlier in the process. I've added a check now, but I'll follow up with another PR to make this more strict.

amogh-jahagirdar

Sweet, this looks great to me now, thanks @Fokko !

* Correct schema behavior When we alter the schema, we want to use the latest schema by default, except when you select a specific snapshot that has a schema-id. * Add warning if schema-id is missing from the metadata * Catch unexisting snapshots

Correct schema behavior

9247dc4

When we alter the schema, we want to use the latest schema by default, except when you select a specific snapshot that has a schema-id.

amogh-jahagirdar reviewed Jan 2, 2024

View reviewed changes

Fokko added 2 commits January 3, 2024 09:52

Add warning if schema-id is missing from the metadata

4598292

Catch unexisting snapshots

f0f3226

amogh-jahagirdar approved these changes Jan 5, 2024

View reviewed changes

Fokko merged commit dba1ef8 into apache:main Jan 5, 2024

Fokko mentioned this pull request Jan 9, 2024

feat: Introduce basic file scan planning. apache/iceberg-rust#129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct schema behavior #247

Correct schema behavior #247

Uh oh!

Fokko commented Jan 1, 2024

Uh oh!

amogh-jahagirdar Jan 2, 2024

Uh oh!

Fokko Jan 3, 2024

Uh oh!

Fokko Jan 3, 2024

Uh oh!

amogh-jahagirdar Jan 4, 2024

Uh oh!

Fokko Jan 4, 2024

Uh oh!

amogh-jahagirdar left a comment

Uh oh!

Uh oh!

Correct schema behavior #247

Correct schema behavior #247

Uh oh!

Conversation

Fokko commented Jan 1, 2024

Uh oh!

amogh-jahagirdar Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Fokko Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

Fokko Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

Fokko Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!