-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Fix the NPE while updating event in the context of eventual consistency. #2552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // consistency problems in refresh. | ||
| LOG.warn("Failed to load committed snapshot, leave sequence number to 0"); | ||
| } else { | ||
| sequenceNumber = justSaved.sequenceNumber(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix looks good but I'm not sure if we should default sequenceNumber to 0, as this may result in incorrect data for users that rely on this sequence number in the event. Since without this change we probably will not send out the event at all, if we don't want to wait until the data become eventual I wonder if returning -1 would be better since 0 may refer to an actual sequence number of the table (although in v1 all sequence number will be 0 so return 0 for v1 is better, but I'm not sure how easy it is to check table version here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Make sense to set it to -1 since its initial value is 0 and increase monotonically as this line shows
| public long nextSequenceNumber() { |
Posted a new commit to fix it.
|
Can you take a look? @rdblue @aokolnychyi @RussellSpitzer @karuppayya @kbendick |
|
I think this is a good fix to the NPE. But I do want to make sure we warn folks that an eventually consistent Catalog can break the history of an Iceberg table. This would only be acceptable for a catalog which guaranteed consistent reads during the lock phase. My thinking is basically |
Is this something that could come up with the DynamoDB based lock? I agree this is a good fix to the NPE, but I have personally had weird edge cases with other software that uses a DynamoDB table as a lock (such as terraform). Ideally testing handles this and the DynamoDB / Glue catalog doesn't suffer from these issues. Possibly I need to review the DynamoDB table lock again, but felt it might be worth noting as I've had weird issues with terraform table locks and consistency issues. |
|
User should avoid eventually consistent Catalog if possible. If Hive catalog is used, we normally expect HMS is strong consistency. It is NOT the case some times, for example, HMS could be an eventual consistency system if it uses Cassandra as the backend DB. But I like the call-out from @RussellSpitzer and @kbendick, avoid it as much as possible. However, we cannot remove eventual consistency related code if Iceberg still support Hadoop catalog, which based on an eventual consistency system, HDFS. |
| if (justSaved == null) { | ||
| // The snapshot just saved may not be present if the latest metadata couldn't be loaded due to eventual | ||
| // consistency problems in refresh. | ||
| LOG.warn("Failed to load committed snapshot, leave its sequence number to an invalid number(-1)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a possible follow up, should we consider retrying this with a backoff (or does this already do that)? Or is that something we'd rather avoid getting into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a retry in the main part of method commit(), not for clean-up and notifyListeners. I'm not sure if it's a good idea to introduce retry to them. I didn't see much downside, at most delay the commit, but feel like it is a bit over engineering. It is going to be the another PR anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for back-off. That is handled by the table operations when it tries to load the new metadata. All this needs to do is handle the case where the table metadata can't be loaded.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Left a comment about possibly retrying the refresh for the snapshot with the given snapshotId, perhaps as a follow up. But I'm not sure of the implications of that and ideally users are using consistent catalogs / metastores.
| long snapshotId = snapshotId(); | ||
| long sequenceNumber = ops.refresh().snapshot(snapshotId).sequenceNumber(); | ||
| Snapshot justSaved = ops.refresh().snapshot(snapshotId); | ||
| long sequenceNumber = TableMetadata.INVALID_SEQUENCE_NUMBER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I am not sure what should be the best way to react here. We do expect the catalog to be consistent so one option is to just log an error and not throw an event or throw some special even type.
cc @rdsr who implemented this part and @rdblue who uses the event-based system. What do you expect in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the situation where the snapshot would not be in metadata? If the snapshot wasn't committed, then I would expect to not send an event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see the comment about not being able to load the metadata.
I'd probably prefer null, but -1 is fine so that we don't change a public API.
| Map<String, String> summary, | ||
| List<ManifestFile> dataManifests) { | ||
| this(io, INITIAL_SEQUENCE_NUMBER, snapshotId, parentId, timestampMillis, operation, summary, null); | ||
| this(io, TableMetadata.INITIAL_SEQUENCE_NUMBER, snapshotId, parentId, timestampMillis, operation, summary, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks unrelated to me. Can you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to piggy-back a change to consolidate INITIAL_SEQUENCE_NUMBER from multiple places. I can put it in another PR if it is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please do. We try to avoid unnecessary changes so that we don't cause unnecessary commit conflicts for people maintaining branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it in a new commit.
| if (justSaved == null) { | ||
| // The snapshot just saved may not be present if the latest metadata couldn't be loaded due to eventual | ||
| // consistency problems in refresh. | ||
| LOG.warn("Failed to load committed snapshot, leave its sequence number to an invalid number(-1)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve the message; "leave its sequence number to an invalid number(-1)" isn't very clear. How about "Failed to load committed snapshot: omitting sequence number from notifications (not known)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it in a new commit.
|
The test failures are unrelated. |
|
Thanks, @flyrain! |
We need the same thing as here in the context of eventual consistency. Otherwise, we hits a NPE like this. We need a fix even though it is a WARN.