Skip to content

Conversation

@flyrain
Copy link
Contributor

@flyrain flyrain commented May 3, 2021

We need the same thing as here in the context of eventual consistency. Otherwise, we hits a NPE like this. We need a fix even though it is a WARN.

timestamp="2021-04-23T22:35:23,938+0000",level="WARN",threadName="pool-1-thread-1",appName="spark-driver",logger="org.apache.iceberg.SnapshotProducer",message="Failed to notify listeners",exception="java.lang.NullPointerException
	at org.apache.iceberg.MergingSnapshotProducer.updateEvent(MergingSnapshotProducer.java:377)
	at org.apache.iceberg.SnapshotProducer.notifyListeners(SnapshotProducer.java:329)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:324)
	at org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:236)

@github-actions github-actions bot added the core label May 3, 2021
// consistency problems in refresh.
LOG.warn("Failed to load committed snapshot, leave sequence number to 0");
} else {
sequenceNumber = justSaved.sequenceNumber();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix looks good but I'm not sure if we should default sequenceNumber to 0, as this may result in incorrect data for users that rely on this sequence number in the event. Since without this change we probably will not send out the event at all, if we don't want to wait until the data become eventual I wonder if returning -1 would be better since 0 may refer to an actual sequence number of the table (although in v1 all sequence number will be 0 so return 0 for v1 is better, but I'm not sure how easy it is to check table version here)

Copy link
Contributor Author

@flyrain flyrain May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Make sense to set it to -1 since its initial value is 0 and increase monotonically as this line shows

public long nextSequenceNumber() {

Posted a new commit to fix it.

@flyrain
Copy link
Contributor Author

flyrain commented May 4, 2021

Can you take a look? @rdblue @aokolnychyi @RussellSpitzer @karuppayya @kbendick

@RussellSpitzer
Copy link
Member

I think this is a good fix to the NPE. But I do want to make sure we warn folks that an eventually consistent Catalog can break the history of an Iceberg table. This would only be acceptable for a catalog which guaranteed consistent reads during the lock phase.

My thinking is basically

Consistent - Read Your Writes  Catalog = Best
Eventually Consistent Reads - Consistent when locking/committing = Adequate but not great
Eventually consistent during lock/commit = Will cause lost writes, non-linearizable histories and weirdness. 

@kbendick
Copy link
Contributor

kbendick commented May 6, 2021

I think this is a good fix to the NPE. But I do want to make sure we warn folks that an eventually consistent Catalog can break the history of an Iceberg table. This would only be acceptable for a catalog which guaranteed consistent reads during the lock phase.

My thinking is basically

Consistent - Read Your Writes  Catalog = Best
Eventually Consistent Reads - Consistent when locking/committing = Adequate but not great
Eventually consistent during lock/commit = Will cause lost writes, non-linearizable histories and weirdness. 

Is this something that could come up with the DynamoDB based lock? I agree this is a good fix to the NPE, but I have personally had weird edge cases with other software that uses a DynamoDB table as a lock (such as terraform). Ideally testing handles this and the DynamoDB / Glue catalog doesn't suffer from these issues. Possibly I need to review the DynamoDB table lock again, but felt it might be worth noting as I've had weird issues with terraform table locks and consistency issues.

@flyrain
Copy link
Contributor Author

flyrain commented May 6, 2021

User should avoid eventually consistent Catalog if possible. If Hive catalog is used, we normally expect HMS is strong consistency. It is NOT the case some times, for example, HMS could be an eventual consistency system if it uses Cassandra as the backend DB. But I like the call-out from @RussellSpitzer and @kbendick, avoid it as much as possible.

However, we cannot remove eventual consistency related code if Iceberg still support Hadoop catalog, which based on an eventual consistency system, HDFS.

if (justSaved == null) {
// The snapshot just saved may not be present if the latest metadata couldn't be loaded due to eventual
// consistency problems in refresh.
LOG.warn("Failed to load committed snapshot, leave its sequence number to an invalid number(-1)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a possible follow up, should we consider retrying this with a backoff (or does this already do that)? Or is that something we'd rather avoid getting into?

Copy link
Contributor Author

@flyrain flyrain May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a retry in the main part of method commit(), not for clean-up and notifyListeners. I'm not sure if it's a good idea to introduce retry to them. I didn't see much downside, at most delay the commit, but feel like it is a bit over engineering. It is going to be the another PR anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for back-off. That is handled by the table operations when it tries to load the new metadata. All this needs to do is handle the case where the table metadata can't be loaded.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Left a comment about possibly retrying the refresh for the snapshot with the given snapshotId, perhaps as a follow up. But I'm not sure of the implications of that and ideally users are using consistent catalogs / metastores.

long snapshotId = snapshotId();
long sequenceNumber = ops.refresh().snapshot(snapshotId).sequenceNumber();
Snapshot justSaved = ops.refresh().snapshot(snapshotId);
long sequenceNumber = TableMetadata.INVALID_SEQUENCE_NUMBER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am not sure what should be the best way to react here. We do expect the catalog to be consistent so one option is to just log an error and not throw an event or throw some special even type.

cc @rdsr who implemented this part and @rdblue who uses the event-based system. What do you expect in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the situation where the snapshot would not be in metadata? If the snapshot wasn't committed, then I would expect to not send an event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see the comment about not being able to load the metadata.

I'd probably prefer null, but -1 is fine so that we don't change a public API.

Map<String, String> summary,
List<ManifestFile> dataManifests) {
this(io, INITIAL_SEQUENCE_NUMBER, snapshotId, parentId, timestampMillis, operation, summary, null);
this(io, TableMetadata.INITIAL_SEQUENCE_NUMBER, snapshotId, parentId, timestampMillis, operation, summary, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated to me. Can you remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to piggy-back a change to consolidate INITIAL_SEQUENCE_NUMBER from multiple places. I can put it in another PR if it is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do. We try to avoid unnecessary changes so that we don't cause unnecessary commit conflicts for people maintaining branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it in a new commit.

if (justSaved == null) {
// The snapshot just saved may not be present if the latest metadata couldn't be loaded due to eventual
// consistency problems in refresh.
LOG.warn("Failed to load committed snapshot, leave its sequence number to an invalid number(-1)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the message; "leave its sequence number to an invalid number(-1)" isn't very clear. How about "Failed to load committed snapshot: omitting sequence number from notifications (not known)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it in a new commit.

@flyrain
Copy link
Contributor Author

flyrain commented May 19, 2021

The test failures are unrelated.

@rdblue rdblue merged commit ba54f9c into apache:master May 20, 2021
@rdblue
Copy link
Contributor

rdblue commented May 20, 2021

Thanks, @flyrain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants