Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-7032. RDBStore#getUpdatesSince should throw SequenceNumberNotFoundException back to the caller in order for Recon to fall back to full snapshot #3613

Merged
merged 4 commits into from Jul 21, 2022

Conversation

smengcl
Copy link
Contributor

@smengcl smengcl commented Jul 21, 2022

What changes were proposed in this pull request?

Recon expects an exception to be thrown in order to fall back to full snapshot when an OM DB delta update cannot be retrieved:

LOG.info("Obtaining delta updates from Ozone Manager");
// Get updates from OM and apply to local Recon OM DB.
getAndApplyDeltaUpdatesFromOM(currentSequenceNumber,
omdbUpdatesHandler);
// Update timestamp of successful delta updates query.
ReconTaskStatus reconTaskStatusRecord = new ReconTaskStatus(
OmSnapshotTaskName.OmDeltaRequest.name(),
System.currentTimeMillis(), getCurrentOMDBSequenceNumber());
reconTaskStatusDao.update(reconTaskStatusRecord);
// Pass on DB update events to tasks that are listening.
reconTaskController.consumeOMEvents(new OMUpdateEventBatch(
omdbUpdatesHandler.getEvents()), omMetadataManager);
} catch (InterruptedException intEx) {
Thread.currentThread().interrupt();
} catch (Exception e) {
metrics.incrNumDeltaRequestsFailed();
LOG.warn("Unable to get and apply delta updates from OM.", e);
fullSnapshot = true;
}
}
if (fullSnapshot) {

However, the current logic implies that RDBStore#getUpdatesSince will never throw SequenceNumberNotFoundException back to the client (Recon) because it is caught as IOException in try-catch. A patch earlier unintentionally caused the regression.

The solution is to restore the intended behavior by explicitly catching SequenceNumberNotFoundException and throwing it back to the client.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7032

How was this patch tested?

  • Manual tested on a repro cluster.

…ndException back to the caller in order for Recon to fall back to full snapshot

Change-Id: I6e90b47f28d963a458572e8cd2de879619fc423b
@kerneltime
Copy link
Contributor

Thanks @smengcl for finding this.

@@ -322,8 +322,16 @@ public DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
}
transactionLogIterator.next();
}
} catch (SequenceNumberNotFoundException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert back to throwing all IOExceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe SequenceNumberNotFoundException is the only expected checked exception to be thrown anyway, judging from the method signature:

public DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
throws SequenceNumberNotFoundException {

Change-Id: Ic40e547c2f547f85460a0c256ea08eca1af9aada
…ch can circumvent the existing throw SNNFE logic below. Tested on a cluster.

Change-Id: Ica660650d0e2cd54f01215fbdd5b8af30d0e1e26
Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Change-Id: I6b5c2804328df6522a6a6838d9935ec71f59130a
Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have to revisit the exception handling here.

@kerneltime kerneltime merged commit c57c9bc into apache:master Jul 21, 2022
@smengcl
Copy link
Contributor Author

smengcl commented Jul 21, 2022

I think we might have to revisit the exception handling here.

Yeah at some point we might want to refactor it.

Thanks @swagle and @kerneltime for the review.

@adoroszlai adoroszlai changed the title HDDS-7032. RDBStore#getUpdatesSince should throw SequenceNumberNotFou… HDDS-7032. RDBStore#getUpdatesSince should throw SequenceNumberNotFoundException back to the caller in order for Recon to fall back to full snapshot Jul 22, 2022
@adoroszlai
Copy link
Contributor

Unfortunately the commit message cannot be fixed...

@kerneltime
Copy link
Contributor

Unfortunately the commit message cannot be fixed...

Will keep this in mind.

duongkame pushed a commit to duongkame/ozone that referenced this pull request Aug 16, 2022
…NumberNotFou… (apache#3613)

(cherry picked from commit c57c9bc)

Change-Id: I7fe61567a1c7e49c5c34cbe2225a5fab1ef2e597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants