-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode #4988
Conversation
… sending duplicate delete transactions to the DN
// todo. Implementing unit test and integration test, |
No such command.
|
1 similar comment
No such command.
|
@adoroszlai PTAL Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xichen01 thanks for working over this, this seems good improvement to send new blocks and retry with some delay avoiding duplicate command. This is feasible now after removal of strict ordering of transactionId check at DN HDDS-8228
. The metrics added for outOfOrder may not be required now at Dn with this change as it will be common to be out-of-order.
Additionally, at SCM, state is managed in DB with retry, and multiple map. We need relook and refactor to have combined state for the Txs.
...-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeleteBlocksCommandStatusManager.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/hadoop/ozone/container/common/report/CommandStatusReportPublisher.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
Outdated
Show resolved
Hide resolved
...-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeleteBlocksCommandStatusManager.java
Outdated
Show resolved
Hide resolved
…CommandStatusManager
@sumitagrawl PTAL Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xichen01 I tried to understand the changes and impact, I am just getting lost in code,
My Understanding,
Here, State Machine to avoid duplicate sending for DeleteBlock command but can not find any good action for these state.
TO_BE_SENT: initial state
NEED_EXECUTED, EXECUTED: just removed on timeout, no other action on keep these
PENDING_EXECUTED, SENT: just to avoid retry, and on timeout, remove
Do we really need so many states? Or just, "INTIAL" & "SENT", and cleanup on timeout or executed.
- One improvement can see from this PR that next command, it includes new set of blocks (even current blocks are not yet executed).
I think we should have refactored code including transactionToDNsCommitMap, transactionToRetryCountMap to take advantage of these state management and simplified the code.
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Show resolved
Hide resolved
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Outdated
Show resolved
Hide resolved
OK, I'll try to compress some of the state to reduce the complexity of the code. |
@xichen01, Thanks for working on this. |
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java
Yes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xichen01 Thanks for update, given few comments for this PR. Overall looks good.
Will recheck for commandStatusMap for cleanup after fix.
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Show resolved
Hide resolved
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Show resolved
Hide resolved
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java
Show resolved
Hide resolved
...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java
Outdated
Show resolved
Hide resolved
… useless code; Fix thread issue
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
Thanks @xichen01 for updating the patch. Can you please check https://github.com/xichen01/ozone/actions/runs/7057702518/job/19212125760#step:5:1833 |
@adoroszlai @sumitagrawl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xichen01 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @xichen01 for the patch.
SCMDeletedBlockTransactionStatusManager | ||
getSCMDeletedBlockTransactionStatusManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeletedBlockLog
interface is defined in terms of operations . I don't think exposing a manager object is appropriate for the interface, it should be an implementation detail. Similarly, sharing the same lock between the two objects does not seem right.
Maybe the interface should define operations that the implementation passes through to the manager. Alternatively the manager object should have an interface defined separately, and act as a way to manipulate the DeletedBlockLog
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the getSCMDeletedBlockTransactionStatusManager
interface from DeletedBlockLog
and added DeletedBlockTransactionStatusManager
related actions to DeletedBlockLog
.
What changes were proposed in this pull request?
Currently SCM will send a duplicate
DeletedBlocksTransaction
to the specify DN if the DN not report the transactions have been finish by the Heartbeat. So if theDeleteBlocksCommandHandler
Thread of a DN was Blocked cause by some reason (Such as wait Container lock) the SCM will send a duplicateDeletedBlocksTransaction
to this DN.Summary
The Status of
DeleteBlocksCommand
State Transfer
TO_BE_SENT -> SENT: The DeleteBlocksCommand is sent by SCM, The follow-up status has not been updated by Datanode.
SENT -> null (remove state recode from
SCMDeleteBlocksCommandStatusManager
)Once the DN executes DeleteBlocksCommands, regardless of whether DeleteBlocksCommands is executed successfully or not, it will be deleted from record.
Successful DeleteBlocksCommands are recorded in
SCMDeletedBlockTransactionStatusManager#transactionToDNsCommitMap
.DeleteBlocksCommand resent
The
DeleteBlocksCommand
on theTO_BE_SENT, SENT
will not be resent by SCM.SCMDeletedBlockTransactionStatusManager
SCMDeletedBlockTransactionStatusManager
contains thetransactionToDNsCommitMap
migrated fromDeletedBlockLogImpl
use to manage the commitedDeletedBlocksTransaction
.And the
SCMDeletedBlockTransactionStatusManager#SCMDeleteBlocksCommandStatusManager
use to manage theDeletedBlocksTransaction
which are uncommited.The "commited" means that
DeletedBlockTransaction
is executed on DN and reported to SCM by the heartbeatWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8882
Please replace this section with the link to the Apache JIRA)
How was this patch tested?
integration test