New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4761. Implement increment count optimization in DeletedBlockLog V2 #1852
Conversation
R: @runzhiwang @GlenGeng |
Without this optimization, next time when master is synced to HDDS-2823, I think current V2 implementation won't pass the unit test in ozone/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestDeletedBlockLog.java Line 242 in 8d3817c
|
@amaliujia Thanks for back port this fix to HDDS-2823. I consider we have to move The reason is, content of The design decision is to only encapsulate |
@GlenGeng I have updated this PR based on discussion. Basically we still need a in-memory state in the state manager. Meanwhile I updated to clear the state upon leader change. Also update the UT as the UT was updated during the master to 2823 sync thus before this PR, the UT was not functionally correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
*/ | ||
public void clearTransactionToDNsCommitMap() { | ||
public void clear() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to onBecomeLeader()
@@ -42,4 +42,6 @@ void increaseRetryCountOfTransactionInDB(ArrayList<Long> txIDs) | |||
KeyValue<Long, DeletedBlocksTransaction>> getReadOnlyIterator(); | |||
|
|||
void onFlush(); | |||
|
|||
void clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. onBecomeLeader()
// then set the retry value to -1, stop retrying, admins can | ||
// analyze those blocks and purge them manually by SCMCli. | ||
builder.setCount(-1); | ||
deletedTable.putWithBatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better transactionRetryCountMap.remove(txID);
? Otherwise might be a leak.
Hey Rui, after a second thought, I realized that my suggestion (the second commit) is wrong, sorry for that. Given accross different SCMs,
The generated The real problem here is, this solution does not decrease the number of fsync. In non-HA mode, each write to DB will call fsync, and in HA mode, the writes will be buffered, but each write will trigger a raft client request, which will call fsync on raft log. Let's go througth this problem again. |
After a third thought, I consider both the first commit and first + second commit can not ensure the deterministic of the retry counter map, and the optimization in HA mode should be reduce the raft call. What do you think ? |
@GlenGeng Optimize to reduce the Ratis call make sure. So in that case, we will need both in-memory state in DeletedBlockLog and state manager. The former is to reduce ratis call, and the latter is required because of the transaction buffer. What do you think? |
Agree. The key point of a deterministic solution is to let each ratis call write rocksDB. |
@GlenGeng unfortunately with transaction buffer that we cannot flush retry count into DB immediately thus we probably have to accept the non-deterministic result for this moment. First of all, the current implementation is already bad because it reads from DB and then try to write into DB, thus in the worst case, DB value never increase (because previous writes are in buffer) thus for a txID it retries forever. So we have to introduce the in-memory state in state manager. Then,
At least current PR will avoid the worst case. What do you think? |
Not flush retry count into DB immediately may not lead to non-deterministic. Let's try to prove it: What we need take care is to move the counter check logic into ReadOnlyIterator, e.g,
Given the |
Hey Rui, the trxBuffer introduce some issues into the delete block log, let me draft a design for it, we can discuss there. |
I will open another PR to apply the optimization on retry count that we have discussed. |
What changes were proposed in this pull request?
Implement increment count optimization in DeletedBlockLog V2. We only write retry count to DB in every 100 times.
This optimization is already in master branch: 39027e4
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4761
How was this patch tested?
UT