HDDS-7111. SCMHADBTransactionBuffer not flushed in time #3670
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
We recently found that after we deleted all the keys, the cluster will remain with some data. One of the reasons is due to this:
SCM will add deleted blocks into transactions when receiving the request from OM. When HA is enabled, DBTransactionbuffer is implemented as the SCMHADBTransactionbufferImpl. Inside this, the buffer will not be flushed immediately. Normally, it will be flushed when SCM takes a snapshot. The snapshot gap threshold is default 1000. If the user has little load pressure on the cluster (no writing more ratis logs), the buffer will be always pending in the memory. Real deletion happened in SCMBlockDeletingService, which will scan the DB and get the transactions, for those txns in-memory buffer, it won't find them. This is why DN has not yet received these deleted block info.
This PR adds a flush monitor checking regularly to trigger the non-empty flush, overall, this could be a common mechanism if some other cases like the deleted blocks appear in the future.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7111
How was this patch tested?
UT