Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-7111. SCMHADBTransactionBuffer not flushed in time #3670

Closed
wants to merge 1 commit into from

Conversation

Xushaohong
Copy link
Contributor

What changes were proposed in this pull request?

We recently found that after we deleted all the keys, the cluster will remain with some data. One of the reasons is due to this:

SCM will add deleted blocks into transactions when receiving the request from OM. When HA is enabled,  DBTransactionbuffer is implemented as the SCMHADBTransactionbufferImpl. Inside this, the buffer will not be flushed immediately. Normally, it will be flushed when SCM takes a snapshot. The snapshot gap threshold is default 1000. If the user has little load pressure on the cluster (no writing more ratis logs), the buffer will be always pending in the memory. Real deletion happened in SCMBlockDeletingService, which will scan the DB and get the transactions, for those txns in-memory buffer, it won't find them. This is why DN has not yet received these deleted block info.

This PR adds a flush monitor checking regularly to trigger the non-empty flush, overall, this could be a common mechanism if some other cases like the deleted blocks appear in the future.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7111

How was this patch tested?

UT

@Xushaohong
Copy link
Contributor Author

@ChenSammi PTAL~

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Xushaohong for the patch!
i would like to not use DBTransactionBuffer for DeleteBlockLog, since deletion is not time-sensitive. we can just use a rocksdb write batch for each of om delete request, and commit it to rocksdb directly, so that the iterator can always get the latest view of deleteblocklog.

@errose28
Copy link
Contributor

Hi @Xushaohong. The issue you've brought up is definitely valid. We have discussed this as well in HDDS-6721. After some discussion the preferred solution in that Jira was to add a time based Ratis snapshot feature in RATIS-1583, which has not been implemented yet. IMO time based Ratis snapshot is the easiest way forward because it will provide a general purpose Ratis feature that we can use with a single config. For example, have Ratis take a snapshot (flush to DB) every 1000 transactions or every 10 minutes (arbitrary example value), whichever comes first.

@Xushaohong
Copy link
Contributor Author

Hi @Xushaohong. The issue you've brought up is definitely valid. We have discussed this as well in HDDS-6721. After some discussion the preferred solution in that Jira was to add a time based Ratis snapshot feature in RATIS-1583, which has not been implemented yet. IMO time based Ratis snapshot is the easiest way forward because it will provide a general purpose Ratis feature that we can use with a single config. For example, have Ratis take a snapshot (flush to DB) every 1000 transactions or every 10 minutes (arbitrary example value), whichever comes first.

Make sense, but the refactor in ratis need a long time comes into ozone. If i have time, i will take a look. Close this PR now

@errose28
Copy link
Contributor

Thanks @Xushaohong, I will close this for now. If you need a temporary workaround to get data deleted you can restart the SCMs to force the logs to flush.

@errose28 errose28 closed this Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants