disable restart for destroy-pending repl-dev#605
Conversation
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #605 +/- ##
===========================================
+ Coverage 56.51% 66.51% +10.00%
===========================================
Files 108 109 +1
Lines 10300 10836 +536
Branches 1402 1484 +82
===========================================
+ Hits 5821 7208 +1387
+ Misses 3894 2921 -973
- Partials 585 707 +122 ☔ View full report in Codecov by Sentry. |
xiaoxichen
left a comment
There was a problem hiding this comment.
also one thing i am not clear ,
void LogDev::handle_unopened_log_stores(bool format) {
this seems trying to similar target to GC the leaked logstores, it doesnt take care the logdev but it is not hard to add that when a logdev has zero open log store we GC the logdev?
I think better to have one clear solution, if we believe we have handled the GC here, the handle_unopened_log_stores should be change to ASSERT or LOG_ERROR
xiaoxichen
left a comment
There was a problem hiding this comment.
lgtm
feel free to merge if want to create a ticket and work on enhancement later
this is a good question. after go through the code, the logic is as follows. 1 destroy log store when log truncation happens , if all the log entries of this log store is truncated, it will be permanently destroyed. 2 destroy logdev coming to repl_dev case, if we do not open log dev, it will also be permanently destroyed. |
fba9034 to
4933dfb
Compare
| // skip it. | ||
|
|
||
| // 3 logdev will be destroyed in delete_unopened_logdevs() if we don't open it(create repl_dev) here, so skip | ||
| // it. |
There was a problem hiding this comment.
we need do nothing here, since if we do not create the repl_dev , the related log_dev will not be opened, as a result , the log dev will be eventually detroyed at
when we try to destroy a repl_dev , we will first mark it to destroy-pending state, and then a background gc thread will try to find it periodically and permanently destroy it. however, if crash happens before it is permanently destroyed, then there will be some issue left.
1 a destroy-pending repl-dev will not be put into
m_rd_map, soraft_group_config_foundwill return a nullptr for this repl_dev, and thus repl_dev->restart will cause a nullpointer fault(fixed).2 when permanently destroy a repl_dev we will
if crash happens after
m_rd_sb.destroy(), but beforem_data_journal->remove_store(), we will have no chance to reclaim log related resource for this repl_dev.this pr checks and reclaims the resource when start and destory repl_dev superblk only after all the related resource are reclaimed