-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add more detail info for metablk debug #426
Conversation
8787390
to
3a7a95e
Compare
try { | ||
m_sb_vdev->sync_write((const char*)m_ssb, block_size(), m_ssb->bid); | ||
} catch (std::exception& e) { HS_REL_ASSERT(false, "exception happen during write {}", e.what()); } | ||
auto error = m_sb_vdev->sync_write((const char*)m_ssb, block_size(), m_ssb->bid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about create a new function MetaBlkService::sync_write
and move all error handling to that function, to avoid duplicated error handling?
void MetaBlkService::sync_write(const char* data, uint32_t size, BlkId const& bid) {
// copy line: 199~206 here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do that , we also have to handle the the error returned by MetaBlkService::sync_write
whereever it is called. that is almost the same as we use m_sb_vdev->sync_write
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. MetaBlkService::sync_write::sync_write
is returning void and only error handling happens in MetaBlkService::sync_write
, not in the caller. So instead of 4 places doing the same error handling (e.g. a release assert), only one place is doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , will do this in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
the metablk crc mismatch issue is not easy to be reproduced, and one most possible reason is write failure, which will lead to crc mismatch.
as we discussed in homestore meeting, this PR add more detail debug info which will be helpful when the issue occurs. also in this PR, we will terminate the process with an assert failure so that it will not lead to later CRC mismatch when recovery
the strategy of handling write failures of different service of homestore will be written up to a doc and discussed later on