New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd-mirror: fix potential infinite loop when formatting status message #20349
Conversation
@@ -201,16 +202,16 @@ void ReplayStatusFormatter<I>::handle_update_tag_cache(uint64_t master_tag_tid, | |||
tag_data.predecessor.mirror_uuid != | |||
librbd::Journal<>::ORPHAN_MIRROR_UUID) { | |||
dout(20) << "hit remote image non-primary epoch" << dendl; | |||
tag_data.predecessor.tag_tid = mirror_tag_tid; | |||
tag_data.predecessor = {"", true, mirror_tag_tid, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: might as well just set the tag_tid
to zero since you cannot make any progress in the decode loop: tag_data.predecessor = {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated. I also had to update send_update_tag_cache to test for master_tag_tid == 0
as a termination condition.
@@ -201,16 +202,16 @@ void ReplayStatusFormatter<I>::handle_update_tag_cache(uint64_t master_tag_tid, | |||
tag_data.predecessor.mirror_uuid != | |||
librbd::Journal<>::ORPHAN_MIRROR_UUID) { | |||
dout(20) << "hit remote image non-primary epoch" << dendl; | |||
tag_data.predecessor.tag_tid = mirror_tag_tid; | |||
tag_data.predecessor = {"", true, mirror_tag_tid, 0}; | |||
} else if (tag_data.predecessor.tag_tid == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this conditional can now be dropped since the tag is already zeroed if the retrieve/decode failed
5d85e6c
to
cf9d2de
Compare
<< mirror_tag_tid << dendl; | ||
|
||
if (master_tag_tid == mirror_tag_tid) { | ||
if (master_tag_tid == 0 || master_tag_tid == mirror_tag_tid || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tag_tid
of zero is perfectly valid so it should still be loaded. Also, should master_tag_tid == mirror_tag_tid
be changed to master_tag_tid <= mirror_tag_tid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we start from tag_tid = 1?
Because I saw tag_tid of zero only for mirror_position, which I supposed was only for cases when it was initialized with cls::journal::ObjectPosition()
and was not filled with a real position.
I saw a case when it tried to fetch tag_tid of zero and failed:
2018-02-06 15:53:46.602 7fcf89ffb700 -1 rbd::mirror::image_replayer::ReplayStatusFormatter: 0x7fcf2000d5e0 handle_update_tag_cache: error retrieving tag 0: (2) No such file or directory
Also, what predecessor.tag_tid
is expected for this tag_tid of zero? I suppose 0, and then it would try to fetch again.
Anyway, I agree about master_tag_tid <= mirror_tag_tid
condition, and then it will include master_tag_tid == 0
too, so no need to explicitly test this. Updating.
@dillaman the update pushed |
master = {0, tag_data.predecessor.tag_tid, tag_data.predecessor.entry_tid}; | ||
} | ||
if (master.tag_tid == mirror_tag_tid) { | ||
m_entries_behind_master += master.entry_tid - m_mirror_position.entry_tid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should probably protect against this going massively negative as well (if mirror is ahead of master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated.
The improvements include: - tag_tid values should always be increasing, so loop only if master.tag_tid > mirror_tag_tid in calculate_behind_master_or_send_update; - in send_update_tag_cache don't refetch a tag if it is already in the cache; - make fake tags with tag_data.predecessor.tag_tid set to zero; - make sure the new tag is inserted to the cache if an old entry with this id happens to exist. Fixes: http://tracker.ceph.com/issues/22932 Signed-off-by: Mykola Golub <mgolub@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The improvements include:
master.tag_tid > mirror_tag_tid in calculate_behind_master_or_send_update;
cache;
are properly logged
with this id happens to exist.
Fixes: http://tracker.ceph.com/issues/22932
Signed-off-by: Mykola Golub mgolub@suse.com