Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: fix potential infinite loop when formatting status message #20349

Merged
merged 1 commit into from Feb 9, 2018

Conversation

trociny
Copy link
Contributor

@trociny trociny commented Feb 7, 2018

The improvements include:

  • tag_tid values should always be increasing, so loop only if
    master.tag_tid > mirror_tag_tid in calculate_behind_master_or_send_update;
  • in send_update_tag_cache don't refetch a tag if it is already in the
    cache;
  • make fake tags with predecessor.commit_valid set to true, so they
    are properly logged
  • make sure the new tag is inserted to the cache if an old entry
    with this id happens to exist.

Fixes: http://tracker.ceph.com/issues/22932
Signed-off-by: Mykola Golub mgolub@suse.com

@@ -201,16 +202,16 @@ void ReplayStatusFormatter<I>::handle_update_tag_cache(uint64_t master_tag_tid,
tag_data.predecessor.mirror_uuid !=
librbd::Journal<>::ORPHAN_MIRROR_UUID) {
dout(20) << "hit remote image non-primary epoch" << dendl;
tag_data.predecessor.tag_tid = mirror_tag_tid;
tag_data.predecessor = {"", true, mirror_tag_tid, 0};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might as well just set the tag_tid to zero since you cannot make any progress in the decode loop: tag_data.predecessor = {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated. I also had to update send_update_tag_cache to test for master_tag_tid == 0 as a termination condition.

@@ -201,16 +202,16 @@ void ReplayStatusFormatter<I>::handle_update_tag_cache(uint64_t master_tag_tid,
tag_data.predecessor.mirror_uuid !=
librbd::Journal<>::ORPHAN_MIRROR_UUID) {
dout(20) << "hit remote image non-primary epoch" << dendl;
tag_data.predecessor.tag_tid = mirror_tag_tid;
tag_data.predecessor = {"", true, mirror_tag_tid, 0};
} else if (tag_data.predecessor.tag_tid == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this conditional can now be dropped since the tag is already zeroed if the retrieve/decode failed

@trociny trociny force-pushed the wip-22932 branch 2 times, most recently from 5d85e6c to cf9d2de Compare February 7, 2018 15:17
<< mirror_tag_tid << dendl;

if (master_tag_tid == mirror_tag_tid) {
if (master_tag_tid == 0 || master_tag_tid == mirror_tag_tid ||
Copy link

@dillaman dillaman Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tag_tid of zero is perfectly valid so it should still be loaded. Also, should master_tag_tid == mirror_tag_tid be changed to master_tag_tid <= mirror_tag_tid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we start from tag_tid = 1?

Because I saw tag_tid of zero only for mirror_position, which I supposed was only for cases when it was initialized with cls::journal::ObjectPosition() and was not filled with a real position.

I saw a case when it tried to fetch tag_tid of zero and failed:

2018-02-06 15:53:46.602 7fcf89ffb700 -1 rbd::mirror::image_replayer::ReplayStatusFormatter: 0x7fcf2000d5e0 handle_update_tag_cache: error retrieving tag 0: (2) No such file or directory

Also, what predecessor.tag_tid is expected for this tag_tid of zero? I suppose 0, and then it would try to fetch again.

Anyway, I agree about master_tag_tid <= mirror_tag_tid condition, and then it will include master_tag_tid == 0 too, so no need to explicitly test this. Updating.

@trociny
Copy link
Contributor Author

trociny commented Feb 7, 2018

@dillaman the update pushed

master = {0, tag_data.predecessor.tag_tid, tag_data.predecessor.entry_tid};
}
if (master.tag_tid == mirror_tag_tid) {
m_entries_behind_master += master.entry_tid - m_mirror_position.entry_tid;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should probably protect against this going massively negative as well (if mirror is ahead of master)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

The improvements include:

- tag_tid values should always be increasing, so loop only if
  master.tag_tid > mirror_tag_tid in calculate_behind_master_or_send_update;
- in send_update_tag_cache don't refetch a tag if it is already in the
  cache;
- make fake tags with tag_data.predecessor.tag_tid set to zero;
- make sure the new tag is inserted to the cache if an old entry
  with this id happens to exist.

Fixes: http://tracker.ceph.com/issues/22932
Signed-off-by: Mykola Golub <mgolub@suse.com>
Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dillaman dillaman merged commit 6c3d5aa into ceph:master Feb 9, 2018
@trociny trociny deleted the wip-22932 branch February 9, 2018 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants