Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/OSDMonitor: Respect paxos_propose_interval #22243

Merged
merged 1 commit into from
May 26, 2018

Conversation

xiaoxichen
Copy link
Contributor

@xiaoxichen xiaoxichen commented May 25, 2018

This change doesn't looks right and causing twice as much proposal as we targeted to (limited by paxos_propose_interval).

Imaging we have a sequence of pg_temp/up_thru during a large recovery.

now =T
The 1st up_thru/pg_temp will go through fast path and trigger propose at T + paxos_min_wait, last_attempted_minwait_time = T.

now = T+ paxos_min_wait
The [2, K] up_thru will failed by (now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and go through PaxosService::should_propose, which will schedule the propose at) T+paxos_propose_interval

now= T+ paxos_propose_interval + paxos_min_wait
The K+1 up_thru/pg_temp comes, both
(now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and
(now - paxos->get_last_commit_time() > g_conf->paxos_min_wait)
satisfied, so we trigger another propose at now+ paxos_min_wait = T+ paxos_propose_interval +paxos_min_wait.

clearly we made TWO proposal in each paxos_propose_interval.

This reverts commit ca65210.

Signed-off-by: Xiaoxi CHEN xiaoxchen@ebay.com

This change doesn't looks right and causing twice as much proposal as we targeted to (limited by paxos_propose_interval).

Imaging we have a sequence of pg_temp/up_thru during a large recovery.

now =T
The 1st up_thru/pg_temp will go through fast path and trigger propose at T + paxos_min_wait, last_attempted_minwait_time = T.

now = T+ paxos_min_wait
The [2, K] up_thru will failed by (now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and go through PaxosService::should_propose, which will schedule the propose at) T+paxos_propose_interval

now= T+ paxos_propose_interval + paxos_min_wait
The K+1 up_thru/pg_temp comes, both (now - last_attempted_minwait_time > g_conf->paxos_propose_interval
and now - paxos->get_last_commit_time() > g_conf->paxos_min_wait satisfied, so we trigger another propose
in now+ paxos_min_wait = T+ paxos_propose_interval +paxos_min_wait.

clearly we made TWO proposal in each paxos_propose_interval.

This reverts commit ca65210.

Signed-off-by: Xiaoxi CHEN <xiaoxchen@ebay.com>
Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm following the logic about how this causes us to double our commit rate. I think it's correctly letting us do one fast commit, and then delaying every subsequent commit to the paxos_propose_interval period?

dout(15) << " propose as fast as possible for up_thru/pg_temp" << dendl;

utime_t now = ceph_clock_now();
if (now - last_attempted_minwait_time > g_conf->paxos_propose_interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an issue with the logic, I think it's here — perhaps we need to take the most recent of last_attempted_minwait_time and paxos->get_last_commit_time() in this first check of the if-block?
My head's a little fuzzy right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly it correctly letting us do one fast .the problem is how to define first commit, the code logic now make it first whenever after a regular commit, so it goes to fast/slow/fast/slow loop instead of fast/slow/slow/slow.

And yes we can fix the logical with adding more if and make it more fuzzy. but it looks to me like the paxoservice::should_propose already implement what we want and clear enough.

 utime_t now = ceph_clock_now();
    if ((now - paxos->last_commit_time) > g_conf->paxos_propose_interval)
      delay = (double)g_conf->paxos_min_wait;
    else
      delay = (double)(g_conf->paxos_propose_interval + paxos->last_commit_time
		       - now);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it now.

@liewegas liewegas changed the title mon/OSDMonitor: Respect paxos_propose_interval(revert ca652104fe91ac41c7c1788a2907178c36fbe6ef) mon/OSDMonitor: Respect paxos_propose_interval May 25, 2018
@liewegas
Copy link
Member

can you prepare a backport for mimic and/or luminous?

@liewegas liewegas merged commit 44283e5 into ceph:master May 26, 2018
liewegas added a commit that referenced this pull request May 26, 2018
* refs/pull/22243/head:
	Revert "mon: no delay for single message MSG_ALIVE and MSG_PGTEMP"

Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
@xiaoxichen
Copy link
Contributor Author

sure, will do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants