-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon/OSDMonitor: Respect paxos_propose_interval #22243
Conversation
This change doesn't looks right and causing twice as much proposal as we targeted to (limited by paxos_propose_interval). Imaging we have a sequence of pg_temp/up_thru during a large recovery. now =T The 1st up_thru/pg_temp will go through fast path and trigger propose at T + paxos_min_wait, last_attempted_minwait_time = T. now = T+ paxos_min_wait The [2, K] up_thru will failed by (now - last_attempted_minwait_time > g_conf->paxos_propose_interval) and go through PaxosService::should_propose, which will schedule the propose at) T+paxos_propose_interval now= T+ paxos_propose_interval + paxos_min_wait The K+1 up_thru/pg_temp comes, both (now - last_attempted_minwait_time > g_conf->paxos_propose_interval and now - paxos->get_last_commit_time() > g_conf->paxos_min_wait satisfied, so we trigger another propose in now+ paxos_min_wait = T+ paxos_propose_interval +paxos_min_wait. clearly we made TWO proposal in each paxos_propose_interval. This reverts commit ca65210. Signed-off-by: Xiaoxi CHEN <xiaoxchen@ebay.com>
6083cad
to
44283e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm following the logic about how this causes us to double our commit rate. I think it's correctly letting us do one fast commit, and then delaying every subsequent commit to the paxos_propose_interval period?
dout(15) << " propose as fast as possible for up_thru/pg_temp" << dendl; | ||
|
||
utime_t now = ceph_clock_now(); | ||
if (now - last_attempted_minwait_time > g_conf->paxos_propose_interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an issue with the logic, I think it's here — perhaps we need to take the most recent of last_attempted_minwait_time and paxos->get_last_commit_time() in this first check of the if-block?
My head's a little fuzzy right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly it correctly letting us do one fast .the problem is how to define first
commit, the code logic now make it first
whenever after a regular commit, so it goes to fast/slow/fast/slow loop instead of fast/slow/slow/slow.
And yes we can fix the logical with adding more if
and make it more fuzzy. but it looks to me like the paxoservice::should_propose already implement what we want and clear enough.
utime_t now = ceph_clock_now();
if ((now - paxos->last_commit_time) > g_conf->paxos_propose_interval)
delay = (double)g_conf->paxos_min_wait;
else
delay = (double)(g_conf->paxos_propose_interval + paxos->last_commit_time
- now);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I get it now.
can you prepare a backport for mimic and/or luminous? |
* refs/pull/22243/head: Revert "mon: no delay for single message MSG_ALIVE and MSG_PGTEMP" Reviewed-by: Gregory Farnum <gfarnum@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
sure, will do it |
This change doesn't looks right and causing twice as much proposal as we targeted to (limited by paxos_propose_interval).
Imaging we have a sequence of pg_temp/up_thru during a large recovery.
now =T
The 1st up_thru/pg_temp will go through fast path and trigger propose at T + paxos_min_wait, last_attempted_minwait_time = T.
now = T+ paxos_min_wait
The [2, K] up_thru will failed by (now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and go through PaxosService::should_propose, which will schedule the propose at) T+paxos_propose_interval
now= T+ paxos_propose_interval + paxos_min_wait
The K+1 up_thru/pg_temp comes, both
(now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and
(now - paxos->get_last_commit_time() > g_conf->paxos_min_wait)
satisfied, so we trigger another propose at now+ paxos_min_wait = T+ paxos_propose_interval +paxos_min_wait.
clearly we made TWO proposal in each paxos_propose_interval.
This reverts commit ca65210.
Signed-off-by: Xiaoxi CHEN xiaoxchen@ebay.com