Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paxos fixes #457

Merged
merged 7 commits into from Jul 23, 2013
Merged

paxos fixes #457

merged 7 commits into from Jul 23, 2013

Conversation

liewegas
Copy link
Member

No description provided.

@liewegas
Copy link
Member Author

there is a set of logs referenced in the bug, #5698

@@ -370,9 +394,13 @@ void Paxos::handle_last(MMonPaxos *last)
return;
}

assert(g_conf->paxos_kill_at != 1);

// store any committed values if any are specified in the message
store_state(last);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to guard this function from old values too, not just the changes to our uncommitted values down below. store_state() looks like it tramples over whatever's there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share/store_state are only dealing with committed values, so if they trample over an uncommitted value it is with good reason. it's also only writing new stuff outside of [first_committed,last_committed]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good enough.

@gregsfortytwo
Copy link
Member

Okay, this all looks good to me modulo a couple commit message typos.
Not sure if/when we want to merge this though, so I'll leave that for you guys to decide on.

Sage Weil added 7 commits July 22, 2013 14:12
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
We may have an uncommitted value from our perspective (it is our lc + 1)
when the collector has a much larger lc (because we have been out for
the last few rounds).  Only share an uncommitted value if it is in fact
the next value.

Signed-off-by: Sage Weil <sage@inktank.com>
If an older peer sends an uncommitted value, make sure we only take it
if it is in the future, and at least as new as any current uncommitted
value.

(Prior to the previous patch, peers could send values from long-past
rounds.  The pn values are also bogus.)

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
During the collect/last exchange, peers share any uncommitted values
with the leader.  They are supposed to also share the pn under which
that value was accepted, but were instead using the just-accepted pn
value.  This effectively meant that we *always* took the uncommitted
value; if there were multiples, which one we accepted depended on what
order the LAST messages arrived, not which pn the values were generated
under.

The specific failure sequence I observed:

 - collect
  - learned uncommitted value for 262 from myself
  - send collect with pn 901
 - got last with pn 901 (incorrect) for 200 (old) from peer
  - discard our own value, remember the other
 - finish collect phase
  - ignore old uncommitted value

Fix this by storing a pending_v and pending_pn value whenever we accept
a value.  Use this to send an appropriate pn value in the LAST reply
so that the leader can make it's decision about which uncommitted value
to accept based on accurate information.  Also use it when we learn
the uncommitted value from ourselves.

We could probably be more clever about storing less information here,
for example by omitting pending_v and clearing pending_pn at the
appropriate point, but that would be more fragile.  Similarly, we could
store a pn for *every* commit if we wanted to lay some groundwork for
having multiple uncommitted proposals in flight, but I don't want to
speculate about what is necessary or sufficient for a correct solution
there.

Fixes: #5698
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
liewegas pushed a commit that referenced this pull request Jul 23, 2013
paxos fixes

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
@liewegas liewegas merged commit 9626f77 into next Jul 23, 2013
@liewegas liewegas deleted the wip-paxos branch July 23, 2013 01:28
mgfritch added a commit to mgfritch/ceph that referenced this pull request Mar 11, 2022
mgr/prometheus: Fix regression with OSD/host details/overview dashboards

Reviewed-by: Michael Fritch <mfritch@suse.com>
tobias-urdin pushed a commit to tobias-urdin/ceph that referenced this pull request Aug 2, 2023
remove tests that fail on boto3's parameter validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants