Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#11481: MDS resilience to weird mdsmaps #4658

Merged
merged 5 commits into from May 19, 2015

Conversation

Projects
None yet
4 participants
@jcsp
Copy link
Contributor

commented May 12, 2015

Plus some bonus bits from when I was looking through this code.

boot statemachine wasn't the right tool for the job here, as in order to use it we'd have to have code to translate the (oldstate, state) tuple into an event (and reject state transitions that weren't valid events), by which point we would have done all the validation we need.

@gregsfortytwo gregsfortytwo self-assigned this May 12, 2015

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented May 13, 2015

in tonight's greg-fs-testing to make sure it didn't break anything.

@gregsfortytwo gregsfortytwo assigned jcsp and unassigned gregsfortytwo May 13, 2015

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented May 13, 2015

ubuntu@teuthology:/a/ubuntu-2015-05-12_23:32:41-fs-greg-fs-testing-testing-basic-multi/889677

mds.0.446 Invalid state transition up:replay->up:reconnect

John Spray added some commits May 12, 2015

John Spray
mds: on suicide(), send a DNE beacon to MDSMonitor
...using the same timeout beacon send routine that
was created for damaged().

Signed-off-by: John Spray <john.spray@redhat.com>
John Spray
mon: handle DNE beacon from MDS
...by calling fail_mds_gid for that MDS.

Signed-off-by: John Spray <john.spray@redhat.com>
John Spray
mds: respawn instead of suicide on blacklist
This was already the case in general, but the case
in RecoveryQueue slipped through.

Signed-off-by: John Spray <john.spray@redhat.com>
John Spray
mds: validate the state+rank in MDS map
Especially:
 * once I have been assigned a rank, it
can't be taken away without restarting
the daemon.
 * once I have entered standby, I can
only go upwards through the states.

Fixes: #11481
Signed-off-by: John Spray <john.spray@redhat.com>

@jcsp jcsp force-pushed the wip-11481 branch from 13fdd67 to 0c5b9a2 May 14, 2015

@jcsp

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2015

Updated to handle the cases where the MDS skips past RESOLVE or CLIENTREPLAY.

Also the "handle DNE beacon from MDS" was broken too, because it didn't propose the new OSDMap after blacklisting, that's fixed too.

@jcsp jcsp assigned gregsfortytwo and unassigned jcsp May 14, 2015

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented May 15, 2015

Running through greg-fs-testing again.

John Spray
mds: fix handle_mds_map in standby_replay
Broken by "mds: validate the state+rank in MDS map"

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2015

Pushed a fix. @gregsfortytwo are you planning on scheduling another suite today?

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented May 15, 2015

I think so

@jcsp

This comment has been minimized.

ukernel added a commit that referenced this pull request May 19, 2015

Merge pull request #4658 from ceph/wip-11481
#11481: MDS resilience to weird mdsmaps

@ukernel ukernel merged commit e585ddf into master May 19, 2015

@ukernel ukernel deleted the wip-11481 branch May 19, 2015

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented May 19, 2015

@ukernel please remember to include Reviewed-by: lines when merging — it's important to demonstrate that the reviews have happened, in addition to leaving breadcrumbs for help when things get broken and the release statistics. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.