Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: fix cond that prevents sending MLock messages to starting MDS #21577

Closed
wants to merge 1 commit into from

Conversation

batrick
Copy link
Member

@batrick batrick commented Apr 21, 2018

Fixes: http://tracker.ceph.com/issues/23812

Signed-off-by: Patrick Donnelly pdonnell@redhat.com

@batrick
Copy link
Member Author

batrick commented Apr 21, 2018

This is not quite right. That actual fix is to change MDSMap::is_degraded to return true if a rank is starting. I'll fix that shortly. (Time for dinner!)

@@ -890,7 +890,7 @@ void Locker::eval_gather(SimpleLock *lock, bool first, bool *pneed_issue, list<M
return;
}

if (!mds->is_cluster_degraded() ||
if (!mds->is_cluster_degraded() &&
mds->mdsmap->get_state(auth) >= MDSMap::STATE_REJOIN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not wrong. !mds->is_cluster_degraded() check is an optimization, it avoids checking target mds' state for each message

@batrick
Copy link
Member Author

batrick commented Apr 21, 2018

@ukernel PTAL. It looks to fix the problem in my testing. Please also look at #18278 (comment)

@ukernel
Copy link
Contributor

ukernel commented Apr 21, 2018

The root cause is that mds discovers root inode when it's still in starting state. It should handle lock message after it get replica of root inode. I think a better to discover root inode when mds is active. (just like the code for creating new mds rank). The subtle part is that, when mds is in starting state, it needs to load dirfrag of its mdsdir, then creates a new log segments

@batrick
Copy link
Member Author

batrick commented Apr 23, 2018

@ukernel, that may be the cause but does it need fixed if we do this patch? It seems to me that having one rank in up:starting is actually a degraded state since the rank is unavailable which could prevent lock processing

@batrick
Copy link
Member Author

batrick commented Apr 23, 2018

Actually what I said makes no sense since a up:starting MDS wouldn't be holding any locks or authoritative for any subtrees.

Still, I feel "degraded" might not be the right thing here though and perhaps we just need a different term to represent that the MDS cluster has ranks which are creating/starting/recovering.

@ukernel
Copy link
Contributor

ukernel commented Apr 23, 2018

I think degraded is OK for cluster with failed or recovering mds. If we make mds not discover inode when it's in creating/starting state, I think we don't need to create new state for cluster with creating/starting mds.

@ukernel
Copy link
Contributor

ukernel commented Apr 23, 2018

@batrick do you want me to fix #23812

@batrick
Copy link
Member Author

batrick commented Apr 23, 2018

Sure!

@batrick batrick closed this Apr 23, 2018
@batrick batrick deleted the i23812 branch April 23, 2018 17:51
@batrick batrick restored the i23812 branch May 23, 2018 18:45
@batrick batrick deleted the i23812 branch September 7, 2018 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants