Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: handle fragment notify race #24580

Merged
merged 2 commits into from
Nov 29, 2018
Merged

mds: handle fragment notify race #24580

merged 2 commits into from
Nov 29, 2018

Conversation

ukernel
Copy link
Contributor

@ukernel ukernel commented Oct 15, 2018

In the nornal case, mds does not trim dir inode whose child dirfrags
are likely being fragmented (see trim_inode()). But when fragmenting
subtree roots, following race can happen:

  • mds.a (auth mds of dirfrag) sends fragment_notify message to
    mds.c and drops wrlock on dirfragtreelock.
  • mds.b (auth mds of dir inode) changes dirfragtreelock state to
    SYNC and send lock message mds.c
  • mds.c receives the lock message and changes dirfragtreelock state
    to SYNC
  • mds.c trim dirfrag and dir inode from its cache
  • mds.c receives the fragment_notify message

Fixes: http://tracker.ceph.com/issues/36035
Signed-off-by: "Yan, Zheng" zyan@redhat.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@ukernel ukernel added bug-fix cephfs Ceph File System labels Oct 15, 2018
In the nornal case, mds does not trim dir inode whose child dirfrags
are likely being fragmented (see trim_inode()). But when fragmenting
subtree roots, following race can happen:

- mds.a (auth mds of dirfrag) sends fragment_notify message to
  mds.c and drops wrlock on dirfragtreelock.
- mds.b (auth mds of dir inode) changes dirfragtreelock state to
  SYNC and send lock message mds.c
- mds.c receives the lock message and changes dirfragtreelock state
  to SYNC
- mds.c trim dirfrag and dir inode from its cache
- mds.c receives the fragment_notify message

The fix is asking replicas to ack fragment_notify message, unlocking
dirfragtreelock after mds gets all acks.

Fixes: http://tracker.ceph.com/issues/36035
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to bump cluster protocol?

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
batrick added a commit to batrick/ceph that referenced this pull request Nov 22, 2018
* refs/pull/24580/head:
	mds: bump mds protocol version
	mds: handle fragment notify race
@batrick batrick merged commit 6d83ba6 into ceph:master Nov 29, 2018
batrick added a commit that referenced this pull request Nov 29, 2018
* refs/pull/24580/head:
	mds: bump mds protocol version
	mds: handle fragment notify race

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
@ukernel ukernel deleted the wip-36035 branch November 13, 2019 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants