New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jewel mds: order directories by hash and fix simultaneous readdir races #9655
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
don't distinguish leftmost frag from other frags. always use 2 as first entry's offset. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 6572c2a) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit c41ceb9) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Current code saves the readdir result into MedaRequest, then updates dir_result_t according to MetaRequest. I can't see any reason why we need to do this. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit db5d60d) Signed-off-by: Greg Farnum <gfarnum@redhat.com
This gives us stable ordering of dentries. (Previously ordering of dentries changes after directory gets fragmented) Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit f483224) Signed-off-by: Greg Farnum <gfarnum@redhat.com
so that we can introduce new flags for readdir reply. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 92cfbdf) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Client::seekdir doesn't reset dirp->at_cache_name for a forward seek within same frag. So the dentry with name == at_cache_name may not be the one prior to the readdir postion. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 0e32115) Signed-off-by: Greg Farnum <gfarnum@redhat.com
This is preparation for using hash value as dentry 'offset' Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit bd6546e) Signed-off-by: Greg Farnum <gfarnum@redhat.com
If MDS sorts dentries in dirfrag in hash order, we use hash value to compose dentry offset. dentry offset is: (0xff << 52) | ((24 bits hash) << 28) | (the nth entry hash hash collision) This offset is stable across directory fragmentation. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 680766e) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Now the ordering of dentries is stable across directory fragmentation. There is no need to reset readdir offset if directory get fragmented in the middle of readdir. Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 98a01af) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 9b17d14) Signed-off-by: Greg Farnum <gfarnum@redhat.com
We close Inode::dir when it's empty. Once closing the dir, we lose track of {release,ordered}_count. This causes direcotry to be wrongly marked as complete. (dir is trimmed to empty in the middle of readdir) Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 235fcf6) Signed-off-by: Greg Farnum <gfarnum@redhat.com
Current readdir code uses list to track the order of the dentries in readdir replies. When handling a readdir reply, it pushes the resulting dentries to the back of directory's dentry_list. After readdir finishes, the dentry_list reflects how MDS sorts dentries. This method is racy when there are simultaneous readdirs. The fix is use vector instead of list to trace how dentries are sorted in its parent directory. As long as shared_gen doesn't change, each dentry is at fixed position of the vector. So cocurrent readdirs do not affect each other. Fixes: http://tracker.ceph.com/issues/15508 Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit 9d297c5) Signed-off-by: Greg Farnum <gfarnum@redhat.com
gregsfortytwo
changed the title
Jewel mds: order directories by hash and fix simultaneous readdir races
DNM Jewel mds: order directories by hash and fix simultaneous readdir races
Jun 12, 2016
DNM until testing is done (shortly). |
Changelog:
|
gregsfortytwo
changed the title
DNM Jewel mds: order directories by hash and fix simultaneous readdir races
Jewel mds: order directories by hash and fix simultaneous readdir races
Jun 13, 2016
http://pulpito.ceph.com/gregf-2016-06-12_19:51:59-kcephfs-greg-fs-jewel-testing---basic-mira/ SELinux failures, and missing support for pool namespace vxattrs (just an old kernel, I think). |
http://pulpito.ceph.com/gregf-2016-06-12_16:01:34-fs-greg-fs-jewel-testing---basic-mira/ had some OSD issues but is otherwise good. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
http://tracker.ceph.com/issues/16251
Ordering directories by hash makes readdir stable across directory fragmentation, and allows
an easier fix for racing readdirs on the client side.