Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jewel mds: order directories by hash and fix simultaneous readdir races #9655

Merged
merged 12 commits into from Jun 13, 2016

Commits on Jun 12, 2016

  1. client: simplify 'offset in frag'

    don't distinguish leftmost frag from other frags. always use 2 as
    first entry's offset.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 6572c2a)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    98e36d1 View commit details
    Browse the repository at this point in the history
  2. client: don't allocate dir_result_t::buffer dynamically

    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit c41ceb9)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    75edb5b View commit details
    Browse the repository at this point in the history
  3. client: save readdir result into dir_result_t directly

    Current code saves the readdir result into MedaRequest, then updates
    dir_result_t according to MetaRequest. I can't see any reason why
    we need to do this.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit db5d60d)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    288160b View commit details
    Browse the repository at this point in the history
  4. mds: sort dentries in CDir in hash order

    This gives us stable ordering of dentries. (Previously ordering of
    dentries changes after directory gets fragmented)
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit f483224)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    9ce73cd View commit details
    Browse the repository at this point in the history
  5. mds: define end/complete in readdir reply as single u16 flags

    so that we can introduce new flags for readdir reply.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 92cfbdf)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    cf26125 View commit details
    Browse the repository at this point in the history
  6. client: fix cached readdir after seekdir

    Client::seekdir doesn't reset dirp->at_cache_name for a forward seek
    within same frag. So the dentry with name == at_cache_name may not be
    the one prior to the readdir postion.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 0e32115)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    8361b98 View commit details
    Browse the repository at this point in the history
  7. client: record 'offset' for each entry of dir_result_t::buffer

    This is preparation for using hash value as dentry 'offset'
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit bd6546e)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    a65b3ef View commit details
    Browse the repository at this point in the history
  8. client: using hash value to compose dentry offset

    If MDS sorts dentries in dirfrag in hash order, we use hash value to
    compose dentry offset. dentry offset is:
    
      (0xff << 52) | ((24 bits hash) << 28) |
      (the nth entry hash hash collision)
    
    This offset is stable across directory fragmentation.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 680766e)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    3fe5a09 View commit details
    Browse the repository at this point in the history
  9. mds: don't reset readdir offset if client supports hash order dentry

    Now the ordering of dentries is stable across directory fragmentation.
    There is no need to reset readdir offset if directory get fragmented
    in the middle of readdir.
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 98a01af)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    51a7506 View commit details
    Browse the repository at this point in the history
  10. ceph_test_libcephfs: check order of entries in readdir result

    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 9b17d14)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    f5db278 View commit details
    Browse the repository at this point in the history
  11. client: move dir_{release,ordered}_count into class Inode

    We close Inode::dir when it's empty. Once closing the dir, we lose
    track of {release,ordered}_count. This causes direcotry to be wrongly
    marked as complete. (dir is trimmed to empty in the middle of readdir)
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 235fcf6)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    ba9fa11 View commit details
    Browse the repository at this point in the history
  12. client: fix simultaneous readdirs race

    Current readdir code uses list to track the order of the dentries
    in readdir replies.  When handling a readdir reply, it pushes the
    resulting dentries to the back of directory's dentry_list. After
    readdir finishes, the dentry_list reflects how MDS sorts dentries.
    
    This method is racy when there are simultaneous readdirs. The fix
    is use vector instead of list to trace how dentries are sorted in
    its parent directory. As long as shared_gen doesn't change, each
    dentry is at fixed position of the vector. So cocurrent readdirs
    do not affect each other.
    
    Fixes: http://tracker.ceph.com/issues/15508
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    (cherry picked from commit 9d297c5)
    
    Signed-off-by: Greg Farnum <gfarnum@redhat.com
    ukernel authored and gregsfortytwo committed Jun 12, 2016
    Copy the full SHA
    d61e3dd View commit details
    Browse the repository at this point in the history