Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/rdma: fix Tx buffer leakage that can introduce "heartbeat no reply" #18053

Merged
merged 3 commits into from
Oct 3, 2017

Commits on Sep 30, 2017

  1. msg/async/rdma: fix Tx buffer leakage which can introduce "heartbeat no

    reply" due to out of Tx buffers, this can be reproduced by marking some
    OSDs down in a big Ceph cluster, say 300+ OSDs.
    
    rootcause: when RDMAStack wants to delete faulty connections there are
    chances that those QPs still have inflight CQEs, thus inflight Tx
    buffers; without waiting for them to complete, Tx buffer pool will run
    out of buffers finally.
    
    fix: ideally the best way to fix this bug is to destroy QPs gracefully
    such as to_dead(), we now just reply on the number of Tx WQE and CQE to
    avoid buffer leakage; RDMAStack polling is always running so we are safe
    to simply bypass some QPs that are not in 'complete' state.
    
    Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
    ownedu committed Sep 30, 2017
    Configuration menu
    Copy the full SHA
    92c3499 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2017

  1. Addressing CR comments from tchaikov (Kefu Chai).

    Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
    ownedu committed Oct 1, 2017
    Configuration menu
    Copy the full SHA
    303e640 View commit details
    Browse the repository at this point in the history
  2. Addressing CR comments from alex-mikheev (Alex Mikheev), to use a single

    atomic counter for inflight Tx CQEs.
    
    Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
    ownedu committed Oct 1, 2017
    Configuration menu
    Copy the full SHA
    e323771 View commit details
    Browse the repository at this point in the history