Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix deferred_queue locking #38934

Merged
merged 1 commit into from
Jan 20, 2021

Conversation

ifed01
Copy link
Contributor

@ifed01 ifed01 commented Jan 16, 2021

#30027 introduced a gap in osr
protection (in _deferred_queue()) which could cause improper deferred_pending value while
processing osr from _deferred_aio_finish().
As a result both segmentation fault in _deferred_aio_finish() or deadlock could occur.

Fixes: https://tracker.ceph.com/issues/48776

Signed-off-by: Igor Fedotov ifedotov@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

ceph#30027 introduced a gap in osr
protection (in _deferred_queue()) which could cause improper deferred_pending value while
processing osr from _deferred_aio_finish().
As a result both segmentation fault in _deferred_aio_finish() or deadlock could occur.

Fixes: https://tracker.ceph.com/issues/48776

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
@majianpeng
Copy link
Member

@ifed01 . Sorry, i don't know the root cause from the git log. Can you give more explain? Thanks !

@ifed01
Copy link
Contributor Author

ifed01 commented Jan 18, 2021

@ifed01 . Sorry, i don't know the root cause from the git log. Can you give more explain? Thanks !

@majianpeng - current implementation of BlueStore::_deferred_queue() has an unprotected gap when OSR's deferred_pending is reset and preserved in tmp local var. This might cause concurring _deferred_aio_finish() behave differently and trigger an improper removal from deferred_queue (e.g. https://tracker.ceph.com/issues/48876):

    if (!osr->deferred_pending) {
      dout(20) << __func__ << " dequeueing" << dendl;
      {
	deferred_lock.lock();
	auto q = deferred_queue.iterator_to(*osr);
	deferred_queue.erase(q);
	deferred_lock.unlock();
      }
      osr->deferred_lock.unlock();
    } else {

Another appearance (which I haven't investigated deeply but it dissolves after my patch) is an occasionally hanging ceph_test_objectstore's IORemount/2 test case: https://tracker.ceph.com/issues/48776

Both tickets are quite easily reproducible by multiple running of the above mentioned ceph_test_objectstore's IORemount/2 test case

@majianpeng
Copy link
Member

majianpeng commented Jan 19, 2021

@ifed01 . w/ your pr. In __deffered_queue func,

  txc->osr->deferred_lock.lock();
  {
    if (!txc->osr->deferred_pending) {
      tmp = new DeferredBatch(cct, txc->osr.get());
      txc->osr->deferred_lock.unlock();
// >>For this case, it still don't insert osr into deferred_queue. And  after unlock, _deffered_aio_finish 
//   will call deferred_queue.erase(). So i think for this case we can't unlock. and
//   for deferred_pending != NULL, it can unlock.
      need_lock_again = true;
    } else {
      tmp  = txc->osr->deferred_pending;
    }
  }

@majianpeng
Copy link
Member

@ifed01 . For the hung problem is :

 _deferred_queue()                                                                      _deferred_aio_finish()
txc->osr->deferred_lock.lock();
 tmp  = txc->osr->deferred_pending;
txc->osr->deferred_pending = nullptr;
 txc->osr->deferred_lock.unlock();
                                                                               osr->deferred_lock.lock();
                                                                               osr->deferred_running = nullptr;
                                                                                if (!osr->deferred_pending) {
                                                                              dout(20) << __func__ << " dequeueing" << dendl;
                                                                              {
                                                                                      deferred_lock.lock();
                                                                                      auto q = deferred_queue.iterator_to(*osr);
                                                                                       deferred_queue.erase(q);
                                                                                         deferred_lock.unlock();
                                                                                 }
                                                                                  osr->deferred_lock.unlock();

  txc->osr->deferred_lock.lock();                                                                             
 if (!txc->osr->deferred_running && (tmp->txcs.size() == 1)) {

tmp->txcs.size() > 1. So osr can't insert deferred_queue.
when drain_osr, it will always waitting.

@ifed01
Copy link
Contributor Author

ifed01 commented Jan 19, 2021

@ifed01 . w/ your pr. In __deffered_queue func,

  txc->osr->deferred_lock.lock();
  {
    if (!txc->osr->deferred_pending) {
      tmp = new DeferredBatch(cct, txc->osr.get());
      txc->osr->deferred_lock.unlock();
// >>For this case, it still don't insert osr into deferred_queue. And  after unlock, _deffered_aio_finish 
//   will call deferred_queue.erase(). So i think for this case we can't unlock. and
//   for deferred_pending != NULL, it can unlock.
      need_lock_again = true;
    } else {
      tmp  = txc->osr->deferred_pending;
    }
  }

Agree that we should keep lock for deferred_pending == null case but IMO for deferred_pending we should hold it as well.
Perhaps this is rather a general consideration but since we're performing some manipulations on tmp which is shared as it's equal to deferred_pending) we need to protect it with lock. Not sure this makes much sense for now but just to avoid any issues with future code modifications.

@jtlayton
Copy link
Contributor

jtlayton commented Jan 20, 2021

I can anecdotally confirm that this PR makes the OSD much more stable. It was routinely crashing for me in testing before I applied it. Since then, no OSD crashes.

@majianpeng
Copy link
Member

@ifed01 . w/ your pr. In __deffered_queue func,

  txc->osr->deferred_lock.lock();
  {
    if (!txc->osr->deferred_pending) {
      tmp = new DeferredBatch(cct, txc->osr.get());
      txc->osr->deferred_lock.unlock();
// >>For this case, it still don't insert osr into deferred_queue. And  after unlock, _deffered_aio_finish 
//   will call deferred_queue.erase(). So i think for this case we can't unlock. and
//   for deferred_pending != NULL, it can unlock.
      need_lock_again = true;
    } else {
      tmp  = txc->osr->deferred_pending;
    }
  }

Agree that we should keep lock for deferred_pending == null case but IMO for deferred_pending we should hold it as well.
Perhaps this is rather a general consideration but since we're performing some manipulations on tmp which is shared as it's equal to deferred_pending) we need to protect it with lock. Not sure this makes much sense for now but just to avoid any issues with future code modifications.

I agree your point.

@neha-ojha neha-ojha merged commit aa3eec2 into ceph:master Jan 20, 2021
@neha-ojha
Copy link
Member

@ifed01 could you please create a pacific backport PR for this?

@neha-ojha
Copy link
Member

@ifed01 could you please create a pacific backport PR for this?

#38989

@ifed01 ifed01 deleted the wip-ifed-fix-48776 branch January 26, 2021 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants