New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics #23026

Merged
merged 1 commit into from Oct 17, 2018

Conversation

@smithfarm
Copy link
Contributor

smithfarm commented Jul 13, 2018

common: fix races when visiting a TrackedOp.
Before the commit following race can happen:
  ```
  A           : OpTracker::visit_ops_in_flight(..., callable leaking TrackedOpRef outside)
  A           : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)
  A with lock : (nref > 0) == true

  B           : TrackedOp::put(), nref := 0 // updating the counter is done without the lock
  B           : OpTracker::unregister_inflight_op()
  B           : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)

  A with lock : visit() -> TrackedOp::get(), nref := 1
  A with lock : Mutex::Locker::~Locker()

  B with lock : boost::intrusive::list::iterator_to(op)
  B with lock : boost::intrusive::list::erase(iter)
  B with lock : Mutex::Locker::~Locker()

  A           : TrackedOp::put(), nref := 0
  A           : OpTracker::unregister_inflight_op()
  A           : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)
  A with lock : boost::intrusive::list::iterator_to(op) // oops as op doesn't belong to the list anymore
  ```

Fixes: https://tracker.ceph.com/issues/24664
Related-To: https://tracker.ceph.com/issues/24037
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 871cbf0)

@smithfarm smithfarm self-assigned this Jul 13, 2018

@smithfarm smithfarm added this to the mimic milestone Jul 13, 2018

@smithfarm smithfarm added common and removed core labels Jul 13, 2018

@rzarzynski
Copy link
Contributor

rzarzynski left a comment

LGTM.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jul 13, 2018

Let's let this bake in master for a few weeks first. These races are hard to hit.

@smithfarm smithfarm added the DNM label Jul 13, 2018

@gregsfortytwo

This comment has been minimized.

Copy link
Member

gregsfortytwo commented Sep 13, 2018

It's been a couple months; do we think this is ready? (Or we can wait until we're definitely done with the upcoming release; not sure what its branch status is.)

@smithfarm smithfarm added needs-qa mimic-batch-1 and removed DNM labels Oct 3, 2018

@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Oct 3, 2018

@gregsfortytwo Queued for 13.2.3

@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Oct 15, 2018

@yuriw yuriw merged commit bc4ac11 into ceph:mimic Oct 17, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@smithfarm smithfarm deleted the smithfarm:wip-24889-mimic branch Oct 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment