Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: osdc: guard op->on_notify_finish with lock #21834

Merged
merged 1 commit into from May 9, 2018

Conversation

Projects
None yet
2 participants
@tchaikov
Copy link
Contributor

commented May 6, 2018

Fixes: http://tracker.ceph.com/issues/23966
Signed-off-by: Kefu Chai kchai@redhat.com

osdc: guard op->on_notify_finish with lock
Fixes: http://tracker.ceph.com/issues/23966
Signed-off-by: Kefu Chai <kchai@redhat.com>

@tchaikov tchaikov added the bug fix label May 6, 2018

@tchaikov tchaikov added this to the mimic milestone May 6, 2018

@tchaikov tchaikov requested review from jdurgin and dillaman May 6, 2018

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2018

#1  0x00007fc98df33c39 in Context::complete (this=0x556e940743d0, r=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/include/Context.h:77
#2  0x00007fc98df7630c in Objecter::_linger_commit (this=0x556e94018dd0, info=0x556e94074fe0, r=-2, outbl=...) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:596
#3  0x00007fc98df33c39 in Context::complete (this=0x556e940755f0, r=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/include/Context.h:77
#4  0x00007fc98df8bd4e in Objecter::handle_osd_op_reply (this=this@entry=0x556e94018dd0, m=m@entry=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:3548
#5  0x00007fc98df9764b in Objecter::ms_dispatch (this=0x556e94018dd0, m=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:973
#6  0x00007fc98df9cde2 in non-virtual thunk to Objecter::ms_fast_dispatch(Message*) () at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.h:2114
#7  0x00007fc985255486 in Messenger::ms_fast_dispatch (m=0x7fc958001850, this=0x556e940182e0) at /build/ceph-13.1.0-76-ga29c008/src/msg/Messenger.h:638
#8  DispatchQueue::fast_dispatch (this=0x556e94018488, m=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/msg/DispatchQueue.cc:71
#9  0x00007fc9852e5670 in Pipe::reader (this=0x556e94078770) at /build/ceph-13.1.0-76-ga29c008/src/msg/simple/Pipe.cc:1774
#10 0x00007fc9852ec65d in Pipe::Reader::entry (this=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/msg/simple/Pipe.h:51
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: starting.
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: creating pool foo.16966
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: created object 0...
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: finishing.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: notifying object 0.obj
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: watching object 0.obj
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: deleting pool foo.16966
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: starting.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: creating pool foo.16966.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: finishing.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: starting.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: watching object 0.obj
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: test2: got error: run_until_finished: runnable process_7: got error: [16966]: (33) Numerical argument out of domain
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: starting.
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: deleting pool foo.16966.
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: shutting down.

/a//kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/2481186

this should address some failures in http://pulpito.ceph.com/kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/

@tchaikov tchaikov modified the milestones: mimic, core May 6, 2018

@tchaikov tchaikov added the core label May 6, 2018

@tchaikov tchaikov modified the milestones: core, mimic May 6, 2018

@tchaikov

This comment has been minimized.

@dillaman

This comment has been minimized.

Copy link
Contributor

commented May 7, 2018

@tchaikov I think the bug is in IoCtxImpl::notify -- a notify_finish_cond.wait(); should be put here [1] (ignoring the output).

[1] https://github.com/ceph/ceph/blob/mimic/src/librados/IoCtxImpl.cc#L1862

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

@dillaman after applying #21859, the failure does go away. but i think we might need this PR as well. because we should guard check-complete-reset with a lock, otherwise it would not be atomic. what do you think?

@dillaman
Copy link
Contributor

left a comment

lgtm

@tchaikov tchaikov merged commit 1b38039 into ceph:mimic May 9, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@tchaikov tchaikov deleted the tchaikov:mimic-23966 branch May 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.