Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: osdc: guard op->on_notify_finish with lock #21834

Merged
merged 1 commit into from May 9, 2018

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented May 6, 2018

Fixes: http://tracker.ceph.com/issues/23966
Signed-off-by: Kefu Chai kchai@redhat.com

Fixes: http://tracker.ceph.com/issues/23966
Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov tchaikov added this to the mimic milestone May 6, 2018
@tchaikov tchaikov requested review from jdurgin and dillaman May 6, 2018 03:14
@tchaikov
Copy link
Contributor Author

tchaikov commented May 6, 2018

#1  0x00007fc98df33c39 in Context::complete (this=0x556e940743d0, r=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/include/Context.h:77
#2  0x00007fc98df7630c in Objecter::_linger_commit (this=0x556e94018dd0, info=0x556e94074fe0, r=-2, outbl=...) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:596
#3  0x00007fc98df33c39 in Context::complete (this=0x556e940755f0, r=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/include/Context.h:77
#4  0x00007fc98df8bd4e in Objecter::handle_osd_op_reply (this=this@entry=0x556e94018dd0, m=m@entry=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:3548
#5  0x00007fc98df9764b in Objecter::ms_dispatch (this=0x556e94018dd0, m=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.cc:973
#6  0x00007fc98df9cde2 in non-virtual thunk to Objecter::ms_fast_dispatch(Message*) () at /build/ceph-13.1.0-76-ga29c008/src/osdc/Objecter.h:2114
#7  0x00007fc985255486 in Messenger::ms_fast_dispatch (m=0x7fc958001850, this=0x556e940182e0) at /build/ceph-13.1.0-76-ga29c008/src/msg/Messenger.h:638
#8  DispatchQueue::fast_dispatch (this=0x556e94018488, m=0x7fc958001850) at /build/ceph-13.1.0-76-ga29c008/src/msg/DispatchQueue.cc:71
#9  0x00007fc9852e5670 in Pipe::reader (this=0x556e94078770) at /build/ceph-13.1.0-76-ga29c008/src/msg/simple/Pipe.cc:1774
#10 0x00007fc9852ec65d in Pipe::Reader::entry (this=<optimized out>) at /build/ceph-13.1.0-76-ga29c008/src/msg/simple/Pipe.h:51
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: starting.
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: creating pool foo.16966
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: created object 0...
2018-05-05T17:06:40.734 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: finishing.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_1_[17109]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: notifying object 0.obj
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_3_[17111]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: watching object 0.obj
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_2_[17110]: shutting down.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: starting.
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: deleting pool foo.16966
2018-05-05T17:06:40.735 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_4_[17112]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: starting.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: creating pool foo.16966.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: finishing.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_5_[18746]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: starting.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: watching object 0.obj
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_6_[18747]: shutting down.
2018-05-05T17:06:40.736 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: test2: got error: run_until_finished: runnable process_7: got error: [16966]: (33) Numerical argument out of domain
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: *******************************
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: starting.
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: deleting pool foo.16966.
2018-05-05T17:06:40.737 INFO:tasks.workunit.client.0.smithi100.stdout:             watch_notify: process_8_[18749]: shutting down.

/a//kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/2481186

this should address some failures in http://pulpito.ceph.com/kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/

@tchaikov tchaikov modified the milestones: mimic, core May 6, 2018
@tchaikov tchaikov added the core label May 6, 2018
@tchaikov tchaikov modified the milestones: core, mimic May 6, 2018
@tchaikov
Copy link
Contributor Author

tchaikov commented May 6, 2018

@dillaman
Copy link

dillaman commented May 7, 2018

@tchaikov I think the bug is in IoCtxImpl::notify -- a notify_finish_cond.wait(); should be put here [1] (ignoring the output).

[1] https://github.com/ceph/ceph/blob/mimic/src/librados/IoCtxImpl.cc#L1862

@tchaikov
Copy link
Contributor Author

tchaikov commented May 9, 2018

@dillaman after applying #21859, the failure does go away. but i think we might need this PR as well. because we should guard check-complete-reset with a lock, otherwise it would not be atomic. what do you think?

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tchaikov tchaikov merged commit 1b38039 into ceph:mimic May 9, 2018
@tchaikov tchaikov deleted the mimic-23966 branch May 9, 2018 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants