New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/simple: call clear_pipe in wait() shutdown path #12633

Merged
merged 1 commit into from Dec 30, 2016

Conversation

Projects
None yet
2 participants
@liewegas
Member

liewegas commented Dec 22, 2016

No description provided.

msg/simple: clear_pipe when wait() is mopping up pipes
When wait is mopping up connections it may hit one that
is in the process of accepting.  It will unregister it
whilst the accept() thread is trying to set it up,
aborting the accept and getting it reaped.  However,
the pipe mop-up does not clear_pipe() the way that
mark_down(), mark_down_all(), and fault() do, which
leads to this assert.

Pipe is accepting...

  -161> 2016-12-22 17:31:45.460613 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept:  setting up session_security.
  -160> 2016-12-22 17:31:45.460733 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept new session
  -159> 2016-12-22 17:31:45.460846 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept success, connect_seq = 1, sending READY
  -158> 2016-12-22 17:31:45.460959 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept features 1152921504336314367

wait() is shutting down...

  -156> 2016-12-22 17:31:45.461882 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping accepter thread
  -155> 2016-12-22 17:31:45.462994 9506ac0 10 accepter.stop accept listening on: 15
...
  -116> 2016-12-22 17:31:45.482137 9506ac0 10 -- 172.21.15.14:6804/20738 wait: closing pipes
  -115> 2016-12-22 17:31:45.482850 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).unregister_pipe
  -114> 2016-12-22 17:31:45.483421 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).stop

...which interrupts the accept()...

  -113> 2016-12-22 17:31:45.484164 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept fault after register

and makes accept() return failure, and reader() to exit
and reap...

  -110> 2016-12-22 17:31:45.486103 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for pipes 0x3e2a5c20 to close
  -109> 2016-12-22 17:31:45.487146 37353700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e2a5c20
  -108> 2016-12-22 17:31:45.487658 9506ac0 10 -- 172.21.15.14:6804/20738 reaper
  -107> 2016-12-22 17:31:45.487722 9506ac0 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e2a5c20 172.21.15.35:0/146098963
  -106> 2016-12-22 17:31:45.487816 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).discard_queue
  -105> 2016-12-22 17:31:45.494742 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).reader done
...
   -92> 2016-12-22 17:31:45.527589 9506ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 9506ac0 time 2016-12-22 17:31:45.488264
/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: 235: FAILED assert(!cleared)

Fixes: http://tracker.ceph.com/issues/15784
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas merged commit eef842e into ceph:master Dec 30, 2016

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-15784 branch Dec 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment