Skip to content

Commit

Permalink
msg/simple: clear_pipe when wait() is mopping up pipes
Browse files Browse the repository at this point in the history
When wait is mopping up connections it may hit one that
is in the process of accepting.  It will unregister it
whilst the accept() thread is trying to set it up,
aborting the accept and getting it reaped.  However,
the pipe mop-up does not clear_pipe() the way that
mark_down(), mark_down_all(), and fault() do, which
leads to this assert.

Pipe is accepting...

  -161> 2016-12-22 17:31:45.460613 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept:  setting up session_security.
  -160> 2016-12-22 17:31:45.460733 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept new session
  -159> 2016-12-22 17:31:45.460846 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept success, connect_seq = 1, sending READY
  -158> 2016-12-22 17:31:45.460959 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept features 1152921504336314367

wait() is shutting down...

  -156> 2016-12-22 17:31:45.461882 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping accepter thread
  -155> 2016-12-22 17:31:45.462994 9506ac0 10 accepter.stop accept listening on: 15
...
  -116> 2016-12-22 17:31:45.482137 9506ac0 10 -- 172.21.15.14:6804/20738 wait: closing pipes
  -115> 2016-12-22 17:31:45.482850 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).unregister_pipe
  -114> 2016-12-22 17:31:45.483421 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).stop

...which interrupts the accept()...

  -113> 2016-12-22 17:31:45.484164 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept fault after register

and makes accept() return failure, and reader() to exit
and reap...

  -110> 2016-12-22 17:31:45.486103 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for pipes 0x3e2a5c20 to close
  -109> 2016-12-22 17:31:45.487146 37353700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e2a5c20
  -108> 2016-12-22 17:31:45.487658 9506ac0 10 -- 172.21.15.14:6804/20738 reaper
  -107> 2016-12-22 17:31:45.487722 9506ac0 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e2a5c20 172.21.15.35:0/146098963
  -106> 2016-12-22 17:31:45.487816 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).discard_queue
  -105> 2016-12-22 17:31:45.494742 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).reader done
...
   -92> 2016-12-22 17:31:45.527589 9506ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 9506ac0 time 2016-12-22 17:31:45.488264
/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: 235: FAILED assert(!cleared)

Fixes: http://tracker.ceph.com/issues/15784
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 948f97b)
  • Loading branch information
liewegas authored and smithfarm committed Jul 5, 2017
1 parent e12eae9 commit a7af766
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/msg/simple/SimpleMessenger.cc
Expand Up @@ -572,6 +572,10 @@ void SimpleMessenger::wait()
p->unregister_pipe();
p->pipe_lock.Lock();
p->stop_and_wait();
// don't generate an event here; we're shutting down anyway.
PipeConnectionRef con = p->connection_state;
if (con)
con->clear_pipe(p);
p->pipe_lock.Unlock();
}

Expand Down

0 comments on commit a7af766

Please sign in to comment.