Segfault in conn_close on master #71

mloughran · 2011-09-07T22:03:04Z

The segfault occurs on the master branch (e3158bc) with the following backtrace

delay for default t->waiting.used=0 t->ready.len=0 t->pause=0
delay for timer t->waiting.used=0 t->ready.len=0 t->pause=0
accepted conn, fd=7
client hung up fd=7
client hung up fd=6

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000160
0x0000000100001f54 in conn_close (c=0x100101240) at conn.c:269
269     c->use->using_ct--;
(gdb) backtrace
#0  0x0000000100001f54 in conn_close (c=0x100101240) at conn.c:269
#1  0x00000001000095c7 in h_conn (fd=6, which=104, c=0x100101240) at prot.c:1724
#2  0x0000000100009656 in prothandle (c=0x100101240, ev=104) at prot.c:1736
#3  0x000000010000a913 in handle (s=0x100101258, filt=-1, flags=32769) at sock-bsd.c:105
#4  0x000000010000a8ce in sockmain () at sock-bsd.c:94
#5  0x000000010000aa64 in srv (s=0x7fff5fbff520) at srv.c:34
#6  0x000000010000cd2a in main (argc=1, argv=0x7fff5fbff6b8) at main.c:195
(gdb) print c
$1 = (conn) 0x100101240
(gdb) print c->use
$2 = (tube) 0x0
(gdb) print c->use->using_ct
Cannot access memory at address 0x160
(gdb) print *c              
$3 = {
  prev = 0x100101240, 
  next = 0x100101240, 
  srv = 0x7fff5fbff520, 
  sock = {
    fd = 6, 
    f = 0x100009631 <prothandle>, 
    x = 0x100101240, 
    added = 1
  }, 
  state = 0 '\0', 
  type = 0 '\0', 
  rw = 114, 
  pending_timeout = -1, 
  tickat = 0, 
  tickpos = 0, 
  cmd = "watch timer\000\ntch timer\r\n", '\0' <repeats 183 times>, 
  cmd_len = 0, 
  cmd_read = 0, 
  reply = 0x100101380 "WATCHING 2\r\n", 
  reply_len = 12, 
  reply_sent = 0, 
  reply_buf = "WATCHING 2\r\n", '\0' <repeats 195 times>, 
  in_job = 0x0, 
  soonest_job = 0x0, 
  in_job_read = 0, 
  out_job = 0x0, 
  out_job_sent = 0, 
  use = 0x0, 
  watch = {
    used = 0, 
    cap = 0, 
    last = 0, 
    items = 0x0, 
    oninsert = 0x1000016b0 <on_watch>, 
    onremove = 0x1000016e6 <on_ignore>
  }, 
  reserved_jobs = {
    r = {
      id = 0, 
      pri = 0, 
      delay = 0, 
      ttr = 0, 
      body_size = 0, 
      created_at = 0, 
      deadline_at = 0, 
      reserve_ct = 0, 
      timeout_ct = 0, 
      release_ct = 0, 
      bury_ct = 0, 
      kick_ct = 0, 
      state = 0 '\0'
    }, 
    pad = "\000\000\000\000\000", 
    tube = 0x0, 
    prev = 0x1001014b0, 
    next = 0x1001014b0, 
    ht_next = 0x0, 
    heap_index = 0, 
    file = 0x0, 
    fnext = 0x0, 
    fprev = 0x0, 
    reserver = 0x0, 
    walresv = 0, 
    walused = 0, 
    body = 0x100101558 ""
  }
}

I've seen this segfault several times this evening (while developing locally on OS X 10.7), but unfortunately I can't identify steps to reproduce reliably. However, needless to say, this happens when I try to close the connection to beanstalkd!

/cc @miksago who's been debugging this with me

The text was updated successfully, but these errors were encountered:

kr · 2011-09-25T04:41:47Z

This is fixed in my tree now.

The problem is that in sock-bsd.c, we're reading up to 500 events
before processing any of them. If two of these events are for the
same conn, and the first one closes the conn, it'll be freed. When
we get around to processing the second one, we're working with
invalid memory.

The easy way to fix this is to read only one event at a time, so
that's what I've done.

mloughran · 2011-09-26T00:20:09Z

Thanks so much Keith - I don't think I'd have worked that out! Will let you know if I find any other issues :)

zhenghouzz · 2012-05-31T22:01:41Z

I run into similar problem with a v1.6 build on ubuntu 10.04 with downloaded 1.6.zip. I double-checked the src and cb16211 has been applied.

My use case is: I have python consumer daemon to dequeue from beanstalk. The daemon generates a lot of log, so I use logrotate to rotate the log. I restart the daemon in postrotate to use the new log file. However, whenever I run the logrotate, the beanstalkd server will go down. I enabled core dump and here it is the info. Let me know if you need the core file.

Core was generated by `/usr/bin/beanstalkd -l 0.0.0.0 -p 11300 -b /var/lib/beanstalkd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
232 c->use->using_ct--;
(gdb) bt
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
#1 0x0000000000409eb2 in update_conns () at prot.c:1763
#2 0x000000000040a135 in prottick (s=0x6124e0) at prot.c:1831
#3 0x000000000040b873 in srvtick (s=0x6124e0, ev=0) at srv.c:68
#4 0x000000000040b5e3 in sockmain () at sock-linux.c:81
#5 0x000000000040b7a2 in srvserve (s=0x6124e0) at srv.c:40
#6 0x000000000040d77b in main (argc=7, argv=0x7fff46099d88) at main.c:86

(gdb) print *c
$1 = {srv = 0x3b598, sock = {fd = 100, f = 0, x = 0x12a05f200, added = 169}, state = 3 '\003', type = -110 '\222', next = 0x0, use = 0x0, tickat = 0, tickpos = -1, soonest_job = 0x0, rw = 119, pending_timeout = -1, cmd = "\200-A\001\000\000\000\000\200-A\001", '\000' <repeats 195 times>, cmd_len = 0, cmd_read = 0, reply = 0x40e6a0 "TIMED_OUT\r\n",
reply_len = 11, reply_sent = 0, reply_buf = "RESERVED 243094 \301\001\000\000\000\000\000\000\230\256\230\256\035\177\000\000\230\256\230\256\035\177", '\000' <repeats 169 times>, in_job_read = 0, in_job = 0x0, out_job = 0x0, out_job_sent = 0, watch = {used = 0, cap = 0, last = 0, items = 0x0, oninsert = 0x401cd4 <on_watch>,
onremove = 0x401d0d <on_ignore>}, reserved_jobs = {r = {id = 0, pri = 0, delay = 0, ttr = 0, body_size = 0, created_at = 0, deadline_at = 0, reserve_ct = 0, timeout_ct = 0, release_ct = 0, bury_ct = 0, kick_ct = 0, state = 0 '\000'}, pad = "\000\000\000\000\000", tube = 0x0, prev = 0x1412fe8, next = 0x1412fe8, ht_next = 0x0, heap_index = 0, file = 0x0,
fnext = 0x0, fprev = 0x0, reserver = 0x0, walresv = 0, walused = 0, body = 0x1412d80 "\230\265\003"}}
(gdb) print c->use
$2 = (tube) 0x0

zhenghouzz · 2012-05-31T22:40:18Z

Can we reopen this issue? I am running this on production and the frequent crashing caused a lot of issues. Thanks!

kr closed this as completed in cb16211 Sep 25, 2011

zhenghouzz mentioned this issue Sep 4, 2012

segfault in connclose (v1.6) #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in conn_close on master #71

Segfault in conn_close on master #71

mloughran commented Sep 7, 2011

kr commented Sep 25, 2011

mloughran commented Sep 26, 2011

zhenghouzz commented May 31, 2012

zhenghouzz commented May 31, 2012

Segfault in conn_close on master #71

Segfault in conn_close on master #71

Comments

mloughran commented Sep 7, 2011

kr commented Sep 25, 2011

mloughran commented Sep 26, 2011

zhenghouzz commented May 31, 2012

zhenghouzz commented May 31, 2012