Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in conn_close on master #71

Closed
mloughran opened this issue Sep 7, 2011 · 4 comments
Closed

Segfault in conn_close on master #71

mloughran opened this issue Sep 7, 2011 · 4 comments
Milestone

Comments

@mloughran
Copy link

The segfault occurs on the master branch (e3158bc) with the following backtrace

delay for default t->waiting.used=0 t->ready.len=0 t->pause=0
delay for timer t->waiting.used=0 t->ready.len=0 t->pause=0
accepted conn, fd=7
client hung up fd=7
client hung up fd=6

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000160
0x0000000100001f54 in conn_close (c=0x100101240) at conn.c:269
269     c->use->using_ct--;
(gdb) backtrace
#0  0x0000000100001f54 in conn_close (c=0x100101240) at conn.c:269
#1  0x00000001000095c7 in h_conn (fd=6, which=104, c=0x100101240) at prot.c:1724
#2  0x0000000100009656 in prothandle (c=0x100101240, ev=104) at prot.c:1736
#3  0x000000010000a913 in handle (s=0x100101258, filt=-1, flags=32769) at sock-bsd.c:105
#4  0x000000010000a8ce in sockmain () at sock-bsd.c:94
#5  0x000000010000aa64 in srv (s=0x7fff5fbff520) at srv.c:34
#6  0x000000010000cd2a in main (argc=1, argv=0x7fff5fbff6b8) at main.c:195
(gdb) print c
$1 = (conn) 0x100101240
(gdb) print c->use
$2 = (tube) 0x0
(gdb) print c->use->using_ct
Cannot access memory at address 0x160
(gdb) print *c              
$3 = {
  prev = 0x100101240, 
  next = 0x100101240, 
  srv = 0x7fff5fbff520, 
  sock = {
    fd = 6, 
    f = 0x100009631 <prothandle>, 
    x = 0x100101240, 
    added = 1
  }, 
  state = 0 '\0', 
  type = 0 '\0', 
  rw = 114, 
  pending_timeout = -1, 
  tickat = 0, 
  tickpos = 0, 
  cmd = "watch timer\000\ntch timer\r\n", '\0' <repeats 183 times>, 
  cmd_len = 0, 
  cmd_read = 0, 
  reply = 0x100101380 "WATCHING 2\r\n", 
  reply_len = 12, 
  reply_sent = 0, 
  reply_buf = "WATCHING 2\r\n", '\0' <repeats 195 times>, 
  in_job = 0x0, 
  soonest_job = 0x0, 
  in_job_read = 0, 
  out_job = 0x0, 
  out_job_sent = 0, 
  use = 0x0, 
  watch = {
    used = 0, 
    cap = 0, 
    last = 0, 
    items = 0x0, 
    oninsert = 0x1000016b0 <on_watch>, 
    onremove = 0x1000016e6 <on_ignore>
  }, 
  reserved_jobs = {
    r = {
      id = 0, 
      pri = 0, 
      delay = 0, 
      ttr = 0, 
      body_size = 0, 
      created_at = 0, 
      deadline_at = 0, 
      reserve_ct = 0, 
      timeout_ct = 0, 
      release_ct = 0, 
      bury_ct = 0, 
      kick_ct = 0, 
      state = 0 '\0'
    }, 
    pad = "\000\000\000\000\000", 
    tube = 0x0, 
    prev = 0x1001014b0, 
    next = 0x1001014b0, 
    ht_next = 0x0, 
    heap_index = 0, 
    file = 0x0, 
    fnext = 0x0, 
    fprev = 0x0, 
    reserver = 0x0, 
    walresv = 0, 
    walused = 0, 
    body = 0x100101558 ""
  }
}

I've seen this segfault several times this evening (while developing locally on OS X 10.7), but unfortunately I can't identify steps to reproduce reliably. However, needless to say, this happens when I try to close the connection to beanstalkd!

/cc @miksago who's been debugging this with me

@kr
Copy link
Member

kr commented Sep 25, 2011

This is fixed in my tree now.

The problem is that in sock-bsd.c, we're reading up to 500 events
before processing any of them. If two of these events are for the
same conn, and the first one closes the conn, it'll be freed. When
we get around to processing the second one, we're working with
invalid memory.

The easy way to fix this is to read only one event at a time, so
that's what I've done.

@kr kr closed this as completed in cb16211 Sep 25, 2011
@mloughran
Copy link
Author

Thanks so much Keith - I don't think I'd have worked that out! Will let you know if I find any other issues :)

@zhenghouzz
Copy link

I run into similar problem with a v1.6 build on ubuntu 10.04 with downloaded 1.6.zip. I double-checked the src and cb16211 has been applied.

My use case is: I have python consumer daemon to dequeue from beanstalk. The daemon generates a lot of log, so I use logrotate to rotate the log. I restart the daemon in postrotate to use the new log file. However, whenever I run the logrotate, the beanstalkd server will go down. I enabled core dump and here it is the info. Let me know if you need the core file.

Core was generated by `/usr/bin/beanstalkd -l 0.0.0.0 -p 11300 -b /var/lib/beanstalkd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
232 c->use->using_ct--;
(gdb) bt
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
#1 0x0000000000409eb2 in update_conns () at prot.c:1763
#2 0x000000000040a135 in prottick (s=0x6124e0) at prot.c:1831
#3 0x000000000040b873 in srvtick (s=0x6124e0, ev=0) at srv.c:68
#4 0x000000000040b5e3 in sockmain () at sock-linux.c:81
#5 0x000000000040b7a2 in srvserve (s=0x6124e0) at srv.c:40
#6 0x000000000040d77b in main (argc=7, argv=0x7fff46099d88) at main.c:86

(gdb) print *c
$1 = {srv = 0x3b598, sock = {fd = 100, f = 0, x = 0x12a05f200, added = 169}, state = 3 '\003', type = -110 '\222', next = 0x0, use = 0x0, tickat = 0, tickpos = -1, soonest_job = 0x0, rw = 119, pending_timeout = -1, cmd = "\200-A\001\000\000\000\000\200-A\001", '\000' <repeats 195 times>, cmd_len = 0, cmd_read = 0, reply = 0x40e6a0 "TIMED_OUT\r\n",
reply_len = 11, reply_sent = 0, reply_buf = "RESERVED 243094 \301\001\000\000\000\000\000\000\230\256\230\256\035\177\000\000\230\256\230\256\035\177", '\000' <repeats 169 times>, in_job_read = 0, in_job = 0x0, out_job = 0x0, out_job_sent = 0, watch = {used = 0, cap = 0, last = 0, items = 0x0, oninsert = 0x401cd4 <on_watch>,
onremove = 0x401d0d <on_ignore>}, reserved_jobs = {r = {id = 0, pri = 0, delay = 0, ttr = 0, body_size = 0, created_at = 0, deadline_at = 0, reserve_ct = 0, timeout_ct = 0, release_ct = 0, bury_ct = 0, kick_ct = 0, state = 0 '\000'}, pad = "\000\000\000\000\000", tube = 0x0, prev = 0x1412fe8, next = 0x1412fe8, ht_next = 0x0, heap_index = 0, file = 0x0,
fnext = 0x0, fprev = 0x0, reserver = 0x0, walresv = 0, walused = 0, body = 0x1412d80 "\230\265\003"}}
(gdb) print c->use
$2 = (tube) 0x0

@zhenghouzz
Copy link

Can we reopen this issue? I am running this on production and the frequent crashing caused a lot of issues. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants