segfault in connclose (v1.6) #126

zhenghouzz · 2012-06-01T03:42:09Z

I run into similar problem as #71. I built from v1.6.zip on ubuntu 10.04. I double-checked the src and cb16211 patch has been applied.

My use case is: I have python consumer daemon to dequeue from beanstalk. The daemon generates a lot of log, so I use logrotate to rotate the log. I restart the daemon in postrotate to use the new log file. However, whenever I run the logrotate, the beanstalkd server will go down. I enabled core dump and here it is the info. Let me know if you need the core file.

I made the following patch (sorry for the format, I don't know how to paste it without triggering the markup), but not sure if it is the right way. I wonder how the conn got into this corrupted state.

diff conn.c.new conn.c                                                                                                                       
232,235c232,233
     if (c->use) {
         c->use->using_ct--;
         TUBE_ASSIGN(c->use, NULL);
     }
-
    c->use->using_ct--;
    TUBE_ASSIGN(c->use, NULL);

Core was generated by `/usr/bin/beanstalkd -l 0.0.0.0 -p 11300 -b /var/lib/beanstalkd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
232 c->use->using_ct--;
(gdb) bt
#0 0x00000000004023a2 in connclose (c=0x1412d80) at conn.c:232
#1 0x0000000000409eb2 in update_conns () at prot.c:1763
#2 0x000000000040a135 in prottick (s=0x6124e0) at prot.c:1831
#3 0x000000000040b873 in srvtick (s=0x6124e0, ev=0) at srv.c:68
#4 0x000000000040b5e3 in sockmain () at sock-linux.c:81
#5 0x000000000040b7a2 in srvserve (s=0x6124e0) at srv.c:40
#6 0x000000000040d77b in main (argc=7, argv=0x7fff46099d88) at main.c:86

(gdb) print *c
$1 = {srv = 0x3b598, sock = {fd = 100, f = 0, x = 0x12a05f200, added = 169}, state = 3 '\003', type = -110 '\222', next = 0x0, use = 0x0, tickat = 0, tickpos = -1, soonest_job = 0x0, rw = 119, pending_timeout = -1, cmd = "\200-A\001\000\000\000\000\200-A\001", '\000' , cmd_len = 0, cmd_read = 0, reply = 0x40e6a0 "TIMED_OUT\r\n", 
reply_len = 11, reply_sent = 0, reply_buf = "RESERVED 243094 \301\001\000\000\000\000\000\000\230\256\230\256\035\177\000\000\230\256\230\256\035\177", '\000' , in_job_read = 0, in_job = 0x0, out_job = 0x0, out_job_sent = 0, watch = {used = 0, cap = 0, last = 0, items = 0x0, oninsert = 0x401cd4 , 
onremove = 0x401d0d }, reserved_jobs = {r = {id = 0, pri = 0, delay = 0, ttr = 0, body_size = 0, created_at = 0, deadline_at = 0, reserve_ct = 0, timeout_ct = 0, release_ct = 0, bury_ct = 0, kick_ct = 0, state = 0 '\000'}, pad = "\000\000\000\000\000", tube = 0x0, prev = 0x1412fe8, next = 0x1412fe8, ht_next = 0x0, heap_index = 0, file = 0x0, 
fnext = 0x0, fprev = 0x0, reserver = 0x0, walresv = 0, walused = 0, body = 0x1412d80 "\230\265\003"}}
(gdb) print c->use
$2 = (tube) 0x0

The text was updated successfully, but these errors were encountered:

kr · 2012-06-11T02:02:56Z

Thanks. The backtrace is really helpful.

I tried to fix the markup, but I'm not sure this is what you intended.
Feel free to edit the original post to correct it.

zhenghouzz · 2012-06-14T22:56:14Z

FYI, now I downgraded to v1.4.6. The same client code read/write to it, and it never crashes. So I suspect the newer version 1.5 and 1.6, is vulnerable to unclean client connection termination.

kr · 2012-08-31T04:12:24Z

@zhenghouzz could you please run this build of beanstalkd and let
me know if you see any problems?

https://s3.amazonaws.com/krheroku/beanstalkd

$ ./beanstalkd -v
beanstalkd 1.6+4+g236c669

If it works well I'll make a release.

zhenghouzz · 2012-08-31T18:19:02Z

@kr, will give it try in production today or tomorrow. Thanks for the effort to get this fixed.

kr · 2012-09-03T19:27:11Z

@zhenghouzz how did the test go?

zhenghouzz · 2012-09-04T07:26:08Z

@kr, I put this version on production, so far it works fine. I restarted connected clients and the server stays up. The action triggered the beanstalkd crash on the older v1.6.

kr · 2012-09-04T20:34:28Z

Excellent, thanks!

Closing this as fixed in 7261f57.

kr · 2012-09-04T20:36:20Z

Actually, this is a dup of #119.

kr mentioned this issue Aug 31, 2012

Better attempt to fix issue #134 #136

Merged

kr closed this as completed Sep 4, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault in connclose (v1.6) #126

segfault in connclose (v1.6) #126

zhenghouzz commented Jun 1, 2012

kr commented Jun 11, 2012

zhenghouzz commented Jun 14, 2012

kr commented Aug 31, 2012

zhenghouzz commented Aug 31, 2012

kr commented Sep 3, 2012

zhenghouzz commented Sep 4, 2012

kr commented Sep 4, 2012

kr commented Sep 4, 2012

segfault in connclose (v1.6) #126

segfault in connclose (v1.6) #126

Comments

zhenghouzz commented Jun 1, 2012

kr commented Jun 11, 2012

zhenghouzz commented Jun 14, 2012

kr commented Aug 31, 2012

zhenghouzz commented Aug 31, 2012

kr commented Sep 3, 2012

zhenghouzz commented Sep 4, 2012

kr commented Sep 4, 2012

kr commented Sep 4, 2012