Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosquitto v 2.0.17 segfault on debian bookworm and huge CPU usage #2881

Closed
Nunak opened this issue Aug 23, 2023 · 4 comments
Closed

Mosquitto v 2.0.17 segfault on debian bookworm and huge CPU usage #2881

Nunak opened this issue Aug 23, 2023 · 4 comments

Comments

@Nunak
Copy link

Nunak commented Aug 23, 2023

After I did upgrade to version 2.0.17 on my debian bookworm I am gettting segaults and cpu usage of mosquitto is huge.

Aug 23 10:01:02 has-001 kernel: [ 6998.132959] traps: mosquitto[8587] general protection fault ip:7facecf9023b sp:7ffd8364ced8 error:0 in libc.so.6[7facece45000+155000]
Aug 23 10:03:41 has-001 kernel: [ 7157.142230] traps: mosquitto[9352] general protection fault ip:55ece008f045 sp:7ffee0f151a0 error:0 in mosquitto[55ece0081000+31000]
Aug 23 10:23:57 has-001 kernel: [ 8373.151116] traps: mosquitto[13047] general protection fault ip:7f7ac1ac023b sp:7fff0e457b58 error:0 in libc.so.6[7f7ac1975000+155000]
Aug 23 10:28:05 has-001 kernel: [ 8621.072459] mosquitto[13198]: segfault at 5000000b4 ip 000056033fe19109 sp 00007ffffe658380 error 4 in mosquitto[56033fe0b000+31000] like
ly on CPU 0 (core 0, socket 0)
Aug 23 10:28:05 has-001 kernel: [ 8621.072474] Code: 38 48 89 45 00 49 8b 46 30 48 83 7c c8 f8 00 0f 84 35 fe ff ff 49 8b 84 24 58 01 00 00 8b 54 24 04 48 85 c0 0f 84 9f 01
 00 00 <83> b8 94 00 00 00 01 0f 85 92 01 00 00 41 83 7c 24 04 ff 0f 85 86
Aug 23 10:32:11 has-001 kernel: [ 8866.691515] mosquitto[13334]: segfault at 61 ip 000055e682ef9045 sp 00007ffce00b3e20 error 4 in mosquitto[55e682eeb000+31000] likely on C
PU 1 (core 0, socket 0)
Aug 23 10:32:11 has-001 kernel: [ 8866.691531] Code: 00 00 41 88 7d 2b 40 38 c6 49 89 4d 18 0f 46 c6 41 88 45 2a 83 fa 0b 0f 84 28 02 00 00 48 8b 4d 00 48 85 c9 0f 84 fb 02
 00 00 <48> 8b 31 49 89 75 00 4c 89 6e 08 4c 89 29 49 c7 45 08 00 00 00 00
Aug 23 10:34:16 has-001 kernel: [ 8992.013300] mosquitto[13478]: segfault at 71 ip 000055a5a69d5045 sp 00007ffccea17de0 error 4 in mosquitto[55a5a69c7000+31000] likely on C
PU 0 (core 0, socket 0)
Aug 23 10:34:16 has-001 kernel: [ 8992.013319] Code: 00 00 41 88 7d 2b 40 38 c6 49 89 4d 18 0f 46 c6 41 88 45 2a 83 fa 0b 0f 84 28 02 00 00 48 8b 4d 00 48 85 c9 0f 84 fb 02
 00 00 <48> 8b 31 49 89 75 00 4c 89 6e 08 4c 89 29 49 c7 45 08 00 00 00 00
Aug 23 10:42:27 has-001 kernel: [ 9483.142494] traps: mosquitto[13539] general protection fault ip:563ac110a045 sp:7ffe526084b0 error:0 in mosquitto[563ac10fc000+31000]
Aug 23 10:54:44 has-001 kernel: [10219.771071] mosquitto[14037]: segfault at 51 ip 00005573d0c28045 sp 00007fff89514750 error 4 in mosquitto[5573d0c1a000+31000] likely on C
PU 1 (core 0, socket 0)
Aug 23 10:54:44 has-001 kernel: [10219.771092] Code: 00 00 41 88 7d 2b 40 38 c6 49 89 4d 18 0f 46 c6 41 88 45 2a 83 fa 0b 0f 84 28 02 00 00 48 8b 4d 00 48 85 c9 0f 84 fb 02
 00 00 <48> 8b 31 49 89 75 00 4c 89 6e 08 4c 89 29 49 c7 45 08 00 00 00 00
Aug 23 11:19:21 has-001 kernel: [11696.852903] traps: mosquitto[14921] general protection fault ip:7f4c0cf9023b sp:7ffe952bf128 error:0 in libc.so.6[7f4c0ce45000+155000]
Aug 23 11:25:28 has-001 kernel: [12063.867427] traps: mosquitto[16121] general protection fault ip:7f294ff331d6 sp:7fffd6c50148 error:0 in libc.so.6[7f294fde8000+155000]
Aug 23 11:27:31 has-001 kernel: [12186.649847] mosquitto[16469]: segfault at 2 ip 00007fa1c87901d6 sp 00007ffea567ce98 error 4 in libc.so.6[7fa1c8645000+155000] likely on C
PU 1 (core 0, socket 0)
Aug 23 11:27:31 has-001 kernel: [12186.649862] Code: 00 8b 0c 8a 8b 04 82 29 c8 c3 66 2e 0f 1f 84 00 00 00 00 00 89 f1 89 f8 48 83 e1 3f 48 83 e0 3f 83 f9 30 77 3f 83 f8 30
 77 3a <f3> 0f 6f 0f f3 0f 6f 16 66 0f ef c0 66 0f 74 c1 66 0f 74 ca 66 0f
Aug 23 11:39:22 has-001 kernel: [12898.035162] hrtimer: interrupt took 9105010 ns
Aug 23 12:03:27 has-001 kernel: [14343.089493] traps: mosquitto[16582] general protection fault ip:7f079dac8bb6 sp:7ffe78dcbbb8 error:0 in libc.so.6[7f079da45000+155000]

@halfgaar
Copy link

I have the same issue with 2.0.16:

mosquitto[2407936]: segfault at 1 ip 00007f3530c7fade sp 00007fffb4a8a968 error 4 in libc.so.6[7f3530b0f000+195000]
mosquitto[2403739]: segfault at 21 ip 00007feaa120fade sp 00007fffc6b6a088 error 4 in libc.so.6[7feaa109f000+195000]
mosquitto[2408207]: segfault at 1 ip 00007fbc5bd8bade sp 00007fffdb521648 error 4 in libc.so.6[7fbc5bc1b000+195000]

All null dereference in libc, which usually indicates some kind of printf bug, or feeding NULL to strlen(). An example of the latter:

segfault at 0 ip 00007fb8840856e5 sp 00007ffc939ef988 error 4 in libc-2.31.so[7fb883f1f000+178000]

I've gotten the crashes after 10 and 30 minutes running time.

I did not have the opportunity to take the time with a debug build and a debugger.

@halfgaar
Copy link

halfgaar commented Sep 11, 2023

Ha, my system actually still had a core handler set, so I have core dumps of it. The (slightly redacted) backtrace, of a release build unfortunately, is:

Core was generated by `/opt/mosquitto/mosquitto -c /etc/mosquitto/mosquitto10.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:115
115     ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory.
[Current thread is 1 (Thread 0x7f3530a79580 (LWP 2407936))]
(gdb) bt
#0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:115
#1  0x00005621235e1ecb in sub__add_leaf (context=context@entry=0x5621fa3e8dd0, qos=qos@entry=0 '\000', identifier=identifier@entry=0, options=options@entry=0, head=head@entry=0x5621f92773b8, newleaf=newleaf@entry=0x7fffb4a8aa20)
    at subs.c:165
#2  0x00005621235e4385 in sub__add_normal (subhier=0x5621f9277370, options=0, identifier=0, qos=112 'p', sub=0x5621751acc40 "N/[redacted]/vebus/276/State", context=0x5621fa3e8dd0) at subs.c:289
#3  sub__add_context (sharename=0x0, topics=<optimized out>, subhier=0x5621f9277370, options=0, identifier=0, qos=112 'p', topic_filter=0x5621751acc40 "N/[redacted]/vebus/276/State", context=0x5621fa3e8dd0) at subs.c:362
#4  sub__add (context=context@entry=0x5621fa3e8dd0, sub=0x5621751acc40 "N/[redacted]/vebus/276/State", qos=qos@entry=0 '\000', identifier=0, options=0, root=<optimized out>) at subs.c:613
#5  0x00005621235cb5f1 in handle__subscribe (context=0x5621fa3e8dd0) at handle_subscribe.c:191
#6  0x00005621235d91cd in handle__packet (context=context@entry=0x5621fa3e8dd0) at read_handle.c:69
#7  0x00005621235e847b in callback_mqtt (wsi=<optimized out>, reason=<optimized out>, user=<optimized out>, in=0x5621e13eba93, len=37) at websockets.c:369
#8  0x00007f3530d32b3d in ?? () from /lib/x86_64-linux-gnu/libwebsockets.so.16
#9  0x00007f3530d336be in ?? () from /lib/x86_64-linux-gnu/libwebsockets.so.16
#10 0x00007f3530d26809 in lws_service_fd_tsi () from /lib/x86_64-linux-gnu/libwebsockets.so.16
#11 0x00005621235cdf30 in loop_handle_reads_writes (events=1, context=0x5621fa3e8dd0) at mux_epoll.c:253
#12 mux_epoll__handle () at mux_epoll.c:207
#13 0x00005621235cd7b9 in mux__handle (listensock=listensock@entry=0x5621e99b9320, listensock_count=listensock_count@entry=2) at mux.c:76
#14 0x00005621235ccb83 in mosquitto_main_loop (listensock=0x5621e99b9320, listensock_count=2) at loop.c:205
#15 0x00005621235ba688 in main (argc=<optimized out>, argv=<optimized out>) at mosquitto.c:576

This was done with a binary compiled from source from tag v2.0.16.

The qos=112 is interesting. In my other core dump, it's qos=224.

@NorbertHeusser
Copy link
Contributor

Thanks for the stacktrace. Will try to take a look into it today. The only strcmp in this line is comparison of the id between the context of existing subscription and the context currently received subscription. But the id representing the clientid of the session should never be NULL.
@halfgaar: As this seems to happen in the context of a persistent session (more precise clean_session=false) do you have persistence enabled in your broker and using the normal Mosquitto snapshot file persistence ?
Just want to find out, if this might be a problem in restoring broker state from disk.

@halfgaar
Copy link

I also sent some details to Roger per request.

Persistence is enabled. Mosquitto started crashing after having been restarted into 2.0.16, after which I reverted to 2.0.15 and it was stable again. The restart process was shutting down 2.0.15, which saved a 700 MB mosquitto.db, which is then loaded.

I have two core dumps, both have virtually the exact same stack trace, including the fact that libwebsocket was involved.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants