Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock in server while shutting down due to invalid free in v1.8.0 #532

Closed
stefvanvlierberghe opened this issue Oct 31, 2017 · 8 comments
Labels
notourbug This issue needs to be resolved elsewhere

Comments

@stefvanvlierberghe
Copy link

stefvanvlierberghe commented Oct 31, 2017

We call: vncserver -kill :90

stdout:

Killing Xvnc process ID 36933
Xvnc seems to be deadlocked.  Kill the process manually and then re-run
    /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/vncserver -kill :90
to clean up the socket files.

stderr:

Mon Oct 30 22:07:24 2017
 vncext:      VNC extension running!
 vncext:      Listening for VNC connections on all interface(s), port 5990
 vncext:      created VNC server for screen 0
access control disabled, clients can connect from any host
(EE) 
(EE) Backtrace:
(EE) 0: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (xorg_backtrace+0x3f) [0x5fa68f]
(EE) 1: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (0x400000+0x1fdb19) [0x5fdb19]
(EE) 2: /lib64/libpthread.so.0 (0x7ffff7598000+0xf370) [0x7ffff75a7370]
(EE) 3: /lib64/libc.so.6 (0x7ffff71d7000+0x7c6c9) [0x7ffff72536c9]
(EE) 4: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (SrvXkbFreeClientMap+0x13d) [0x58dc5d]
(EE) 5: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (SrvXkbFreeKeyboard+0xfb) [0x58a30b]
(EE) 6: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (XkbFreeInfo+0xd9) [0x581559]
(EE) 7: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (0x400000+0x1a1416) [0x5a1416]
(EE) 8: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (0x400000+0x1a1712) [0x5a1712]
(EE) 9: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (CloseDownDevices+0x79) [0x5a1e29]
(EE) 10: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (main+0x41c) [0x4bfd0c]
(EE) 11: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7ffff71f8b35]
(EE) 12: /cfmu/appcm/TOOL/GNU!21.5.0.31/build_G!17.OP.L7/generated/tigervnc/usr/bin/Xvnc (0x400000+0xc17e3) [0x4c17e3]
(EE) 
(EE) Segmentation fault at address 0xffffffff00000028

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting

backtrace using gdb on core dump:

#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007ffff7259991 in _L_lock_4780 () from /lib64/libc.so.6
#2  0x00007ffff72531f8 in _int_free (av=0x7ffff7591760 <main_arena>, p=0xcf13e0, have_lock=0) at malloc.c:3940
#3  0x00007ffff720fa70 in __run_exit_handlers (status=1, listp=0x7ffff75916c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:86
#4  0x00007ffff720fa95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000603d1d in AbortServer ()
#6  0x0000000000604448 in FatalError ()
#7  0x00000000005fdb6e in ?? ()
#8  <signal handler called>
#9  0x00007ffff72536c9 in _int_free (av=0x7ffff7591760 <main_arena>, p=0xf24130, have_lock=0) at malloc.c:3984
#10 0x000000000058dc5d in SrvXkbFreeClientMap ()
#11 0x000000000058a30b in SrvXkbFreeKeyboard ()
#12 0x0000000000581559 in XkbFreeInfo ()
#13 0x00000000005a1416 in ?? ()
#14 0x00000000005a1712 in ?? ()
#15 0x00000000005a1e29 in CloseDownDevices ()
#16 0x00000000004bfd0c in main ()

Looks like the first call to free (_int_free) raised a signal and the signal handler made a re-entrant call to _int_free again causing deadlock.

Probably heap corruption, valgrind might clarify.
A simple test shows an invalid free during shutdown:

lvalgrind --tool=memcheck /cm/ot/TOOL/GNU.22.0.0.5/build_G.17.IP.L7/generated/tigervnc/usr/bin/Xvnc :99 -auth /tmp/kde-vvl/xauth-401-_0 -depth 24 -desktop 'Xvnc' -fp catalogue:/etc/X11/fontpath.d -geometry 1920x1200 -pn  -rfbauth /auto/home/vvl/.vnc/passwd -rfbport 5999 -rfbwait 30000 '-desktop' 'vvl@dhws029::99' '-MaxCutText' '100000000' '-AlwaysShared' '-SecurityTypes' 'None' '-AcceptSetDesktopSize=0' '-UseIPv6=0' '-dpi' '100' '-cc' '4'

==39677== Invalid free() / delete / delete[] / realloc()
==39677==    at 0x4028CFA: free (vg_replace_malloc.c:530)
==39677==    by 0x53CFB9B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==39677==    by 0x402275F: _vgnU_freeres (vg_preloaded.c:77)
==39677==    by 0x52A1A2A: __run_exit_handlers (exit.c:92)
==39677==    by 0x52A1AB4: exit (exit.c:99)
==39677==    by 0x528AC0B: (below main) (libc-start.c:308)
==39677==  Address 0x56253d0 is 0 bytes inside data symbol "noai6ai_cached"

Such a free may frequently "work" and exceptionally cause the signal.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@CendioOssman
Copy link
Member

I'm afraid I don't see much TigerVNC in any of those traces, so it might be a general Xorg bug. Can you reproduce this? And does valgrind give any more clues?

@stefvanvlierberghe
Copy link
Author

stefvanvlierberghe commented Oct 31, 2017 via email

@CendioOssman
Copy link
Member

Sure, vncserver could probably be patched to be more aggressive. In this case it crashed though, so I'm not sure why it gave up.

But that's a band aid. We would still like to find the bug causing the crash. And that will be difficult without a reproducible test case.

Have you tested with our binaries?

@stefvanvlierberghe
Copy link
Author

stefvanvlierberghe commented Nov 2, 2017 via email

@CendioOssman
Copy link
Member

Ah, alright. The log suggested it terminated, but I guess it logged that line and then promptly locked up. :)

I agree, such work should not be done from the signal handler. However we've inherited that code from Xorg, so there might be something fundamental preventing a cleanup. I'll have a look.

@CendioOssman
Copy link
Member

Right, so the poor signal handling is indeed in Xorg code. It's not part of TigerVNC at all. I'm afraid that's a bug report that will have to be filed with them.

What we might be able to fix is the initial corruption, provided you find some good way to reproduce the issue so we can find the offending code.

@stefvanvlierberghe
Copy link
Author

Getting the issue again more frequently.

Filed a bug with xorg: https://bugs.freedesktop.org/show_bug.cgi?id=106146

Still using Xvnc TigerVNC 1.8.0 - built May 16 2017 14:01:59, and still no clue what affects the frequency of this bug.

@CendioOssman
Copy link
Member

Since the issue has been reported to Xorg, I don't think there is much more we can do in our end.

@CendioOssman CendioOssman closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
@CendioOssman CendioOssman added the notourbug This issue needs to be resolved elsewhere label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notourbug This issue needs to be resolved elsewhere
Projects
None yet
Development

No branches or pull requests

2 participants