Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SZ_CODE Function relocation crash info #1

Closed
Dead2 opened this issue Sep 15, 2022 · 19 comments · Fixed by #7
Closed

SZ_CODE Function relocation crash info #1

Dead2 opened this issue Sep 15, 2022 · 19 comments · Fixed by #7
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Dead2
Copy link
Owner

Dead2 commented Sep 15, 2022

This thread is for tracking information related to the crash on application-startup when SZ_CODE is enabled.

If anyone wants to have a go at fixing this bug, please contribute any additional information that can be useful in fixing this bug.

I suspect that this bug is somewhere in pass/Stabilizer.cpp, where the related llvm pass integration is.

@Dead2 Dead2 added bug Something isn't working help wanted Extra attention is needed labels Sep 15, 2022
@Dead2
Copy link
Owner Author

Dead2 commented Sep 15, 2022

Program received signal SIGSEGV, Segmentation fault.
0x00000000004027e7 in main (argc=1, argv=0x7fffffffde48) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x00000000004027e4 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
=> 0x00000000004027e7 <main+23>:        48 30 51 a2     rex.W xor %dl,-0x5e(%rcx)
   0x00000000004027eb <main+27>:        f5      cmc
   0x00000000004027ec <main+28>:        ff      (bad)
   0x00000000004027ed <main+29>:        7f 00   jg     0x4027ef <main+31>

(gdb) info registers
rax            0xa                 10
rbx            0x0                 0
rcx            0x7ffff7c7a317      140737350443799
rdx            0xffffffff          4294967295
rsi            0x7fffffffde48      140737488346696
rdi            0x1                 1
rbp            0x7fffffffe190      0x7fffffffe190
rsp            0x7fffffffdc18      0x7fffffffdc18
r8             0xa                 10
r9             0x7fffffffb900      140737488337152
r10            0x4007a8            4196264
r11            0x4027d0            4204496
r12            0x7fffffffde48      140737488346696
r13            0x7ffff7f8101e      140737353617438
r14            0x7ffff7d4d600      140737351308800
r15            0x1                 1
rip            0x4027e7            0x4027e7 <main+23>
eflags         0x10216             [ PF AF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

It seems the disassembly differs slightly from run to run

Program received signal SIGSEGV, Segmentation fault.
0x00000000004027ea in main (argc=1, argv=0x7fffffffde48) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x00000000004027e4 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
   0x00000000004027e7 <main+23>:        48 b0 a6        rex.W mov $0xa6,%al
=> 0x00000000004027ea <main+26>:        a2 f5 ff 7f 00 00 28 48 89      movabs %al,0x89482800007ffff5
Program received signal SIGSEGV, Segmentation fault.
0x00000000004027e7 in main (argc=1, argv=0x7fffffffde48) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x00000000004027e4 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
=> 0x00000000004027e7 <main+23>:        48 10 b3 a2 f5 ff 7f    rex.W adc %sil,0x7ffff5a2(%rbx)
Program received signal SIGSEGV, Segmentation fault.
0x00000000004027e7 in main (argc=1, argv=0x7fffffffde48) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x00000000004027e4 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
=> 0x00000000004027e7 <main+23>:        48 c0 87 a2 f5 ff 7f 00 rex.W rolb $0x0,0x7ffff5a2(%rdi)
Program received signal SIGSEGV, Segmentation fault.
0x000000000040278d in ?? ()
=> 0x000000000040278d:  00 00   add    %al,(%rax)

Once out of about 15 tests similar to the above, it got a little further:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402848 in main (argc=1, argv=0x7fffffffde48) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:292
292         else if (!strcmp(bname, "zcat"))
   0x000000000040283d <main+109>:       48 89 df        mov    %rbx,%rdi
   0x0000000000402840 <main+112>:       ff 15 6a 05 00 00       call   *0x56a(%rip)        # 0x402db0 <adler32_stub+96>
   0x0000000000402846 <main+118>:       45 31 f6        xor    %r14d,%r14d
   0x0000000000402849 <main+121>:       85 c0   test   %eax,%eax
   0x000000000040284b <main+123>:       41 0f 94 c6     sete   %r14b
   0x000000000040284f <main+127>:       44 89 f1        mov    %r14d,%ecx
   0x0000000000402852 <main+130>:       89 4c 24 04     mov    %ecx,0x4(%rsp)

(gdb) info registers
rax            0xa                 10
rbx            0x0                 0
rcx            0x7ffff7c7a316      140737350443798
rdx            0xffffffff          4294967295
rsi            0x7fffffffde48      140737488346696
rdi            0x1                 1
rbp            0x7fffffffe190      0x7fffffffe190
rsp            0x7fffffffdc18      0x7fffffffdc18
r8             0xa                 10
r9             0x7fffffffb900      140737488337152
r10            0x4007a8            4196264
r11            0x4027d0            4204496
r12            0x7fffffffde48      140737488346696
r13            0x7ffff7f8101e      140737353617438
r14            0x7ffff7d4d600      140737351308800
r15            0x1                 1
rip            0x402848            0x402848 <main+120>
eflags         0x10212             [ AF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

main above is most likely my test programs (minigzip) own main function.
Stabilizer llvm pass renames it to stabilizer_main and uses main from libstabilizer instead, but it doesn't seem like the debuginfo is actually renamed too. If I run gdb without debuginfo from the application, it shows that the crash is in stabilizer_main() instead.

This suggests that the function has either been mangled by the stabilizer llvm pass, or the libstabilizer has relocated it incorrectly.

I would bet on the llvm pass being at fault because the porting to a newer llvm was done in several steps by various 3rd-party people and is likely a complex task.
libstabilizer.so however is not likely to have required any changes to support a newer llvm, since it never interacts with llvm at all, so I doubt it would be the cause of this bug.

@Dead2
Copy link
Owner Author

Dead2 commented Sep 15, 2022

In order to debug with gdb, you have to recompile libstabilizer with debug enabled: make clean debug
The program will still stop due to a SIGTRAP, so do a continue when that happens.

@mtl1979
Copy link

mtl1979 commented Sep 15, 2022

For me it looks like off-by-one issue... The first instruction that stays same is 3 bytes, but the changed bytes start from 4 bytes later. If you concatenate all bytes, you see that bytes a2 f5 ff 7f stay same, but offset change.

Would be useful to see disassembly of the main function without stabilizer... Most likely it has issues with decoding/encoding instruction that has rex prefix.

@magras
Copy link
Collaborator

magras commented Sep 17, 2022

Can you describe the steps to reproduce the issue?

I am far from an expert but may be I'll be able to help.

@Dead2
Copy link
Owner Author

Dead2 commented Sep 17, 2022

Not the best or simplest testcase, I am sure it could be simplified to a really minimal .c file, but this is what I have right now:

Clone zlib-ng https://github.com/zlib-ng/zlib-ng

export SZ_CODE=1 SZ_LOWER=1
./cmake . -DWITH_BENCHMARKS=OFF -DCMAKE_C_COMPILER=/path/to/stabilizer/clang -DCMAKE_CXX_COMPILER=/path/to/stabilizer/clang++ -DCMAKE_C_FLAGS="-g3 -ggdb"
make -j4
gdb -- ./minigzip -7 <insert any testfile here>

In gdb do run, it will then stop on a trap so do continue. After this it will get SIGSEGV or in rare cases SIGILL, both caused by the same corruption.

@Dead2
Copy link
Owner Author

Dead2 commented Sep 17, 2022

I also see crashes with SZ_STACK=1, but those are more rare and most often minigzip manages to finish its task. Maybe a 5-15% crash rate or so, I haven't had the time to investigate beyond seeing that it happens but not so often.

@magras
Copy link
Collaborator

magras commented Oct 5, 2022

Sorry, I fled my country and unfortunately right now I can not help you with this issue.

I'm not sure, that it is relevant, but I noticed that stabilizer built with llvm-14 doesn't compile zlib-ng saying

LandingPadInst not the first non-PHI instruction in the block.

about HandleExceptionsInMethodIfSupported function from gtest. It might be a result of not being able to lower intrinsics (there are warnings about it), or clue to the issue. I localized this error to a couple of passes, but I haven't wrote them in my notes.

If you are going to work on this project, I can recommend a few techniques that helped me to debug the issues:

  • llvm-dis allows to disassemble a bitcode to the readable llvm asm
  • Stabilizer applies it's passes through opt call, which takes bitcode. Stabilizer prints used command. You can rerun this command on your own to test changes, minimize input and so on. Also opt can take llvm asm instead of bitcode.

This techniques might be less useful right now, when the error happens in runtime, but probably you will need them later to fix generated code.

@Dead2
Copy link
Owner Author

Dead2 commented Oct 9, 2022

@magras Real life comes first, I understand completely, and I hope you are safe friend.
Thank you for your input, these are indeed useful pieces of information to me and others 👍

I am a bit busy with some important things IRL currently myself, but hope to get around to attempt to understand these bugs better sometime this winter.

I did not have any problems compiling gtest locally, however gtest can be disabled with -DZLIB_ENABLE_TESTS=OFF to bypass that hurdle for now as it is not required for compiling zlib-ng or minigzip.
I'd also consider gtest a more complex source code than zlib-ng, so it might be easier to debug with zlib-ng and especially the crash in main() of minigzip should be a simple case since minigzip itself does not do anything fancy at all unless it gets affected by the loading of the stabilized zlib-ng library.

It is on my TODO to try to reproduce this bug with a smaller piece of source code, possibly only a main() function copying and printing data from argv would be enough.

@magras
Copy link
Collaborator

magras commented Nov 23, 2022

I found the root cause: stabilizer used relocation table to access a tls variable, which resulted in an attempt to write to .tdata section of image.

I'm working on a fix, but looks like there are more issues, so it might take time.

@magras magras mentioned this issue Nov 24, 2022
@Dead2
Copy link
Owner Author

Dead2 commented Nov 24, 2022

@magras I tested with #6 and #7.

It looks like SZ_STACK might be fixed. It used to fail around 1/5 times, and I just tested it about 60 times without a crash. Too early to say for sure, but looking better.
Unfortunately SZ_CODE still seems to crash just as badly as before.

This is with clang-12 on Fedora 34 though, as that might be relevant. I'll try to find some time to get my development machine updated to Fedora 37 soon.

[hansr@hk zlib-ng]$ gdb --eval-command="set  disassemble-next-line on" --silent --args ./minigzip -c -k switchlevels.s
Reading symbols from ./minigzip...
(gdb) run
Starting program: /home/hansr/github/zlib/zlib-ng/minigzip -c -k switchlevels.s
 [libstabilizer.cpp:57] Initializing Stabilizer
 [libstabilizer.cpp:60] Stack top is at 0x7fffffffdd10
 [libstabilizer.cpp:68] Signal handlers installed
 [libstabilizer.cpp:76] Trapped all functions
 [libstabilizer.cpp:80] Set re-randomization timer
 [libstabilizer.cpp:87] Finished with program constructors

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000402811 in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:272
272     int main(int argc, char *argv[]) {
   0x0000000000402810 <main+0>: cc      int3
=> 0x0000000000402811 <main+1>: 41 57   push   %r15
   0x0000000000402813 <main+3>: 41 56   push   %r14
   0x0000000000402815 <main+5>: 41 55   push   %r13
   0x0000000000402817 <main+7>: 41 54   push   %r12
   0x0000000000402819 <main+9>: 53      push   %rbx
   0x000000000040281a <main+10>:        48 83 ec 48     sub    $0x48,%rsp
   0x000000000040281e <main+14>:        49 89 f7        mov    %rsi,%r15
   0x0000000000402821 <main+17>:        41 89 fc        mov    %edi,%r12d
(gdb) continue
Continuing.
 [libstabilizer.cpp:186] Re-randomization timer fired at 0x402811
 [libstabilizer.cpp:199] Placing traps

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402903 in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:296
296             if (strcmp(argv[i], "-c") == 0)
   0x0000000000402900 <main+240>:       4c 89 e7        mov    %r12,%rdi
=> 0x0000000000402903 <main+243>:       ff 15 ff 04 00 00       call   *0x4ff(%rip)        # 0x402e08 <adler32_stub+120>
(gdb) info registers
rax            0xd8b481374003883   975952992643594371
rbx            0x4                 4
rcx            0x7ffff7c7a316      140737350443798
rdx            0xffffffff          4294967295
rsi            0xf2e66e1ff5d5f41   1093924880335593281
rdi            0x7fffffffe19e      140737488347550
rbp            0x1                 0x1
rsp            0x7fffffffdbc8      0x7fffffffdbc8
r8             0xa                 10
r9             0x7fffffffb900      140737488337152
r10            0x4007c0            4196288
r11            0x402810            4204560
r12            0x7fffffffe19e      140737488347550
r13            0xb7058d480038      201234473025592
r14            0x0                 0
r15            0x7fffffffde08      140737488346632
rip            0x402903            0x402903 <main+243>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

line 296 calls strcmp, but looks like it points to adler32_stub, that function is even referenced from minigzip.c at all, it is used much deeper in the zlib-ng code.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402827 in main (argc=1, argv=0x7fffffffde38) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
=> 0x0000000000402827 <main+23>:        48 c0 6c a2 f5 ff       rex.W shrb $0xff,-0xb(%rdx,%riz,4)
   0x000000000040282d <main+29>:        7f 00   jg     0x40282f <main+31>

argv access fails, like before.

Program received signal SIGSEGV, Segmentation fault.
0x000000000040282a in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
   0x0000000000402827 <main+23>:        48 70 da        rex.W jo 0x402804 <stabilizer.dummy.show_help+4>
=> 0x000000000040282a <main+26>:        a2 f5 ff 7f 00 00 28 48 89      movabs %al,0x89482800007ffff5

show_help function should not be called until line 310, wonder what that address does here.

Program received signal SIGSEGV, Segmentation fault.
0x000000000040287c in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:290
290         if (!strcmp(bname, "gunzip"))
   0x0000000000402874 <main+100>:       74 23   je     0x402899 <main+137>
   0x0000000000402876 <main+102>:       48 8b 35 e3 05 00 00    mov    0x5e3(%rip),%rsi        # 0x402e60 <adler32_stub+208>

Looks like random function addresses are inserted to the wrong places in the code..?

@Dead2
Copy link
Owner Author

Dead2 commented Nov 24, 2022

The weird thing is that minigzip does not crash if I run it with ./minigzip --help -c -k switchlevels.s, it does without the --help, and seemingly it crashes before the point where it checks for --help..
If I run it through gdb though, it will still crash.

Interestingly it seems to crash when parsing the filename only. If --help appears anywhere before the filename, it succeeds in showing helptext. If --help is anywhere after filename, it segfaults.
So that would mean that the gdb output above is quite incorrect, unless the optimizer has reordered things that much.

So, here are a few runs with -O0 for good measure.

Program received signal SIGSEGV, Segmentation fault.
0x000000000040282a in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
   0x0000000000402827 <main+23>:        48 90   rex.W nop
   0x0000000000402829 <main+25>:        93      xchg   %eax,%ebx
=> 0x000000000040282a <main+26>:        a2 f5 ff 7f 00 00 28 48 89      movabs %al,0x89482800007ffff5
Program received signal SIGILL, Illegal instruction.
0x0000000000402827 in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
=> 0x0000000000402827 <main+23>:        48      rex.W
   0x0000000000402828 <main+24>:        f0 cc   lock int3
   0x000000000040282a <main+26>:        a2 f5 ff 7f 00 00 28 48 89      movabs %al,0x89482800007ffff5
Program received signal SIGSEGV, Segmentation fault.
0x000000000040282a in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
   0x0000000000402827 <main+23>:        48 90   rex.W nop
   0x0000000000402829 <main+25>:        54      push   %rsp
=> 0x000000000040282a <main+26>:        a2 f5 ff 7f 00 00 28 48 89      movabs %al,0x89482800007ffff5
Program received signal SIGSEGV, Segmentation fault.
0x0000000000402829 in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:283
283         prog = argv[i];
   0x0000000000402824 <main+20>:        48 8b 2e        mov    (%rsi),%rbp
   0x0000000000402827 <main+23>:        48 50   rex.W push %rax
=> 0x0000000000402829 <main+25>:        88 a2 f5 ff 7f 00       mov    %ah,0x7ffff5(%rdx)

That does look more consistent, although it still claims to crash before --help, and --help still does not crash. 🤯

@magras
Copy link
Collaborator

magras commented Nov 24, 2022

Stabilizer uses SIGTRAP to patch functions at runtime and debugger doesn't pass this signal to the debuggee by default. You need to use signal SIGTRAP (or short form sig 5) instead of continue to continue execution. In theory you should be able to change handling strategy with handle SIGTRAP pass, but I had no luck with it - probably gdb inserts it's own breakpoints which breaks the program.

@magras
Copy link
Collaborator

magras commented Nov 24, 2022

To clarify, I debugged the issue with core dumps and printf's and tested with echo foo | minigzip. I've got output equal to the output of system gzip, so I assume minigzip works fine.

@Dead2
Copy link
Owner Author

Dead2 commented Nov 24, 2022

To clarify, I debugged the issue with core dumps and printf's and tested with echo foo | minigzip. I've got output equal to the output of system gzip, so I assume minigzip works fine.

That is useful information about the SIGTRAP 😄
However it crashes here whenever I try to run minigzip directly as well, even with your example:

$ echo foo | ./minigzip
 [libstabilizer.cpp:57] Initializing Stabilizer
 [libstabilizer.cpp:60] Stack top is at 0x7ffcf1be1e40
 [libstabilizer.cpp:68] Signal handlers installed
 [libstabilizer.cpp:76] Trapped all functions
 [libstabilizer.cpp:80] Set re-randomization timer
 [libstabilizer.cpp:87] Finished with program constructors
Segmentation fault (core dumped)

and:

$ gdb --eval-command="set  disassemble-next-line on" --silent --args ./minigzip -c -k switchlevels.s
Reading symbols from ./minigzip...
(gdb) run
Starting program: /home/hansr/github/zlib/zlib-ng/minigzip -c -k switchlevels.s
 [libstabilizer.cpp:57] Initializing Stabilizer
 [libstabilizer.cpp:60] Stack top is at 0x7fffffffdd10
 [libstabilizer.cpp:68] Signal handlers installed
 [libstabilizer.cpp:76] Trapped all functions
 [libstabilizer.cpp:80] Set re-randomization timer
 [libstabilizer.cpp:87] Finished with program constructors

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000402811 in main (argc=4, argv=0x7fffffffde08) at /home/hansr/github/zlib/zlib-ng/test/minigzip.c:272
272     int main(int argc, char *argv[]) {
   0x0000000000402810 <main+0>: cc      int3
=> 0x0000000000402811 <main+1>: 41 57   push   %r15
   0x0000000000402813 <main+3>: 41 56   push   %r14
   0x0000000000402815 <main+5>: 41 55   push   %r13
   0x0000000000402817 <main+7>: 41 54   push   %r12
   0x0000000000402819 <main+9>: 53      push   %rbx
   0x000000000040281a <main+10>:        48 83 ec 48     sub    $0x48,%rsp
   0x000000000040281e <main+14>:        49 89 f7        mov    %rsi,%r15
   0x0000000000402821 <main+17>:        41 89 fc        mov    %edi,%r12d
(gdb) sig 5
Continuing with signal SIGTRAP.
 [libstabilizer.cpp:186] Re-randomization timer fired at 0x4005df30
 [libstabilizer.cpp:199] Placing traps

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000000041dd51 in gz_open (path=0x444b20, fd=1, mode=0x7fffffffdbe0 "wb6") at /home/hansr/github/zlib/zlib-ng/gzlib.c:41
41      static gzFile gz_open(const void *path, int fd, const char *mode) {
   0x000000000041dd50 <gz_open+0>:      cc      int3
=> 0x000000000041dd51 <gz_open+1>:      41 57   push   %r15
   0x000000000041dd53 <gz_open+3>:      41 56   push   %r14
   0x000000000041dd55 <gz_open+5>:      41 55   push   %r13
   0x000000000041dd57 <gz_open+7>:      41 54   push   %r12
   0x000000000041dd59 <gz_open+9>:      53      push   %rbx
   0x000000000041dd5a <gz_open+10>:     48 83 ec 18     sub    $0x18,%rsp
(gdb) sig 5
Continuing with signal SIGTRAP.
 [libstabilizer.cpp:149] Re-randomization started after trap on 0x41dd50

Program received signal SIGSEGV, Segmentation fault.
onTrap (sig=<optimized out>, info=<optimized out>, p=0x7fffffffd500) at libstabilizer.cpp:154
154         while (s.fp() != topFrame) {
=> 0x00007ffff7f7f0ac <_Z6onTrapiP9siginfo_tPv+234>:    48 8b 03        mov    (%rbx),%rax
   0x00007ffff7f7f0af <_Z6onTrapiP9siginfo_tPv+237>:    eb e0   jmp    0x7ffff7f7f091 <_Z6onTrapiP9siginfo_tPv+207>

@magras
Copy link
Collaborator

magras commented Nov 24, 2022

Can you make sure that both builds are clean?

#!/bin/sh -e

STABILIZER=$HOME/src/stabilizer
ZLIBNG=$HOME/src/zlib-ng

cd $STABILIZER
make clean all

cd $ZLIBNG
cmake -B build -DCMAKE_C_COMPILER=$STABILIZER/szcc -DCMAKE_CXX_COMPILER=$STABILIZER/szcc++ --fresh
SZ_CODE=1 SZ_LOWER=1 cmake --build build --target minigzip --clean-first
echo foo | LD_LIBRARY_PATH=$STABILIZER build/minigzip | hexdump -C

Output of the last command:

 [libstabilizer.cpp:57] Initializing Stabilizer
 [libstabilizer.cpp:60] Stack top is at 0x7ffd36d73e90
 [libstabilizer.cpp:68] Signal handlers installed
 [libstabilizer.cpp:76] Trapped all functions
 [libstabilizer.cpp:80] Set re-randomization timer
 [libstabilizer.cpp:87] Finished with program constructors
 [libstabilizer.cpp:95] Shutting down
00000000  1f 8b 08 00 00 00 00 00  00 03 4b cb cf e7 02 00  |..........K.....|
00000010  ff ff ff ff 04 00 00 00                           |........|
00000018

There are no errors reported, but upon closer inspection the output seems to be corrupted. For example echo foo | minigzip | gunzip reports a checksum error. I'm pretty sure that I had an identical output for minigzip and gzip at some point, but clearly it is not anymore.

@magras
Copy link
Collaborator

magras commented Nov 24, 2022

The end of your listing:

Continuing with signal SIGTRAP.
 [libstabilizer.cpp:149] Re-randomization started after trap on 0x41dd50

Program received signal SIGSEGV, Segmentation fault.
onTrap (sig=<optimized out>, info=<optimized out>, p=0x7fffffffd500) at libstabilizer.cpp:154
154         while (s.fp() != topFrame) {
=> 0x00007ffff7f7f0ac <_Z6onTrapiP9siginfo_tPv+234>:    48 8b 03        mov    (%rbx),%rax
   0x00007ffff7f7f0af <_Z6onTrapiP9siginfo_tPv+237>:    eb e0   jmp    0x7ffff7f7f091 <_Z6onTrapiP9siginfo_tPv+207>

This crash could be the result of interaction between two signals: SIGTRAP and SIGALRM. During debugging I disable re-randomization in source code or with handle SIGALRM nopass in gdb. This can be another issue for the future, but not for echo foo test, because I believe timer set for 500 ms.

@Dead2
Copy link
Owner Author

Dead2 commented Nov 24, 2022

I always do clean builds. 😉

It seems I found the culprit though, while trying to narrow down the problem I disabled SZ_LOWER so only SZ_CODE was enabled. I didn't consider that it was required by SZ_CODE. Manual runs of minigzip now work (I have not checked whether it is giving the correct output yet though).

When I run deflatebench (python script that runs minigzip using subprocess module, minigzip somehow ends up endlessly looping at 100% cpu. If I attach to it and try to interrupt it, it nearly always stops at code related to std::_Rb_tree_increment.

0x00007ffa640f1070 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) frame
#0  0x00007ffa640f1070 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
MemRange::contains (this=0x7ffa5fcd2208, p=0x7ffa63cd3610) at ./MemRange.h:56
56              return offsetOf(p) < size();
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa640f108f in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa642512e0 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt () from /lib64/libstabilizer.so
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa640f108c in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa642512e0 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt () from /lib64/libstabilizer.so
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa640f1070 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa640f10a0 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffa64251793 in FunctionLocation::find (p=0x7ffa63cd3610) at ./FunctionLocation.h:25
25          static FunctionLocation* find(void* p) {
(gdb) info registers
rax            0x152fcd0           22215888
rbx            0x152f9d0           22215120
rcx            0x43ca3990          1137326480
rdx            0x278               632
rsi            0x7ffa2002fc80      140712255618176
rdi            0x152f9d0           22215120
rbp            0x7ffa64258a38      0x7ffa64258a38
rsp            0x7ffd959e6098      0x7ffd959e6098
r8             0x152fa90           22215312
r9             0x7ffd959e3e52      140727113629266
r10            0x7ffa6424c540      140713398682944
r11            0x7ffa6425178c      140713398704012
r12            0x7ffa63cd3610      140713392944656
r13            0x7ffa642592c0      140713398735552
r14            0x7ffa642592e8      140713398735592
r15            0x7ffa642592b8      140713398735544
rip            0x7ffa640f1080      0x7ffa640f1080 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)+16>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

I guess this might be caused by python/subprocess signal handling, although I have never seen any similar problems before.

@magras
Copy link
Collaborator

magras commented Nov 25, 2022

It seems I found the culprit though, while trying to narrow down the problem I disabled SZ_LOWER so only SZ_CODE was enabled. I didn't consider that it was required by SZ_CODE. Manual runs of minigzip now work (I have not checked whether it is giving the correct output yet though).

I have fallen into the same trap more times than I'd like to admit. =(
Actually I'm not sure in which cases it's possible to run code stabilizer without lowering. Maybe it makes sense to remove this option.

When I run deflatebench (python script that runs minigzip using subprocess module, minigzip somehow ends up endlessly looping at 100% cpu. If I attach to it and try to interrupt it, it nearly always stops at code related to std::_Rb_tree_increment.

I've seen similar behavior when I set watch point in gdb, but I do not remember where it was looping.

@Dead2 Dead2 closed this as completed in #7 Nov 25, 2022
@magras
Copy link
Collaborator

magras commented Nov 26, 2022

By the way I'd like to find out what's causing the corrupted output, but I don't know how to approach it yet. Without crashes and properly working debugger the only route I see is to trace the execution of minigzip with printfs, which sounds not as fun as I'd like, especially considering my unfamiliarity with the zlib-ng's code base.

In the meantime I'm working on the exceptions issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants