-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SZ_CODE Function relocation crash info #1
Comments
It seems the disassembly differs slightly from run to run
Once out of about 15 tests similar to the above, it got a little further:
This suggests that the function has either been mangled by the stabilizer llvm pass, or the libstabilizer has relocated it incorrectly. I would bet on the llvm pass being at fault because the porting to a newer llvm was done in several steps by various 3rd-party people and is likely a complex task. |
In order to debug with gdb, you have to recompile libstabilizer with debug enabled: |
For me it looks like off-by-one issue... The first instruction that stays same is 3 bytes, but the changed bytes start from 4 bytes later. If you concatenate all bytes, you see that bytes Would be useful to see disassembly of the |
Can you describe the steps to reproduce the issue? I am far from an expert but may be I'll be able to help. |
Not the best or simplest testcase, I am sure it could be simplified to a really minimal .c file, but this is what I have right now: Clone zlib-ng https://github.com/zlib-ng/zlib-ng
In gdb do |
I also see crashes with |
Sorry, I fled my country and unfortunately right now I can not help you with this issue. I'm not sure, that it is relevant, but I noticed that stabilizer built with llvm-14 doesn't compile zlib-ng saying
about If you are going to work on this project, I can recommend a few techniques that helped me to debug the issues:
This techniques might be less useful right now, when the error happens in runtime, but probably you will need them later to fix generated code. |
@magras Real life comes first, I understand completely, and I hope you are safe friend. I am a bit busy with some important things IRL currently myself, but hope to get around to attempt to understand these bugs better sometime this winter. I did not have any problems compiling gtest locally, however gtest can be disabled with It is on my TODO to try to reproduce this bug with a smaller piece of source code, possibly only a main() function copying and printing data from argv would be enough. |
I found the root cause: stabilizer used relocation table to access a tls variable, which resulted in an attempt to write to I'm working on a fix, but looks like there are more issues, so it might take time. |
@magras I tested with #6 and #7. It looks like SZ_STACK might be fixed. It used to fail around 1/5 times, and I just tested it about 60 times without a crash. Too early to say for sure, but looking better. This is with clang-12 on Fedora 34 though, as that might be relevant. I'll try to find some time to get my development machine updated to Fedora 37 soon.
line 296 calls strcmp, but looks like it points to adler32_stub, that function is even referenced from minigzip.c at all, it is used much deeper in the zlib-ng code.
argv access fails, like before.
show_help function should not be called until line 310, wonder what that address does here.
Looks like random function addresses are inserted to the wrong places in the code..? |
The weird thing is that minigzip does not crash if I run it with Interestingly it seems to crash when parsing the filename only. If --help appears anywhere before the filename, it succeeds in showing helptext. If --help is anywhere after filename, it segfaults. So, here are a few runs with
That does look more consistent, although it still claims to crash before --help, and --help still does not crash. 🤯 |
Stabilizer uses |
To clarify, I debugged the issue with core dumps and printf's and tested with |
That is useful information about the SIGTRAP 😄
and:
|
Can you make sure that both builds are clean?
Output of the last command:
There are no errors reported, but upon closer inspection the output seems to be corrupted. For example |
The end of your listing:
This crash could be the result of interaction between two signals: |
I always do clean builds. 😉 It seems I found the culprit though, while trying to narrow down the problem I disabled SZ_LOWER so only SZ_CODE was enabled. I didn't consider that it was required by SZ_CODE. Manual runs of minigzip now work (I have not checked whether it is giving the correct output yet though). When I run deflatebench (python script that runs minigzip using subprocess module, minigzip somehow ends up endlessly looping at 100% cpu. If I attach to it and try to interrupt it, it nearly always stops at code related to std::_Rb_tree_increment.
I guess this might be caused by python/subprocess signal handling, although I have never seen any similar problems before. |
I have fallen into the same trap more times than I'd like to admit. =(
I've seen similar behavior when I set watch point in gdb, but I do not remember where it was looping. |
By the way I'd like to find out what's causing the corrupted output, but I don't know how to approach it yet. Without crashes and properly working debugger the only route I see is to trace the execution of minigzip with printfs, which sounds not as fun as I'd like, especially considering my unfamiliarity with the zlib-ng's code base. In the meantime I'm working on the exceptions issue. |
This thread is for tracking information related to the crash on application-startup when SZ_CODE is enabled.
If anyone wants to have a go at fixing this bug, please contribute any additional information that can be useful in fixing this bug.
I suspect that this bug is somewhere in
pass/Stabilizer.cpp
, where the related llvm pass integration is.The text was updated successfully, but these errors were encountered: