-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to solve the UNIX signal handling mess #818
Comments
Just came across this issue, I am hitting the assert in the exception handler, eventually traced it down to the case where multiple |
The assert has been removed in 9c62bc7, which probably makes this section way more dangerous in a mutlithreaded environment (and maybe this is just user error here). Specifically:
When multiple threads are running, you very quickly end up in a situation where the "old" cache is a cache of another thread installing the Yara signal handlers. The whole process ends up with the Yara signal handler installed outside Yara scan context. With the missing abort()/assert(), this is dangerous since SIGSEGVs outside Yara scan get ignored. I've reintroduced the assert() as an abort(); my local behavior is then:
|
The exception handling logic has changed since this issue was opened. Re-open if this is still an issue. |
I just upgrade our software from v3.5 to v4.2.3 of libyara. We had an issue in production that showed this is still an issue even after the changes listed above. I dug out the issue myself and then found this discussion, which reinforces my original thoughts. Some comments: hillu was right on with the following:
I had no idea we were using this in 3.5 (with only sigbus) and it was only with the introduction of sigsegv that the bad side-effects started hitting hard.
This +=100 ! Nowhere in the documentation does it mention what is happening.
Again, yes! I can verify that it is NOT thread-safe. russianfool's description was the exact same conclusion that I came to after a few days dealing with mis-behaving code. Some of his comments relate to the old tidx methodology, but the gist is still there even after the 9c62bc7 patch. His comment regarding the ease at which a multi-threaded process calling yr_rules_scan_mem() will quickly and irrevocably (assuming the non-yara code doesn't mess with signal handlers) replace the signal handler with yara's exception_handler() is verifiably correct based on my experience and testing. Further, russianfool's comment #3 is the (mis-) behavior that we were seeing. Whereas before when our code would segfault, it would dump core (which we could examine and debug) and kubernetes would restart it. With the infinite loop, we just bumped up to 100% cpu and waited until other threads started reacting and the process failed in various other ways. In discussing with colleagues we haven't been able to determine a way that a global signal handler can be used in a local fashion. Especially in a 3rd party library with no warning of the non-thread-safe-ness of the behavior. Our solution was to enable the SCAN_FLAGS_NO_TRYCATCH flag. (This solution was determined by digging through source code, not via any documentation.) However, my contention is that this feature should be off by default and only enabled by the SCAN_FLAGS_THREAD_UNSAFE_TRYCATCH flag. |
The
yr_rules_scan_*
functions install signal handlers for SIGSEGV and SIGBUS. I added those so that an application using libyara would not be terminated when scanning files on corrupted file systems. Another case I found were files that were concurrently being modified (truncated).It was a bad idea to make this behavior largely transparent to the user; adding the SCAN_FLAGS_NO_TRYCATCH for not installing any signal was not a very good solution, part of that is lack of documentation. Moreover, I am pretty sure that YR_TRYCATCH as it is implemented for UNIX so far is not thread-safe.
I'd like to collect some ideas here on what a "proper" implementation might look like
yr_rules_install_exception_handler
/yr_rules_uninstall_exception_handler
functions?The text was updated successfully, but these errors were encountered: