Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
2.11 RC1: Nessus Scan crash the Windows-Client. #7431
Describe the bug
We tested the 2.11 RC1 against the Nessus Scan again. So it's a litte bit a follow up from #6559: the Linux Client survived on CentOS 7 - Great! But not on our Windows System. Here the service is stopped after the scan. 2.11 RC1 is installed on a Windows Server 2012 R2 Standard.
If you Need something more pls tell us.
Scan with a security scanner against a Windows system
The Windows Client should also not Crash like the Linux Client.
Include as many relevant details about the environment you experienced the problem in
My college did the security scan yesterday between 15:31 and 15:44 for the test environment. I filtered and anonymize the icinga2.log for this period. There was also only one entry in the Windows Event log. Icinga didn't produce a crash log.
For me it looks like, that the windows client has problems with the GET-requests at 15:39:21, like GET %24%7B%7B57550614+16044095%7D%7D//. After that the client stopped working.
Thanks, the logs unfortunately don't tell much except for the connections and requests made. The last log line before the restart doesn't unveil much.
I've performed the HTTP GET request against my Windows 10 VM, works as expected.
Just our of interest - I read that Nessus is a free scanner, but obviously has enterprise modules and so on. Which version of it are you using and do you think that there is a chance to reproduce this just with using the free scanner?
Ok, than I ask my college for another scan and I try to "clean" (and stop all other Services from our other software on that Server) to get good logs; inclusive debug.log. Also I ask him which version and modules we use from Nessus. What I know we use the commercial version.
So this is the Information I got from my colleague about the version:
logs are coming later
So my colleague from the security department and I did a lot of scans with various scenarios.
Logs for this:
an interesting fact from yesterday's scan (before wie read the issue with the debuglog)
I hope this helps you to find the bug. If you need more, pls tell us.
With google I found some postings in forums, that also other people has problems with this DLL. They also have crashes. They didn't write the reason. But maybe this is helpful for you:
Both Stefan and Philipp are doing a marvelous job here - they provided me with a full self-written PDF on how to install and configure Nessus Essentials to trigger the full scan.
Previously I tested with a macOS install, but with a wrong subnet Nessus locked me out of the trial (yeah, enterprise software which always wants you to buy something). Now I am using a CentOS 7 Vagrant box.
I can see that Icinga stops running when such a scan happens. This is good news, so I can dig deeper.
For security reasons, I am not sharing any details about Nessus and its configuration nor the exact logs for triggering things until I have found and implemented a fix.
That are a Little bit bad news. Because that means for you more hard work to fix. But you can do that, we're sure.
http::async_read_header() may return an error during liveness disconnect. For some reason, reading the header doesn't return in this case. Calling cancel() in Disconnect() does not cancel running operations, nor would it react shuttingDown. Either we're overlapping buffers here (which I highly doubt), or there's a race condition involved where Boost asio fails. Or something fails with coroutines and strands here.
Upgrade to Boost 1.71 would also be a test option for next week. Code resides in bugfix/windows-nessus-crash-http now @Al2Klimov.
Last resort: Disable IOCP on Windows and force to use select() for IO handling with Boost Asio.
The error is somewhat different, at some point the thrown error is somehow broken. Maybe there is an overlap with the error codes as well. Looked into ceph's handling here too https://github.com/ceph/ceph/blob/6171399fdedd928b4249d135b4036e3de25079aa/src/rgw/rgw_asio_frontend.cc
Btw, you cannot even go into the Windows specific DLLs to read source code. Only Assembler. Closed source sucks.
Dynamically link Boost DLLs
Still crashes, but at least a crash log is now written with an unhandled SEH exception.
When run within a coroutine, exceptions on Windows may influence bad behaviour here. Instead, we'll check for the error code and extract the message from memory. In contrast to exceptions which are stored on the stack frame and then return, this costs a little more memory but simplifies the logic. This doesn't fix the linked issue, but is related to the analysis. refs #7431
…them in sbin/ This comes with the motivation to avoid bundling Boost context into our main binary. Whenever the context library allocates stack memory, or does something weird, we want to see this in the stack traces as origin from an external DLL and not the main icinga2.exe binary. TL;DR - this doesn't fix #7431 but makes debugging and development a bit more easy. And it follows the Windows approach with DLLs.
Exceptions in Disconnect() might be thrown (this has been reworked into error_code locally) which are swallowed inside the Destructor for being dangerous. On the other hand, swallowing them may corrupt the stack unwinding operation from the coroutine layer. The best is to avoid Defer inside lib/remote and call Disconnect() directly after breaking from other operations. refs #7351 refs #7431
TL;DR - 2 weeks with emotional ups and downs and a lot of technical background learned. Icinga now survives a Nessus scan.
Both Nessus and OpenVAS work. Thanks to Stefan & Philipp for providing a setup documentation for Nessus Essential. Won't be provided for security reasons here.
Wireshark TCP Dump
Change the cipher_list, add the node's private key for following the TLS traffic, extract the sent malformed requests.
Only failing requests would lead to stack buffer underrun or memory corruption errors.
Boost Beast HTTP Parser
http::async_read_header may throw exceptions leading into crashes. In order not to populate the exception
Boost ASIO Socket IO
Same strand in m_IoStrand as with other operations. No special influence here.
ASIO wraps them and makes them easier to use.
Coroutine Stack Unwinding on Complete
Swallowing exceptions is evil with Boost Coroutines.
Coroutines work in the way that they allocate their own stack. Whenever an exception is thrown,
Exceptions are asynchronous and put onto the stack. This stack is also used by the Coroutine to
At some point, the thrown exceptions corrupt the stack. Only on Windows, only with Visual Studio as compiler.
In order to trigger this problem, either fire many Nessus Scan Attacks against the HTTP API.
The easiest way to reproduce this is to use the debug console and evaluate a script expression
Coroutines and Context Stack
Our (exception) stack is foobar at some point. Either Beast exceptions throw, our own ScriptError exceptions throw and something asio related itself.
One can manipulate the coroutine stack size allocated by Boost.Context. This defaults to 64KB.
Sets the default stack size to 8 MB. 64MB. Does not immediately crash, but later.
Incompatibility between std, boost and ScriptError exceptions?
I've found a library which wraps the coroutine exceptions from their different types.
The implementation is safe and good to use - since I've now learned that coroutines only support
The Visual Studio Compiler optimizes several bits in a way we as developers do not expect it.
In terms of Boost, we need to disable certain flags already:
These optimizations don't have much of influence here.
Instead, I've found some compiler optimizations going on to reduce the binary sizes.
ASIO IO Threads manipulating memory?
Tested this, but it did not fix the crash. Won't apply the patch on Windows therefore.
Well, turns out that the default allocated stack size for a coroutine is 64 KB. Whenever it has exceptions inside, this stack may be too low.
Raising the default coroutine stack on Windows actually fixes the problem.
Whenever valid requests are made, everything is fine.
Boost Beast http::async_read_header can throw an exception. We can re-build this into error codes. Doesn't change.
Not all of them are boost compatible exceptions.
This includes a stack deallocate() call which calls std::free(). This crashes with pointing to memory which is corrupted.
Yes. Rewriting EnsureValidHeaders() to not use these exceptions allows the code to complete.
No fucking idea what the Visual Studio compiler renders from this code bit. It may originate from static INLINE too.
The stack size with 64 KB is too low. Why? Look into what's stored inside the exception. The whole DebugInfo struct is massively bloated.
It may also be the case that exceptions inside coroutines cause an overhead which is
Yes, Boost Coroutine doesn't work with the unwind exception.
Yes, these two patterns are incompatible within the new network stack. Avoid them at all cost.
Tests have proven something between 8 and 64 MB. This still is too high, but mitigates the issue.
Ensure that Coroutine Stack unwinding is fully supported within the API.
Ensure that HTTP Disconnects don't fire I/O.
Wrap thrown exceptions into Boost compatible exceptions in boost::asio::spawn().
References and URLs
Everything I have had opened and shed some light into finding the cause.
Thanks for the kind feedback, now we can start preparing the 2.11 release.
As written on Twitter already, the intention with all the details is to not only help an Icinga developer, but also everyone developing with Boost, Coroutines, etc. on Windows and debugging the same issue. Sharing knowledge is what makes open source strong, not only the source code.