-
Notifications
You must be signed in to change notification settings - Fork 102
GNUTLS segmentation fault with connection timeouts #215
Comments
|
so the suggested "naive" patch works, but it probably hides the problem under the rug - i can't imagine it's a good thing to just skip those callbacks ... @aaronmdjones maybe you have a better idea here? |
|
I suspect it is related to his TLS cleanups. It looks like rb_ssl_timeout() is being called without F->accept ever being set. |
|
I did not change any of the logic in the GNUTLS backend. As far as I remember I only ever changed the way it reports its version, and prevented a crash bug in |
|
is there anyone that groks that code? otherwise maybe my patch should just be added to the other backends, which are probably susceptible to similar crashes |
|
I could have a look at the GNUTLS backend tomorrow and see what it does differently from the OpenSSL backend with respect to its callbacks. |
|
i did some of that earlier - couldn't find anything. at this point, i wouldn't assume OpenSSL works at all - i have been running those servers with GnuTLS and 3.4/3.5 codebases without problems until we deployed qwebirc, which triggers the bug. |
|
The OpenSSL backend should work, as it's using the same non-OpenSSL-specific backend code as the MbedTLS backend, which I threw a battery of tests at. You can see with |
|
well, that's just because i pushed to have GnuTLS support restored in Charybdis, so yeah I "own" the whole file... i didn't actually commit anything in there: https://github.com/charybdis-ircd/charybdis/commits/release/3.5/libratbox/src/gnutls.c regarding those tests: can those be targeted at the gnutls backend to see if we can reproduce this reliably then? |
|
Unfortunately they were all manual so I'll give it a go tomorrow. Off the top of my head I tried:
And maybe a few others. Then the usual course of tests with normal connections (completing user registration) on the 3 different libraries to ensure interop. These were all done under Valgrind and ASan. |
it would be great to have that stress test suite available somewhere... |
|
I have done some experimentation against the GNUTLS backend and cannot find any problems. If you can find what exactly in qwebirc is tripping up the IRCd I can write a reproducible test for it. |
|
I have made some progress in the investigation. I suspect it is caused by negotiation failure -- the current GNUTLS backend code does not distinguish between negotiation success and failure before indicating to the IRCd that everything is okay. I am currently gutting the GNUTLS backend to bring it in line with the MbedTLS one. I will update the issue when I have some code for you to test. |
|
Please test commit c0f6591 in aaronmdjones/charybdis |
|
wow... that diff is scary... did you test that code at all? |
|
Nope! That's why I asked you to test it... and why I didn't push it upstream. However, I did just test commit fbe8a69 and I can establish a session with that. |
|
I'm happy with commit 569d5f6 if you are. I'll be throwing more tests at it and then opening a pull request for other chary developers to review. I've also done the same to the OpenSSL backend. The diffs between all 3 of them are now fairly minimal. |
|
if understand this correctly, i actually need 3 patches: c0f6591, fbe8a69 and 569d5f6? it's too bad that the diff is so far-reaching: it makes it basically impossible to review, and hard to backport to 3.4, short of just copying the file in place completely. that's basically what i'll do to test this here, because i am not sure that those three patches are sufficient. thanks for working on this anyways! |
|
Oh yeah you'd definitely be better off just replacing the file wholesale; I did mean "the current state of the file at commit ...". I don't need you to actually work on integrating it though, I just need you to confirm it fixes the issue, and then we'll go about integrating it into upstream and you can just grab the repository at that commit instead. Since I'm doing work on all 3 backends and at least this new backend will be a bug fix (I also discovered several memory leaks in the current upstream GNUTLS backend) we may even end up doing a new release for it, which should make your work even easier. |
|
that's great news! the problem is i am still maintaining a 3.4 branch for debian stable here. i factored in a few things there, including pulling in all of gnutls.c from scratch, as it is simply not present in 3.4 :( so that would be a little harder to merge in as well... :) |
|
Well if you already pulled in one whole file ... ;) |
|
problem is auditability: the whole file changed, so there's no way for us to figure out wth happened. gnutls support generally works very well in 3.4, it's just this one bug that was left. :p |
|
The main problem (the one I suspect causing this reported issue) is that the current upstream GNUTLS backend does not properly check the return value of It's still going to leak memory all over the place though. Adjusting the test for handshakes that succeed is only going to make the leaks worse -- every failed handshake will also leak memory too (right now they just crash the backend if they fail in the right way), in addition to the current rehash leaks. |
|
Also, I sure hope you backported commit 818a3fd (if applicable). |
|
818a3fd has been relased in Debian last week, iirc, i made sure of that: https://security-tracker.debian.org/tracker/CVE-2016-7143 heck, i'm the one that requested that CVE. :p |
|
@kaniini Do we even still support version 3.4? I've not been prodded (by you or anyone else) to push any security improvements or fixes to it. |
|
regarding there are two other places where i would sure prefer such a patch for now, because right now i'm battling with my crappy silencing patch and it looks pretty ugly. i think memory leaks are not so critical as to warrant a full update of the gnutls backend in stable, but i could ask the release team for approval on that explicitely. regarding 3.4 support, unfortunately i have to support it here, at least for security, because we have that in stable. if you make a clear statement that using 3.4 is harmful, then maybe i could convince the release team to upgrade to 3.5 altogether, but that (running the same version across all suites) happens rarely. |
|
At the very least I'd say it should be strongly discouraged. I doubt you have the configurable fingerprint digest algorithm functionality added in 3.5, for example -- which would mean your GNUTLS backend port is using SHA-1 for digesting client and server certificates. SHA-1 is no longer considered acceptable for this purpose; hell, NIST was recommending against it 6 years ago. |
|
yeep, sha-1. :/ i guess that'll be a good argument for upgrading, but for now i'm just trying to fix this one crash. ;) |
|
@aaronmdjones i'm open to doing a 3.4.3 security release if you want, but would rather emphasize adoption of 3.5 and 4.0 branches as charybdis 5 is not due to be stable for at least another 18 months most likely. |
|
I'd rather encourage adoption of 3.5 too. |
|
so i'd be glad to jump ship and join the 3.5 bandwagon, although i doubt i could convince the stable release managers to agree in debian. but that hardly matters: i need to fix this bug. i stepped away from this for a while and now i'm back to square one, because there's no release that bundles those fixes in a meaningful way. 3.5.3 was released before those branches were merged... what's the way forward here? chary 4? |
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
I have just pushed a release 3.5.4.
…-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCAAGBQJYtexpAAoJEOxvhu6c2EC1R5UP/3tyWY4WKU1V1751QKcBXaHA
1A8y4wemcVlVMvrwMb3PvfEhn/RDpMHnhUM7PTzv93/cYMDPCQN2ffDSOIK7XsUW
YU82Ou51UyqJE1QFPfTSfiiJTypfAWMsK8H8yr/9XChacUwfiSisodzuGoWhSKUK
B6flsKXZ1w7RqhT5j7DnFDuqAgehLD3E+TQ7f6obiy38CfLMqCqsdJA/Jvfw2w3/
XINiZWweMIbpte/2Xk+WDaNQQKXtMJbV7I6OFplxg6YvsTCdCjvsjVGs3ie9p3xP
zHdRGvFsFiET2yA34ekzD7WpLLGNKelPAR7uPf8M0syvRS/VuhQKq/ciWuFCIAi+
SeJBLu/SzveGG4lyHpLvC+tbBopjshz9UcCZd+2F5z1MyKLBLabV17lAoOis3UPH
8SwGM7hqpy8ilKBRwAZUYRuH+ZHM0iv+1gyc46avpdJ98nEx4lDsNTaY70DL9+5M
EudRZl4dK1iK1Qkg4HdZ9zS21Xm6euG8FEPNZ8pXOyCuZmPirNNFQTUkXDQB6Hdj
FtczHaikx7brInOUfFEjRVF8GIOyzFnWndORRq3HuKYvtMsaDFK4TsEv4scVJ/Nc
CQp058eGpkI5Rujehq+dvXkoFpvT0fgOFCzHRJ32tH6xLbtvp35PsO6cMI7qiKB5
zKS2U2wdrGMDUAb5ggCw
=TW1L
-----END PGP SIGNATURE-----
|
|
@aaronmdjones thanks - i'll test that and report back here. |
|
@aaronmdjones i'm geetting this in 3.5.4, investigating: SSL connections fail: |
|
btw, the |
|
well, what a clusterfuck... turns out the gnutls code doesn't really work at all the minute you actually have a certificate to load (or maybe a certificate chain? who knows...), see #238 for details. the second error was fixed by using a custom i would have thought that a working default would have been shipped in the gnutls module... oh well. |
|
oh, and with #238, i can confirm this issue is resolved in 3.5.4. i am not sure we can say it's really fixed until that's merged, because i can't really test it, but i assume it is... |
|
Apologies. It seems the default I shipped uses a token that was introduced with a recent GNUTLS but Debian is still stuck using an old one. I will use older tokens. |
|
With your PR merged and commit 5d8a480 does it work? |
|
@aaronmdjones yes, thanks. |
I have this strange crash here which I can reproduce only when using a
qwebircwebchat frontend. From the user's point of view, the connexion fails with:From the server side, it's much more fun though:
It looks like a NULL pointer is passed as
F->acceptto therb_ssl_timeoutcallback. it is unclear why or when that is called - it looks like it's being called during the client setup. i have indeed found the handler is called with a NULL pointer here:but it could be that it's supposed to be configured after the handler is setup? that seems fishy. in fact, using a
rb_settimeoutbreakpoint, all the handler setups would have a NULL pointer there.i wonder if a naive fix like this would be enough:
any feedback would be appreciated.
This is Charybdis 3.5.3 or 3.4.2 on Debian jessie. This server was working fine when it was running 3.3.0.
The text was updated successfully, but these errors were encountered: