-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blightmud immediately disconnecting when connecting to TLS mud #788
Comments
Sorry about hijacking this comment but I don't want the original issue to creep into this new one. And it's better for tracing. Once we have some logs with this error from Mac I don't think it should be too hard to figure out the issue (famous last words). |
Here are the instructions for producing the logs we need: #621 (comment) (Perhaps we should add these instructions to some form of doc?) |
I was able to borrow a MacBook Air (M1, 2020) running MacOS Monterey 12.4 to do some quick testing. I tried both a release and debug build of 0ba7d13 (tip of dev at the time of writing). With both builds I ran @cmetz Are you still seeing this behaviour? I think logs from your machine (and maybe a pcap) are going to be required to make progress until one of us can reproduce the bug reliably. |
I think I'll just close this. If anyone does experience the bug in the future just drop a new issue and we'll deal with that. |
I've been looking into tweaking some socket options and noticed this same behaviour happens to me occasionally. I'm on Linux, building tip of
That's my theory too. Re-opening this to remind myself to try to debug. |
I added some more logging and was able to catch an instance of this happening with a good lead for further debugging: Client side:
Server side:
|
I've done a lot more investigation and my conclusions are a bit fuzzy at this point but I figured I could share my results so far since I can't dig any deeper today. I'm only able to reliably reproduce this issue with the Dune MUD server/configuration, but I'm not sure yet that it's a server-side bug. To chase this down I wrote an autoloading Lua config that would:
With this approach I was able to reliably reproduce the issue of seeing an immediate disconnect after receiving a TLS protocol version alert from the server, but only with Dune/DevDune. I was unable to reproduce with an With Dune/DevDune it usually took around ~30 cycles to reproduce the disconnect. With the unaffected servers I could do 500+ cycles without issue. During a failure attempt I took a packet capture and configured Blightmud to write its TLS session keys to a file. With the keys + pcap I could see in Wireshark that Blightmud completes the full TLS 1.3 handshake and we begin exchanging application data records with the server. The "smoking gun" seems to be a message of TCP "continuation data" with a TLS payload that Blightmud sends to the server. Afterwards the server sends application data, and then the incorrect protocol version alert. The continuation data could be caused by a few things but given successful connection attempts don't have a similar message and it's the last thing we send the server before a disconnect it seems linked to the root cause (whatever it is). Probably the most important aspect so far is that my reproduction script doesn't work with other TLS enabled MUD servers I've tested. Finding another server I could reliably reproduce the issue with would be a great help. With the data I have so far it seems like the possibilities are:
Here's the lua script I used:blight.output("!!! Loaded repro script")
local host = "localhost"
local port = 4243
local cert_verify = true
local wait = 0.2
local max_checks = 1000
local checks = 0
local function repro()
blight.output(string.format("Check try number: %d", checks))
if checks > max_checks then
return
end
if mud.is_connected() == true then
blight.output("!!! Disconnecting from existing conn")
mud.disconnect()
end
blight.output("!!! Connecting")
mud.connect(host, port, true, cert_verify)
blight.output("!!! Registering wait timer")
timer.add(wait, 1, function()
blight.output("!!! Checking after wait")
if mud.is_connected() == true then
checks = checks + 1
blight.output("!!! Connected, so repeating...")
repro()
else
blight.output("!!! Disconnected - quitting - check the logs!!!")
blight.quit()
end
end)
end
blight.output("Kicking off repro loop")
repro() Here's the relevant part of the pcap + session keys: |
I have been able to reliably reproduce this issue with the following MUDs (MUCKs actually) as well:
tintin++ connects to these servers over TLS without issues. By the way, how do I turn on debug info? I 'd like to attach debug logs for these servers if possible. |
Thanks - that's useful data 👍
Start Blightmud with I owe this ticket an update - I did more digging since my last comment. I suspect (but haven't proved) the RwStream abstraction may be involved and its a race on the Blightmud side. The The Blightmud architecture wants to split the read/write sides of a TCP stream and use blocking I/O on each half in separate send/receive threads. TLS streams don't easily map to that abstraction since reads can cause writes (e.g. for alerts) and there's state to maintain that assumes (and enforces through the type system) sole access. I think that's why the original native-tls code was fighting being put into this arrangement, requiring the If this is the cause (again just a theory at this point) I think the right fix would be to reconstitute the read/write threads into one thread that uses non-blocking I/O with something like mio. |
Thanks! The logs don't seem particularly helpful with regards to the TLS issue though. :( Logs for
Another run with the
Should look like this (output from
|
Yeah :-( Let me push a patch I've been using locally that adds more debugging in one place. Though, for these servers I think the problem I'm chasing is unrelated:
This one is definitely unrelated to the bug at hand: the server only speaks TLS 1.0 on this port and the TLS implementation that Blightmud uses only supports TLS 1.2 and TLS 1.3. Unfortunately TLS 1.0 is woefully insecure. TinTin++ offers a much less opinionated implementation, which is better for compatibility with MUDs like this one.
I'll have to look closer at this one when I have time but I suspect it also might just be a server generally incompatible with the TLS library. I can reproduce a handshake failure with a separate Rustls program than Blightmud. |
|
If it helps, here's the same connection for the last server being negotiated over TLS 1.2, using the openssl s_client:
|
I would not be surprised if this is the cause of the problem we are seeing. IIRC this will be tricky to switch out. |
This server's TLS configuration is also pretty dire (It's still offering SSLv3 💀). The issue of importance w.r.t Blightmud is that
Indeed :-( I started looking into it over the holidays and have some ideas but they will be time intensive to realize. I'll see if I can find time to make a PR sometime in Jan. Confirming this is the cause of the issue before diving into that rabbit hole would be nice, but I figure even if it doesn't fix the bug it will eliminate some |
Issue flagged as stale |
Issue flagged as stale |
Issue flagged as stale |
@cpu yeah it's not crashing anymore 👍 but with tls enabled i now get instantly disconnected when directly launched via terminal. i'll test this later on linux also.
steps to reproduce:
results in:
but sometimes it works and it stays conneted. without tls enabled, it always seems to work. maybe its a timing issue.
Originally posted by @cmetz in #621 (comment)
The text was updated successfully, but these errors were encountered: