Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blightmud immediately disconnecting when connecting to TLS mud #788

Closed
LiquidityC opened this issue Apr 12, 2023 · 18 comments
Closed

Blightmud immediately disconnecting when connecting to TLS mud #788

LiquidityC opened this issue Apr 12, 2023 · 18 comments
Labels
bug Something isn't working help wanted Extra attention is needed stale No activity
Milestone

Comments

@LiquidityC
Copy link
Member

@cpu yeah it's not crashing anymore 👍 but with tls enabled i now get instantly disconnected when directly launched via terminal. i'll test this later on linux also.

steps to reproduce:

> blightmud -n -t -c mg.mud.de:4712

results in:
grafik

but sometimes it works and it stays conneted. without tls enabled, it always seems to work. maybe its a timing issue.

Originally posted by @cmetz in #621 (comment)

@LiquidityC
Copy link
Member Author

Sorry about hijacking this comment but I don't want the original issue to creep into this new one. And it's better for tracing. Once we have some logs with this error from Mac I don't think it should be too hard to figure out the issue (famous last words).

@LiquidityC
Copy link
Member Author

Here are the instructions for producing the logs we need: #621 (comment)

(Perhaps we should add these instructions to some form of doc?)

@LiquidityC LiquidityC added bug Something isn't working help wanted Extra attention is needed labels Apr 12, 2023
@cpu
Copy link
Member

cpu commented Apr 22, 2023

I was able to borrow a MacBook Air (M1, 2020) running MacOS Monterey 12.4 to do some quick testing. I tried both a release and debug build of 0ba7d13 (tip of dev at the time of writing). With both builds I ran blightmud -n -t -c mg.mud.de:4712 ~10 times and never saw an abrupt unexplained disconnect. I'm afraid I can't reproduce at all :-(

@cmetz Are you still seeing this behaviour? I think logs from your machine (and maybe a pcap) are going to be required to make progress until one of us can reproduce the bug reliably.

@LiquidityC
Copy link
Member Author

I think I'll just close this. If anyone does experience the bug in the future just drop a new issue and we'll deal with that.

@LiquidityC LiquidityC added this to the Version 5.3 milestone Sep 29, 2023
@cpu cpu reopened this Nov 9, 2023
@cpu
Copy link
Member

cpu commented Nov 9, 2023

I've been looking into tweaking some socket options and noticed this same behaviour happens to me occasionally. I'm on Linux, building tip of dev and using /connect <host> <tls port> true true. Once in a while (maybe 1 in 5 tries?) the connection succeeds, the login screen appears, and there's an immediate disconnect. Reconnecting works reliably.

maybe its a timing issue.

That's my theory too. Re-opening this to remind myself to try to debug.

@cpu cpu changed the title Blightmud immediately disconnecting when connecting to TLS mud from command line on Mac Blightmud immediately disconnecting when connecting to TLS mud Nov 9, 2023
@cpu
Copy link
Member

cpu commented Nov 9, 2023

I added some more logging and was able to catch an instance of this happening with a good lead for further debugging:

Client side:

<snipped>
[00:00:00.275] (7fb55570c6c0) DEBUG  END prompt: Please enter your name:
[00:00:00.331] (7fb55570c6c0) DEBUG  Read an err from stream: Err(Custom { kind: InvalidData, error: AlertReceived(ProtocolVersion) })
[00:00:00.331] (7fb55570c6c0) DEBUG  Bytes: []
[00:00:00.331] (7fb55570c6c0) DEBUG  read empty bytes. Disconnecting
[00:00:00.331] (7fb55570c6c0) DEBUG  Receive stream closing
[00:00:00.331] (7fb5582e7a40) DEBUG  Disconnecting from dev.dunemud.net:4241
[00:00:00.332] (7fb5582e7a40) DEBUG  Sending warning alert CloseNotify
[00:00:00.332] (7fb5551096c0) DEBUG  Transmit stream closing

Server side:

2023.11.09 22:45:16 TLS: Received corrupted data (1). Closing the connection.
2023.11.09 22:45:16 Dumping OpenSSL error stack...
  336130315: error:1408F10B:SSL routines:ssl3_get_record:wrong version number
2023.11.09 22:45:16 end of OpenSSL error stack dump.
2023.11.09 22:45:16 Error (108) upon reading socket 17 (ip <snipped> 'secure/login/login#154842'), closing connection: Cannot send after transport endpoint shutdown

@cpu
Copy link
Member

cpu commented Nov 12, 2023

I've done a lot more investigation and my conclusions are a bit fuzzy at this point but I figured I could share my results so far since I can't dig any deeper today. I'm only able to reliably reproduce this issue with the Dune MUD server/configuration, but I'm not sure yet that it's a server-side bug.

To chase this down I wrote an autoloading Lua config that would:

  • connect over TLS
  • wait briefly
  • if the connection state was disconnected (the unexpected state), exit
  • otherwise, disconnect and repeat

With this approach I was able to reliably reproduce the issue of seeing an immediate disconnect after receiving a TLS protocol version alert from the server, but only with Dune/DevDune. I was unable to reproduce with an openssl s_server test server, LDMud running the demo LP-245 game, StickMUD (LDMud), or the mg.mud.de game the OP tried (also LDMud based).

With Dune/DevDune it usually took around ~30 cycles to reproduce the disconnect. With the unaffected servers I could do 500+ cycles without issue. During a failure attempt I took a packet capture and configured Blightmud to write its TLS session keys to a file. With the keys + pcap I could see in Wireshark that Blightmud completes the full TLS 1.3 handshake and we begin exchanging application data records with the server. The "smoking gun" seems to be a message of TCP "continuation data" with a TLS payload that Blightmud sends to the server. Afterwards the server sends application data, and then the incorrect protocol version alert. The continuation data could be caused by a few things but given successful connection attempts don't have a similar message and it's the last thing we send the server before a disconnect it seems linked to the root cause (whatever it is).

Probably the most important aspect so far is that my reproduction script doesn't work with other TLS enabled MUD servers I've tested. Finding another server I could reliably reproduce the issue with would be a great help.

With the data I have so far it seems like the possibilities are:

  • It's a Blightmud bug, but the conditions under which it manifests depend on server side properties that Dune happens to meet by chance (seems possible but I'm running short of ideas of how to narrow that down)
  • It's a bug specific to Dune (seems most likely at this point - I will try bisecting our game code)
  • It's a bug specific to LDMud (plausible since mg.mud.de is also an LDMud game, but unlikely since I can't repro with that server or a minimal LDMud lp-245 reproducer)
  • Something else :-P
Here's the lua script I used:
blight.output("!!! Loaded repro script")

local host = "localhost"
local port = 4243
local cert_verify = true
local wait = 0.2
local max_checks = 1000

local checks = 0

local function repro()
    blight.output(string.format("Check try number: %d", checks))
    if checks > max_checks then
        return
    end
    if mud.is_connected() == true then
        blight.output("!!! Disconnecting from existing conn")
        mud.disconnect()
    end

    blight.output("!!! Connecting")
    mud.connect(host, port, true, cert_verify)

    blight.output("!!! Registering wait timer")
    timer.add(wait, 1, function()
        blight.output("!!! Checking after wait")
        if mud.is_connected() == true then
            checks = checks + 1
            blight.output("!!! Connected, so repeating...")
            repro()
        else
            blight.output("!!! Disconnected - quitting - check the logs!!!")
            blight.quit()
        end
    end)

end

blight.output("Kicking off repro loop")
repro()

Here's the relevant part of the pcap + session keys:
tls.disconnect.badversion.pcap.tar.gz

@pganguli
Copy link

pganguli commented Jan 1, 2024

I have been able to reliably reproduce this issue with the following MUDs (MUCKs actually) as well:

[**] furry        Host: furrymuck.com             Port: 8899 TLS:  on Verify: off
[**] tapestries   Host: tapestries.fur.com        Port: 6699 TLS:  on Verify:  on

tintin++ connects to these servers over TLS without issues.

By the way, how do I turn on debug info? I 'd like to attach debug logs for these servers if possible.

@cpu
Copy link
Member

cpu commented Jan 1, 2024

I have been able to reliably reproduce this issue with the following MUDs (MUCKs actually) as well:

Thanks - that's useful data 👍

By the way, how do I turn on debug info? I 'd like to attach debug logs for these servers if possible.

Start Blightmud with --verbose or -V and you should find the log in $DATA_DIR/logs/log.txt has a lot more data (and $DATA_DIR is easy to find w/ /lua blight.data_dir() in-client).

I owe this ticket an update - I did more digging since my last comment. I suspect (but haven't proved) the RwStream abstraction may be involved and its a race on the Blightmud side. The RwStream was added when the initial native-tls support was added in #113 (I'd guess based on this Stack Overflow answer). As some comments there point out I think even with the Arc<Mutex<>> layers this code is unsound and a prime suspect for a weird race condition.

The Blightmud architecture wants to split the read/write sides of a TCP stream and use blocking I/O on each half in separate send/receive threads. TLS streams don't easily map to that abstraction since reads can cause writes (e.g. for alerts) and there's state to maintain that assumes (and enforces through the type system) sole access. I think that's why the original native-tls code was fighting being put into this arrangement, requiring the unsafe workaround that was persisted in the rustls update.

If this is the cause (again just a theory at this point) I think the right fix would be to reconstitute the read/write threads into one thread that uses non-blocking I/O with something like mio.

@pganguli
Copy link

pganguli commented Jan 1, 2024

Start Blightmud with --verbose or -V and you should find the log in $DATA_DIR/logs/log.txt has a lot more data (and $DATA_DIR is easy to find w/ /lua blight.data_dir() in-client).

Thanks!

The logs don't seem particularly helpful with regards to the TLS issue though. :(

Logs for /connect tapestries.fur.com 6699 tls (server with up-to-date TLS ciphers, but has an expired cert):

[00:00:00.000] (7f37594b4a00) INFO   Starting application
[00:00:00.003] (7f37594b4a00) DEBUG  Reading from disk: __command_history -> Some("[\"/connect\"]")
[00:00:00.032] (7f3749fb66c0) DEBUG  Input stream spawned
[00:00:00.032] (7f37594b4a00) DEBUG  Drawing help file: welcome
[00:00:00.032] (7f37499b06c0) DEBUG  starting new connection: https://api.github.com/
[00:00:00.121] (7f37499b06c0) DEBUG  No cached session for DnsName("api.github.com")
[00:00:00.121] (7f37499b06c0) DEBUG  Not resuming any session
[00:00:00.176] (7f37499b06c0) DEBUG  Using ciphersuite TLS13_AES_128_GCM_SHA256
[00:00:00.176] (7f37499b06c0) DEBUG  Not resuming
[00:00:00.176] (7f37499b06c0) DEBUG  TLS1.3 encrypted extensions: [ServerNameAck, Protocols([ProtocolName(6832)])]
[00:00:00.176] (7f37499b06c0) DEBUG  ALPN protocol is Some(b"h2")
[00:00:17.434] (7f37594b4a00) DEBUG  Input sent. Mode: UnterminatedPrompt
[00:00:17.435] (7f3749bb46c0) DEBUG  Connecting to tapestries.fur.com:6699 tls: true verify: true
[00:00:17.436] (7f3749bb46c0) DEBUG  resolving IP addresses for tapestries.fur.com:6699
[00:00:18.312] (7f3749bb46c0) DEBUG  resolved 1 potential addresses
[00:00:18.312] (7f3749bb46c0) DEBUG  attempting to connect to 192.81.132.253:6699
[00:00:18.554] (7f3749bb46c0) DEBUG  connected to 192.81.132.253:6699
[00:00:18.554] (7f3749bb46c0) DEBUG  enabling TCP keepalive
[00:00:18.554] (7f3749bb46c0) DEBUG  No cached session for DnsName("tapestries.fur.com")
[00:00:18.555] (7f3749bb46c0) DEBUG  Not resuming any session
[00:00:18.555] (7f37594b4a00) DEBUG  Connected to tapestries.fur.com:6699
[00:00:18.555] (7f3749bb46c0) DEBUG  Receive stream spawned
[00:00:18.556] (7f37499b06c0) DEBUG  Transmit stream spawned
[00:00:18.797] (7f3749bb46c0) DEBUG  Bytes: []
[00:00:18.797] (7f3749bb46c0) DEBUG  Receive stream closing
[00:00:18.797] (7f37594b4a00) DEBUG  Disconnecting from tapestries.fur.com:6699
[00:00:18.797] (7f37594b4a00) DEBUG  Sending warning alert CloseNotify
[00:00:18.798] (7f37594b4a00) DEBUG  Disconnected from tapestries.fur.com:6699
[00:00:18.798] (7f37499b06c0) DEBUG  Transmit stream closing
[00:00:18.798] (7f37594b4a00) DEBUG  Writing to disk: __command_history -> ["/connect tapestries.fur.com 6699 tls"]
[00:00:23.587] (7f37594b4a00) INFO   Shutting down

Another run with the no-verify option:

[00:00:00.000] (7fd641f1ca00) INFO   Starting application
[00:00:00.003] (7fd641f1ca00) DEBUG  Reading from disk: __command_history -> Some("[\"/connect\",\"/connect tapestries.fur.com 6699 tls\"]")
[00:00:00.034] (7fd63adfc6c0) DEBUG  Input stream spawned
[00:00:00.034] (7fd641f1ca00) DEBUG  Drawing help file: welcome
[00:00:00.035] (7fd63a7f96c0) DEBUG  starting new connection: https://api.github.com/
[00:00:00.124] (7fd63a7f96c0) DEBUG  No cached session for DnsName("api.github.com")
[00:00:00.124] (7fd63a7f96c0) DEBUG  Not resuming any session
[00:00:00.185] (7fd63a7f96c0) DEBUG  Using ciphersuite TLS13_AES_128_GCM_SHA256
[00:00:00.185] (7fd63a7f96c0) DEBUG  Not resuming
[00:00:00.185] (7fd63a7f96c0) DEBUG  TLS1.3 encrypted extensions: [ServerNameAck, Protocols([ProtocolName(6832)])]
[00:00:00.186] (7fd63a7f96c0) DEBUG  ALPN protocol is Some(b"h2")
[00:00:05.683] (7fd641f1ca00) DEBUG  Input sent. Mode: UnterminatedPrompt
[00:00:05.684] (7fd63a9fa6c0) DEBUG  Connecting to tapestries.fur.com:6699 tls: true verify: false
[00:00:05.684] (7fd63a9fa6c0) DEBUG  resolving IP addresses for tapestries.fur.com:6699
[00:00:06.557] (7fd63a9fa6c0) DEBUG  resolved 1 potential addresses
[00:00:06.557] (7fd63a9fa6c0) DEBUG  attempting to connect to 192.81.132.253:6699
[00:00:06.793] (7fd63a9fa6c0) DEBUG  connected to 192.81.132.253:6699
[00:00:06.793] (7fd63a9fa6c0) DEBUG  enabling TCP keepalive
[00:00:06.793] (7fd63a9fa6c0) DEBUG  No cached session for DnsName("tapestries.fur.com")
[00:00:06.794] (7fd63a9fa6c0) DEBUG  Not resuming any session
[00:00:06.794] (7fd641f1ca00) DEBUG  Connected to tapestries.fur.com:6699
[00:00:06.794] (7fd63a7f96c0) DEBUG  Transmit stream spawned
[00:00:06.794] (7fd63a9fa6c0) DEBUG  Receive stream spawned
[00:00:07.031] (7fd63a9fa6c0) DEBUG  Bytes: []
[00:00:07.031] (7fd63a9fa6c0) DEBUG  Receive stream closing
[00:00:07.031] (7fd641f1ca00) DEBUG  Disconnecting from tapestries.fur.com:6699
[00:00:07.031] (7fd641f1ca00) DEBUG  Sending warning alert CloseNotify
[00:00:07.031] (7fd641f1ca00) DEBUG  Disconnected from tapestries.fur.com:6699
[00:00:07.032] (7fd63a7f96c0) DEBUG  Transmit stream closing
[00:00:07.032] (7fd641f1ca00) DEBUG  Writing to disk: __command_history -> ["/connect","/connect tapestries.fur.com 6699 tls","/connect tapestries.fur.com 6699 tls no-verify"]
[00:00:09.074] (7fd641f1ca00) INFO   Shutting down

Should look like this (output from tintin++):

#ssl taps tapestries.fur.com 6699
#TRYING TO CONNECT 'taps' TO 'tapestries.fur.com' PORT '6699'.
#SSL ALERT: CERTIFICATE HAS EXPIRED (Thu Mar 14 12:30:37 2013)
#SSL ALERT: YOU MAY BE VULNERABLE TO MAN-IN-THE-MIDDLE ATTACKS.

#SESSION 'taps' CONNECTED TO 'tapestries.fur.com' PORT '6699'
      ___
    -   ---___-                           ,
       (' ||      _                      ||         '
      ((  ||     < \, -_-_   _-_   _-_, =||= ,._-_ \\  _-_   _-_,
     ((   ||     /-|| || \\ || \\ ||_.   ||   ||   || || \\ ||_.
      (( //     (( || || || ||/    ~ ||  ||   ||   || ||/    ~ ||
        -____-   \/\\ ||-'  \\,/  ,-_-   \\,  \\,  \\ \\,/  ,-_-
                      |/
                      '     http://www.fur.com/tapestries/

@cpu
Copy link
Member

cpu commented Jan 1, 2024

The logs don't seem particularly helpful with regards to the TLS issue though. :(

Yeah :-( Let me push a patch I've been using locally that adds more debugging in one place. Though, for these servers I think the problem I'm chasing is unrelated:

I have been able to reliably reproduce this issue with the following MUDs (MUCKs actually) as well:

[**] furry Host: furrymuck.com Port: 8899 TLS: on Verify: off

This one is definitely unrelated to the bug at hand: the server only speaks TLS 1.0 on this port and the TLS implementation that Blightmud uses only supports TLS 1.2 and TLS 1.3. Unfortunately TLS 1.0 is woefully insecure. TinTin++ offers a much less opinionated implementation, which is better for compatibility with MUDs like this one.

[**] tapestries Host: tapestries.fur.com Port: 6699 TLS: on Verify: on

I'll have to look closer at this one when I have time but I suspect it also might just be a server generally incompatible with the TLS library. I can reproduce a handshake failure with a separate Rustls program than Blightmud.

@cpu
Copy link
Member

cpu commented Jan 1, 2024

Let me push a patch I've been using locally that adds more debugging in one place.

#969

@pganguli
Copy link

pganguli commented Jan 1, 2024

If it helps, here's the same connection for the last server being negotiated over TLS 1.2, using the openssl s_client:

$ openssl s_client -tls1_2 tapestries.fur.com:6699
CONNECTED(00000003)
depth=2 C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Certification Authority
verify error:num=19:self-signed certificate in certificate chain
verify return:1
depth=0 description = AVr07HF6AI9aqC93, C = US, CN = tapestries.fur.com, emailAddress = postmaster@fur.com
verify error:num=68:CA signature digest algorithm too weak
verify return:1
depth=1 C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
verify error:num=68:CA signature digest algorithm too weak
verify return:1
depth=2 C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Certification Authority
verify return:1
depth=1 C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
verify error:num=10:certificate has expired
notAfter=Oct 24 20:54:17 2017 GMT
verify return:1
depth=1 C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
notAfter=Oct 24 20:54:17 2017 GMT
verify return:1
depth=0 description = AVr07HF6AI9aqC93, C = US, CN = tapestries.fur.com, emailAddress = postmaster@fur.com
verify error:num=10:certificate has expired
notAfter=Mar 14 07:00:37 2013 GMT
verify return:1
depth=0 description = AVr07HF6AI9aqC93, C = US, CN = tapestries.fur.com, emailAddress = postmaster@fur.com
notAfter=Mar 14 07:00:37 2013 GMT
verify return:1
---
Certificate chain
 0 s:description = AVr07HF6AI9aqC93, C = US, CN = tapestries.fur.com, emailAddress = postmaster@fur.com
   i:C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA1
   v:NotBefore: Mar 13 11:18:45 2012 GMT; NotAfter: Mar 14 07:00:37 2013 GMT
 1 s:C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
   i:C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Certification Authority
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA1
   v:NotBefore: Oct 24 20:54:17 2007 GMT; NotAfter: Oct 24 20:54:17 2017 GMT
 2 s:C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Certification Authority
   i:C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Certification Authority
   a:PKEY: rsaEncryption, 4096 (bit); sigalg: RSA-SHA1
   v:NotBefore: Sep 17 19:46:36 2006 GMT; NotAfter: Sep 17 19:46:36 2036 GMT
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIHHzCCBgegAwIBAgIDBbiqMA0GCSqGSIb3DQEBBQUAMIGMMQswCQYDVQQGEwJJ
TDEWMBQGA1UEChMNU3RhcnRDb20gTHRkLjErMCkGA1UECxMiU2VjdXJlIERpZ2l0
YWwgQ2VydGlmaWNhdGUgU2lnbmluZzE4MDYGA1UEAxMvU3RhcnRDb20gQ2xhc3Mg
MSBQcmltYXJ5IEludGVybWVkaWF0ZSBTZXJ2ZXIgQ0EwHhcNMTIwMzEzMTExODQ1
WhcNMTMwMzE0MDcwMDM3WjBoMRkwFwYDVQQNExBBVnIwN0hGNkFJOWFxQzkzMQsw
CQYDVQQGEwJVUzEbMBkGA1UEAxMSdGFwZXN0cmllcy5mdXIuY29tMSEwHwYJKoZI
hvcNAQkBFhJwb3N0bWFzdGVyQGZ1ci5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IB
DwAwggEKAoIBAQDAhFWqv6mIwwBTTjhTDMBbtnc7jzEWy6Sx348TKqtDYJNNnrm8
9Cgl3AFImco1mjw6McVc3HbXMKF7NtT4bpIfLF+Sv+nER7qxqbpTVoUpOdFF1jxQ
gGK3Ze42e5xZKjRTok5ctgRqlakfVqQ4715lcsMS917g9nHX9Yn85ZObyunWDSTI
37R4U8WWY4qycKycdavE2WouS8Xn35pcAyoRzuAekSh28Z2+OU4QuO0w7HN9O0E+
ycQpq+RTdzmSTAJn5nHBSkMnQ2G7sWFSTXw7Gc9jo9Y9FEmYVk1hqnb8r7MSQXVi
PG3e93HK+7opE/SdYcEb/jU5n5BbRCwlnHeXAgMBAAGjggOrMIIDpzAJBgNVHRME
AjAAMAsGA1UdDwQEAwIDqDATBgNVHSUEDDAKBggrBgEFBQcDATAdBgNVHQ4EFgQU
KHNc6czRPjrvGzFy53bhJi4/U5kwHwYDVR0jBBgwFoAU60I00Jiwq5/0G2sI98xk
Lu8OLEUwJgYDVR0RBB8wHYISdGFwZXN0cmllcy5mdXIuY29tggdmdXIuY29tMIIC
IQYDVR0gBIICGDCCAhQwggIQBgsrBgEEAYG1NwECAjCCAf8wLgYIKwYBBQUHAgEW
Imh0dHA6Ly93d3cuc3RhcnRzc2wuY29tL3BvbGljeS5wZGYwNAYIKwYBBQUHAgEW
KGh0dHA6Ly93d3cuc3RhcnRzc2wuY29tL2ludGVybWVkaWF0ZS5wZGYwgfcGCCsG
AQUFBwICMIHqMCcWIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MAMC
AQEagb5UaGlzIGNlcnRpZmljYXRlIHdhcyBpc3N1ZWQgYWNjb3JkaW5nIHRvIHRo
ZSBDbGFzcyAxIFZhbGlkYXRpb24gcmVxdWlyZW1lbnRzIG9mIHRoZSBTdGFydENv
bSBDQSBwb2xpY3ksIHJlbGlhbmNlIG9ubHkgZm9yIHRoZSBpbnRlbmRlZCBwdXJw
b3NlIGluIGNvbXBsaWFuY2Ugb2YgdGhlIHJlbHlpbmcgcGFydHkgb2JsaWdhdGlv
bnMuMIGcBggrBgEFBQcCAjCBjzAnFiBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1
dGhvcml0eTADAgECGmRMaWFiaWxpdHkgYW5kIHdhcnJhbnRpZXMgYXJlIGxpbWl0
ZWQhIFNlZSBzZWN0aW9uICJMZWdhbCBhbmQgTGltaXRhdGlvbnMiIG9mIHRoZSBT
dGFydENvbSBDQSBwb2xpY3kuMDUGA1UdHwQuMCwwKqAooCaGJGh0dHA6Ly9jcmwu
c3RhcnRzc2wuY29tL2NydDEtY3JsLmNybDCBjgYIKwYBBQUHAQEEgYEwfzA5Bggr
BgEFBQcwAYYtaHR0cDovL29jc3Auc3RhcnRzc2wuY29tL3N1Yi9jbGFzczEvc2Vy
dmVyL2NhMEIGCCsGAQUFBzAChjZodHRwOi8vYWlhLnN0YXJ0c3NsLmNvbS9jZXJ0
cy9zdWIuY2xhc3MxLnNlcnZlci5jYS5jcnQwIwYDVR0SBBwwGoYYaHR0cDovL3d3
dy5zdGFydHNzbC5jb20vMA0GCSqGSIb3DQEBBQUAA4IBAQCKqrk0nbfu0FuPqtFj
+EHkn8I+hFBPZL+7OdndPNq7FvLjDa/7+doYfCVTrtLer0Df0XvpMqPv2F79RZmj
utYiX+wWlWgHYmKoeugoNQeS9vYRcKb1jhLYcTQwd3Ax/x8WF1oklvTN+eaxR5Md
BuSUD3OzVTpDSQFNoygeqtPJN50g9AhYnlAUsnQAQV/nr2eDpY00YluMNCRaM5gx
NRDZg0S88VHNGj4n4/RI9EIDikWnCkaczh+ONK4RF8N3Eo6+JbmqrBhsMFJnk+Wy
ZQV+15wxzPNCAMuHiFzR0hIV+/sqHDrm1G8GTMrZlszE/V6HqpBddg+UmgWJzyI9
wC2H
-----END CERTIFICATE-----
subject=description = AVr07HF6AI9aqC93, C = US, CN = tapestries.fur.com, emailAddress = postmaster@fur.com
issuer=C = IL, O = StartCom Ltd., OU = Secure Digital Certificate Signing, CN = StartCom Class 1 Primary Intermediate Server CA
---
No client certificate CA names sent
---
SSL handshake has read 5746 bytes and written 533 bytes
Verification error: certificate has expired
---
New, TLSv1.2, Cipher is AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : AES256-GCM-SHA384
    Session-ID: 8327D1939334699F7E32E3F9256858696DF4FCB8CDDE58A1F6412C4F9469C091
    Session-ID-ctx:
    Master-Key: A126100775360A7D52C559E5A9999909A1BFF203E6E68BB51F0ABAD8A9683D1888D03492C67EFA4F0E4E976DB80E6B81
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 300 (seconds)
    TLS session ticket:
    0000 - c3 d8 b1 73 d2 e3 3a da-36 ac d5 0f 0f 8b 30 79   ...s..:.6.....0y
    0010 - 8a f9 34 b1 27 1b 0a 16-e3 c2 d5 97 fa 78 50 9f   ..4.'........xP.
    0020 - 73 fc 79 47 a3 e3 e0 8a-ee a5 a3 19 9b 86 55 98   s.yG..........U.
    0030 - 2c 2c c4 00 c5 4d 50 e9-93 d8 07 9b 93 ec 78 4f   ,,...MP.......xO
    0040 - dd 6a b5 ff 56 1c c6 a6-9f 66 36 95 7c 7a 3b b7   .j..V....f6.|z;.
    0050 - 28 91 c9 c6 ec ac 3b 60-92 35 e3 e2 3a 92 18 3d   (.....;`.5..:..=
    0060 - 6f 0f 0a 4b 2c c6 ef 73-c2 5c f5 24 5a 6d 5d 6d   o..K,..s.\.$Zm]m
    0070 - 35 11 48 6e 74 d3 11 57-f5 18 d3 86 71 7b 08 a1   5.Hnt..W....q{..
    0080 - 84 d8 be 07 9c d4 4f e7-cb d1 cf 38 55 1e d7 ce   ......O....8U...
    0090 - 09 4e 2f 13 dc 85 87 e4-d6 3d e6 79 58 39 94 dd   .N/......=.yX9..
    00a0 - 69 11 a7 22 b3 ea 79 6c-c3 09 8d 25 95 be 42 9e   i.."..yl...%..B.

    Start Time: 1704076290
    Timeout   : 7200 (sec)
    Verify return code: 10 (certificate has expired)
    Extended master secret: no
---
      ___
    -   ---___-                           ,
       (' ||      _                      ||         '
      ((  ||     < \, -_-_   _-_   _-_, =||= ,._-_ \\  _-_   _-_,
     ((   ||     /-|| || \\ || \\ ||_.   ||   ||   || || \\ ||_.
      (( //     (( || || || ||/    ~ ||  ||   ||   || ||/    ~ ||
        -____-   \/\\ ||-'  \\,/  ,-_-   \\,  \\,  \\ \\,/  ,-_-
                      |/
                      '     http://www.fur.com/tapestries/

@LiquidityC
Copy link
Member Author

I owe this ticket an update - I did more digging since my last comment. I suspect (but haven't proved) the RwStream abstraction may be involved and its a race on the Blightmud side. The RwStream was added when the initial native-tls support was added in #113 (I'd guess based on this Stack Overflow answer). As some comments there point out I think even with the Arc<Mutex<>> layers this code is unsound and a prime suspect for a weird race condition.

The Blightmud architecture wants to split the read/write sides of a TCP stream and use blocking I/O on each half in separate send/receive threads. TLS streams don't easily map to that abstraction since reads can cause writes (e.g. for alerts) and there's state to maintain that assumes (and enforces through the type system) sole access. I think that's why the original native-tls code was fighting being put into this arrangement, requiring the unsafe workaround that was persisted in the rustls update.

If this is the cause (again just a theory at this point) I think the right fix would be to reconstitute the read/write threads into one thread that uses non-blocking I/O with something like mio.

I would not be surprised if this is the cause of the problem we are seeing. IIRC this will be tricky to switch out.

@cpu
Copy link
Member

cpu commented Jan 2, 2024

If it helps, here's the same connection for the last server being negotiated over TLS 1.2, using the openssl s_client:

This server's TLS configuration is also pretty dire (It's still offering SSLv3 💀). The issue of importance w.r.t Blightmud is that tapestries.fur.com:6699 only supports RSA key exchange and that is intentionally not implemented in Rustls (and so by continuation, Blightmud). Since this server doesn't support any other key exchange mechanisms the connection fails. It's challenging to surface useful error messages in this domain but there might be improvements to make here.

I would not be surprised if this is the cause of the problem we are seeing. IIRC this will be tricky to switch out.

Indeed :-( I started looking into it over the holidays and have some ideas but they will be time intensive to realize. I'll see if I can find time to make a PR sometime in Jan. Confirming this is the cause of the issue before diving into that rabbit hole would be nice, but I figure even if it doesn't fix the bug it will eliminate some unsafe code, so maybe not necessary?

Copy link

github-actions bot commented Mar 3, 2024

Issue flagged as stale

@github-actions github-actions bot added the stale No activity label Mar 3, 2024
@LiquidityC LiquidityC removed the stale No activity label Mar 4, 2024
Copy link

github-actions bot commented May 3, 2024

Issue flagged as stale

@github-actions github-actions bot added the stale No activity label May 3, 2024
@LiquidityC LiquidityC removed the stale No activity label May 7, 2024
Copy link

github-actions bot commented Jul 6, 2024

Issue flagged as stale

@github-actions github-actions bot added the stale No activity label Jul 6, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed stale No activity
Projects
None yet
Development

No branches or pull requests

3 participants