Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/comms: check for non-standard compliant error #2130

Merged
merged 4 commits into from Feb 17, 2023

Conversation

ukane-philemon
Copy link
Contributor

@ukane-philemon ukane-philemon commented Feb 15, 2023

This seems to be a known issue on some versions of MacOS or OS X where an error: x509: “invalid-cert-name” certificate is not standards compliant is returned. Since we cannot check the error type using errors.Is or errors.As, this PR checks if the returned error contains the known error message certificate is not standards compliant.

Users could run varying versions of macOS and could be affected by this issue. This PR will ensure that a retry loop is not started because the cert is considered Invalid.

Issue: golang/go#51991

For more context, I use macOS and was a bit flustered that TestWsConn kept failing, so I had to investigate.

@chappjc
Copy link
Member

chappjc commented Feb 15, 2023

What precise go version?
golang/go#52010 (comment)

@ukane-philemon
Copy link
Contributor Author

ukane-philemon commented Feb 15, 2023

What precise go version? golang/go#52010 (comment)

Go version: 1.20.1
MacOS version: macOS Catalina 10.15.7

@ukane-philemon
Copy link
Contributor Author

ukane-philemon commented Feb 15, 2023

See also https://go-review.googlesource.com/c/go/+/460896/2/src/crypto/x509/root_darwin.go#61

case macOS.ErrSecNotTrusted does not seem to catch this error.

What do you think? Do you suggest we do something like this instead?

https://github.com/kubernetes/kubernetes/pull/109102/files#diff-4a1205c4eb7e8f2b8cf6feb57eba6df5472e88fc58302e7c89c01924231243a1R53

@chappjc
Copy link
Member

chappjc commented Feb 15, 2023

What do you think? Do you suggest we do something like this instead?

We should do a combination of things.:

  1. The three different substrings from that kubes PR
  2. The other x509 errors: CertificateInvalidError, HostnameError, and UnknownAuthorityError... probably more from https://pkg.go.dev/crypto/x509
  3. figure out what about the certificate that macOS doesn't like
  4. see if there's a reason the Mac certificate resolver is involved instead of the Go resolver

@chappjc
Copy link
Member

chappjc commented Feb 15, 2023

4. see if there's a reason the Mac certificate resolver is involved instead of the Go resolver

What I mean about this is, why are we even bothering with SystemCertPool in this:

rootCAs, _ := x509.SystemCertPool()
if rootCAs == nil {
rootCAs = x509.NewCertPool()
}
if ok := rootCAs.AppendCertsFromPEM(cfg.Cert); !ok {
return nil, ErrInvalidCert
}
tlsConfig = &tls.Config{
RootCAs: rootCAs,

I suspect all would work fine for you if we juts used rootCAs = x509.NewCertPool() always.

@ukane-philemon
Copy link
Contributor Author

see if there's a reason the Mac certificate resolver is involved instead of the Go resolver

I think this is the reason: https://github.com/golang/go/blob/master/src/crypto/x509/verify.go#L767-#L784

SystemCertPool returns the *CertPool type with systemPool set to true which will force two verifications("one using the roots provided by the caller, and one using the system platform verifier.") according to the docs. However, since we are not even providing a cert(in the test that failed for me), Go will always attempt to use platform verifiers.

While x509.NewCertPool() could force Go to use its verifier, we currently don't set the *tlsConfig values used by a connection if a cert is not provided. Do we want to change this? i.e set the tlsConfig values regardless of whether a cert was provided. Only the RootCAs value would be modified in the if block when/if a cert is provided.

@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

we currently don't set the *tlsConfig values used by a connection if a cert is not provided. Do we want to change this? i.e set the tlsConfig values regardless of whether a cert was provided

// RootCAs defines the set of root certificate authorities
// that clients use when verifying server certificates.
// If RootCAs is nil, TLS uses the host's root CA set.
RootCAs *[x509](https://pkg.go.dev/crypto/x509).[CertPool](https://pkg.go.dev/crypto/x509#CertPool)

nil is correct to use the hosts root CA set if you specify no certificate, meaning perhaps it's not self-signed.

Anyway, can you link to the test that uses no cert and what it's expected to do?

@ukane-philemon
Copy link
Contributor Author

Anyway, can you link to the test that uses no cert and what it's expected to do?

We are currently unable to catch extra invalid cert errors but we depend on our ErrInvalidCert for some actions like stopping a conn retry loop.
https://github.com/decred/dcrdex/blob/master/client/comms/wsconn_test.go#L258

@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

Right, I just wanted to see why we're giving a nil cert. In this case what "test no cert error" really means is see if the system cert store has it. So this test really depends on the system.

We still need to do the fixes in #2130 (comment).

@ukane-philemon
Copy link
Contributor Author

ukane-philemon commented Feb 16, 2023

Does HostnameError really fall within InvalidCert errors? I don't think so. But we would want to stop the retry loop for such errors too, right?

@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

Does HostnameError really fall within InvalidCert errors? I don't think so.

Yup:

HostnameError results when the set of authorized names doesn't match the requested name.

Regardless, it's an error from x509 and retrying isn't going to help.

@ukane-philemon
Copy link
Contributor Author

ukane-philemon commented Feb 16, 2023

Okay.

EDIT:

We still need to do the fixes in #2130 (comment).

Done in a88f34b

client/comms/wsconn.go Outdated Show resolved Hide resolved
client/comms/wsconn.go Outdated Show resolved Hide resolved
@norwnd
Copy link
Contributor

norwnd commented Feb 16, 2023

On a related note, I've just observed this issue happening for the first time today on my Mac too:

2023-02-16 09:33:41.488 [INF] CORE: Connecting wallet for eth
2023-02-16 09:33:42.606 [DBG] CORE[eth][ETH][RPC]: couldn't get a websocket connection for "wss://goerli.infura.io/ws/v3/e068872d3aa044aebbcf96f8e31104cc" (original scheme: "https") (OK)
2023-02-16 09:33:43.784 [ERR] CORE[eth][ETH][RPC]: Failed to get chain ID from "https://goerli.infura.io/v3/e068872d3aa044aebbcf96f8e31104cc": Post "https://goerli.infura.io/v3/e068872d3aa044aebbcf96f8e31104cc": x509: “*.infura.io” certificate is not standards compliant
2023-02-16 09:33:43.784 [ERR] WEB: error placing order: eth connectAndUnlock error: connectWallet: failed to connect eth wallet: connect failure: failed to connect

It seems to have disappeared after restarting Mac, but not right after restart (I tried, it was still there) - more like 1h after. Looks more like Mac-specific (or maybe golang-specific) issue anyway, nothing to do here, just leaving it for anyone hitting in the future.

I'm running dexc with go version go1.19.2 darwin/amd64 on Monterey 12.6 (21G115).

@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

On a related note, I've just observed this issue happening for the first time today on my Mac too:

2023-02-16 09:33:41.488 [INF] CORE: Connecting wallet for eth
2023-02-16 09:33:42.606 [DBG] CORE[eth][ETH][RPC]: couldn't get a websocket connection for "wss://goerli.infura.io/ws/v3/e068872d3aa044aebbcf96f8e31104cc" (original scheme: "https") (OK)
2023-02-16 09:33:43.784 [ERR] CORE[eth][ETH][RPC]: Failed to get chain ID from "https://goerli.infura.io/v3/e068872d3aa044aebbcf96f8e31104cc": Post "https://goerli.infura.io/v3/e068872d3aa044aebbcf96f8e31104cc": x509: “*.infura.io” certificate is not standards compliant
2023-02-16 09:33:43.784 [ERR] WEB: error placing order: eth connectAndUnlock error: connectWallet: failed to connect eth wallet: connect failure: failed to connect

It seems to have disappeared after restarting Mac, but not right after restart (I tried, it was still there) - more like 1h after. Looks more like Mac-specific (or maybe golang-specific) issue anyway, nothing to do here, just leaving it for anyone hitting in the future.

I'm running dexc with go version go1.19.2 darwin/amd64 on Monterey 12.6 (21G115).

Wow, so it affects host names with certificates signed by trusted root CAs. That's messed up. Literally nothing we can do about that, esp. within the eth rpcclient. (This PR pertains the client/comms, which is for client<->server comms of course)

I suggest updating your Go version though.

client/comms/wsconn.go Outdated Show resolved Hide resolved
@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

Will you also please try this on your end?

diff --git a/client/comms/wsconn.go b/client/comms/wsconn.go
index fb1a8123e..b118873a0 100644
--- a/client/comms/wsconn.go
+++ b/client/comms/wsconn.go
@@ -174,11 +174,7 @@ func NewWsConn(cfg *WsCfg) (WsConn, error) {
                        return nil, fmt.Errorf("error parsing URL: %w", err)
                }
 
-               rootCAs, _ := x509.SystemCertPool()
-               if rootCAs == nil {
-                       rootCAs = x509.NewCertPool()
-               }
-
+               rootCAs := x509.NewCertPool()
                if ok := rootCAs.AppendCertsFromPEM(cfg.Cert); !ok {
                        return nil, ErrInvalidCert
                }

If the user is specifying a certificate to use, I don't see a point in appending it to the SystemCertPool.
I'm hesitant to actually make that change though, but I am curious if it changes anything for you with Mac

@ukane-philemon
Copy link
Contributor Author

ukane-philemon commented Feb 16, 2023

If the user is specifying a certificate to use, I don't see a point in appending it to the SystemCertPool.
I'm hesitant to actually make that change though, but I am curious if it changes anything for you with Mac.

I will check. But I don't think it would make any difference. I haven't had any issues with the system verifier when a cert is provided.

@chappjc
Copy link
Member

chappjc commented Feb 16, 2023

I will check. But I don't think it would make any difference. I haven't have any issues with the system verifier when a cert is provided.

I wouldn't expect it to affect the nil cert test case.

@ukane-philemon
Copy link
Contributor Author

but I am curious if it changes anything for you with Mac

Well, since it does not change anything for mac(the invalid cert errors have been handled in this PR already). I think it's best we continue to use SystemCertPool as default and x509.NewCertPool() as a fallback as we've been doing.

Unless there's a reason a particular reason we shouldn't.

@chappjc chappjc added this to the 0.6 milestone Feb 17, 2023
@chappjc
Copy link
Member

chappjc commented Feb 17, 2023

Labeling for 0.6 since it's a show stopper for a subset of mac users.

@chappjc chappjc merged commit f9531e8 into decred:master Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants