Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: support server TLS certificate change #1602

Merged
merged 4 commits into from May 21, 2022

Conversation

martonp
Copy link
Contributor

@martonp martonp commented Apr 30, 2022

  • client/comms: Instead of just keeping track of the connection
    being conneccted or not, a ConnectionStatus type is introduced
    which includes InvalidCert in addition to Connected and
    Disconnected. This is then passed to core and the UI through
    the ConnEventNote.

  • client/core: A new function UpdateCert is added which uses
    TLS certificate supplied by the user to connect to a server. If
    the connection is successful, the new certificate is stored in
    the db.

  • client/webserver: A new api request /api/updatecert is added
    to allow the user to update the TLS Certificate for a host.

  • ui: The settings page is updated. Instead of all the options for
    each server being on the settings page, there is just a link to a
    dex settings page specific to a server. This contains all the options
    that previously existed on the settings page in addition to the new
    option to update the TLS Certificate.

Closes #1571

Copy link
Member

@buck54321 buck54321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably downsize the connection status and icon a bit

After the certificate is accepted, there's really no indication that things went correctly. I wonder if we could just display a little message or checkmark next to the button or something.

client/core/account.go Outdated Show resolved Hide resolved
client/core/account.go Outdated Show resolved Hide resolved
client/core/core.go Outdated Show resolved Hide resolved
client/webserver/http.go Outdated Show resolved Hide resolved
client/webserver/site/src/font/icomoon.svg Outdated Show resolved Hide resolved
client/webserver/site/src/js/app.ts Outdated Show resolved Hide resolved
client/comms/wsconn.go Show resolved Hide resolved
Copy link
Member

@JoeGruffins JoeGruffins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working well, but, if you change the cert while dexc is offline, it will only say Disconnected after you start up dexc and navigate to the settings screen, even if the cert is the problem.

client/comms/wsconn.go Outdated Show resolved Hide resolved
@martonp
Copy link
Contributor Author

martonp commented May 14, 2022

While fixing the issue of the dex settings screen showing Disconnected if the cert is updated while the client is offline, I also noticed that the client will not do keep alive if the initial connection failed. If the initial connection fails, the ConnectionMaster will cancel the context, causing the keep alive loop to end. This is a bug isn't it?

@martonp martonp force-pushed the tlsReconfig branch 2 times, most recently from 2053473 to 3d7ca20 Compare May 14, 2022 15:45
@chappjc
Copy link
Member

chappjc commented May 14, 2022

While fixing the issue of the dex settings screen showing Disconnected if the cert is updated while the client is offline, I also noticed that the client will not do keep alive if the initial connection failed. If the initial connection fails, the ConnectionMaster will cancel the context, causing the keep alive loop to end. This is a bug isn't it?

Haven't tried this pr yet, but the reconnect loop starts if the server is unreachable on dexc startup. c782ffb

I would expect it to do the same on server reconfigure, but I have to look into that

@martonp
Copy link
Contributor Author

martonp commented May 14, 2022

The loop starts, but it is stopped immediately because in ConnectionMaster.Connect, if c.connector.Connect returns an error, then the context is cancelled.

@chappjc
Copy link
Member

chappjc commented May 14, 2022

OK, I'll look at it. On master and 0.4, the reconnect loops until connection succeeds. I can start dexc with network down and it starts connection attempts, backing off, and when network comes up it connects fine.

@martonp
Copy link
Contributor Author

martonp commented May 15, 2022

In this commit here: f222e12

Do you not see the "no longer keeping alive" message right after it fails to connect? For me, using master, it does not reconnect to the dex if it fails to connect at startup.

@chappjc
Copy link
Member

chappjc commented May 15, 2022

Oh WTF. It's broken on master.
I'll hunt down what broke it

@chappjc
Copy link
Member

chappjc commented May 15, 2022

Oh, it was this: e8fd8ac

I'll try to figure out what needs to be done there.

@chappjc
Copy link
Member

chappjc commented May 15, 2022

Let's consider this: #1616
See if you can get this PR working with that.

@martonp
Copy link
Contributor Author

martonp commented May 17, 2022

It works well after rebasing. I've gotten rid of the last unnecessary commit.

Copy link
Member

@JoeGruffins JoeGruffins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need to do the reconnect loop if we failed for the certificate. We would expect it to continue failing until the certificate is changed. I don't know if this is a big change though. Working well.

client/core/core.go Outdated Show resolved Hide resolved
client/core/core.go Show resolved Hide resolved
client/core/core.go Outdated Show resolved Hide resolved
client/core/core.go Outdated Show resolved Hide resolved
client/core/core.go Outdated Show resolved Hide resolved
client/comms/wsconn.go Outdated Show resolved Hide resolved
@@ -416,7 +427,7 @@ func (conn *wsConn) Connect(ctx context.Context) (*sync.WaitGroup, error) {
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not start the reconnect loop when the initial connection attempt fails on account of a bad or missing cert, maybe the following:

func (conn *wsConn) Connect(ctx context.Context) (*sync.WaitGroup, error) {
	var ctxInternal context.Context
	ctxInternal, conn.cancel = context.WithCancel(ctx)

	err := conn.connect(ctxInternal)
	if err != nil {
		// If the certificate is invalid or missing, do not start the reconnect
		// loop, and return an error with no WaitGroup.
		if errors.Is(err, ErrInvalidCert) || errors.Is(err, ErrCertRequired) {
			conn.cancel()
			conn.wg.Wait() // probably a no-op
			close(conn.readCh)
			return nil, err
		}

		// The read loop would normally trigger keepAlive, but it wasn't started
		// on account of a connect error.
		conn.log.Errorf("Initial connection failed, starting reconnect loop: %v", err)
		time.AfterFunc(5*time.Second, func() {
			conn.reconnectCh <- struct{}{}
		})
	}

	conn.wg.Add(2)
	go func() {
		defer conn.wg.Done()
		conn.keepAlive(ctxInternal)
	}()
	go func() {
		defer conn.wg.Done()
		<-ctxInternal.Done()
		conn.setConnectionStatus(Disconnected)
		conn.wsMtx.Lock()
		if conn.ws != nil {
			conn.log.Debug("Sending close 1000 (normal) message.")
			conn.close()
		}
		conn.wsMtx.Unlock()

		close(conn.readCh) // signal to MessageSource receivers that the wsConn is dead
	}()

	return &conn.wg, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In feeling out the above change, I think we have some unfortunate patterns and conventions we're fighting with.

Back in (*Core).connectDEX, consider adding this comment at this location:

@@ -5630,6 +5631,14 @@ func (c *Core) connectDEX(acctInfo *db.AccountInfo, temporary ...bool) (*dexConn
        conn, err := c.wsConstructor(&wsCfg)
        if err != nil {
                return nil, err
        }

        dc.WsConn = conn
        dc.connMaster = dex.NewConnectionMaster(conn)
 
+       // At this point, we have a valid dexConnection object whether or not we can
+       // actually connect. In any return below, we return the dexConnection so it
+       // may be tracked in the c.conns map (and listed as a known DEX). TODO:
+       // split the above code into a dexConnection constructor, and the below into
+       // a startDexConnection function so we don't have the anti-pattern of
+       // returning a non-nil object with a non-nil error and requiring the caller
+       // to check both!
+
        // Start listening for messages. The listener stops when core shuts down or
        // the dexConnection's ConnectionMaster is shut down. This goroutine should
        // be started as long as the reconnect loop is running. It only returns when
        // the wsConn is stopped.
        if listen {
                c.wg.Add(1)
                go c.listen(dc)
        }

        err = dc.connMaster.Connect(c.ctx)

I don't think we should act on that TODO now, but I wanted to call out this design issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And related, I believe there's more trouble with ConnectionMaster:

diff --git a/dex/runner.go b/dex/runner.go
index e6ae29e7..4858f7e5 100644
--- a/dex/runner.go
+++ b/dex/runner.go
@@ -110,10 +110,14 @@ func (c *ConnectionMaster) Connect(ctx context.Context) (err error) {
        c.init(ctx)
        c.mtx.Lock()
        c.wg, err = c.connector.Connect(c.ctx)
+       if c.wg == nil { // don't expect a waitgroup if Connect errored
+               c.wg = new(sync.WaitGroup) // don't let Disconnect or Wait panic
+               c.cancel()                 // we're not "On"
+       }
        c.mtx.Unlock()
        // NOTE: Even if err is non-nil, we can't cancel the internal context
-       // because the connector may be attempting to reconnect. The caller should
-       // decide to let it go or to call Disconnect.
+       // because the connector may be attempting to reconnect (if wg is non-nil).
+       // The caller should decide to let it go or to call Disconnect.
        return err
 }

Because the Disconnect and Wait methods would panic if the Connector.Connect call returned nil for the wg (like the asset.Wallet implementations do!). I will get back to #1474 soon because removing this wg pointer field was one thing I thought should be done.

And as an aside, in retrospect, I don't think we should have used ConnectionMaster so broadly given it's quirks and tendency to hide the actual types and their statuses, but here we are.

In general, we should resist patterns that return non-nil objects when error is non-nil. We're stuck with it for now however.

Copy link
Member

@chappjc chappjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more fix. Otherwise working well:

image

image


accountInfo.Cert = cert

_, connected := c.connectAccount(accountInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a dexConnection already in the map, it will be orphaned. You can observe this if a failing reconnect loop is going for the old connection as it never stops even though the conn was replaced with a working one with a new TLS cert.

I believe connectAccount needs a check that stops any existing connection (prior to connecting the new one otherwise the frontend gets the notifications in the wrong order and it looks disconnected until you refresh the page):

diff --git a/client/core/core.go b/client/core/core.go
index 48b1002f..4dfcf333 100644
--- a/client/core/core.go
+++ b/client/core/core.go
@@ -4824,7 +4824,8 @@ func (c *Core) initialize() {
 // connectAccount makes a connection to the DEX for the given account. If a
 // non-nil dexConnection is returned, it was inserted into the conns map even if
 // the initial connection attempt failed (connected == false), and the connect
-// retry / keepalive loop is active.
+// retry / keepalive loop is active. If there was already a dexConnection, it is
+// first stopped.
 func (c *Core) connectAccount(acct *db.AccountInfo) (dc *dexConnection, connected bool) {
        if !acct.Paid && len(acct.FeeCoin) == 0 {
                // Register should have set this when creating the account that was
@@ -4839,6 +4840,14 @@ func (c *Core) connectAccount(acct *db.AccountInfo) (dc *dexConnection, connecte
                c.log.Errorf("skipping loading of %s due to address parse error: %v", host, err)
                return
        }
+
+       c.connMtx.RLock()
+       if dc := c.conns[host]; dc != nil {
+               dc.connMaster.Disconnect()
+               dc.acct.lock()
+       } // leave it in the map so it remains listed if connectDEX fails
+       c.connMtx.RUnlock()
+
        dc, err = c.connectDEX(acct)
        if dc == nil {
                c.log.Errorf("Cannot connect to DEX %s: %v", host, err)

Copy link
Member

@chappjc chappjc May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should close the book feeds too, but haven't tested with that:

	c.connMtx.RLock()
	if dc := c.conns[host]; dc != nil {
		dc.connMaster.Disconnect()
		dc.acct.lock()
		dc.booksMtx.Lock()
		for m, b := range dc.books {
			b.closeFeeds()
			if b.closeTimer != nil {
				b.closeTimer.Stop()
			}
			delete(dc.books, m)
		}
		dc.booksMtx.Unlock()
	} // leave it in the map so it remains listed if connectDEX fails
	c.connMtx.RUnlock()

Copy link
Member

@chappjc chappjc May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With rebase and the above diff: https://github.com/chappjc/dcrdex/commits/tlsReconfig-chappjc (untested!)

As an aside, this enUS map creates a lot of churn. In rebasing, I just used a shorter, but equally good, key to spare modifying a ton of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the changes work well.

client/core/core.go Outdated Show resolved Hide resolved
client/webserver/site/src/font/icomoon.svg Outdated Show resolved Hide resolved
client/webserver/site/src/js/app.ts Outdated Show resolved Hide resolved
client/webserver/site/src/html/dexsettings.tmpl Outdated Show resolved Hide resolved
- `client/comms`: Instead of just keeping track of the connection
  being conneccted or not, a `ConnectionStatus` type is introduced
  which includes `InvalidCert` in addition to `Connected` and
  `Disconnected`. This is then passed to core and the UI through
  the `ConnEventNote`.

- `client/core`: A new function `UpdateCert` is added which uses
  TLS certificate supplied by the user to connect to a server. If
  the connection is successful, the new certificate is stored in
  the db.

- `client/webserver`: A new api request `/api/updatecert` is added
  to allow the user to update the TLS Certificate for a host.

- `ui`: The settings page is updated. Instead of all the options for
  each server being on the settings page, there is just a link to a
  dex settings page specific to a server. This contains all the options
  that previously existed on the settings page in addition to the new
  option to update the TLS Certificate.
@martonp martonp force-pushed the tlsReconfig branch 2 times, most recently from 5213df4 to 98f2b25 Compare May 21, 2022 09:21
AcctID string `json:"acctID"`
Markets map[string]*Market `json:"markets"`
Assets map[uint32]*dex.Asset `json:"assets"`
ConnectionStatus comms.ConnectionStatus `json:"connectionStatus"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok because we are still at v0.x but note that when we reach v1, we're making a promise to Go consumers not to break the API. So doing something like this in the future would cause the module to become core/v2.

However, if we left Connected bool, it would not be a breaking change even if it appears redundant with ConnectionStatus.

Just pointing it out so we can get used to stabilizing the API.

@chappjc
Copy link
Member

chappjc commented May 21, 2022

All still looking good to me with the latest update. LOL re the Exchange method. That one went right over my head.

@chappjc chappjc merged commit 30e29d7 into decred:master May 21, 2022
@chappjc chappjc added this to the 0.5 milestone May 25, 2022
@martonp martonp deleted the tlsReconfig branch January 20, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client: support server TLS certificate change
4 participants