Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/core: safe swap and comms tweaks #663

Merged
merged 3 commits into from
Sep 10, 2020

Conversation

chappjc
Copy link
Member

@chappjc chappjc commented Sep 3, 2020

Swap contracts are no longer permitted to be broadcast if the DEX connection is known to be down. This can happen when a computer or client process comes out of suspend and immediately tries to take their action, particularly the taker trying to bcast their swap. The net connection's read timeout must still put the link into a failed state quickly for this to work during a resume, but this connectivity check is also valuable with normal connectivity trouble, not just machine/process suspend.

Triggering of ticks with the tick timer is also modified to skip tick execution if a long delay since the previous timer fire is detected (computer resume) to allow time for DEX re-connection. This together with the swap check described above prevents needlessly broadcasting a swap contract for a match that is revoked unbeknownst to the client. The swap check is still necessary however since asset tip changes also trigger a tick.

client/comms.WsConn reconnect logic is tweaked to prevent back-to-back or duplicate queued reconnect attempts. This is especially necessary now since send errors in Send and the ping handler now also trigger reconnect. Previously read errors would only trigger reconnect. The (*wsConn).read loop now detects and logs timeouts immediately before all other error checks to hasten flagging the connection as down and to log the timeout as the cause in plain language.

^Z
[1]+  Stopped                 ./dexc --simnet --rpcaddr=localhost:6757 --webaddr=localhost:6758
$    fg 1
./dexc --simnet --rpcaddr=localhost:6757 --webaddr=localhost:6758
2020-09-03 17:21:00.365 [TRC] CORE: Since last tick: 2m31.813685043s
2020-09-03 17:21:00.365 [TRC] CORE: Skipping this tick.
2020-09-03 17:21:00.365 [ERR] WS: Websocket receive error from peer 127.0.0.1: read tcp 127.0.0.1:6758->127.0.0.1:37896: i/o timeout
2020-09-03 17:21:00.365 [TRC] WS: Websocket output handler done for peer 127.0.0.1
2020-09-03 17:21:00.365 [INF] WS: Sending 0 queued outgoing messages for 127.0.0.1.
2020-09-03 17:21:00.365 [ERR] COMMS: pong write error: write tcp 127.0.0.1:36142->127.0.0.1:7232: write: broken pipe
2020-09-03 17:21:00.365 [INF] COMMS: Attempting to reconnect to wss://127.0.0.1:7232/ws...
2020-09-03 17:21:00.365 [ERR] COMMS: read error (write tcp 127.0.0.1:36142->127.0.0.1:7232: write: broken pipe), attempting reconnection
2020-09-03 17:21:00.365 [DBG] COMMS: Already reconnecting.
2020-09-03 17:21:00.365 [DBG] WS: Sent 25 and dropped 0 messages to 127.0.0.1 before shutdown.
2020-09-03 17:21:00.365 [TRC] CORE: notify: |DATA| (epoch) - Index: 266528585
2020-09-03 17:21:00.366 [DBG] WS: Sending close 1000 (normal) message.
2020-09-03 17:21:00.366 [DBG] CORE[dcr]: tip change: 364 (6f2a3f15bd86fdc470ea4e852e76463124a824f44d77cfa7837e06b674cccea6) => 365 (1f72387413ae2459b27495a7a8f7ce2a41761e21a15b8a9dd878efcf59c49cba)
2020-09-03 17:21:00.367 [TRC] CORE: processing tip change for dcr
2020-09-03 17:21:00.367 [WRN] WEB[WS]: Failed to send notify notification to client 1 at 127.0.0.1: peer disconnected
2020-09-03 17:21:00.367 [TRC] WEB[WS]: Disconnected websocket client 127.0.0.1
2020-09-03 17:21:00.367 [TRC] CORE: notify: |DATA| (epoch) - Index: 266528586
2020-09-03 17:21:00.368 [TRC] CORE: notify: |DATA| (epoch) - Index: 266528587
2020-09-03 17:21:00.369 [WRN] CORE: notify: |WARNING| (conn) DEX disconnected - DEX at 127.0.0.1:7232 has disconnected
2020-09-03 17:21:00.371 [DBG] CORE[btc]: tip change: 318 (79999e936ec7e2ca59744fc2c1068f723bfb97794536e415f0652c11534c4a0f) => 321 (21792ad1199d2c94e2031195f4e4fcef92a2ab44a622923d8453df33ec48fa9c)
2020-09-03 17:21:00.371 [TRC] CORE: processing tip change for btc
2020-09-03 17:21:00.375 [DBG] CORE: Swappable match c3408198ead3a8f29d3206e8981ef5eaaaad348ad66e2371230e2e8f56430e98 for order a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899 (Taker)
2020-09-03 17:21:00.376 [ERR] CORE: notify: |ERROR| (order) Swap error - Error encountered sending a swap output(s) worth 20.00000000 dcr on order a331c949 - Order: a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899
2020-09-03 17:21:00.376 [ERR] CORE: 127.0.0.1:7232 tick error: 127.0.0.1:7232 tick: {swapMatches order a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899 - {not broadcasting swap while DEX 127.0.0.1:7232 connection is down (could be revoked)}}
2020-09-03 17:21:00.378 [DBG] CORE: Swappable match c3408198ead3a8f29d3206e8981ef5eaaaad348ad66e2371230e2e8f56430e98 for order a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899 (Taker)
2020-09-03 17:21:00.379 [ERR] CORE: notify: |ERROR| (order) Swap error - Error encountered sending a swap output(s) worth 20.00000000 dcr on order a331c949 - Order: a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899
2020-09-03 17:21:00.379 [ERR] CORE: 127.0.0.1:7232 tick error: 127.0.0.1:7232 tick: {swapMatches order a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899 - {not broadcasting swap while DEX 127.0.0.1:7232 connection is down (could be revoked)}}
2020-09-03 17:21:00.383 [TRC] CORE: notify: |DATA| (balance) balance updated
2020-09-03 17:21:00.384 [TRC] CORE: notify: |DATA| (balance) balance updated
2020-09-03 17:21:00.390 [TRC] CORE: notify: |DATA| (balance) balance updated
2020-09-03 17:21:00.477 [INF] COMMS: Successfully reconnected.
2020-09-03 17:21:00.478 [INF] CORE: notify: |SUCCESS| (conn) DEX connected - DEX at 127.0.0.1:7232 has connected
2020-09-03 17:21:00.494 [WRN] CORE: notify: |WARNING| (order) Match revoked - Match c3408198 has been revoked - Order: a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899
2020-09-03 17:21:00.494 [TRC] CORE: notify: |DATA| (order) revoke - Order: 
2020-09-03 17:21:00.494 [WRN] CORE: Match c3408198ead3a8f29d3206e8981ef5eaaaad348ad66e2371230e2e8f56430e98 revoked in status MakerSwapCast for order a331c9490e965498164fda049ccc8008f0534e1723ed8659c4ee93cd38dc3899
2020-09-03 17:21:00.495 [DBG] CORE: Authenticated connection to 127.0.0.1:7232, 0 active matches
2020-09-03 17:21:00.495 [TRC] CORE: Revoked match c3408198ead3a8f29d3206e8981ef5eaaaad348ad66e2371230e2e8f56430e98 (Taker) in status MakerSwapCast considered inactive.
2020-09-03 17:21:00.495 [WRN] CORE: DEX 127.0.0.1:7232 did not report active match c3408198ead3a8f29d3206e8981ef5eaaaad348ad66e2371230e2e8f56430e98 on order 6897f02e97c9dc919f829b6d96ab1049e37ad355e52a7767edcdbfd1e0356ca1 - assuming revoked.
2020-09-03 17:21:00.497 [ERR] CORE: notify: |ERROR| (order) Missing matches - 1 matches for order 6897f02e were not reported by "127.0.0.1:7232" and are considered revoked - Order: 6897f02e97c9dc919f829b6d96ab1049e37ad355e52a7767edcdbfd1e0356ca1

Comment on lines -363 to -374
// reconnect begins reconnection immediately.
func (conn *wsConn) reconnect() {
conn.setConnected(false)
conn.reconnectCh <- struct{}{}
}

// queueReconnect queues a reconnection attempt.
func (conn *wsConn) queueReconnect(wait time.Duration) {
conn.setConnected(false)
log.Infof("Attempting reconnect to %s in %d seconds.", conn.cfg.URL, wait/time.Second)
time.AfterFunc(wait, func() { conn.reconnectCh <- struct{}{} })
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using these from outside read() would be all kinds of problematic, so they are moved into read.

err = conn.ws.WriteMessage(websocket.TextMessage, b)
if err != nil {
log.Errorf("write error: %v", err)
log.Errorf("Send: WriteMessage error: %v", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had briefly added a conn.close() here to cause the read loop to error and trigger reconnect, but the write error itself seems to be enough to cause reads (e.g. ws.ReadJSON) to return with an error immediately.

client/core/core.go Show resolved Hide resolved
client/core: prevent swap bcast with stale dex conn

more careful reconnect, remove max reconnects
@chappjc chappjc merged commit 1282073 into decred:master Sep 10, 2020
@chappjc chappjc deleted the safe-tick-swap branch September 10, 2020 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants