server/swap: inaction check fixes #680

chappjc · 2020-09-09T23:57:42Z

The contains a number if fixes to the server's inaction checks:

Fix inaction asset ID checks skipping checks because of incorrect swapAsset.
Set swapConfirmed time from the block time, not time.Now, which was causing inaction check to pass by fractions of a second.
On startup, schedule an inaction check for each asset.
Separate block based and "event"-based checks. NewlyMatched->MakerSwapCast timing starts at the time the match requests are sent (event-based), MakerSwapCast->TakerSwapCast starts at the time the maker's swap reaches swapConf (block-based), TakerSwapCast->MakerRedeemed starts at the time the taker's swap reaches swapConf (block-based), and MakerRedeemed->MatchComplete starts at the time the maker broadcasts their redeem, notifies server, and server informs taker (event based).

Other changes:

Triggering of inaction checks is refactored, usings AfterFuncs.
Default broadcast timeout is not 5 minutes, up from 1 minute, but the dcrdex harness still uses 1 min via the config setting.
Swapper's txn waiter expiration is now 2 min, up from 1 min.
On Swapper construction, the tx waiter expiration is checked against the configured bcast timeout so that it is never longer than bcast timeout.
Swapper's coin recheck interval is slightly smaller (5 sec -> 3 sec)

chappjc · 2020-09-09T23:58:51Z

server/swap/swap.go

@@ -1002,7 +1017,7 @@ func (s *Swapper) tryConfirmSwap(status *swapStatus) {
 	if confs >= int64(swapConf) {
 		log.Debugf("Swap %v (%s) has reached %d confirmations (%d required)",
 			status.swap, dex.BipIDSymbol(status.swapAsset), confs, swapConf)
-		status.swapConfirmed = time.Now().UTC()
+		status.swapConfirmed = confTime.UTC()


Because the block time is used to set inaction check timer, and this time.Now was just a hair later than block time, the inaction check could just barely pass. With extra logging, I saw inaction checks pass with "59.998358778s ago".

chappjc · 2020-09-09T23:59:35Z

server/swap/swap.go

-			if match.makerStatus.swapAsset != assetID {
-				return
-			}


The first action timeout check is not relative to a block time.

chappjc · 2020-09-10T00:00:45Z

server/swap/swap.go

-			if match.takerStatus.swapAsset != assetID {
-				return
-			}


These swapAsset checks were not necessary, but in this MakerSwapCast case, it should have been checking makerStatus.swapAsset not takerStatus.swapAsset. Part of why we were seeing very long delays in penalization, even indefinite delays in this case.

chappjc · 2020-09-10T00:01:51Z

server/swap/swap.go

-			// If the maker has sent their swap tx, check the taker's broadcast
-			// timeout against the time of the swap's SwapConf'th confirmation.
-			if match.takerStatus.swapTime.IsZero() &&
-				!match.makerStatus.swapConfirmed.IsZero() &&


Since the status is MakerSwapCast, takerStatus.swapTime cannot be set. When it gets set, it would be TakerSwapCast. We only need to check makerStatus.swapConfirmed.

chappjc · 2020-09-10T00:02:30Z

server/swap/swap.go

-			// If the taker has sent their swap tx, check the maker's broadcast
-			// timeout (for redemption) against the time of the swap's SwapConf'th
-			// confirmation.
-			if match.makerStatus.redeemTime.IsZero() &&


Similar to the previous case, if makerStatus.redeemTime were set, it wouldn't be TakerSwapCast.

chappjc · 2020-09-10T00:03:52Z

server/swap/swap.go

 			// If the maker has redeemed, the taker can redeem immediately, so
 			// check the timeout against the time the Swapper received the
 			// maker's `redeem` request (and sent the taker's 'redemption').
-			if match.takerStatus.redeemTime.IsZero() &&
-				!match.makerStatus.redeemTime.IsZero() &&


This goes further and eliminates the !match.makerStatus.redeemTime.IsZero() because this is a bcast time that is set when it becomes MakerRedeem when the redeem is at least seen, unlike the swap cast states where swapConfirmed is only set after SwapConf

use Timers and AfterFunc for inaction checks more precise latencyQ expirations dcrdex: 5 min default bcast timeout fixing up broadcast timeout tests

itswisdomagain

I recognize this is still a draft, I'm eager to have this in, so took an early look. Haven't gone all through but one recurrent observation is inaction checks not accounting for the time it may take for an actor to receive and maybe validate the previous match step e.g. after server sends maker's swap, it might take some time for taker to receive the audit request and maybe some more time to find and audit the swap. Server expecting taker to act bTimeout after maker's swap gets the required confs may be unrealistic.

client/core/trade_simnet_test.go

dex/testing/dcr/harness.sh

server/swap/swap.go

itswisdomagain · 2020-09-15T11:21:17Z

server/swap/swap.go

+		case order.NewlyMatched:
+			// Maker has not broadcast their swap. They have until match time
+			// plus bTimeout.
+			if tooOld(match.time) {


match.time is set to current time in readMatches:

dcrdex/server/swap/swap.go

Line 2528 in 9e720a7

matches := readMatches(matchSets, feeRates)

but the match requests are sent a little later

dcrdex/server/swap/swap.go

Lines 2651 to 2654 in 9e720a7

// Send the ack request.

s.authMgr.RequestWhenConnected(u, req, func(_ comms.Link, resp *msgjson.Message) {

s.processMatchAcks(u, resp, m)

}, auth.DefaultRequestTimeout, auth.DefaultConnectTimeout, expireFunc)

Also, maker may not get the request until auth.DefaultRequestTimeout + auth.DefaultConnectTimeout later.

The "sent a little later" bit is about as brief as it gets, but as for the request/connect timeouts, that's intentionally deducted from the inaction timeout. If they aren't connected, that's should be dealt with by the inaction check. Users should not be able to draw out the swap intentionally by using some exception added for the request timeouts.

I also meant to comment that DefaultRequestTimeout is perhaps poorly named since it's a response timeout that starts when the request is successfully sent. But I admit there is a peculiar interaction between the response and connect timeouts (a user who is not connected can delay their response by just not being connected when the request is initially generated and queued for send).

Anyway, inaction doesn't kick in until bTimeout after the match request is sent (or queued for send), and I think that's the right way to do this. If the user is not online and that eats into their action time, that is to be held against them. In light of this, I don't have a strong objection to having the response timeout being the time until inaction/penalization, except that the RequestWhenConnected syntax doesn't permit an absolute response deadline, which would make this straightforward. I can imagine a different request function that replaces the response timeout and connect timeout with a single deadline, but I don't want to explore that in this PR.

itswisdomagain · 2020-09-15T11:26:45Z

server/swap/swap.go

+			// If the maker has redeemed, the taker can redeem immediately, so
+			// check the timeout against the time the Swapper received the
+			// maker's `redeem` request (and sent the taker's 'redemption').
+			match.makerStatus.mtx.RLock()
+			defer match.makerStatus.mtx.RUnlock()
+			if tooOld(match.makerStatus.redeemTime) {


Server sends the redemption to taker a brief moment after setting the redeemTime; and taker may not receive the server-sent redemption until auth.DefaultRequestTimeout + auth.DefaultConnectTimeout later. It'd be basically pointless resending the request if taker reconnects after s.bTimeout because the match would have been revoked by then.

Same comment regarding the brief moment. I don't think we need to account for a few milliseconds.

I do agree it is pointless to send the request after bTimeout, so ideally the connect timeout would be capped at bTimeout.

chappjc · 2020-09-15T13:52:39Z

Thanks for having a look @itswisdomagain. I'm gonna leave this PR as is for a bit while I work on other things, so it's good to have eyes on it now.

it might take some time for taker to receive the audit request and maybe some more time to find and audit the swap. Server expecting taker to act bTimeout after maker's swap gets the required confs may be unrealistic.

I think this is all built into the bTimeout setting. There has to be a limit to how long an actor is reasonably allowed to do these things. Admittedly, some steps like the ones involving an audit require more time than others. But with the audit, the server has already found the txn, so the actor should be able to find it too. The bTimeout defines how long they have to take these actions, including the audit. Further, recall that the audit happens before the swap txn reaches swapConf, so there's essentially no time pressure.

I have considered a separate timeout for the audit. In fact, at one point I toyed with inserting Penalize on the audit request expiry, which would allow the txn bcast to be a completely separate timeout. But I'm not convinced that's really necessary.

server/swap/swap.go

server/swap/swap_test.go

chappjc · 2020-09-17T21:00:10Z

server/swap/swap.go

-		mt.mtx.Lock()
-		defer mt.mtx.Unlock()


If there were two concurrent match acks (e.g. different users) targeting the same match IDs, this could deadlock in the loop. Really a poor idea in retrospect.

chappjc · 2020-09-17T21:01:09Z

server/swap/swap.go

 		}
-		err = storFn(mid, ack.Sig)
+		err = storFn(mid, ackSig)


The DB operation now comes after all the ack sigs are stored in memory so they can all be registered asap as they are needed to handle the maker's 'init', which can come at any moment.

Co-authored-by: Wisdom Arerosuoghene <wisdom.arerosuoghene@gmail.com>

chappjc · 2020-09-18T15:16:12Z

server/swap/swap.go

 		case order.TakerSwapCast:
 			if match.takerStatus.swapAsset != block.assetID {
 				break statusSwitch
 			}
 			// If the taker has broadcast their transaction, the maker's broadcast
 			// timeout (for redemption) starts once the maker's swap has SwapConf
 			// confs.
-			s.tryConfirmSwap(match.takerStatus)


BTW, it occurred to me that I don't see much point in the RedeemStatus stuff below in the MatchComplete case, where we check the confirmation count on the redeem transactions. Planning to investigate changes w.r.t. this as there's not much reason to wait for redeem confirms to stop tracking the match. This is closely related to the issue of the maker's ack of the taker's redeem, so I'm going to consider that at the same time and how these things interact w.r.t. timing (e.g. I think we've seen matches removed before the maker's final pointless redeem ack -- on both client and server!).

chappjc · 2020-09-18T15:21:54Z

server/swap/swap.go

+	// The counterparty does not need to actually locate the redemption txn, but
+	// they do need to author and broadcast their own redeem, so allow the
+	// request to wait for a response up until the inaction penalty.
 	s.authMgr.RequestWhenConnected(ack.user, notification, func(_ comms.Link, resp *msgjson.Message) {
 		s.processAck(resp, ack) // resp.ID == notification.ID
-	}, auth.DefaultRequestTimeout, auth.DefaultConnectTimeout, expireFunc)
+	}, s.bTimeout, auth.DefaultConnectTimeout, expireFunc)
+	// maybe: s.bTimeout-time.Since(redeemTime) or time.Until(redeemTime.Add(s.bTimeout))


This pertains to the issue resolved by the final commit, where we saw a maker with simultaneous 20+ matches that were processed in series instead of concurrently. The timeout on the response was too short. Even though the client will be doing them concurrently and faster with 4a22c9d in PR #688, I'm not sure I see a reason to timeout on the response before inaction time.

However, a different RequestWhenConnected with a single deadline instead of two timeouts for response and connect would be more appropriate and less confusing (presently the response timeout only starts after the request is sent, which could be after a good deal of time waiting for connect, but is only accurate as a deadline if the user is already connected at the time this request is made right here).

chappjc · 2020-09-18T15:26:01Z

server/swap/swap.go

+	// Do not search for the transaction past the inaction deadline. For maker,
+	// this is bTimeout after match request. For taker, this is bTimeout after
+	// maker's swap reached swapConfs.
+	lastEvent := stepInfo.match.time // NewlyMatched - the match request time, not matchTime
+	if stepInfo.step == order.MakerSwapCast {
+		lastEvent = stepInfo.match.makerStatus.swapConfTime()
+	}
+	expireTime := time.Now().Add(txWaitExpiration).UTC()
+	if lastEvent.IsZero() {
+		log.Warnf("Prematurely received 'init' from %v at step %v", makerTaker(stepInfo.actor.isMaker), stepInfo.step)
+	} else if deadline := lastEvent.Add(s.bTimeout); expireTime.After(deadline) {
+		expireTime = deadline
+	}
+	log.Debugf("Allowing until %v (%v) to locate contract from %v (%v), match %v",
+		expireTime, time.Until(expireTime), makerTaker(stepInfo.actor.isMaker),
+		stepInfo.step, matchID)
+


This is a fair bit of complexity for little gain in appropriateness of the latencyQ waiter timeout. But it is more appropriate than always going with txWaitExpiration from now, which might be after the inaction time.

buck54321

Looks good on a first pass. Since this is critical code, I want to look again with fresh eyes later. Anything to add?

server/swap/swap.go

chappjc · 2020-09-18T21:08:29Z

No further additions planned for this PR, although I have my eye on the seemingly unnecessary redeem confirm checks and the related issue of the final redeem ack from the maker.

buck54321

Looks good to me.

itswisdomagain

Looks good. Just curious about some stuffs...

itswisdomagain · 2020-09-20T07:56:21Z

client/core/trade_simnet_test.go

-				assetID, nBlocks := assetToMine.ID, uint16(assetToMine.SwapConf)+1 // mine 1 extra to ensure tx gets req. confs
+				assetID, nBlocks := assetToMine.ID, uint16(assetToMine.SwapConf)


Out of curiosity... does mining an additional block (i.e. 2 blocks in quick succession) cause issues for the swap inaction checker?

No, but the extra blocks were added in (pretty sure) because we were grappling with issues getting txns mined on simnet, then we figured out we needed the regentemplate RPC in mine-alpha, which seems to have entirely resolved the mining issue. Do you think we still need an extra block in any of these cases? The appropriate tests would only mine the blocks that reach inaction (just SwapConf).

itswisdomagain · 2020-09-20T08:03:41Z

server/matcher/match.go

@@ -184,7 +184,7 @@ func (m *Matcher) Match(book Booker, queue []*OrderRevealed) (seed []byte, match
 			removed, ok := book.Remove(o.TargetOrderID)
 			if !ok {
 				// The targeted order might be down queue or non-existent.
-				log.Debugf("Failed to remove order %v set by a cancel order %v",
+				log.Debugf("Order %v not removed by a cancel order %v (target either non-existent or down queue in this epoch)",


What does down queue in this context mean? Could the target order be booked at this point but not matched to this cancel order?

Down queue means after shuffle of an epoch queue that contains both a trade and a cancel order targeting that trade, the target order would get processed after the cancel order, so the cancel order would not be able to cancel it since it hasn't been booked yet.

server/swap/swap.go

chappjc commented Sep 10, 2020

View reviewed changes

chappjc force-pushed the inaction branch 2 times, most recently from 340c632 to a659e5f Compare September 13, 2020 02:16

server/swap: inaction check fixes

7589939

use Timers and AfterFunc for inaction checks more precise latencyQ expirations dcrdex: 5 min default bcast timeout fixing up broadcast timeout tests

chappjc force-pushed the inaction branch from a659e5f to 7589939 Compare September 14, 2020 19:54

chappjc added 2 commits September 14, 2020 20:50

harness tweaks and optimizations

0aebdcc

simplify inaction tickers/timers/triggers

bf91bf5

itswisdomagain suggested changes Sep 15, 2020

View reviewed changes

chappjc marked this pull request as ready for review September 15, 2020 13:58

chappjc added 2 commits September 16, 2020 16:59

TESTING logs and timeout tweaks

831ec23

that was a recipie for deadlock

a2897ce

This comment has been minimized.

Sign in to view

JoeGruffins reviewed Sep 17, 2020

View reviewed changes

chappjc added 2 commits September 17, 2020 15:52

fix data race on swapStatus time accesses

a7611e2

TESTS no longer sleep on new block, recv signal instead

0dd6524

chappjc commented Sep 17, 2020

View reviewed changes

chappjc and others added 2 commits September 17, 2020 21:59

dcr harness: use sendmany to fund wallets

06fe8a1

Co-authored-by: Wisdom Arerosuoghene <wisdom.arerosuoghene@gmail.com>

review items

5420c42

chappjc commented Sep 18, 2020

View reviewed changes

buck54321 reviewed Sep 18, 2020

View reviewed changes

server/swap/swap.go Outdated Show resolved Hide resolved

server/swap/swap.go Outdated Show resolved Hide resolved

review

0ed20a1

buck54321 mentioned this pull request Sep 19, 2020

server/market,client/core: order router waits for funding coins on asset net #688

Merged

chappjc mentioned this pull request Sep 19, 2020

capability of server request with a deadline #697

Closed

buck54321 approved these changes Sep 20, 2020

View reviewed changes

itswisdomagain approved these changes Sep 20, 2020

View reviewed changes

chappjc added 2 commits September 21, 2020 08:30

revise comment in latencyQ ExpireFunc

9780f88

grammer

659fc3d

chappjc merged commit e557253 into decred:master Sep 21, 2020

chappjc deleted the inaction branch September 21, 2020 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server/swap: inaction check fixes #680

server/swap: inaction check fixes #680

chappjc commented Sep 9, 2020 •

edited

Loading

chappjc Sep 9, 2020

chappjc Sep 9, 2020

chappjc Sep 10, 2020 •

edited

Loading

chappjc Sep 10, 2020 •

edited

Loading

chappjc Sep 10, 2020

chappjc Sep 10, 2020 •

edited

Loading

itswisdomagain left a comment

itswisdomagain Sep 15, 2020

chappjc Sep 15, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

itswisdomagain Sep 15, 2020

chappjc Sep 15, 2020

chappjc commented Sep 15, 2020 •

edited

Loading

This comment has been minimized.

chappjc Sep 17, 2020 •

edited

Loading

chappjc Sep 17, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

buck54321 left a comment

chappjc commented Sep 18, 2020

buck54321 left a comment

itswisdomagain left a comment

itswisdomagain Sep 20, 2020

chappjc Sep 21, 2020

itswisdomagain Sep 20, 2020

chappjc Sep 21, 2020 •

edited

Loading

	// Send the ack request.
	s.authMgr.RequestWhenConnected(u, req, func(_ comms.Link, resp *msgjson.Message) {
	s.processMatchAcks(u, resp, m)
	}, auth.DefaultRequestTimeout, auth.DefaultConnectTimeout, expireFunc)

		assetID, nBlocks := assetToMine.ID, uint16(assetToMine.SwapConf)+1 // mine 1 extra to ensure tx gets req. confs
		assetID, nBlocks := assetToMine.ID, uint16(assetToMine.SwapConf)

server/swap: inaction check fixes #680

server/swap: inaction check fixes #680

Conversation

chappjc commented Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chappjc Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chappjc Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

itswisdomagain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chappjc Sep 15, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chappjc commented Sep 15, 2020 • edited Loading

This comment has been minimized.

chappjc Sep 17, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 17, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 18, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 18, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc Sep 18, 2020 • edited Loading

Choose a reason for hiding this comment

buck54321 left a comment

Choose a reason for hiding this comment

chappjc commented Sep 18, 2020

buck54321 left a comment

Choose a reason for hiding this comment

itswisdomagain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chappjc Sep 21, 2020 • edited Loading

Choose a reason for hiding this comment

chappjc commented Sep 9, 2020 •

edited

Loading

chappjc Sep 10, 2020 •

edited

Loading

chappjc Sep 10, 2020 •

edited

Loading

chappjc Sep 10, 2020 •

edited

Loading

chappjc Sep 15, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc commented Sep 15, 2020 •

edited

Loading

chappjc Sep 17, 2020 •

edited

Loading

chappjc Sep 17, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc Sep 18, 2020 •

edited

Loading

chappjc Sep 21, 2020 •

edited

Loading