Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server/{dex,market}: cleaner shutdown #787

Merged
merged 4 commits into from
Nov 2, 2020

Conversation

chappjc
Copy link
Member

@chappjc chappjc commented Oct 26, 2020

server/dex

This reorders the subsystem stack so that DEX manager shutdown is ordered properly now. Namely:

  • comms server shuts down after the subsystems that use it (all but assets) comms didn't move because we want it started after the subsystems that use it are up for the sake of incoming message handling, even though stopping it before those dependent subsystems stop can result in dropped outgoing messages
  • book router shuts down after the markets that use it

server/market

When a market is stopped by context cancellation (not a market suspend admin command), it does not wait for the current epoch to close, so any orders in that truncated epoch were just being dropped without changing their status from epoch to executed (how a no-match/fail is normally handled when epochs are processed by the matcher). This updates the primary defer function in (*Market).Run so that if there are orders left in the epochOrders map after the closed epoch pipeline is drained, their statuses are change in the DB so that they aren't reported to clients as active orders for eternity.

Resolves #474, although it didn't hang any more.

@chappjc chappjc added this to the 0.1.1 milestone Oct 26, 2020
Copy link
Member

@buck54321 buck54321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comms server shuts down after the subsystems that use it (all but assets)

Is this to allow the sub-systems to continue sending messages. Seems like we'd want to immediately end incoming messages before the shutting down the lower-level systems.

Comment on lines 478 to 485
// Client comms RPC server.
server, err := comms.NewServer(cfg.CommsCfg)
if err != nil {
abort()
return nil, fmt.Errorf("NewServer failed: %v", err)
}
startSubSys("Comms Server", server)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how this would achieve the desired effect of shutting down the server last, but is it ok to start the server before the lower-level systems that it relies on? I realize of course that sub-system Run methods are run in goroutines with no guarantee of sequence anyway, but we still want ot start the server last, no?

Copy link
Member Author

@chappjc chappjc Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about the comms order. The issue I noticed was that messages at shutdown are dropped if comms goes dows first. But your point about start-up is taken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comms subsys is moved back to its original location, but I'll let you resolve this comment in case you have thoughts about the outgoing messages on shutdown.

Comment on lines 794 to 807
// Revoke any unmatched epoch orders (if context was canceled, not a
// clean suspend stopped the market).
for oid, ord := range m.epochOrders {
log.Infof("Dropping epoch order %v", oid)
if co, ok := ord.(*order.CancelOrder); ok {
if err := m.storage.FailCancelOrder(co); err != nil {
log.Errorf("Failed set orphaned epoch cancel order %v as executed: %v", oid, err)
}
continue
}
if err := m.storage.ExecuteOrder(ord); err != nil {
log.Errorf("Failed set orphaned epoch trade order %v as executed: %v", oid, err)
}
}
Copy link
Member Author

@chappjc chappjc Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suspend command should be used to complete the epoch if that is desired, but ctrl+c should at least not ditch orders in epoch status even though they do not go through matching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticing the difference between the executed status here, and the revoked status when we do a book purge. Seems like similar situations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe revoked is better for the trades too. Will switch it. Really just avoiding eternal epoch status orders though.

Copy link
Member Author

@chappjc chappjc Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we could use RevokeOrder{Uncounted} for the trade order, but that creates a cancel order to go with it. The general idea of order revocation is that it's a forced unbooking, a server-generated cancellation. So while executed may not fix exactly here, going from epoch->executed is more inline with how this is handled elsewhere, such as epoch orders not matching becoming executed, not revoked. This is another (wacky) way epoch orders can become executed, just without the matcher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an operator have any way to know by looking at the order row in the DB whether this order was executed via matching vs. dirty shutdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preimage.

Copy link
Member Author

@chappjc chappjc Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, need to double check that these won't count as a preimage miss, but it had better be pretty darn uncommon to do a dirty shutdown without suspending the markets first.

Copy link
Member Author

@chappjc chappjc Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so by making it executed we also escape having it counted as a preimage miss because the preimage miss query looks for revoked status orders.
Still thinking through other pitfalls with executed status orders that have no preimage stored.

@chappjc chappjc merged commit c7d7e28 into decred:master Nov 2, 2020
@chappjc chappjc deleted the server-market-shutdown branch November 2, 2020 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmd/dcrdex: Unable to stop app with active orders on the book
2 participants