-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server/{dex,market}: cleaner shutdown #787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comms server shuts down after the subsystems that use it (all but assets)
Is this to allow the sub-systems to continue sending messages. Seems like we'd want to immediately end incoming messages before the shutting down the lower-level systems.
server/dex/dex.go
Outdated
// Client comms RPC server. | ||
server, err := comms.NewServer(cfg.CommsCfg) | ||
if err != nil { | ||
abort() | ||
return nil, fmt.Errorf("NewServer failed: %v", err) | ||
} | ||
startSubSys("Comms Server", server) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how this would achieve the desired effect of shutting down the server last, but is it ok to start the server before the lower-level systems that it relies on? I realize of course that sub-system Run
methods are run in goroutines with no guarantee of sequence anyway, but we still want ot start the server last, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point about the comms order. The issue I noticed was that messages at shutdown are dropped if comms goes dows first. But your point about start-up is taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comms subsys is moved back to its original location, but I'll let you resolve this comment in case you have thoughts about the outgoing messages on shutdown.
server/market/market.go
Outdated
// Revoke any unmatched epoch orders (if context was canceled, not a | ||
// clean suspend stopped the market). | ||
for oid, ord := range m.epochOrders { | ||
log.Infof("Dropping epoch order %v", oid) | ||
if co, ok := ord.(*order.CancelOrder); ok { | ||
if err := m.storage.FailCancelOrder(co); err != nil { | ||
log.Errorf("Failed set orphaned epoch cancel order %v as executed: %v", oid, err) | ||
} | ||
continue | ||
} | ||
if err := m.storage.ExecuteOrder(ord); err != nil { | ||
log.Errorf("Failed set orphaned epoch trade order %v as executed: %v", oid, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suspend command should be used to complete the epoch if that is desired, but ctrl+c should at least not ditch orders in epoch status even though they do not go through matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticing the difference between the executed status here, and the revoked status when we do a book purge. Seems like similar situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe revoked is better for the trades too. Will switch it. Really just avoiding eternal epoch status orders though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we could use RevokeOrder{Uncounted} for the trade order, but that creates a cancel order to go with it. The general idea of order revocation is that it's a forced unbooking, a server-generated cancellation. So while executed may not fix exactly here, going from epoch->executed is more inline with how this is handled elsewhere, such as epoch orders not matching becoming executed, not revoked. This is another (wacky) way epoch orders can become executed, just without the matcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an operator have any way to know by looking at the order row in the DB whether this order was executed via matching vs. dirty shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preimage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, need to double check that these won't count as a preimage miss, but it had better be pretty darn uncommon to do a dirty shutdown without suspending the markets first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so by making it executed we also escape having it counted as a preimage miss because the preimage miss query looks for revoked status orders.
Still thinking through other pitfalls with executed status orders that have no preimage stored.
server/dex
This reorders the subsystem stack so that DEX manager shutdown is ordered properly now. Namely:
comms server shuts down after the subsystems that use it (all but assets)comms didn't move because we want it started after the subsystems that use it are up for the sake of incoming message handling, even though stopping it before those dependent subsystems stop can result in dropped outgoing messagesserver/market
When a market is stopped by context cancellation (not a market suspend admin command), it does not wait for the current epoch to close, so any orders in that truncated epoch were just being dropped without changing their status from epoch to executed (how a no-match/fail is normally handled when epochs are processed by the matcher). This updates the primary defer function in
(*Market).Run
so that if there are orders left in theepochOrders
map after the closed epoch pipeline is drained, their statuses are change in the DB so that they aren't reported to clients as active orders for eternity.Resolves #474, although it didn't hang any more.