admission: bypass admission queueing for debug requests #72977
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-sre
For issues SRE opened or otherwise cares about tracking.
T-admission-control
Admission Control
Is your feature request related to a problem? Please describe.
As of 21.1, if a CRDB cluster is overloaded, it is difficult to get debug data from it. The admin UI may be slow or unavailable. Taking a debug zip may fail. Scraping metrics could even fail (tho I am not sure I have seen that happen). This makes fixing the overload difficult.
Describe the solution you'd like
As of 21.2, we have an admission control system. We have more control over the extent to which hardware resources are over-spent during an overload scenario. We also have the ability to decide what works waits in admission control queues & what doesn't. Lastly, there is an explicit notion of priority in the admission control system. As a result, we should be able to keep the cluster healthy enough to always enable critical flows such as viewing the admin UI and taking a debug zip.
I figure there is some TODO work here, beyond turning on admission control by default. One thing that may be TODO is to make various operations that are needed to view the admin UI or take a debug zip either skip the admission control queues or at least have a higher priority than other work.
Describe alternatives you've considered
N/A
Additional context
Rece
Jira issue: CRDB-11363
The text was updated successfully, but these errors were encountered: