New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: administrative commands meant to work over unavailable clusters are broken #43974
Comments
cockroach node status
is broken
@tbg can you check the above? I'd like to point out that Interestingly, #20713 was listing It seems to me that |
I can see some value in As noted, we'd need to tell a new story about authn/authz for such a sql user. Perhaps we want to provide a mechanism to create tokens with an expiration and well scoped access to "control" data. Furthermore, pushing such data underneath SQL will muddle the interface. Perhaps my preference is to do both. Define a clear "control" portion which is implemented on top of SQL and can bypass authz explicitly. Then, with less urgency, provide a mechanism to authorize SQL connections to access the same data. |
Yeah and that info will be stored ... where? In a KV range that needs quorum to access? That sounds like a snake biting its tail. I'm cooking a high-level description of the problem that underlines how we've been designing with conflicting goals. There will be fallout. |
In my head the proposal was that the token be irrevocable or best effort revocable. The idea was that it could be a signed token which stores its expiration so that verification can occur without access to the database. |
We have that already with TLS client certs (signed + embedded expiration) Except that it's not server-side revocable. And we can't control/revoke privileges. |
None of the options discussed seems very appealing. Take a look at the SQL used for Lines 148 to 189 in 0917dcc
Cobbling this together without SQL is unsavory. In fact, the previous code doing so was... bad, we can't revert to it. The two options I find palatable are:
|
I like 2 as it's well aligned with another item on the CLI roadmap. I'll mull it over. |
44022: server,sql,pgwire: let client conns timeout on unavailable clusters r=ajwerner a=knz Fixes #44012. Informs #43974 and #30887. Release note (general change): CockroachDB will now report a timeout error when a client attempts to connect via SQL or the web UI and some system ranges are unavailable. The previous behavior was to wait indefinitely. The timeout is configurable via the cluster setting `server.user_login.timeout` and is set to 10 seconds by default. The value 0 means "indefinitely" can can be used to restore the pre-20.1 behavior. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
We have marked this issue as stale because it has been inactive for |
Found while working on #43893
cockroach node status
is meant to work even if the node is unhealthy (eg. a system range is unavailable), as per a comment in roachtests and my own (knz') personal recollections of past discussionsTherefore currently
cockroach node status
happens to work for theroot
user because of some bypass in the code when the username is "root
" exactly. However, it is broken if used with any other client cert thanroot
's.There are two perspectives about this:
A. either we consider that SQL should always authenticate (which creates consistency and regularity), in which case
cockroach node status
should be made to use the status RPC entirely and bypass SQL.For this to be viable we must start to commit to (and better understand) a separation between "control" (RPC) and "data" (SQL) planes in the APIs.
B. or, we should consider that the "main" administration API for crdb is SQL and that this admin API should always work even when system ranges are unavailable, i.e. we must implement and document a way for clients to establish SQL connections when the system ranges including system.users are unavailable.
For this to be viable we need a special "operator" account that's always special in this way, but without removing the ability for users to use a non-special account that's also admin in the UI (i.e. properly address #43847 / #43870)
(needless say my(knz) heart goes towards direction A)
Jira issue: CRDB-6332
The text was updated successfully, but these errors were encountered: