New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: simplify/consolidate API authn/authz #50472
Comments
cc @dhartunian I thought you might be interested |
xref #50483 |
50494: server: clarify the ordering of authn/authn r=tbg a=knz This PR accompanies #50472 and #50483 This patch ensures that the various API endpoints initialize in the following order: 1. annotate their `context.Context` so that log messages are properly contextualized with the client details. 2. authenticate the user. 3. optionally, check for the cluster setting "remote debugging allowed". The check for the cluster setting should be performed after authentication to prevent an unauthenticated attacker from using this ordering to determine the current value of the cluster setting. Additionally, it removes the option for non-admin users to retrieve redacted log entries. Now only admin can use the "get logs" endpoints. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Updated by #50483 |
Let's look at the debug setting ( | y️ | y | y | Logs are where I see the best argument for additional restrictions. We do not make them available through SQL or other user-facing interfaces. However, logs are available to admin users on non-localhost addresses via | y️ | y | y | This is log-like information (in an interface inherited from GRPC). | y️ | y | y | An admin user already has access to this information through the SQL interface, so a localhost-only restriction is unneeded. What's the question mark on | y️ | y | y | Memory metrics from the go runtime. Not sensitive. (also probably not necessary since there's much more information in | y️ | y | y | Pebble LSM visualization. No more sensitive than, say, | y️ | y | y | Profiles and stack traces, mostly coming from Go by default. Similar to metrics, not senstive.
| y | y | y | Accessors for saved profiles (currently limited to heap profiles and goroutine dumps) | y️ | y | y | I think this is similar to stack traces, although it's possible that there may be sites in the codebase that pass more sensitive data to the stopper. | y | y | y | The only thing on this list that's not read-only. But it's just a combination of forcing something that the range scanner would do anyway and returning logs, so not particularly dangerous. | y | y | y | Logs from simulated allocator runs. | y | y | y | Like logs, gossip is not available through other user-facing interfaces, but it's not intended to include sensitive information and is include in debug zips. | y | y | y | Similar to what's in So overall, I don't see any reason to impose additional restrictions on these paths (in addition to requiring SQL admin credentials) and I'd feel OK about removing the remote-debugging setting. (and with making all of this functionality available via SQL interfaces as we've discussed elsewhere). |
Next let's look at the unauthenticated endpoints (first column) | n | n | n | These are essentially static documentation and metadata about the timeseries metrics. Not sensitive (or even cluster-dependent at all that I can see). | n | n | n | Cluster ID and a couple other bits of metadata (enterprise and telemetry status). It feels a little strange to have the latter bits world-readable (or exposed here at all). The cluster ID is questionable and I'm not sure what the intended use case is, but it seems comparable to details that servers will happily expose in their TLS certificates pre-auth. | n | n | n | Basic binary health checks. Important for these to be open without auth because they're used by all kinds of monitoring and orchestration systems. | n | n | n | Slightly more detailed information about the health of the whole cluster. As with the cluster ID, it's not particularly sensitive data, but I'm not sure what use case would require this to be world-readable. | n This is the interface where users can see the information that gets sent up to our telemetry servers. It's not sensitive data (if it were, we shouldn't be collecting it), but we still shouldn't be exposing it to anybody on the network. This should go behind a login. | n Yet another interface to the metrics. It appears to be the same data as
| n Metrics and command-line flags for all nodes. The command-line flags at least should probably be behind a login. | n I think this one is exposed for debugging practicality, so it can be accessed even when the cluster is having trouble. It's a safe one to expose as far as our debugging interfaces go (only exposes range IDs), although exposing this page on its own doesn't seem like the most useful policy. | n | n | n | These are special cases; login is by definition not protected by the same auth as the other pages. Similarly, logout does nothing unless there is a valid session. | n [1] | n | n | Personally I'd mark these as The The | n [4] | n | n | Again, these endpoints are authenticated, albeit in a nonstandard way, so I'd mark them So to summarize: diagnostics definitely belong behind a login (admin-only, I think). I'd prefer to put all of |
Thanks ben for the review. That's amazing insight. Cc @itsbilal - not necessarily for you to take action on, but I think you can take the knowledge. |
We have marked this issue as stale because it has been inactive for |
I have conducted a review of the authn/authz rules for all the endpoints exposed via HTTP. To my knowledge, this is the first time such an analysis was carried out.
Here are my findings:
/_admin/v1/chartcatalog
/_admin/v1/cluster
/_admin/v1/health
,/health
/_admin/v1/liveness
/_admin/v1/metricmetadata
/_status/diagnostics/{node_id}
/_status/metrics/{node_id}
/_status/nodes/{node_id}
/_status/nodes
/_status/problemranges
/login
/logout
/debug/events
/debug/hba_conf
/debug/logspy
/debug/lsm/%d
/debug/metrics
/debug/pprof/
/debug/pprof/cmdline
/debug/pprof/goroutineui/
/debug/pprof/profile
/debug/pprof/symbol
/debug/pprof/trace
/debug/pprof/ui/
/debug/requests
/debug/stopper
/debug/threads
/debug/vars
/_admin/v1/databases
/_admin/v1/jobs
/_admin/v1/queryplan
/_admin/v1/locations
/_admin/v1/users
/_admin/v1/uidata
/_admin/v1/uidata
/_admin/v1/databases/{database}/tables/{table}
/_admin/v1/databases/{database}
/_admin/v1/events
/_admin/v1/settings
/_status/cancel_query/{node_id}
/_status/cancel_session/{node_id}
/_status/sessions
/_admin/v1/data_distribution
/_admin/v1/databases/{database}/tables/{table}/stats
/_admin/v1/nontablestats
/_admin/v1/rangelog/{range_id}
/_admin/v1/rangelog
/_status/certificates/{node_id}
/_status/details/{node_id}
/_status/enginestats/{node_id}
/_status/logfiles/{node_id}/{file}
/_status/logs/{node_id}
/_status/profile/{node_id}
/_status/raft
/_status/range/{range_id}
/_status/ranges/{node_id}
/_status/span
/_status/stacks/{node_id}
/_status/statements
/_status/stores/{node_id}
/_status/stmtdiag/{statement_diagnostics_id}
/_status/stmtdiagreports
/_status/stmtdiagreports
/_status/local_sessions
/_admin/v1/enqueue_range
/_status/allocator/node/{node_id}
/_status/allocator/range/{range_id}
/_status/files/{node_id}
/_status/gossip/{node_id}
/_status/hotranges
/_status/job/{job_id}
/_status/job_registry/{node_id}
/_status/logfiles/{node_id}
Notes:
[1] Various endpoints (e.g.
_admin/v1/databases
, but notdatabases/{database}
) use a non-standard authentication check, thendelegate the principal they discovered to the SQL engine.
Hopefully, the SQL privilege checks take care of authorization,
although this is more by accident than deliberate design.
[2] the
.../locations
endpoints uses the non-standardauth check and then forward the principal to SQL. (see note [1])
For non-admin users, this causes the lookup to fail because
system.locations
is a system table.This is the cause of issue "non-admin user cannot use node map".
[3] The statement diagnostics endpoint require "any admin user"
but then operate in SQL using
root
specifically. Thiswould cause invalid entries in audit logs.
[4] The "UI data" endpoints (K/V pair for UI customizations) uses
a non-standard authn/authz protocol, due to backward compatibility.
This is legacy debt that should be refactored, preferably
by adding a new endpoint (for the new version), deprecating the current one,
then removing it in a later version.
Epic: CRDB-1473
Jira issue: CRDB-4118
The text was updated successfully, but these errors were encountered: