server: simplify/consolidate API authn/authz #50472

knz · 2020-06-22T13:41:46Z

I have conducted a review of the authn/authz rules for all the endpoints exposed via HTTP. To my knowledge, this is the first time such an analysis was carried out.

Here are my findings:

Authn?	Debug setting?	Admin-only	Endpoint	Method
n	n	n	`/_admin/v1/chartcatalog`	GET
n	n	n	`/_admin/v1/cluster`	GET
n	n	n	`/_admin/v1/health`, `/health`	GET
n	n	n	`/_admin/v1/liveness`	GET
n	n	n	`/_admin/v1/metricmetadata`	GET
n ⚠️	n	n	`/_status/diagnostics/{node_id}`	GET
n ⚠️	n	n	`/_status/metrics/{node_id}`	GET
n ⚠️	n	n	`/_status/nodes/{node_id}`	GET
n ⚠️	n	n	`/_status/nodes`	GET
n ⚠️	n	n	`/_status/problemranges`	GET
n	n	n	`/login`	POST
n	n	n	`/logout`	GET
y️	y	y	`/debug/events`	GET
y️	y	y	`/debug/hba_conf`	GET
y️	y	y	`/debug/logspy`	GET
y️	y	y	`/debug/lsm/%d`	GET
y️	y	y	`/debug/metrics`	GET
y️	y	y	`/debug/pprof/`	GET
y️	y	y	`/debug/pprof/cmdline`	GET
y️	y	y	`/debug/pprof/goroutineui/`	GET
y️	y	y	`/debug/pprof/profile`	GET
y️	y	y	`/debug/pprof/symbol`	GET
y️	y	y	`/debug/pprof/trace`	GET
y️	y	y	`/debug/pprof/ui/`	GET
y️	y	y	`/debug/requests`	GET
y️	y	y	`/debug/stopper`	GET
y️	y	y	`/debug/threads`	GET
y️	y	y	`/debug/vars`	GET
n [1]	n	n	`/_admin/v1/databases`	GET
n [1]	n	n	`/_admin/v1/jobs`	GET
n [1]	n	n	`/_admin/v1/queryplan`	GET
n [1]	n	y [2] 😓	`/_admin/v1/locations`	GET
n [1]	n	y [2]	`/_admin/v1/users`	GET
n [4]	n	n	`/_admin/v1/uidata`	GET
n [4]	n	n	`/_admin/v1/uidata`	POST
y	n	n	`/_admin/v1/databases/{database}/tables/{table}`	GET
y	n	n	`/_admin/v1/databases/{database}`	GET
y	n	n	`/_admin/v1/events`	GET
y	n	n	`/_admin/v1/settings`	GET
y	n	n	`/_status/cancel_query/{node_id}`	GET
y	n	n	`/_status/cancel_session/{node_id}`	GET
y	n	n	`/_status/sessions`	GET
y	n	y	`/_admin/v1/data_distribution`	GET
y	n	y	`/_admin/v1/databases/{database}/tables/{table}/stats`	GET
y	n	y	`/_admin/v1/nontablestats`	GET
y	n	y	`/_admin/v1/rangelog/{range_id}`	GET
y	n	y	`/_admin/v1/rangelog`	GET
y	n	y	`/_status/certificates/{node_id}`	GET
y	n	y	`/_status/details/{node_id}`	GET
y	n	y	`/_status/enginestats/{node_id}`	GET
y	n	y	`/_status/logfiles/{node_id}/{file}`	GET
y	n	y	`/_status/logs/{node_id}`	GET
y	n	y	`/_status/profile/{node_id}`	GET
y	n	y	`/_status/raft`	GET
y	n	y	`/_status/range/{range_id}`	GET
y	n	y	`/_status/ranges/{node_id}`	GET
y	n	y	`/_status/span`	POST
y	n	y	`/_status/stacks/{node_id}`	GET
y	n	y	`/_status/statements`	GET
y	n	y	`/_status/stores/{node_id}`	GET
y	n	y [3] ⚠️	`/_status/stmtdiag/{statement_diagnostics_id}`	GET
y	n	y [3] ⚠️	`/_status/stmtdiagreports`	GET
y	n	y [3] ⚠️	`/_status/stmtdiagreports`	POST
y	y ❓	n	`/_status/local_sessions`	GET
y	y	y	`/_admin/v1/enqueue_range`	POST
y	y	y	`/_status/allocator/node/{node_id}`	GET
y	y	y	`/_status/allocator/range/{range_id}`	GET
y	y	y	`/_status/files/{node_id}`	GET
y	y	y	`/_status/gossip/{node_id}`	GET
y	y	y	`/_status/hotranges`	GET
y	y	y	`/_status/job/{job_id}`	GET
y	y	y	`/_status/job_registry/{node_id}`	GET
y	y	y	`/_status/logfiles/{node_id}`	GET

Notes:

[1] Various endpoints (e.g. _admin/v1/databases, but not
databases/{database}) use a non-standard authentication check, then
delegate the principal they discovered to the SQL engine.
Hopefully, the SQL privilege checks take care of authorization,
although this is more by accident than deliberate design.

[2] the .../locations endpoints uses the non-standard
auth check and then forward the principal to SQL. (see note [1])
For non-admin users, this causes the lookup to fail because
system.locations is a system table.
This is the cause of issue "non-admin user cannot use node map".

[3] The statement diagnostics endpoint require "any admin user"
but then operate in SQL using root specifically. This
would cause invalid entries in audit logs.

[4] The "UI data" endpoints (K/V pair for UI customizations) uses
a non-standard authn/authz protocol, due to backward compatibility.
This is legacy debt that should be refactored, preferably
by adding a new endpoint (for the new version), deprecating the current one,
then removing it in a later version.

Epic: CRDB-1473

Jira issue: CRDB-4118

The text was updated successfully, but these errors were encountered:

knz · 2020-06-22T13:45:10Z

cc @dhartunian I thought you might be interested

knz · 2020-06-22T15:52:28Z

xref #50483

50494: server: clarify the ordering of authn/authn r=tbg a=knz This PR accompanies #50472 and #50483 This patch ensures that the various API endpoints initialize in the following order: 1. annotate their `context.Context` so that log messages are properly contextualized with the client details. 2. authenticate the user. 3. optionally, check for the cluster setting "remote debugging allowed". The check for the cluster setting should be performed after authentication to prevent an unauthenticated attacker from using this ordering to determine the current value of the cluster setting. Additionally, it removes the option for non-admin users to retrieve redacted log entries. Now only admin can use the "get logs" endpoints. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

knz · 2020-06-24T12:17:43Z

Updated by #50483

bdarnell · 2020-11-04T18:45:59Z

Let's look at the debug setting (server.remote_debugging.mode). This was introduced before we had proper authentication for the web interfaces, so it used to be the only line of defense for the paths that used it. Now we have authentication and every path that uses the setting also requires admin authentication. Is it necessary to have an extra hurdle for some paths in addition to admin auth? (and if so, is localhost-only the right extra check? Should it be extended to anything else?) If the answer is no (and I expect it is), then we should default the setting to any or remove it altogether.

| y️ | y | y | /debug/logspy | GET |
| y | y | y | /_status/logfiles/{node_id} | GET |

Logs are where I see the best argument for additional restrictions. We do not make them available through SQL or other user-facing interfaces. However, logs are available to admin users on non-localhost addresses via debug zip, so if we wanted to limit log access to localhost we'd need to lock that down too (in which case you'd be unable to get a full-featured debug zip without changing the debug mode setting).

| y️ | y | y | /debug/events | GET |
| y️ | y | y | /debug/requests | GET |

This is log-like information (in an interface inherited from GRPC).

| y️ | y | y | /debug/hba_conf | GET |
| y | y ❓ | n | /_status/local_sessions | GET |
| y | y | y | /_status/job/{job_id} | GET |
| y | y | y | /_status/job_registry/{node_id} | GET |

An admin user already has access to this information through the SQL interface, so a localhost-only restriction is unneeded. What's the question mark on local_sessions for, @knz ?

| y️ | y | y | /debug/metrics | GET |
| y️ | y | y | /debug/vars | GET |

Memory metrics from the go runtime. Not sensitive. (also probably not necessary since there's much more information in /_status/vars)

| y️ | y | y | /debug/lsm/%d | GET |

Pebble LSM visualization. No more sensitive than, say, crdb_internal.ranges

| y️ | y | y | /debug/pprof/ | GET |
| y️ | y | y | /debug/pprof/cmdline | GET |
| y️ | y | y | /debug/pprof/goroutineui/ | GET |
| y️ | y | y | /debug/pprof/profile | GET |
| y️ | y | y | /debug/pprof/symbol | GET |
| y️ | y | y | /debug/pprof/trace | GET |
| y️ | y | y | /debug/pprof/ui/ | GET |
| y️ | y | y | /debug/threads | GET |

Profiles and stack traces, mostly coming from Go by default. Similar to metrics, not senstive.

/pprof/cmdline is an odd one; I don't think it's used by anything and it might be nice to remove it. Command lines also appear in logs so there is no additional data leak here.

| y | y | y | /_status/files/{node_id} | GET |

Accessors for saved profiles (currently limited to heap profiles and goroutine dumps)

| y️ | y | y | /debug/stopper | GET |

I think this is similar to stack traces, although it's possible that there may be sites in the codebase that pass more sensitive data to the stopper.

| y | y | y | /_admin/v1/enqueue_range | POST |

The only thing on this list that's not read-only. But it's just a combination of forcing something that the range scanner would do anyway and returning logs, so not particularly dangerous.

| y | y | y | /_status/allocator/node/{node_id} | GET |
| y | y | y | /_status/allocator/range/{range_id} | GET |

Logs from simulated allocator runs.

| y | y | y | /_status/gossip/{node_id} | GET |

Like logs, gossip is not available through other user-facing interfaces, but it's not intended to include sensitive information and is include in debug zips.

| y | y | y | /_status/hotranges | GET |

Similar to what's in crdb_internal.ranges and system.jobs.

So overall, I don't see any reason to impose additional restrictions on these paths (in addition to requiring SQL admin credentials) and I'd feel OK about removing the remote-debugging setting. (and with making all of this functionality available via SQL interfaces as we've discussed elsewhere).

bdarnell · 2020-11-24T19:23:39Z

Next let's look at the unauthenticated endpoints (first column)

| n | n | n | /_admin/v1/chartcatalog | GET |
| n | n | n | /_admin/v1/metricmetadata | GET |

These are essentially static documentation and metadata about the timeseries metrics. Not sensitive (or even cluster-dependent at all that I can see).

| n | n | n | /_admin/v1/cluster | GET |

Cluster ID and a couple other bits of metadata (enterprise and telemetry status). It feels a little strange to have the latter bits world-readable (or exposed here at all). The cluster ID is questionable and I'm not sure what the intended use case is, but it seems comparable to details that servers will happily expose in their TLS certificates pre-auth.

| n | n | n | /_admin/v1/health, /health | GET |

Basic binary health checks. Important for these to be open without auth because they're used by all kinds of monitoring and orchestration systems.

| n | n | n | /_admin/v1/liveness | GET |

Slightly more detailed information about the health of the whole cluster. As with the cluster ID, it's not particularly sensitive data, but I'm not sure what use case would require this to be world-readable.

| n ⚠️ | n | n | /_status/diagnostics/{node_id} | GET |

This is the interface where users can see the information that gets sent up to our telemetry servers. It's not sensitive data (if it were, we shouldn't be collecting it), but we still shouldn't be exposing it to anybody on the network. This should go behind a login.

| n ⚠️ | n | n | /_status/metrics/{node_id} | GET |

Yet another interface to the metrics. It appears to be the same data as /_status/vars, but translated into json, base64 encoded, then wrapped in another json dict. This seems unlikely to be intentional; is this endpoint used anywhere?

/_status/vars was not in this table, but it should have been. Anyway, as for the security implications, we already want metrics to be publicly visible (at least by default. Per #50126 maybe there should be a setting to require a login for metrics?) for consumption by prometheus and similar systems. So this endpoint is fine but it would be nice to cut down on the redundancy and weird formatting.

| n ⚠️ | n | n | /_status/nodes/{node_id} | GET |
| n ⚠️ | n | n | /_status/nodes | GET |

Metrics and command-line flags for all nodes. The command-line flags at least should probably be behind a login.

| n ⚠️ | n | n | /_status/problemranges | GET |

I think this one is exposed for debugging practicality, so it can be accessed even when the cluster is having trouble. It's a safe one to expose as far as our debugging interfaces go (only exposes range IDs), although exposing this page on its own doesn't seem like the most useful policy.

| n | n | n | /login | POST |
| n | n | n | /logout | GET |

These are special cases; login is by definition not protected by the same auth as the other pages. Similarly, logout does nothing unless there is a valid session.

| n [1] | n | n | /_admin/v1/databases | GET |
| n [1] | n | n | /_admin/v1/jobs | GET |
| n [1] | n | n | /_admin/v1/queryplan | GET |
| n [1] | n | y [2] 😓 | /_admin/v1/locations | GET |
| n [1] | n | y [2] | /_admin/v1/users | GET |

Personally I'd mark these as y [1] instead of n [1] - the auth used is not quite our standard pattern, but it looks valid.

The [2] issue with locations has been fixed: the locations are now visible to any logged-in user. (The ease with which this could be changed is honestly probably the worst part of the [1] auth pattern).

The [2] issue with users is that while the endpoint works for non-admin users, they see only themselves. This seems reasonable and I wouldn't change anything here.

| n [4] | n | n | /_admin/v1/uidata | GET |
| n [4] | n | n | /_admin/v1/uidata | POST |

Again, these endpoints are authenticated, albeit in a nonstandard way, so I'd mark them y [4] (and I agree with the footnote that this interface could really use a redesign).

So to summarize: diagnostics definitely belong behind a login (admin-only, I think). I'd prefer to put all of /_status and /admin/liveness behind a login although I'd be willing to make exceptions for specific use cases. /admin/cluster should probably have everything but the cluster ID removed and perhaps moved behind a login.

knz · 2020-11-24T22:01:28Z

Thanks ben for the review. That's amazing insight.

Cc @itsbilal - not necessarily for you to take action on, but I think you can take the knowledge.

github-actions · 2023-09-12T11:11:21Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

knz added C-investigation Further steps needed to qualify. C-label will change. A-webui-security A-security labels Jun 22, 2020

knz added this to To do in DB Server & Security Jun 22, 2020

knz mentioned this issue Jun 22, 2020

server: clarify the ordering of authn/authn #50494

Merged

knz moved this from To do to Linked issues (see roadmap) in DB Server & Security Jul 15, 2020

knz mentioned this issue Jul 30, 2020

/_status/vars is world readable #50126

Closed

bdarnell mentioned this issue Nov 24, 2020

server: Remove remote_debugging setting #57072

Closed

jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021

knz added A-server-architecture Relates to the internal APIs and src org for server code A-authentication Pertains to authn subsystems labels Jul 29, 2021

github-actions bot added the no-issue-activity label Sep 12, 2023

knz added T-observability-inf and removed T-server-and-security DB Server & Security no-issue-activity labels Sep 12, 2023

blathers-crl bot added the A-observability-inf label Sep 12, 2023

exalate-issue-sync bot added T-observability and removed T-observability-inf labels Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: simplify/consolidate API authn/authz #50472

server: simplify/consolidate API authn/authz #50472

knz commented Jun 22, 2020 •

edited by cockroach-jira-scripts

knz commented Jun 22, 2020

knz commented Jun 22, 2020

knz commented Jun 24, 2020

bdarnell commented Nov 4, 2020

bdarnell commented Nov 24, 2020

knz commented Nov 24, 2020

github-actions bot commented Sep 12, 2023

server: simplify/consolidate API authn/authz #50472

server: simplify/consolidate API authn/authz #50472

Comments

knz commented Jun 22, 2020 • edited by cockroach-jira-scripts

knz commented Jun 22, 2020

knz commented Jun 22, 2020

knz commented Jun 24, 2020

bdarnell commented Nov 4, 2020

bdarnell commented Nov 24, 2020

knz commented Nov 24, 2020

github-actions bot commented Sep 12, 2023

knz commented Jun 22, 2020 •

edited by cockroach-jira-scripts