Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: simplify/consolidate API authn/authz #50472

Open
knz opened this issue Jun 22, 2020 · 7 comments
Open

server: simplify/consolidate API authn/authz #50472

knz opened this issue Jun 22, 2020 · 7 comments
Labels
A-authentication Pertains to authn subsystems A-observability-inf A-security A-server-architecture Relates to the internal APIs and src org for server code A-webui-security C-investigation Further steps needed to qualify. C-label will change. T-observability

Comments

@knz
Copy link
Contributor

knz commented Jun 22, 2020

I have conducted a review of the authn/authz rules for all the endpoints exposed via HTTP. To my knowledge, this is the first time such an analysis was carried out.

Here are my findings:

Authn? Debug setting? Admin-only Endpoint Method
n n n /_admin/v1/chartcatalog GET
n n n /_admin/v1/cluster GET
n n n /_admin/v1/health, /health GET
n n n /_admin/v1/liveness GET
n n n /_admin/v1/metricmetadata GET
n ⚠️ n n /_status/diagnostics/{node_id} GET
n ⚠️ n n /_status/metrics/{node_id} GET
n ⚠️ n n /_status/nodes/{node_id} GET
n ⚠️ n n /_status/nodes GET
n ⚠️ n n /_status/problemranges GET
n n n /login POST
n n n /logout GET
y️ y y /debug/events GET
y️ y y /debug/hba_conf GET
y️ y y /debug/logspy GET
y️ y y /debug/lsm/%d GET
y️ y y /debug/metrics GET
y️ y y /debug/pprof/ GET
y️ y y /debug/pprof/cmdline GET
y️ y y /debug/pprof/goroutineui/ GET
y️ y y /debug/pprof/profile GET
y️ y y /debug/pprof/symbol GET
y️ y y /debug/pprof/trace GET
y️ y y /debug/pprof/ui/ GET
y️ y y /debug/requests GET
y️ y y /debug/stopper GET
y️ y y /debug/threads GET
y️ y y /debug/vars GET
n [1] n n /_admin/v1/databases GET
n [1] n n /_admin/v1/jobs GET
n [1] n n /_admin/v1/queryplan GET
n [1] n y [2] 😓 /_admin/v1/locations GET
n [1] n y [2] /_admin/v1/users GET
n [4] n n /_admin/v1/uidata GET
n [4] n n /_admin/v1/uidata POST
y n n /_admin/v1/databases/{database}/tables/{table} GET
y n n /_admin/v1/databases/{database} GET
y n n /_admin/v1/events GET
y n n /_admin/v1/settings GET
y n n /_status/cancel_query/{node_id} GET
y n n /_status/cancel_session/{node_id} GET
y n n /_status/sessions GET
y n y /_admin/v1/data_distribution GET
y n y /_admin/v1/databases/{database}/tables/{table}/stats GET
y n y /_admin/v1/nontablestats GET
y n y /_admin/v1/rangelog/{range_id} GET
y n y /_admin/v1/rangelog GET
y n y /_status/certificates/{node_id} GET
y n y /_status/details/{node_id} GET
y n y /_status/enginestats/{node_id} GET
y n y /_status/logfiles/{node_id}/{file} GET
y n y /_status/logs/{node_id} GET
y n y /_status/profile/{node_id} GET
y n y /_status/raft GET
y n y /_status/range/{range_id} GET
y n y /_status/ranges/{node_id} GET
y n y /_status/span POST
y n y /_status/stacks/{node_id} GET
y n y /_status/statements GET
y n y /_status/stores/{node_id} GET
y n y [3] ⚠️ /_status/stmtdiag/{statement_diagnostics_id} GET
y n y [3] ⚠️ /_status/stmtdiagreports GET
y n y [3] ⚠️ /_status/stmtdiagreports POST
y y ❓ n /_status/local_sessions GET
y y y /_admin/v1/enqueue_range POST
y y y /_status/allocator/node/{node_id} GET
y y y /_status/allocator/range/{range_id} GET
y y y /_status/files/{node_id} GET
y y y /_status/gossip/{node_id} GET
y y y /_status/hotranges GET
y y y /_status/job/{job_id} GET
y y y /_status/job_registry/{node_id} GET
y y y /_status/logfiles/{node_id} GET

Notes:

[1] Various endpoints (e.g. _admin/v1/databases, but not
databases/{database}) use a non-standard authentication check, then
delegate the principal they discovered to the SQL engine.
Hopefully, the SQL privilege checks take care of authorization,
although this is more by accident than deliberate design.

[2] the .../locations endpoints uses the non-standard
auth check and then forward the principal to SQL. (see note [1])
For non-admin users, this causes the lookup to fail because
system.locations is a system table.
This is the cause of issue "non-admin user cannot use node map".

[3] The statement diagnostics endpoint require "any admin user"
but then operate in SQL using root specifically. This
would cause invalid entries in audit logs.

[4] The "UI data" endpoints (K/V pair for UI customizations) uses
a non-standard authn/authz protocol, due to backward compatibility.
This is legacy debt that should be refactored, preferably
by adding a new endpoint (for the new version), deprecating the current one,
then removing it in a later version.

Epic: CRDB-1473

Jira issue: CRDB-4118

@knz knz added C-investigation Further steps needed to qualify. C-label will change. A-webui-security A-security labels Jun 22, 2020
@knz
Copy link
Contributor Author

knz commented Jun 22, 2020

cc @dhartunian I thought you might be interested

@knz knz added this to To do in DB Server & Security Jun 22, 2020
@knz
Copy link
Contributor Author

knz commented Jun 22, 2020

xref #50483

craig bot pushed a commit that referenced this issue Jun 23, 2020
50494: server: clarify the ordering of authn/authn r=tbg a=knz

This PR accompanies #50472 and #50483 

This patch ensures that the various API endpoints initialize in the
following order:

1. annotate their `context.Context` so that log messages are properly
   contextualized with the client details.
2. authenticate the user.
3. optionally, check for the cluster setting "remote debugging allowed".

The check for the cluster setting should be performed after
authentication to prevent an unauthenticated attacker from using this
ordering to determine the current value of the cluster setting.

Additionally, it removes the option for non-admin users to retrieve
redacted log entries. Now only admin can use the "get logs" endpoints.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@knz
Copy link
Contributor Author

knz commented Jun 24, 2020

Updated by #50483

@knz knz moved this from To do to Linked issues (see roadmap) in DB Server & Security Jul 15, 2020
@bdarnell
Copy link
Member

bdarnell commented Nov 4, 2020

Let's look at the debug setting (server.remote_debugging.mode). This was introduced before we had proper authentication for the web interfaces, so it used to be the only line of defense for the paths that used it. Now we have authentication and every path that uses the setting also requires admin authentication. Is it necessary to have an extra hurdle for some paths in addition to admin auth? (and if so, is localhost-only the right extra check? Should it be extended to anything else?) If the answer is no (and I expect it is), then we should default the setting to any or remove it altogether.

| y️ | y | y | /debug/logspy | GET |
| y | y | y | /_status/logfiles/{node_id} | GET |

Logs are where I see the best argument for additional restrictions. We do not make them available through SQL or other user-facing interfaces. However, logs are available to admin users on non-localhost addresses via debug zip, so if we wanted to limit log access to localhost we'd need to lock that down too (in which case you'd be unable to get a full-featured debug zip without changing the debug mode setting).

| y️ | y | y | /debug/events | GET |
| y️ | y | y | /debug/requests | GET |

This is log-like information (in an interface inherited from GRPC).

| y️ | y | y | /debug/hba_conf | GET |
| y | y ❓ | n | /_status/local_sessions | GET |
| y | y | y | /_status/job/{job_id} | GET |
| y | y | y | /_status/job_registry/{node_id} | GET |

An admin user already has access to this information through the SQL interface, so a localhost-only restriction is unneeded. What's the question mark on local_sessions for, @knz ?

| y️ | y | y | /debug/metrics | GET |
| y️ | y | y | /debug/vars | GET |

Memory metrics from the go runtime. Not sensitive. (also probably not necessary since there's much more information in /_status/vars)

| y️ | y | y | /debug/lsm/%d | GET |

Pebble LSM visualization. No more sensitive than, say, crdb_internal.ranges

| y️ | y | y | /debug/pprof/ | GET |
| y️ | y | y | /debug/pprof/cmdline | GET |
| y️ | y | y | /debug/pprof/goroutineui/ | GET |
| y️ | y | y | /debug/pprof/profile | GET |
| y️ | y | y | /debug/pprof/symbol | GET |
| y️ | y | y | /debug/pprof/trace | GET |
| y️ | y | y | /debug/pprof/ui/ | GET |
| y️ | y | y | /debug/threads | GET |

Profiles and stack traces, mostly coming from Go by default. Similar to metrics, not senstive.

/pprof/cmdline is an odd one; I don't think it's used by anything and it might be nice to remove it. Command lines also appear in logs so there is no additional data leak here.

| y | y | y | /_status/files/{node_id} | GET |

Accessors for saved profiles (currently limited to heap profiles and goroutine dumps)

| y️ | y | y | /debug/stopper | GET |

I think this is similar to stack traces, although it's possible that there may be sites in the codebase that pass more sensitive data to the stopper.

| y | y | y | /_admin/v1/enqueue_range | POST |

The only thing on this list that's not read-only. But it's just a combination of forcing something that the range scanner would do anyway and returning logs, so not particularly dangerous.

| y | y | y | /_status/allocator/node/{node_id} | GET |
| y | y | y | /_status/allocator/range/{range_id} | GET |

Logs from simulated allocator runs.

| y | y | y | /_status/gossip/{node_id} | GET |

Like logs, gossip is not available through other user-facing interfaces, but it's not intended to include sensitive information and is include in debug zips.

| y | y | y | /_status/hotranges | GET |

Similar to what's in crdb_internal.ranges and system.jobs.

So overall, I don't see any reason to impose additional restrictions on these paths (in addition to requiring SQL admin credentials) and I'd feel OK about removing the remote-debugging setting. (and with making all of this functionality available via SQL interfaces as we've discussed elsewhere).

@bdarnell
Copy link
Member

Next let's look at the unauthenticated endpoints (first column)

| n | n | n | /_admin/v1/chartcatalog | GET |
| n | n | n | /_admin/v1/metricmetadata | GET |

These are essentially static documentation and metadata about the timeseries metrics. Not sensitive (or even cluster-dependent at all that I can see).

| n | n | n | /_admin/v1/cluster | GET |

Cluster ID and a couple other bits of metadata (enterprise and telemetry status). It feels a little strange to have the latter bits world-readable (or exposed here at all). The cluster ID is questionable and I'm not sure what the intended use case is, but it seems comparable to details that servers will happily expose in their TLS certificates pre-auth.

| n | n | n | /_admin/v1/health, /health | GET |

Basic binary health checks. Important for these to be open without auth because they're used by all kinds of monitoring and orchestration systems.

| n | n | n | /_admin/v1/liveness | GET |

Slightly more detailed information about the health of the whole cluster. As with the cluster ID, it's not particularly sensitive data, but I'm not sure what use case would require this to be world-readable.

| n ⚠️ | n | n | /_status/diagnostics/{node_id} | GET |

This is the interface where users can see the information that gets sent up to our telemetry servers. It's not sensitive data (if it were, we shouldn't be collecting it), but we still shouldn't be exposing it to anybody on the network. This should go behind a login.

| n ⚠️ | n | n | /_status/metrics/{node_id} | GET |

Yet another interface to the metrics. It appears to be the same data as /_status/vars, but translated into json, base64 encoded, then wrapped in another json dict. This seems unlikely to be intentional; is this endpoint used anywhere?

/_status/vars was not in this table, but it should have been. Anyway, as for the security implications, we already want metrics to be publicly visible (at least by default. Per #50126 maybe there should be a setting to require a login for metrics?) for consumption by prometheus and similar systems. So this endpoint is fine but it would be nice to cut down on the redundancy and weird formatting.

| n ⚠️ | n | n | /_status/nodes/{node_id} | GET |
| n ⚠️ | n | n | /_status/nodes | GET |

Metrics and command-line flags for all nodes. The command-line flags at least should probably be behind a login.

| n ⚠️ | n | n | /_status/problemranges | GET |

I think this one is exposed for debugging practicality, so it can be accessed even when the cluster is having trouble. It's a safe one to expose as far as our debugging interfaces go (only exposes range IDs), although exposing this page on its own doesn't seem like the most useful policy.

| n | n | n | /login | POST |
| n | n | n | /logout | GET |

These are special cases; login is by definition not protected by the same auth as the other pages. Similarly, logout does nothing unless there is a valid session.

| n [1] | n | n | /_admin/v1/databases | GET |
| n [1] | n | n | /_admin/v1/jobs | GET |
| n [1] | n | n | /_admin/v1/queryplan | GET |
| n [1] | n | y [2] 😓 | /_admin/v1/locations | GET |
| n [1] | n | y [2] | /_admin/v1/users | GET |

Personally I'd mark these as y [1] instead of n [1] - the auth used is not quite our standard pattern, but it looks valid.

The [2] issue with locations has been fixed: the locations are now visible to any logged-in user. (The ease with which this could be changed is honestly probably the worst part of the [1] auth pattern).

The [2] issue with users is that while the endpoint works for non-admin users, they see only themselves. This seems reasonable and I wouldn't change anything here.

| n [4] | n | n | /_admin/v1/uidata | GET |
| n [4] | n | n | /_admin/v1/uidata | POST |

Again, these endpoints are authenticated, albeit in a nonstandard way, so I'd mark them y [4] (and I agree with the footnote that this interface could really use a redesign).

So to summarize: diagnostics definitely belong behind a login (admin-only, I think). I'd prefer to put all of /_status and /admin/liveness behind a login although I'd be willing to make exceptions for specific use cases. /admin/cluster should probably have everything but the cluster ID removed and perhaps moved behind a login.

@knz
Copy link
Contributor Author

knz commented Nov 24, 2020

Thanks ben for the review. That's amazing insight.

Cc @itsbilal - not necessarily for you to take action on, but I think you can take the knowledge.

@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added A-server-architecture Relates to the internal APIs and src org for server code A-authentication Pertains to authn subsystems labels Jul 29, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-authentication Pertains to authn subsystems A-observability-inf A-security A-server-architecture Relates to the internal APIs and src org for server code A-webui-security C-investigation Further steps needed to qualify. C-label will change. T-observability
Projects
DB Server & Security
  
Linked issues (from the roadmap colum...
Development

No branches or pull requests

3 participants