Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: properly authenticate some admin RPC requests #42563

Merged
merged 3 commits into from
Nov 25, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 19, 2019

Needed for #42520.
Fixes the vulnerability described in #42567 (but does not durably prevent it from creeping back, so more work is needed to fully close the issue).

The the individual commits for details.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 19, 2019

@rolandcrosby @bdarnell do you know if the current/previous behavior was intentional? If not, perhaps this may warrant a security announcement?

@rolandcrosby
Copy link

@piyush-singh can you speak to the intended admin UI behavior and whether this was by design or not?

@knz
Copy link
Contributor Author

knz commented Nov 19, 2019

Filed a separate issue #42567 to describe the problem.

Copy link
Contributor

@mberhault mberhault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anything stop working on the admin UI? List of databases/tables/users? Some of those are unlikely to work as non-admin users.

SELECT crdb_internal.is_admin()
----
false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you grant admin role membership to testuser and try again? It should now be true. Or would that need to be done in the CCL logictests?

@@ -3284,6 +3284,31 @@ may increase either contention or retry errors, or both.`,
},
),

// Returns true iff the current user has admin role.
// Note: it would be a privacy leak to extend this to check arbitrary usernames.
"crdb_internal.is_admin": makeBuiltin(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice builtin!

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anything stop working on the admin UI? List of databases/tables/users? Some of those are unlikely to work as non-admin users.

I checked manually: the databases/tables/users list now only display what a user would be able to see in SQL otherwise (intended feature). The statements page is empty (this is a admin-only feature, so intended behavior).

The jobs page reports an error, but again I think that's OK since listing all jobs is also admin-only.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @knz, and @mberhault)


pkg/sql/logictest/testdata/logic_test/crdb_internal, line 616 at r1 (raw file):

Previously, mberhault (marc) wrote…

Can you grant admin role membership to testuser and try again? It should now be true. Or would that need to be done in the CCL logictests?

Done (in ccllogictest).

@knz
Copy link
Contributor Author

knz commented Nov 19, 2019

Discussed offline: this PR is not a complete fix for #42567 but it's moving the needle forward and also it's a prerequisite to #42520 which I'd also like to get off my plate.

So I'd like to motion to merge this even though it's not a complete fix yet.

@knz knz changed the title server: properly authenticate admin RPC requests server: properly authenticate some admin RPC requests Nov 19, 2019
@piyush-singh
Copy link

This was mostly just unimplemented on our end. We intended to add Authz checks after the addition of Authn in the 2.1 release. Requiring Authn was already a strict improvement as previously the HTTP endpoints were completely unsecured for the admin UI in secure clusters. However before we were able to actually make progress on adding Authz checks, the team was dissolved. Thank you @knz for picking this back up, I should probably have discussed this as a security issue more publicly after the team was dissolved.

Copy link
Contributor

@mberhault mberhault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @knz)

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is not a complete fix for #42567

What's left?

Reviewed 18 of 18 files at r1, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@knz
Copy link
Contributor Author

knz commented Nov 22, 2019

What's left?

The code currently works by extracting the authenticated username and passing it to the SQL queries, and letting SQL validate privileges.

There are two problems with this:

  • some SQL-based endpoints work with a non-root username but we may not want them to, because they are cluster-wide expensive (risk of DoS) or they contain a non-SQL part which can reveal sensitive information. An example of this is DataDistribution.

  • Not all endpoints use SQL in the first place. All those that don't are currently still unauthenticated and function as root.

The following endpoints remain problematic in either of these two ways:

  • Liveness, data leak, it can reveal IP addresses of nodes
  • Drain - DoS
  • DecommissionStatus - data leak
  • Decommission - DoS
  • DataDistribution - DoS and data leak
  • EnqueueRange - data leak: you give it any range ID and it gives you back a detailed trace including key strings full of PII
  • any additional admin RPC added in the future that does not use SQL, this PR does not add any guardrail to ensure the future engineer does the right thing.

Also see my additional comment on the issue.

Prior to this patch, all the admin RPCs would perform
whatever SQL operations they needed to do using the root user.
See issue cockroachdb#42567.

Moreover, because of this the "UI data" payload containing all the UI
configuration settings was shared between all users.

This was possibly a security flaw, as it could enable any
authenticated user (eg via a certificate on the CLI, or via the web
UI) to access SQL data otherwise only available to root.

This patch fixes **part of** issue cockroachdb#42567 by
authorizing the SQL execution of admin RPC endpoints properly.

It also introduces a new SQL built-in function
`crdb_internal.is_admin()` used internally by the admin user.

**The following problem from cockroachdb#42567 is not fixed: RPCs that do not use
SQL are still allowed to proceed as root regardless of the
authenticated user.** This will be addressed by a subsequent commit.

Release note (admin ui change): Certain web UI pages (like the list of
databases, tables) now restrict their content to match the privileges
of the logged-in user.

Release note (admin ui change): The event log now presents all cluster
settings changes, unredacted, when an admin user uses the page.

Release note (admin ui change): Customization of the UI by
users is only properly saved if the user has write privilege
to `system.ui` (i.e. admin users). Also, all authenticated
users share the same customizations. This is a known limitation
and should be lifted in a future version.
The Drain() (`cockroach quit`) RPC is used over gRPC by `cockroach
quit`. There is no requirement to have it exposed via HTTP, and
moreover it is a security problem (DoS) waiting to happen
since the request is not fully authorized.

Release note: none
@knz
Copy link
Contributor Author

knz commented Nov 22, 2019

I have investigated all the remaining RPCs, closed the remaining gaps, documented the UX changes in release notes and left TODOs for the remaining work. So this means the PR will remove the "vulnerability" part of #42567.

I still think the issue should remain open until we have a better mechanism (common check with whitelist) and more tests! But at least we can get this through for the release.

Previously, admin/status RPCs available over HTTP but that did not
perform SQL operations would not verify authorization from
clients (the only authorization was that performed by SQL privilege
checks).

This patch fixes it.

The following services are now blocked from non-admin users:

- `TableStats`. This is a temporary limitation, until this RPC learns
  how to verify privileges on the specific table for which stats are
  requested.
- `NonTableStats`
- `DataDistribution`
- `EnqueueRange`
- `RangeLog`
- `Gossip`
- `EngineStats`
- `Allocator`
- `AllocatorRange`
- `Certificates`
- `Details`
- `GetFiles`
- `LogFilesList`
- `LogFile`
- `Logs`
- `Stacks`
- `Profile`
- `RaftDebug`
- `Ranges`
- `HotRanges`
- `Range`
- `SpanStats`
- `Stores`

For reference, the following services were already properly authorized
by SQL privilege checks:

- `Databases`
- `DatabaseDetails`
- `TableDetails`
- `Users`
- `Events`
- `Jobs`
- `Locations`
- `QueryPlan`

The following were already authorized using a custom check:

- `ListSessions` (user must be either admin or can only list their own
  sessions)
- `ListLocalSessions` (ditto)

The following are now authorized using a custom check:

- `CancelSession` (user must be either admin or operate on their own sessions)
- `CancelQuery` (user must be either admin or operate on their own queries)

The following RPCs are not authorized but do not leak sensitive
information:

- `Health`
- `Liveness`
- `Settings`

Release note (admin ui change): Access to table statistics are
temporary blocked from access by non-admin users until further notice,
for security reasons.

Release note (admin ui change): Certain debug pages have been blocked
from non-admin users for security reasons.
@craig
Copy link
Contributor

craig bot commented Nov 25, 2019

Build failed

@knz
Copy link
Contributor Author

knz commented Nov 25, 2019

Transient upstream pip error in ORM tests

bors r+

craig bot pushed a commit that referenced this pull request Nov 25, 2019
42563: server: properly authenticate some admin RPC requests r=knz a=knz

Needed for #42520.
Fixes the vulnerability described in #42567 (but does not durably prevent it from creeping back, so more work is needed to fully close the issue).

The the individual commits for details.


Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@knz
Copy link
Contributor Author

knz commented Nov 25, 2019

@bdarnell it's quite a hassle to do port this to 2.1, essentially a lot of the pieces have to be re-done from scratch due to arch changes in the 2.1->19.1 transition. Should I still look into it?

@craig
Copy link
Contributor

craig bot commented Nov 25, 2019

Build succeeded

@craig craig bot merged commit 04c947d into cockroachdb:master Nov 25, 2019
@knz knz deleted the 20191119-admin-server branch November 25, 2019 12:44
craig bot pushed a commit that referenced this pull request Nov 25, 2019
42520: settings: classify public vs non-public cluster settings r=knz a=knz

First commit from #42563.

This PR makes three user-visible changes recommended/requested by @johnrk: 

- it annotates all the cluster settings into broad public (= documented+supported) and non-public (= non-documented/non-supported) categories.
- the categories are now displayed alongside the settings in the output of `SHOW ALL CLUSTER SETTINGS`
- a new statement `SHOW CLUSTER SETTINGS` is introduced that lists only the public settings.

By default during implementation, a setting starts as non-public.

New settings should only be promoted to the public category after the team has built confidence in a broad understanding of all its possible effects, and after starting a doc project to explain the setting.

In this PR, I have manually opted in the following settings to become public:

```
                          variable                         |                    desc
+----------------------------------------------------------+---------------------------------------------+
  cloudstorage.gs.default.key                              | if set, JSON key to use during Google Cl...
  cloudstorage.http.custom_ca                              | custom root CA (appended to system's def...
  cloudstorage.timeout                                     | the timeout for import/export storage op...
  cluster.organization                                     | organization name...
  cluster.preserve_downgrade_option                        | disable (automatic or manual) cluster ve...
  debug.panic_on_failed_assertions                         | panic when an assertion fails rather tha...
  diagnostics.forced_stat_reset.interval                   | interval after which pending diagnostics...
  diagnostics.reporting.enabled                            | enable reporting diagnostic metrics to c...
  diagnostics.reporting.interval                           | interval at which diagnostics data shoul...
  external.graphite.endpoint                               | if nonempty, push server metrics to the ...
  external.graphite.interval                               | the interval at which metrics are pushed...
  kv.allocator.load_based_lease_rebalancing.enabled        | set to enable rebalancing of range lease...
  kv.allocator.load_based_rebalancing                      | whether to rebalance based on the distri...
  kv.allocator.qps_rebalance_threshold                     | minimum fraction away from the mean a st...
  kv.allocator.range_rebalance_threshold                   | minimum fraction away from the mean a st...
  kv.bulk_io_write.max_rate                                | the rate limit (bytes/sec) to use for wr...
  kv.closed_timestamp.follower_reads_enabled               | allow (all) replicas to serve consistent...
  kv.range_split.by_load_enabled                           | allow automatic splits of ranges based o...
  kv.range_split.load_qps_threshold                        | the QPS over which, the range becomes a ...
  kv.rangefeed.enabled                                     | if set, rangefeed registration is enable...
  kv.replication_reports.interval                          | the frequency for generating the replica...
  kv.snapshot_rebalance.max_rate                           | the rate limit (bytes/sec) to use for re...
  kv.snapshot_recovery.max_rate                            | the rate limit (bytes/sec) to use for re...
  kv.transaction.max_intents_bytes                         | maximum number of bytes used to track wr...
  kv.transaction.max_refresh_spans_bytes                   | maximum number of bytes used to track re...
  server.eventlog.ttl                                      | if nonzero, event log entries older than...
  server.host_based_authentication.configuration           | host-based authentication configuration ...
  server.rangelog.ttl                                      | if nonzero, range log entries older than...
  server.remote_debugging.mode                             | set to enable remote debugging, localhos...
  server.time_until_store_dead                             | the time after which if there is no new ...
  server.web_session_timeout                               | the duration that a newly created web se...
  sql.defaults.default_int_size                            | the size, in bytes, of an INT type...
  sql.defaults.distsql                                     | default distributed SQL execution mode [...
  sql.defaults.results_buffer.size                         | default size of the buffer that accumula...
  sql.defaults.serial_normalization                        | default handling of SERIAL in table defi...
  sql.distsql.max_running_flows                            | maximum number of concurrent flows that ...
  sql.distsql.temp_storage.joins                           | set to true to enable use of disk for di...
  sql.distsql.temp_storage.sorts                           | set to true to enable use of disk for di...
  sql.metrics.statement_details.dump_to_logs               | dump collected statement statistics to n...
  sql.metrics.statement_details.enabled                    | collect per-statement query statistics...
  sql.metrics.statement_details.plan_collection.enabled    | periodically save a logical plan for eac...
  sql.metrics.statement_details.plan_collection.period     | the time until a new logical plan is col...
  sql.metrics.statement_details.threshold                  | minimum execution time to cause statisti...
  sql.metrics.transaction_details.enabled                  | collect per-application transaction stat...
  sql.stats.automatic_collection.enabled                   | automatic statistics collection mode...
  sql.stats.histogram_collection.enabled                   | histogram collection mode...
  sql.stats.post_events.enabled                            | if set, an event is logged for every CRE...
  sql.trace.log_statement_execute                          | set to true to enable logging of execute...
  sql.trace.session_eventlog.enabled                       | set to true to enable session tracing. N...
  sql.trace.txn.enable_threshold                           | duration beyond which all transactions a...
  timeseries.storage.enabled                               | if set, periodic timeseries data is stor...
  timeseries.storage.resolution_10s.ttl                    | the maximum age of time series data stor...
  timeseries.storage.resolution_30m.ttl                    | the maximum age of time series data stor...
  trace.debug.enable                                       | if set, traces for recent requests can b...
  trace.lightstep.token                                    | if set, traces go to Lightstep using thi...
  trace.zipkin.collector                                   | if set, traces go to the given Zipkin in...
  version                                                  | set the active cluster version in the fo...
```

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@knz knz added this to To do in DB Server & Security via automation Jan 13, 2020
@knz knz moved this from To do to Done 20.1 in DB Server & Security Jan 13, 2020
knz added a commit to knz/cockroach that referenced this pull request Feb 15, 2020
Summary table:

| Time frame  | R/W of UI customization | different customizations per user |
|-------------|-------------------------|-----------------------------------|
| Pre-cockroachdb#42563  | all users               | broken (shared between users)     |
| Post-cockroachdb#42563 | admin only (regression) | broken                            |
| This commit | all users               | yes - each user has its own       |

The separation of customization between users is done by mixing the
username with the key with a `$` character when looking up rows in
`system.ui`.

This approach can look perhaps baroque when SQL would prefer us to
have two separate *columns* in the PK of `system.ui` and use that for
lookups and updates. Unfortunately PK changes are not yet supported,
and a PK change would incur a complex story in mixed-version clusters.

Additionally, this approach is suitable for backporting in 2.1, 19.1
and 19.2 clusters where the functionality was broken by cockroachdb#42563.

Release note (admin ui change): The display options are now saved
separately for each authenticated user. Note: when upgrading to a
version with this change, all current display customizations for admin
users are lost.

Release note (bug fix): Customizations of the Admin UI are again
properly saved across sessions.
craig bot pushed a commit that referenced this pull request Mar 4, 2020
45127: server: make UI customizations work for non-admin users r=dhartunian a=knz

Fixes  #45126.

cc @piyush-singh  - I am considering this for backporting in 19.1/19.2 since it fixes a regression introduced in a previous patch revision.

Summary table:

| Time frame  | R/W of UI customization | different customizations per user |
|-------------|-------------------------|-----------------------------------|
| Pre-#42563  | all users               | broken (shared between users)     |
| Post-#42563 | admin only (regression) | broken                            |
| This commit | all users               | yes - each user has its own       |

The separation of customization between users is done by mixing the
username with the key with a `$` character when looking up rows in
`system.ui`.

This approach can look perhaps baroque when SQL would prefer us to
have two separate *columns* in the PK of `system.ui` and use that for
lookups and updates. Unfortunately PK changes are not yet supported,
and a PK change would incur a complex story in mixed-version clusters.

Additionally, this approach is suitable for backporting in 2.1, 19.1
and 19.2 clusters where the functionality was broken by #42563.

Release note (admin ui change): The display options are now saved
separately for each authenticated user. Note: when upgrading to a
version with this change, all current display customizations for admin
users are lost.

Release note (bug fix): Customizations of the Admin UI are again
properly saved across sessions.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
knz added a commit to knz/cockroach that referenced this pull request Mar 24, 2020
Summary table:

| Time frame  | R/W of UI customization | different customizations per user |
|-------------|-------------------------|-----------------------------------|
| Pre-cockroachdb#42563  | all users               | broken (shared between users)     |
| Post-cockroachdb#42563 | admin only (regression) | broken                            |
| This commit | all users               | yes - each user has its own       |

The separation of customization between users is done by mixing the
username with the key with a `$` character when looking up rows in
`system.ui`.

This approach can look perhaps baroque when SQL would prefer us to
have two separate *columns* in the PK of `system.ui` and use that for
lookups and updates. Unfortunately PK changes are not yet supported,
and a PK change would incur a complex story in mixed-version clusters.

Additionally, this approach is suitable for backporting in 2.1, 19.1
and 19.2 clusters where the functionality was broken by cockroachdb#42563.

Release note (admin ui change): The display options are now saved
separately for each authenticated user. Note: when upgrading to a
version with this change, all current display customizations for admin
users are lost.

Release note (bug fix): Customizations of the Admin UI are again
properly saved across sessions.

Release justification: low risk, high benefit changes to existing
functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants