Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: accept non-TLS client conns safely in secure mode #53991

Merged
merged 2 commits into from
Sep 10, 2020

Conversation

knz
Copy link
Contributor

@knz knz commented Sep 5, 2020

Fixes #44842.
Informs cockroachdb/helm-charts#228.
Informs #53404.

This change makes it possible for a DBA / system administrator to
reconfigure individual nodes in a secure cluster to accept SQL
client sessions over TCP without mandating a TLS
handshake. Authentication remains mandatory as per the HBA rules.

Motivation: we have at least two high-profile customers who keep their
nodes and client apps in a private secure network (with network-level
encryption / privacy) and who experience client-side TLS as
unnecessary and expensive friction.

Additionally, this feature is a prerequisite to upgrade an insecure
cluster to secure mode without downtime.

Why this does not impair security:

  • authentication remains mandatory (as per the HBA rules -- 1 2).
  • the feature is opt-in: the operator must set a command-line flag
    (--accept-sql-without-tls), which is not enabled by default.
  • there is an interlock: the user must both set up the flag
    and set log-in passwords for their SQL users (by default,
    users get created without a password and thus cannot log
    in without client certs).
  • for now, node-node connections still require TLS.

For context, the default HBA configuration is the following:

host  all root all cert-password # fixed rule
host  all all  all cert-password # built-in CockroachDB default
local all all      password      # built-in CockroachDB default

The directive host covers both TLS and non-TLS incoming TCP
connections (local is for the unix socket). The method
cert-password means "client cert or password": without a cert, the
password is mandatory.

As previously, the user can further secure the configuration by
restricting non-TLS connections to just a subnetwork, for example:

host  all all 10.0.0.0/8 password # accept conns on the 10/8 network.
host  all all all        reject   # refuse conns from other nets.
local all all            password

Note that this change is limited to the server side: CockroachDB's own
cockroach CLI commands do not yet know how to connect to a
CockroachDB server without TLS; such connections are only supported
from psql or SQL client drivers in apps. See #53994 for a follow-up.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

@knz knz requested a review from a team as a code owner September 5, 2020 09:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Sep 5, 2020

@bdarnell @petermattis I'd like to petition for this PR to be included in the 20.2 release, even though it is coming late in the cycle. This had been tracked before, but we have only learned in the past few days how important this feature is to our customers and why the security impact is minimal (as described in the commit message). Moreover, the code change is minimal (1 conditional) and the feature is opt-in.

@knz knz force-pushed the 20200905-sql-no-tls branch 2 times, most recently from b8a0724 to c6bfd16 Compare September 5, 2020 09:31
@knz
Copy link
Contributor Author

knz commented Sep 5, 2020

cc @thtruo for tracking

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes LGTM. Thanks for the detailed write up in your commit message, it was instructive!

Reviewed 1 of 1 files at r1, 12 of 12 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @knz)


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 18 at r2 (raw file):

----

# Since root and testuser do not have a password, they stil

typo: still.


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 49 at r2 (raw file):

# Active authentication configuration on this node:
# Original configuration:
# host  all root all cert-password # CockroachDB mandatory rule

I think the column alignment here is a bit off.

@aaron-crl
Copy link

aaron-crl commented Sep 8, 2020

Code and feature :lgtm:.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To introduce this feature into an existing cluster, proceed as
follows:

Using this feature on a new cluster is awkward (assuming you want to use the hostnossl directive to restrict non-TLS access to a certain network). You have to start the cluster without the command-line flag, connect with TLS to set the new hba.conf, then restart the cluster with the flag.

Instead, if the default hba.conf were more restrictive, say by defaulting to localhost-only, the process would be to start the cluster with the flag, then connect from localhost without TLS to set a less-restrictive hba.conf. (but if you're unable to easily connect from localhost, which may be an issue in some orchestrated environments, you may be back to the more complicated procedure)

On the other hand, are these hba.conf settings even necessary? A user who is comfortable with this flag is presumably practicing some sort of perimeter-based security, in which case I would expect that only the trusted part of the network is able to route packets to the database (and even if untrusted machines were able to talk to the database, they'd still need a valid password to get in). Unless we have specific user requests for this capability, I might leave the hba.conf out.

Reviewed 11 of 12 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl and @knz)


pkg/cli/flags.go, line 350 at r2 (raw file):

		varFlag(f, addrSetter{&serverHTTPAddr, &serverHTTPPort}, cliflags.ListenHTTPAddr)
		stringFlag(f, &serverSocketDir, cliflags.SocketDir)
		boolFlag(f, &startCtx.unencryptedLocalhostHTTP, cliflags.UnencryptedLocalhostHTTP)

It's unfortunate that --unencrypted-localhost-http combines non-TLS access with restricting the port to the loopback interface. It would be nice to split that up some day (but not today - I think there are a number of cleanups we could make to the network configuration flags)


pkg/cli/cliflags/flags.go, line 540 at r2 (raw file):

	}

	AllowSecureSQLWithoutTLS = FlagInfo{

It always smells bad to me when the FlagInfo variable name and the flag itself don't match. Consider removing the word Secure here.


pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):

	AllowSecureSQLWithoutTLS = FlagInfo{
		Name: "accept-sql-without-tls",

In the counterpart PR #53994, the flag name makes it clear that passwords will be sent unencrypted. Should we do more to make that clear with the server-side flag?


pkg/sql/pgwire/auth_test.go, line 56 at r2 (raw file):

//       the test file is applicable to both.)
//
// allow_non_tls

nit: Consider s/non/without/ for consistency with the other variables.


pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):

				)
			}
		case hba.ConnHostSSL, hba.ConnHostNoSSL:

Maybe I'm just unfamiliar with this code, but I don't see where the hostssl and hostnossl directives are actually used.


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 35 at r2 (raw file):

ok

# Now testuser can log in.

Add a test to verify that testuser can't log in with an invalid password.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ben I was also waiting for you to chime in before moving forward with this.

Using this feature on a new cluster is awkward (assuming you want to use the hostnossl directive to restrict non-TLS access to a certain network). You have to start the cluster without the command-line flag, connect with TLS to set the new hba.conf, then restart the cluster with the flag.

Good point. You're right but I believe, based on the next paragraph, that you were accidentally right. I'll clarify in the release note. This would be improved if we ever solve #26722 .

Instead, if the default hba.conf were more restrictive, say by defaulting to localhost-only, the process would be to start the cluster with the flag, then connect from localhost without TLS to set a less-restrictive hba.conf.

No that's not true - even if the SQL client can establish the connection, root still cannot log in: without a valid TLS client cert, root would need a password, and there is no password by default. We'd need a trust rule in the HBA config to log in without a password, but I'm feeling icky about that.

Which means that above you probably didn't meanm "you need to start the cluster without the flag to connect". In fact, you can connect with the flag just fine using the unix datagram socket and the default HBA config. However, you cannot log in as long as the password is not set.

This is why #26722 is so important: not just to customize the HBA conf, but also to set a root password.

A user who is comfortable with this flag is presumably practicing some sort of perimeter-based security, in which case I would expect that only the trusted part of the network is able to route packets to the database (and even if untrusted machines were able to talk to the database, they'd still need a valid password to get in).

This was motivated by the general principle of defense in depth.

Unless we have specific user requests for this capability, I might leave the hba.conf out.

I can leave it out of the release note, but I think the code is more natural to read this way (and closer to pg).

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @knz)


pkg/cli/flags.go, line 350 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's unfortunate that --unencrypted-localhost-http combines non-TLS access with restricting the port to the loopback interface. It would be nice to split that up some day (but not today - I think there are a number of cleanups we could make to the network configuration flags)

Agreed.


pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

In the counterpart PR #53994, the flag name makes it clear that passwords will be sent unencrypted. Should we do more to make that clear with the server-side flag?

What do you have in mind?


pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Maybe I'm just unfamiliar with this code, but I don't see where the hostssl and hostnossl directives are actually used.

The code to check them was already implemented in 20.1 and used internally in unit tests, but the condition was never encountered in a running server because of the conditional in server.go.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means that above you probably didn't meanm "you need to start the cluster without the flag to connect".

I didn't mean you have to start the cluster without the flag in order to connect; you have to start the cluster without the flag to avoid temporarily exposing non-TLS SQL sessions on unintended networks. It's true that it's accidentally safe because there are no users with passwords by default, but that's something we need to fix anyway (it's a prerequisite for #51991 cert-free setup). And when we set up a default user with a password, it becomes necessary to use this two-restart dance for new clusters if we accept the requirement that users need to be able to restrict non-TLS connections by IP range.

So that's why I'd prefer to reject this requirement unless we have a more explicit indication that it's important (especially if this is to be a last-minute addition to a release, which I'm not thrilled with. New flags are surface area we have to support for a long time, and if we're doing a wholesale change to the security setup model with #51991 in the next release I'm not sure it's a good idea to add a couple more knobs to the old scheme without time to really validate the interface in this release)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @knz)


pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):

Previously, knz (kena) wrote…

What do you have in mind?

I'm not sure; everything I can think of seems very cumbersome. But again coming back to the scram discussion from #53994, it seems likely that in the future we'd want to allow non-TLS connections only if we're able to use scram for authentication. Maybe just make the flag --allow-unencrypted-passwords for both client and server?


pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):

Previously, knz (kena) wrote…

The code to check them was already implemented in 20.1 and used internally in unit tests, but the condition was never encountered in a running server because of the conditional in server.go.

A cluster version check for this seems excessive to me, but OK.

... to refer to issue cockroachdb#53404.

Release justification: non-production code changes

Release note: None
This change makes it possible for a DBA / system administrator to
reconfigure individual nodes *in a secure cluster* to accept SQL
client sessions over TCP without mandating a TLS
handshake. Authentication remains mandatory as per the HBA rules.

Motivation: we have at least two high-profile customers who keep their
nodes and client apps in a private secure network (with network-level
encryption / privacy) and who experience client-side TLS as
unnecessary and expensive friction.

Why this does not impair security:

- authentication remains mandatory (as per the HBA rules).
- the feature is opt-in: the operator must set a command-line flag
  (`--accept-sql-without-tls`), which is not enabled by default.
- there is an interlock: the user must both set up the flag
  and set log-in passwords for their SQL users (by default,
  users get created without a password and thus cannot log
  in without client certs).
- for now, node-node connections still require TLS.

For context, the default HBA configuration is the following:

```
host  all root all cert-password # fixed rule
host  all all  all cert-password # built-in CockroachDB default
local all all      password      # built-in CockroachDB default
```

The directive `host` covers both TLS and non-TLS incoming TCP
connections (`local` is for the unix socket). The method
`cert-password` means "client cert or password": without a cert, the
password is mandatory.

As previously, the user can further secure the configuration by
restricting connections to just a subnetwork, for example:

```
host       all all 10.0.0.0/8 cert-password # accept conns on the 10/8 network.
host       all all all        reject # reject conns from other networks
local      all all            password
```

Note that this change is limited to the server side: CockroachDB's own
`cockroach` CLI commands do not yet know how to connect to a
CockroachDB server without TLS; such connections are only supported
from `psql` or SQL client drivers in apps.

(PostgreSQL's HBA rule types `hostssl` and `hostnossl` are now also
recognized. They operate like in PostgreSQL. However we don't have a
compelling use case for them yet so we don't emphasize them.)

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (security update): A new experimental flag
`--accept-sql-without-tls` has been introduced for `cockroach start`
and `start-single-node`: when specified, a secure node will also
accept secure SQL connections without TLS. When this flag is enabled:

- Node-to-node connections still use TLS: the server must still be
  started with `--certs-dir` and valid TLS cert configuration for nodes.
- Client authentication (spoof protection) and authorization (access control
  and privilege escalation prevention) is performed by CockroachDB
  as usual, subject to the HBA configuration (for authn) and SQL
  privileges (for authz).
- Transport-level security (integrity and confidentiality) for client
  connections must then be provided by the operator outside of
  CockroachDB -- for example, by using a private network or VPN
  dedicated to CockroachDB and its client app(s).
- The flag only applies to the SQL interface. TLS is still required
  for the HTTP endpoint (unless `--unencrypted-localhost-http` is
  passed) and for the RPC endpoint.

To introduce this feature into an existing cluster, proceed as
follows:
1. ensure the cluster ugprade is finalized.
2. set up the HBA configuration to reject `host` connections
   for any network other than the one that has been secured.
3. add the command-line flag and restart the nodes.

Note that even when the flag is supplied, clients can still negotiate
TLS and present a valid TLS certificate to identify themselves (at
least under the default HBA configuration).

Finally, this flag is experimental and its ergonomics will likely
change in a later version.
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to start the cluster without the flag to avoid temporarily exposing non-TLS SQL sessions on unintended networks

Not exactly. The user can still limit somewhat with --listen-addr= to localhost or something like that.
Also we should really talk about #26722 which would solve both this and facilitate #51991 like you point out.

So that's why I'd prefer to reject this requirement ... not sure it's a good idea to add a couple more knobs

For context we're focusing on the hostnossl and hostssl rule match right? These are pg-compatible directives and the logic for them was already implemented in v20.1. What is new here is that the code path is activated. The user doesn't even have to know about them since host matches both TLS and non-TLS and they could use that already.

So I'll take your hint we dont want to talk about them and removed them from the release note.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @bdarnell, and @irfansharif)


pkg/cli/cliflags/flags.go, line 540 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It always smells bad to me when the FlagInfo variable name and the flag itself don't match. Consider removing the word Secure here.

Done.


pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure; everything I can think of seems very cumbersome. But again coming back to the scram discussion from #53994, it seems likely that in the future we'd want to allow non-TLS connections only if we're able to use scram for authentication. Maybe just make the flag --allow-unencrypted-passwords for both client and server?

These are two fully unrelated concerns.
SCRAM would work equally well with or without TLS. SCRAM doesn't need a flag.

The flag here is introduced to ensure that a cluster does not get mistaknely started without TLS, unless the user opts in explicitly.


pkg/sql/pgwire/auth_test.go, line 56 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

nit: Consider s/non/without/ for consistency with the other variables.

Done.


pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

A cluster version check for this seems excessive to me, but OK.

We discovered last time it is necessary: the other not-yet-upgraded nodes in the cluster would choke if they see a new keyword they don't understand in the HBA config.


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 18 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

typo: still.

Done.


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 35 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add a test to verify that testuser can't log in with an invalid password.

Done.


pkg/sql/pgwire/testdata/auth/secure_non_tls, line 49 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I think the column alignment here is a bit off.

this is auto-generated. If there's an improvement to be made it doesn't belong to this PR.

@knz
Copy link
Contributor Author

knz commented Sep 10, 2020

bors r=aaron-crl,irfansharif,bdarnell

@craig
Copy link
Contributor

craig bot commented Sep 10, 2020

Build succeeded:

@craig craig bot merged commit eab6bcb into cockroachdb:master Sep 10, 2020
@knz knz deleted the 20200905-sql-no-tls branch September 10, 2020 14:33
craig bot pushed a commit that referenced this pull request Sep 10, 2020
53842: server: always create a liveness record before starting up r=irfansharif a=irfansharif

Previously it used to be the case that it was possible for a node to be
up and running, and for there to be no corresponding liveness record for
it. This was a very transient situation as liveness records are created
for a given node as soon as it out its first heartbeat. Still, given
that this could take a few seconds, it lent to a lot of complexity in
our handling of node liveness where we had to always anticipate the
possibility of there being no corresponding liveness record for a given
node (and thus creating it if necessary).

Having a liveness record for each node always present is a crucial
building block for long running migrations (#48843). There the intention
is to have the orchestrator process look towards the list of liveness
records for an authoritative view of cluster membership. Previously when
it was possible for an active member of the cluster to not have a
corresponding liveness record (no matter how unlikely or short-lived in
practice), we could not generate such a view.

---

This is an alternative implementation for #53805. Here we choose to
manually write the liveness record for the bootstrapping node when
writing initial cluster data. For all other nodes, we do it on the
server-side of the join RPC. We're also careful to do it in the legacy
codepath when joining a cluster through gossip.

Release note: None


53994: cli: allow SQL commands to use password authn in more cases r=bdarnell,irfansharif,aaron-crl a=knz

First two commits from #53991.

Previously, SQL password authn was only allowed over TLS connections.

With this change, password authn is allowed regardless of whether the
connection uses TLS.

This is implemented by also only asking for a password interactively
the first time that the server complains that pw auth has failed. This
way, no password is ever requested interactively if the server
"trusts" the connection (via HBA rules or `--insecure`).

Release justification: low risk, high benefit changes to existing functionality


54035: sql: emit more tracing events from the stats cache r=RaduBerinde a=RaduBerinde

The stats cache has various "slow" paths (where we need to query the
system table). These are currently only logged if verbosity is high.

This change switches to `VEvent` in most cases, so that these are
visible during tracing (including in statement diagnostics bundles).
This will allow us to diagnose slow planning times, e.g. due to the
stats cache getting full.

Release justification: low-risk change to existing functionality, high
potential benefit for debugging issues.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli,server: enable SQL clients on TCP with non-TLS modes
5 participants