Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlproxyccl: minor fixes and enhancements to the proxy handler and denylist #66412

Merged
merged 2 commits into from
Jun 16, 2021

Conversation

jaylim-crl
Copy link
Collaborator

@jaylim-crl jaylim-crl commented Jun 13, 2021

sqlproxyccl: allow denylist entries that do not expire

Previously, we assumed that all denylist entries have an expiration key. When
denylist entries do not specify an expiration key, the entries are marked as
expired right away since their values default to the zero instant time. This
might be cumbersome for operators to specify an expiration when the intention
was to not allow the rule to expire at all. This patch changes the behavior of
the denylist such that entries without any expiration keys represent rules
that do not expire.

sqlproxyccl: minor fixes around the proxy handler

In #65164, we migrated the sqlproxy in the CC code to the DB repository, and
there were a few buglets:

  • sqlproxy crashes when the tenant ID supplied in the connection string is 0
    because roachpb.MakeTenantID panics when the tenant ID is 0.
  • sqlproxy leaks internal parsing errors to the client.

This patch hides internal parsing errors, and replaces them with friendly
user-facing errors (e.g. "Invalid cluster name"). We also add a bounds check
to the parsed tenant ID so that the process does not crash on an invalid
tenant ID. More tests were added as well.

Release note: None

@jaylim-crl jaylim-crl requested a review from a team as a code owner June 13, 2021 05:44
@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 13, 2021

CLA assistant check
All committers have signed the CLA.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @meridional)


pkg/ccl/sqlproxyccl/proxy_handler.go, line 548 at r2 (raw file):

	}

	parts := strings.SplitN(databaseParam, ".", 2)

The original behavior in the old code splits the database parameter into multiple parts, and errors out if there are more than two parts. I reverted this change to match what we currently have in the CC repository.


pkg/ccl/sqlproxyccl/proxy_handler_test.go, line 452 at r2 (raw file):

}

func TestInsecureDoubleProxy(t *testing.T) {

I don't think we want to support this since the syntax of specifying the cluster names is ambiguous and unclear. At the moment, we're assuming that database names should be in the form of <cluster name>.<database name> if the cluster name is present:

// parseDatabaseParam parses the database parameter from the PG connection
// string, and tries to extract the cluster name if present. The cluster
// name should be embedded in the database parameter using the dot (".")
// delimiter in the form of "<cluster name>.<database name>". This approach
// is safe because dots are not allowed in the database names themselves.

If we were to do this, we need to think about which convention we want to go with and document them (e.g. should we read the cluster name from the left or right? (e.g. for foo.bar.defaultdb, is foo or bar the first cluster?)

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @meridional)


pkg/ccl/sqlproxyccl/proxy_handler_test.go, line 735 at r2 (raw file):

}

func TestClusterNameAndTenantFromParams(t *testing.T) {

Note that this was migrated from the CC repository as well.

@andy-kimball
Copy link
Contributor

Does the denylist change make the old lists more backwards-compatible (say for the E2E test cases)?

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @darinpp, @jaylim-crl, and @meridional)


pkg/ccl/sqlproxyccl/proxy_handler.go, line 202 at r2 (raw file):

	backendStartupMsg, clusterName, tenID, err := clusterNameAndTenantFromParams(msg)
	if err != nil {
		clientErr := newErrorf(codeParamsRoutingFailed, "Invalid cluster name")

What's the harm in letting these errors be public? They look like they contain useful details to help the user debug their connection string.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 215 at r2 (raw file):

	ipAddr, _, err := net.SplitHostPort(conn.RemoteAddr().String())
	if err != nil {
		clientErr := newErrorf(codeParamsRoutingFailed, "Unexpected connection address")

I wonder if we should be capitalizing all errors that are safe for external viewing and lowercasing all that aren't. Right now it seems like it's a mishmash.

Copy link
Contributor

@meridional meridional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @jaylim-crl)


pkg/ccl/sqlproxyccl/denylist/file.go, line 159 at r1 (raw file):

	if ent, ok := dl.mu.entries[entity]; ok &&
		(ent.Expiration.IsZero() || !ent.Expiration.Before(dl.timeSource.Now())) {

I think this change is good for editing the config manually. Though not sure it's gonna be helpful much down the road: we have a GRPC endpoint on console (https://github.com/cockroachlabs/managed-service/blob/master/console/consolepb/admin.proto#L338) (and intrusion) for updating it. This will be replacing the manual way. (And we can't have both ways to coexist because the grpc endpoint will overwrite the manual changes.)

E.g., we can do admin-cli console block-access 123.123.123.123 IP "test" "ye@" 1h to block an IP for 1 hour.

How it works is, the admin-cli sends a GRPC to console endpoint, which in turn calls a corresponding intrusion's endpoint with all current non-expired deny list entries (not just the one passed to the admin-cli command). Intrusion server will then push the new list to sqlproxy. So if we update the configmap by editing it directly, our change will get overwritten by the next admin-cli console block-access call.

Note: the block-access call is not enabled right now: https://github.com/cockroachlabs/managed-service/pull/5442.

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Does the denylist change make the old lists more backwards-compatible (say for the E2E test cases)?

The new denylist change is not backwards-compatible to start with. If sqlproxy is running with denylist entries today, the entries will simply be ignored whenever we move to the new sqlproxy. (Fortunately, I don't think we're using the denylist extensively today, so we're good there). Regardless of this change, the E2E test cases would need to be updated to reflect the new format. This change just makes it easier so we don't have to specify an expiration date when testing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @meridional)


pkg/ccl/sqlproxyccl/proxy_handler.go, line 202 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

What's the harm in letting these errors be public? They look like they contain useful details to help the user debug their connection string.

Hm, I looked at clusterNameAndTenantFromParams again, and I think you have a point. I was trying to avoid the case in which ParseUint gets exposed, i.e.

FATAL: codeParamsRoutingFailed: cannot parse 3dd as uint64: strconv.ParseUint: parsing "3dd": invalid syntax
FATAL: codeParamsRoutingFailed: cannot parse 300000000000000000000000000000 as uint64: strconv.ParseUint: parsing "300000000000000000000000000000": value out of range

This doesn't seem safe from a security standpoint since one could infer how the parsing works by looking at the error messages. I'm going to change clusterNameAndTenantFromParams to return errors in the form of a codeError, and get the caller to check. If it's not a user-facing error, we'll just return an "invalid cluster name" error. Would that be better?


pkg/ccl/sqlproxyccl/proxy_handler.go, line 215 at r2 (raw file):

Right now it seems like it's a mishmash.

This is really true. It's also really hard to tell when a function will return a user-facing error since we have multiple levels. I think the ideal situation would be to return user-facing errors only on the top level (i.e. proxy_handler), and everything else should return regular errors. I'm +1 on the idea that was proposed as well.


pkg/ccl/sqlproxyccl/denylist/file.go, line 159 at r1 (raw file):

Previously, meridional (Ye Ji) wrote…

I think this change is good for editing the config manually. Though not sure it's gonna be helpful much down the road: we have a GRPC endpoint on console (https://github.com/cockroachlabs/managed-service/blob/master/console/consolepb/admin.proto#L338) (and intrusion) for updating it. This will be replacing the manual way. (And we can't have both ways to coexist because the grpc endpoint will overwrite the manual changes.)

E.g., we can do admin-cli console block-access 123.123.123.123 IP "test" "ye@" 1h to block an IP for 1 hour.

How it works is, the admin-cli sends a GRPC to console endpoint, which in turn calls a corresponding intrusion's endpoint with all current non-expired deny list entries (not just the one passed to the admin-cli command). Intrusion server will then push the new list to sqlproxy. So if we update the configmap by editing it directly, our change will get overwritten by the next admin-cli console block-access call.

Note: the block-access call is not enabled right now: https://github.com/cockroachlabs/managed-service/pull/5442.

Thanks for the clarifications. The sqlproxy will eventually come with cockroach-operator, which shouldn't depend on anything we have in intrusion. If we want to make expiration a valid field, I think we should pass the configmap into some kind of validator when the denylist reads the new file. Every entry should conform to specific rules (e.g. reason specified, IP in the right format, necessary keys, etc.). If the validator fails, we shouldn't reload the denylist map. At the moment, if keys are missing or specified incorrectly, things might not work as expected since the parsing would just ignore them, and default values to whatever the type is.

Also, what if one wants to block access to an IP or tenant forever? (though the latter case would probably be rare). I think this change could be helpful if we don't want to put an expiration on the denylist entry.

What are your thoughts here?

Copy link
Contributor

@meridional meridional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jaylim-crl)


pkg/ccl/sqlproxyccl/denylist/file.go, line 159 at r1 (raw file):
I'm fine with the change. I was mostly just wanting to point out that we might not be manually editing the config in CC, eventually.

If the validator fails, we shouldn't reload the denylist map. At the moment, if keys are missing or specified incorrectly, things might not work as expected since the parsing would just ignore them, and default values to whatever the type is.
Yeah, I agree. More validation should be done.

Also, what if one wants to block access to an IP or tenant forever? (though the latter case would probably be rare). I think this change could be helpful if we don't want to put an expiration on the denylist entry.
Yeah, I agree.

Copy link
Contributor

@meridional meridional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @jaylim-crl)


pkg/ccl/sqlproxyccl/denylist/file.go, line 159 at r1 (raw file):

Previously, meridional (Ye Ji) wrote…

I'm fine with the change. I was mostly just wanting to point out that we might not be manually editing the config in CC, eventually.

If the validator fails, we shouldn't reload the denylist map. At the moment, if keys are missing or specified incorrectly, things might not work as expected since the parsing would just ignore them, and default values to whatever the type is.
Yeah, I agree. More validation should be done.

Also, what if one wants to block access to an IP or tenant forever? (though the latter case would probably be rare). I think this change could be helpful if we don't want to put an expiration on the denylist entry.
Yeah, I agree.

Hm. I somehow messed up the formatting..

I agree that we should have more validation on the config format. And I agree that this is change for the better.

Copy link
Contributor

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl)


pkg/ccl/sqlproxyccl/proxy_handler.go, line 202 at r2 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

Hm, I looked at clusterNameAndTenantFromParams again, and I think you have a point. I was trying to avoid the case in which ParseUint gets exposed, i.e.

FATAL: codeParamsRoutingFailed: cannot parse 3dd as uint64: strconv.ParseUint: parsing "3dd": invalid syntax
FATAL: codeParamsRoutingFailed: cannot parse 300000000000000000000000000000 as uint64: strconv.ParseUint: parsing "300000000000000000000000000000": value out of range

This doesn't seem safe from a security standpoint since one could infer how the parsing works by looking at the error messages. I'm going to change clusterNameAndTenantFromParams to return errors in the form of a codeError, and get the caller to check. If it's not a user-facing error, we'll just return an "invalid cluster name" error. Would that be better?

Yes, I'd only use the codes when the error is known to be safe, and then in updateMetricsAndSendErrToClient I'd map any non-code error to some kind of generic error. That said, I'd insert the cluster name into the error message so the user knows what exactly was invalid.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 215 at r2 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

Right now it seems like it's a mishmash.

This is really true. It's also really hard to tell when a function will return a user-facing error since we have multiple levels. I think the ideal situation would be to return user-facing errors only on the top level (i.e. proxy_handler), and everything else should return regular errors. I'm +1 on the idea that was proposed as well.

I think it might be a pain to use the top-level idea, since it's very common to extract out functions, move stuff around, etc. Instead, as I suggested above, I'd use codes to mean "user-safe" and then do a final mapping of anything not user-safe just in case we miss an error somewhere.

Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball)


pkg/ccl/sqlproxyccl/proxy_handler.go, line 202 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Yes, I'd only use the codes when the error is known to be safe, and then in updateMetricsAndSendErrToClient I'd map any non-code error to some kind of generic error. That said, I'd insert the cluster name into the error message so the user knows what exactly was invalid.

Done.


pkg/ccl/sqlproxyccl/proxy_handler.go, line 215 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think it might be a pain to use the top-level idea, since it's very common to extract out functions, move stuff around, etc. Instead, as I suggested above, I'd use codes to mean "user-safe" and then do a final mapping of anything not user-safe just in case we miss an error somewhere.

Hm, good point. Done. I made all the errors lowercase instead since we also return errors in a similar fashion from the DB, e.g.

ERROR: password authentication failed for user jay
Failed running "sql"

pkg/ccl/sqlproxyccl/denylist/file.go, line 159 at r1 (raw file):

Previously, meridional (Ye Ji) wrote…

Hm. I somehow messed up the formatting..

I agree that we should have more validation on the config format. And I agree that this is change for the better.

Sounds good. Thanks for pointing that out. For now, I'll leave this patch as-is (i.e. making expiration an optional field). We can update this again once we start adding validators to the denylist.

Previously, we assumed that all denylist entries have an expiration key. When
denylist entries do not specify an expiration key, the entries are marked as
expired right away since their values default to the zero instant time. This
might be cumbersome for operators to specify an expiration when the intention
was to not allow the rule to expire at all. This patch changes the behavior of
the denylist such that entries without any expiration keys represent rules
that do not expire.

Release note: None
In cockroachdb#65164, we migrated the sqlproxy in the CC code to the DB repository, and
there were a few buglets:
- sqlproxy crashes when the tenant ID supplied in the connection string is 0
  because roachpb.MakeTenantID panics when the tenant ID is 0.
- sqlproxy leaks internal parsing errors to the client.

This patch hides internal parsing errors, and replaces them with friendly
user-facing errors (e.g. "invalid cluster name"). We also add a bounds check
to the parsed tenant ID so that the process does not crash on an invalid
tenant ID. More tests were added as well.

Release note: None
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jaylim-crl)

@jaylim-crl
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 16, 2021

Build succeeded:

@craig craig bot merged commit 0ba1894 into cockroachdb:master Jun 16, 2021
@jaylim-crl
Copy link
Collaborator Author

TFTRs!

@jaylim-crl jaylim-crl deleted the sqlproxy-fixes branch June 16, 2021 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants