Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement fine-grained transaction error redaction #93760

Merged
merged 1 commit into from
Jan 14, 2023

Conversation

dhartunian
Copy link
Collaborator

Previously, transaction errors would be redacted in full since there was no redaction implemented for their internals. This commit updates all the errors defined in pkg/roachpb/errors.proto to implement SafeErrorFormatter in order to be properly redacted.

The tests in place are checking to see that redaction markers do not appear in the logs. They will be in place when Errors contain redacted values such as Keys containing row data.

An additional test was added to the telemetry logging suite to ensure that these errors are surfaced unredacted in telemetry logs as well.

Some message fields in Error protobufs have been updated to use RedactedString to better support fine grained redaction. A simple migration was applied for those fields but further work may be necessary for further improvements.

Epic: CRDB-12732
Resolves: CRDB-14087

Release note (ops change): Transaction errors will contain more detailed information in redacted logs.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dhartunian dhartunian requested review from knz and a team December 16, 2022 00:04
@dhartunian dhartunian marked this pull request as ready for review December 16, 2022 00:04
@dhartunian dhartunian requested a review from a team as a code owner December 16, 2022 00:04
@dhartunian dhartunian requested a review from a team December 16, 2022 00:04
@dhartunian dhartunian requested a review from a team as a code owner December 16, 2022 00:04
@dhartunian dhartunian force-pushed the improve-sql-error-redaction branch 2 times, most recently from e6fdd68 to f17d15c Compare December 16, 2022 20:41
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but we need to be careful on the protobuf definition. This needs more work.

Reviewed 17 of 17 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)


pkg/roachpb/errors.go line 1 at r1 (raw file):

// Copyright 2014 The Cockroach Authors.

Throughout this file. I wonder if the message method shouldn't be modified to return RedactableString too.

Then you could also implement the Error() method each time as return redact.Sprint(e).StripMarkers() , which indirectly calls your new SafeFormat method. We're also doing this other places.


pkg/roachpb/errors.go line 216 at r1 (raw file):

	if txn := e.GetTxn(); txn != nil {
		p.Printf(": %s", txn)

: %v


pkg/roachpb/errors.go line 413 at r1 (raw file):

func (e *NodeUnavailableError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

That Printf doesn't seem right. e.message is not guaranteed safe. Did you mean Print?


pkg/roachpb/errors.go line 485 at r1 (raw file):

func (e *LeaseRejectedError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 607 at r1 (raw file):

	mr, err := e.MismatchedRange()
	if err != nil {
		s.Printf("%s", err)

%v


pkg/roachpb/errors.go line 757 at r1 (raw file):

		return s
	}
	return fmt.Sprintf("txn %s %s", pErr.GetTxn(), s)

I'd argue txn %v %s here


pkg/roachpb/errors.go line 760 at r1 (raw file):

}
func (e *TransactionPushError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf("failed to push %s", e.PusheeTxn)

%v


pkg/roachpb/errors.go line 840 at r1 (raw file):

func (e *TransactionStatusError) message(pErr *Error) string {
	return fmt.Sprintf("%s: %s", e.Error(), pErr.GetTxn())

%v: %v


pkg/roachpb/errors.go line 1090 at r1 (raw file):

func (e *OpRequiresTxnError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto safety of message


pkg/roachpb/errors.go line 1110 at r1 (raw file):

func (e *ConditionFailedError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1126 at r1 (raw file):

func (e *RaftGroupDeletedError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1213 at r1 (raw file):

func (e *StoreNotFoundError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1287 at r1 (raw file):

func (e *UnsupportedRequestError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1303 at r1 (raw file):

func (e *BatchTimestampBeforeGCError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1383 at r1 (raw file):

func (e *MergeInProgressError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1410 at r1 (raw file):

func (e *RangeFeedRetryError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1459 at r1 (raw file):

func (e *InvalidLeaseError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1485 at r1 (raw file):

func (e *OptimisticEvalConflictsError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.go line 1516 at r1 (raw file):

func (e *MinTimestampBoundUnsatisfiableError) SafeFormatError(p errors.Printer) (next error) {
	p.Printf(e.message(nil))

ditto


pkg/roachpb/errors.proto line 246 at r1 (raw file):

message TransactionRetryError {
  optional TransactionRetryReason reason = 1 [(gogoproto.nullable) = false];
  optional string extra_msg = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "github.com/cockroachdb/redact.RedactableString"];

So this is where we need more work.

  1. the field with proto tag 2 needs to continue to be supported (for prev-version servers)
  2. there needs to be a new field with proto tag 3 with the new type.
  3. when creating an error object, both fields should be populated (for the benefit of prev-version servers)
  4. when reading an error object, inspect the new field; if non-empty use that, otherwise use the old one.

pkg/roachpb/errors.proto line 256 at r1 (raw file):

// only monotonically increase.
message TransactionStatusError {
  optional string msg = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "github.com/cockroachdb/redact.RedactableString"];

ditto


pkg/roachpb/errors.proto line 452 at r1 (raw file):

message TransactionRetryWithProtoRefreshError {
  // A user-readable message.
  optional string msg = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "github.com/cockroachdb/redact.RedactableString"];

ditto

@dhartunian dhartunian requested a review from a team December 17, 2022 00:31
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/roachpb/errors.go line 1 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Throughout this file. I wonder if the message method shouldn't be modified to return RedactableString too.

Then you could also implement the Error() method each time as return redact.Sprint(e).StripMarkers() , which indirectly calls your new SafeFormat method. We're also doing this other places.

I removed the message method. Couldn't find any usages.


pkg/roachpb/errors.go line 216 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

: %v

done.


pkg/roachpb/errors.go line 413 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

That Printf doesn't seem right. e.message is not guaranteed safe. Did you mean Print?

In the cases where I used .Printf it was when the message was already safe, that't not obvious in the diff. However, now that I'm removing message. The usage should be clear.


pkg/roachpb/errors.go line 485 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 607 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

%v

Done.


pkg/roachpb/errors.go line 757 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'd argue txn %v %s here

removed.


pkg/roachpb/errors.go line 760 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

%v

Done.


pkg/roachpb/errors.go line 840 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

%v: %v

Removed.


pkg/roachpb/errors.go line 1090 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto safety of message

Done.


pkg/roachpb/errors.go line 1110 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1126 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1213 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1287 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1303 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1383 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1410 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1459 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1485 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.go line 1516 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.proto line 246 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

So this is where we need more work.

  1. the field with proto tag 2 needs to continue to be supported (for prev-version servers)
  2. there needs to be a new field with proto tag 3 with the new type.
  3. when creating an error object, both fields should be populated (for the benefit of prev-version servers)
  4. when reading an error object, inspect the new field; if non-empty use that, otherwise use the old one.

Done. Thx for the reminder. Usages are nicely encapsulated in the constructor and Formatter methods so it's not too bad.


pkg/roachpb/errors.proto line 256 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/roachpb/errors.proto line 452 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still like it

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)


pkg/roachpb/errors.go line 607 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Done.

nb this is also equivalent to s.Print(err).


pkg/roachpb/errors.proto line 246 at r2 (raw file):

message TransactionRetryError {
  optional TransactionRetryReason reason = 1 [(gogoproto.nullable) = false];
  // Note: Deprecated. `extra_msg_redactable` takes precedence if present.

What wlll be the condition for us to remove the field outright and the conditional logic around it? Whivh version? Usually we state that in the explanatory comment.


pkg/roachpb/errors.proto line 258 at r2 (raw file):

// only monotonically increase.
message TransactionStatusError {
  // Note: Deprecated. `msg_redactable` takes precedence if present.

same


pkg/roachpb/errors.proto line 456 at r2 (raw file):

message TransactionRetryWithProtoRefreshError {
  // A user-readable message.
  // Note: Deprecated in favor of `msg_redactable` below. Both fields will be

same

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: overall. I agree it'd be nice to outline in the proto comments the criteria for removing/fully deprecating the older error message fields.

I think some crdb_internal tables will benefit from this! For example, in crdb_internal.cluster_transactions today we omit the last_auto_retry_reason column from redacted debug zips. Looks like these fields are eventually pulled from the txnState. I see at least one usage where the error in the txnState is set to a TransactionRetryWithProtoRefreshError. When this is merged I can take a deeper look to see if we can reintroduce this column into redacted debug zip bundles.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian)

@rafiss rafiss removed the request for review from a team January 4, 2023 21:09
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Forgot to actually approve :|)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian)

@dhartunian dhartunian requested a review from a team as a code owner January 13, 2023 03:27
@dhartunian dhartunian force-pushed the improve-sql-error-redaction branch 2 times, most recently from f3871ca to a8759b1 Compare January 13, 2023 16:42
Previously, transaction errors would be redacted in full since there was no
redaction implemented for their internals. This commit updates all the errors
defined in pkg/roachpb/errors.proto to implement `SafeErrorFormatter` in order
to be properly redacted.

The methods `message` in `ErrorDetailInterface` contained no usages so it was
removed.

The tests in place are checking to see that redaction markers *do not* appear
in the logs. They will be in place when Errors contain redacted values such as
Keys containing row data.

An additional test was added to the telemetry logging suite to ensure that
these errors are surfaced unredacted in telemetry logs as well.

Some message fields in Error protobufs have been updated to use
`RedactedString` to better support fine grained redaction. A simple migration
was applied for those fields but further work may be necessary for further
improvements.

Epic: CRDB-12732
Resolves: CRDB-14087

Release note (ops change): Transaction errors will contain more detailed
information in redacted logs.
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)


pkg/roachpb/errors.go line 607 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nb this is also equivalent to s.Print(err).

Done.


pkg/roachpb/errors.proto line 246 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

What wlll be the condition for us to remove the field outright and the conditional logic around it? Whivh version? Usually we state that in the explanatory comment.

Added note and TODO


pkg/roachpb/errors.proto line 258 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

same

Added note and TODO


pkg/roachpb/errors.proto line 456 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

same

Added note and TODO

@dhartunian
Copy link
Collaborator Author

TFTRs!

bors r=abarganier

@craig
Copy link
Contributor

craig bot commented Jan 13, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 14, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants