Skip to content

pgwire: use HashString for usernames in auth log calls#165804

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
visheshbardia:hash-redaction-pgwire-usernames
Mar 17, 2026
Merged

pgwire: use HashString for usernames in auth log calls#165804
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
visheshbardia:hash-redaction-pgwire-usernames

Conversation

@visheshbardia
Copy link
Contributor

Convert usernames in authentication log calls to use redact.HashString() instead of passing them as plain unsafe format arguments. When hash-based redaction is enabled (via log config), these values now produce deterministic 8-character hex hashes instead of being fully redacted to ×, enabling cross-entry correlation in support diagnostics without exposing the actual data.

Also adds TestHashRedactionPerSink which verifies that hash-based redaction respects per-sink settings: a sink with redact=true hashes HashString values while a sink with redact=false shows cleartext.

Epic: CRDB-47199

Release note (ops change): When hash-based redaction is enabled in the logging configuration, usernames in authentication logs now produce deterministic hashes instead of being fully redacted. This allows support engineers to correlate the same user across multiple log entries without seeing the actual values.

@visheshbardia visheshbardia requested review from a team as code owners March 16, 2026 10:34
@visheshbardia visheshbardia requested review from aa-joshi and angles-n-daemons and removed request for a team March 16, 2026 10:34
@trunk-io
Copy link
Contributor

trunk-io bot commented Mar 16, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! The Warningf changes are a good start, but I think there's a bigger opportunity here.

The 5 Warningf call sites are only part of the picture. Every call to ac.LogAuthOK, ac.LogAuthFailed, and ac.LogAuthInfof emits a structured event containing CommonSessionDetails.User and CommonSessionDetails.SystemIdentity as plain strings. In the proto definition, these fields are not marked as safe-from-redaction:

// The database username the session is for. This username will have
// undergone case-folding and Unicode normalization.
string user = 2 [(gogoproto.jsontag) = ",omitempty"];
// The original system identity provided by the client, if an identity
// mapping was used per Host-Based Authentication rules. This may be a
// GSSAPI or X.509 principal or any other external value, so no
// specific assumptions should be made about the contents of this
// field.
string system_identity = 3 [(gogoproto.jsontag) = ",omitempty"];

That means in the generated JSON encoder (json_encode_generated.go:2349-2370), these fields are wrapped with StartMarker()/EndMarker(), so they get fully redacted to × in redacted sinks. Similarly, CommonConnectionDetails.RemoteAddress is treated as sensitive.

If we centralize the hashing at the proto field level, we can fix all of these structured auth events at once, We need that in addition to the individual log call sites you fixed. See inline comment for the suggested approach.

@rafiss made 5 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on aa-joshi, angles-n-daemons, and visheshbardia).


pkg/sql/pgwire/auth.go line 148 at r1 (raw file):

	matchedIdentity, err := c.checkClientUsernameMatchesMapping(ctx, ac, behaviors, systemIdentity)
	if err != nil {
		log.Dev.Warningf(ctx, "unable to map incoming identity %q, or san identities %q to any database user: %+v", systemIdentity, behaviors.GetSANIdentities(), err)

systemIdentity is logged as plain %q

there were also two more places i saw that should be updated:

  • auth_methods.go:1261user.Normalized() in LDAP authorization log
  • auth_methods.go:1269user in errors.Wrapf that flows into structured event Detail

pkg/sql/pgwire/auth.go line 177 at r1 (raw file):

	if err != nil {
		log.Dev.Warningf(ctx, "user retrieval failed for user=%q: %+v", redact.HashString(dbUser.Normalized()), err)
		ac.LogAuthFailed(ctx, eventpb.AuthFailReason_USER_RETRIEVAL_ERROR, err)

The HashString wrapping here is correct, but note that the ac.LogAuthFailed call on this line also carries the username, via p.authDetails.User inside CommonSessionDetails. That field is a plain string in the proto, so it gets fully redacted to × in redacted sinks (not hashed).

Let's make a more centralized fix: change CommonSessionDetails.User and SystemIdentity (and CommonConnectionDetails.RemoteAddress) to RedactableString with redact:"mixed" (like Detail and Info already use), then update SetDbUser/SetSystemIdentity to store the hashable value:

func (p *authPipe) SetDbUser(dbUser username.SQLUsername) {
    p.authDetails.User = redact.Sprintf("%s", redact.HashString(dbUser.Normalized()))
}

func (p *authPipe) SetSystemIdentity(systemIdentity string) {
    p.authDetails.SystemIdentity = redact.Sprintf("%s", redact.HashString(systemIdentity))
}

This way:

  • Non-redacted sinks: cleartext (same as today)
  • Redacted sinks with hashing enabled: deterministic hash (instead of ×)
  • All structured auth events (LogAuthOK, LogAuthFailed, LogAuthInfof) automatically get hashed usernames without touching each call site

pkg/sql/pgwire/auth.go line 254 at r1 (raw file):

			keyVal := strings.SplitN(setting, "=", 2)
			if len(keyVal) != 2 {
				log.Ops.Warningf(ctx, "%s has malformed default setting: %q", redact.HashString(dbUser.Normalized()), setting)

redact.HashString(dbUser.Normalized()) is repeated 5 times in this function. Since dbUser is assigned once (line 164) and never reassigned, consider extracting into a local hashedUser variable after line 164.


pkg/util/log/redact_test.go line 457 at r1 (raw file):

	require.NotContains(t, string(redactedContents), "alice",
		"redacted sink should not contain cleartext")
	require.Contains(t, string(redactedContents), "user=‹",

This assertion doesn't distinguish hashing from full redaction. "user=‹" would also match "user=‹×›". Consider adding:

require.NotContains(t, string(redactedContents), "‹×›",
    "redacted sink should hash, not fully redact")

@visheshbardia visheshbardia force-pushed the hash-redaction-pgwire-usernames branch from 88f7726 to 6f0ed5f Compare March 17, 2026 07:12
@visheshbardia
Copy link
Contributor Author

visheshbardia commented Mar 17, 2026

TFTR @rafiss, I've addressed the quick fixes:

  • Added the missed call sites in auth.go and auth_methods.go
  • Used hashedUser variable instead of repeating redact.HashString(dbUser.Normalized())
  • Added require.NotContains(t, ..., "‹×›") to the test to distinguish hashing from full redaction

For the centralized approach at SetDbUser/SetSystemIdentity level, I agree that's the right long-term solution. Would it be okay with landing the per-log-line wrapping as part for 26.2 and we can add centralized approach in a follow-up PR ?

@visheshbardia visheshbardia requested a review from rafiss March 17, 2026 07:22
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hashing for the common fields would make this PR much more valuable for improving the debugging experience and would be worthwhile to include in the 26.2 release. Can you explain more about why you don't want to add it now? It's only about a 20 line code change.

I made a draft version of the change here: 1634179

@visheshbardia visheshbardia force-pushed the hash-redaction-pgwire-usernames branch from 6f0ed5f to 3b84194 Compare March 17, 2026 15:51
Convert usernames in authentication log calls to use
redact.HashString() instead of passing them as plain unsafe format
arguments. When hash-based redaction is enabled (via log config),
these values now produce deterministic 8-character hex hashes instead
of being fully redacted to ×, enabling cross-entry correlation in
support diagnostics without exposing the actual data.

Also adds TestHashRedactionPerSink which verifies that hash-based
redaction respects per-sink settings: a sink with redact=true hashes
HashString values while a sink with redact=false shows cleartext.

Epic: CRDB-47199

Release note (ops change): When hash-based redaction is enabled in the
logging configuration, usernames in authentication logs now produce
deterministic hashes instead of being fully redacted. This allows
support engineers to correlate the same user across multiple log entries
without seeing the actual values.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@visheshbardia visheshbardia force-pushed the hash-redaction-pgwire-usernames branch from 3b84194 to bea5388 Compare March 17, 2026 15:54
@visheshbardia
Copy link
Contributor Author

Thanks for the draft, that helped a lot! Initially, we approached this as a first step for hash-based redaction: just wrapping the specific log lines that TSE identified as critical for debugging. The per-log-line approach felt safer since I wasn't sure how changing proto field types would interact with the event rendering pipeline. But your draft made it clear it's a straightforward change, so I've adopted it.

One thing I ran into: when SetSystemIdentity("") is called to clear the identity on failed mapping. Since redact.Sprintf on an empty string produces "‹›" (non-empty), omitempty no longer omitted the field from JSON. Fixed by explicitly resetting to "" in the empty case so the existing behavior is preserved.

@visheshbardia visheshbardia requested a review from rafiss March 17, 2026 15:57
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

great work on this! this will be really useful for improving the debugging/support experience

@rafiss made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on aa-joshi, angles-n-daemons, and visheshbardia).

@visheshbardia
Copy link
Contributor Author

TFTR!
/trunk merge

@trunk-io trunk-io bot merged commit 6e210ba into cockroachdb:master Mar 17, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants