pgwire: use HashString for usernames in auth log calls#165804
pgwire: use HashString for usernames in auth log calls#165804trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
😎 Merged successfully - details. |
rafiss
left a comment
There was a problem hiding this comment.
Thanks for working on this! The Warningf changes are a good start, but I think there's a bigger opportunity here.
The 5 Warningf call sites are only part of the picture. Every call to ac.LogAuthOK, ac.LogAuthFailed, and ac.LogAuthInfof emits a structured event containing CommonSessionDetails.User and CommonSessionDetails.SystemIdentity as plain strings. In the proto definition, these fields are not marked as safe-from-redaction:
cockroach/pkg/util/log/eventpb/session_events.proto
Lines 54 to 62 in 1f9236d
That means in the generated JSON encoder (json_encode_generated.go:2349-2370), these fields are wrapped with StartMarker()/EndMarker(), so they get fully redacted to × in redacted sinks. Similarly, CommonConnectionDetails.RemoteAddress is treated as sensitive.
If we centralize the hashing at the proto field level, we can fix all of these structured auth events at once, We need that in addition to the individual log call sites you fixed. See inline comment for the suggested approach.
@rafiss made 5 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on aa-joshi, angles-n-daemons, and visheshbardia).
pkg/sql/pgwire/auth.go line 148 at r1 (raw file):
matchedIdentity, err := c.checkClientUsernameMatchesMapping(ctx, ac, behaviors, systemIdentity) if err != nil { log.Dev.Warningf(ctx, "unable to map incoming identity %q, or san identities %q to any database user: %+v", systemIdentity, behaviors.GetSANIdentities(), err)
systemIdentity is logged as plain %q
there were also two more places i saw that should be updated:
auth_methods.go:1261—user.Normalized()in LDAP authorization logauth_methods.go:1269—userinerrors.Wrapfthat flows into structured eventDetail
pkg/sql/pgwire/auth.go line 177 at r1 (raw file):
if err != nil { log.Dev.Warningf(ctx, "user retrieval failed for user=%q: %+v", redact.HashString(dbUser.Normalized()), err) ac.LogAuthFailed(ctx, eventpb.AuthFailReason_USER_RETRIEVAL_ERROR, err)
The HashString wrapping here is correct, but note that the ac.LogAuthFailed call on this line also carries the username, via p.authDetails.User inside CommonSessionDetails. That field is a plain string in the proto, so it gets fully redacted to × in redacted sinks (not hashed).
Let's make a more centralized fix: change CommonSessionDetails.User and SystemIdentity (and CommonConnectionDetails.RemoteAddress) to RedactableString with redact:"mixed" (like Detail and Info already use), then update SetDbUser/SetSystemIdentity to store the hashable value:
func (p *authPipe) SetDbUser(dbUser username.SQLUsername) {
p.authDetails.User = redact.Sprintf("%s", redact.HashString(dbUser.Normalized()))
}
func (p *authPipe) SetSystemIdentity(systemIdentity string) {
p.authDetails.SystemIdentity = redact.Sprintf("%s", redact.HashString(systemIdentity))
}This way:
- Non-redacted sinks: cleartext (same as today)
- Redacted sinks with hashing enabled: deterministic hash (instead of
×) - All structured auth events (
LogAuthOK,LogAuthFailed,LogAuthInfof) automatically get hashed usernames without touching each call site
pkg/sql/pgwire/auth.go line 254 at r1 (raw file):
keyVal := strings.SplitN(setting, "=", 2) if len(keyVal) != 2 { log.Ops.Warningf(ctx, "%s has malformed default setting: %q", redact.HashString(dbUser.Normalized()), setting)
redact.HashString(dbUser.Normalized()) is repeated 5 times in this function. Since dbUser is assigned once (line 164) and never reassigned, consider extracting into a local hashedUser variable after line 164.
pkg/util/log/redact_test.go line 457 at r1 (raw file):
require.NotContains(t, string(redactedContents), "alice", "redacted sink should not contain cleartext") require.Contains(t, string(redactedContents), "user=‹",
This assertion doesn't distinguish hashing from full redaction. "user=‹" would also match "user=‹×›". Consider adding:
require.NotContains(t, string(redactedContents), "‹×›",
"redacted sink should hash, not fully redact")88f7726 to
6f0ed5f
Compare
|
TFTR @rafiss, I've addressed the quick fixes:
For the centralized approach at |
rafiss
left a comment
There was a problem hiding this comment.
Using hashing for the common fields would make this PR much more valuable for improving the debugging experience and would be worthwhile to include in the 26.2 release. Can you explain more about why you don't want to add it now? It's only about a 20 line code change.
I made a draft version of the change here: 1634179
6f0ed5f to
3b84194
Compare
Convert usernames in authentication log calls to use redact.HashString() instead of passing them as plain unsafe format arguments. When hash-based redaction is enabled (via log config), these values now produce deterministic 8-character hex hashes instead of being fully redacted to ×, enabling cross-entry correlation in support diagnostics without exposing the actual data. Also adds TestHashRedactionPerSink which verifies that hash-based redaction respects per-sink settings: a sink with redact=true hashes HashString values while a sink with redact=false shows cleartext. Epic: CRDB-47199 Release note (ops change): When hash-based redaction is enabled in the logging configuration, usernames in authentication logs now produce deterministic hashes instead of being fully redacted. This allows support engineers to correlate the same user across multiple log entries without seeing the actual values. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
3b84194 to
bea5388
Compare
|
Thanks for the draft, that helped a lot! Initially, we approached this as a first step for hash-based redaction: just wrapping the specific log lines that TSE identified as critical for debugging. The per-log-line approach felt safer since I wasn't sure how changing proto field types would interact with the event rendering pipeline. But your draft made it clear it's a straightforward change, so I've adopted it. One thing I ran into: when |
rafiss
left a comment
There was a problem hiding this comment.
great work on this! this will be really useful for improving the debugging/support experience
@rafiss made 1 comment.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on aa-joshi, angles-n-daemons, and visheshbardia).
|
TFTR! |
Convert usernames in authentication log calls to use redact.HashString() instead of passing them as plain unsafe format arguments. When hash-based redaction is enabled (via log config), these values now produce deterministic 8-character hex hashes instead of being fully redacted to ×, enabling cross-entry correlation in support diagnostics without exposing the actual data.
Also adds TestHashRedactionPerSink which verifies that hash-based redaction respects per-sink settings: a sink with redact=true hashes HashString values while a sink with redact=false shows cleartext.
Epic: CRDB-47199
Release note (ops change): When hash-based redaction is enabled in the logging configuration, usernames in authentication logs now produce deterministic hashes instead of being fully redacted. This allows support engineers to correlate the same user across multiple log entries without seeing the actual values.