-
Notifications
You must be signed in to change notification settings - Fork 0
Potential fix for code scanning alert no. 2: Clear-text storage of sensitive information #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
domfahey
wants to merge
1
commit into
main
Choose a base branch
from
alert-autofix-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check failure
Code scanning / CodeQL
Clear-text storage of sensitive information High
Copilot Autofix
AI 4 months ago
In general, to fix clear-text storage of sensitive information in reports/logs, you either (1) avoid including the sensitive fields altogether, or (2) replace them with non-reversible pseudonyms that cannot be used to reconstruct the original values and do not expose any direct substring of the data. Deterministic, secret-key–based tokens are better than raw hashes, and exposing no raw prefix at all is safer than exposing a few characters.
For this specific code, the problem is localized in
_pseudonymizeand its use inwrite_group_to_file. We can fix the issue without changing existing functionality in a user-visible way by tightening_pseudonymizeso that it no longer includes any portion of the original string, and only outputs a fixed label plus a short, deterministic pseudonymous token derived from the input. This keeps stable equality (the same input always yields the same masked output), so duplicate grouping and report readability (“same masked name appears several times”) are preserved, while eliminating direct leakage of clear-text characters. To further reduce the risk of offline guessing attacks, we can introduce a process-local random salt, so that the digest used for masking cannot be precomputed from the raw database values; since the salt only needs to be consistent within a single run (for generating this report), a single generated salt in the module scope is sufficient and does not require persistent storage.Concretely, in
scripts/analyze_duplicates.py:secretsandbase64(both from the standard library) alongside the existing imports.PSEUDONYM_SALT = secrets.token_bytes(16))._pseudonymizeto:None/ empty the same as today ("N/A").digest = hashlib.sha256(PSEUDONYM_SALT + text.encode("utf-8")).digest().anon:<token>orpseudonym:<token>without embedding any substring oftext.write_group_to_fileunchanged; it will automatically start writing the safer tokens instead of the current"prefix…:digest"values.This modification is restricted to the shown file and lines, uses only standard-library modules, preserves the behavior that identical inputs yield identical masked outputs, and removes clear-text leakage that CodeQL flags.