badwords: refactored for comments and whitelisting#20909
badwords: refactored for comments and whitelisting#20909
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the scripts/badwords scanner to focus on source-code comments and double-quoted strings, consolidate scanning into a single run, and move whitelist configuration into scripts/badwords.txt.
Changes:
- Add a C comment/string extraction path in
scripts/badwordsand switch to extension-based handling. - Move whitelist entries into
scripts/badwords.txtand removescripts/badwords.ok. - Consolidate
badwords-allto a single invocation over sources and markdown.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/badwords.txt | Documents/hosts new whitelist configuration entries alongside existing badword rules. |
| scripts/badwords.ok | Removed in favor of in-file whitelist syntax in badwords.txt. |
| scripts/badwords-all | Consolidates scanning into one command invocation. |
| scripts/badwords | Implements new source-code scanning mode + new whitelist parsing behavior. |
| scripts/Makefile.am | Stops distributing badwords.ok (consistent with whitelist move). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- when scanning source code, this now only checks source code comments and double-quote strings. No more finding bad words as part of code - this allows the full scan to be done in a single invocation - detects source code or markdown by file name extension - moved the whitelist words config into the single `badwords.txt` file, no more having them separately (see top of file for syntax) - all whitelisted words are checked case insensitively now - removed support for whitelisting words on a specific line number. We did not use it and it is too fragile Removing the actual code from getting scanned made the script take an additional 0.5 seconds on my machine. Scanning 1525 files now takes a little under 1.7 seconds for me. Closes #20909
3b8ed09 to
c4af243
Compare
|
augment review |
🤖 Augment PR SummarySummary: Refactors the Changes:
Technical Notes: The scanner enumerates targets via 🤖 Was this summary useful? React with 👍 or 👎 |
Also used in curl-www in two CVE |
badwords.txtfile, no more having them separately (see top of file for syntax)Removing the actual code from getting scanned made the script take an additional 0.5 seconds on my machine.
Scanning 1525 files now takes a little under 1.7 seconds for me.