sdk%fix(codeql): deal with inconsistent caches, allocate half of available memory, fix CSV parsing bug, add --cache/-c#6
Conversation
📝 WalkthroughWalkthroughThis PR centralizes workspace discovery and resource estimation in contrib/lint/common.py, refactors lint_codeql.py to manage database/results lifecycles, changes CodeQL CSV diagnostics parsing, updates CodeQL predicates, and migrates other lint scripts to use root_dir(). ChangesLint infrastructure refactoring with shared environment utilities
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Warning This pull request may have conflicts, please coordinate with the authors of these pull requests. Potential conflicts |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
contrib/lint/common.py (1)
116-125: 💤 Low valueAdd
ValueErrorto the WMIC exception handling for consistency.The macOS path (lines 98-100) catches
ValueErrorfromint()conversion, but the WMIC fallback does not. If WMIC produces malformed output,int()on line 123 would raise an uncaughtValueErrorinstead of falling through to theRuntimeErroron line 126.Suggested fix
try: out = subprocess.check_output( ["wmic", "computersystem", "get", # noqa: S607 "TotalPhysicalMemory", "/value"], ).decode() for line in out.splitlines(): if line.startswith("TotalPhysicalMemory="): return int(line.split("=", 1)[1].strip()) - except (FileNotFoundError, subprocess.CalledProcessError): + except (FileNotFoundError, subprocess.CalledProcessError, ValueError): pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/lint/common.py` around lines 116 - 125, The WMIC fallback block in contrib/lint/common.py currently only catches FileNotFoundError and subprocess.CalledProcessError but can raise ValueError when int() parsing fails; update the except clause for the WMIC branch (the try around subprocess.check_output and the loop that returns int(line.split(...))) to also include ValueError so malformed WMIC output falls through to the final RuntimeError, e.g., add ValueError to the exception tuple that currently references FileNotFoundError and subprocess.CalledProcessError.contrib/lint/lint_codeql.py (3)
173-183: 💤 Low valueClarify
--skip-cachehelp text.The help text "rebuild the database from scratch" could be misinterpreted as rebuilding the persistent cache. In reality,
--skip-cachebypasses the cache entirely by using a temporary directory that is discarded afterward.Consider rephrasing to "bypass cache using temporary database" or "skip persistent cache" for clarity.
♻️ Proposed help text clarification
parser.add_argument( "-s", "--skip-cache", action="store_true", - help="rebuild the database from scratch", + help="bypass cache using a temporary database", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/lint/lint_codeql.py` around lines 173 - 183, Update the help string for the --skip-cache argument in _parse_args to clarify it does not rebuild the persistent cache but instead bypasses it by using a temporary, discarded database; change the help text to something like "bypass persistent cache and use a temporary database (discarded after run)" so callers of _parse_args and the parser.add_argument("--skip-cache") option reflect the correct behavior.
242-247: ⚡ Quick winConsider cache directory migration or cleanup guidance.
The cache structure has changed to use
.cache/dband.cache/resultssubdirectories (see lines 168-170). If developers have existing caches from before this PR, they may encounter issues with incompatible directory structures.Consider adding a check to detect and clean up old cache formats, or document the need to manually clear old caches after this change.
🔄 Proposed cache cleanup check
cache_dir = query_dir / ".cache" + # Clean up old cache format (pre-migration) + old_db_manifest = cache_dir / "codeql-database.yml" + if old_db_manifest.is_file(): + print("Detected old cache format, cleaning up...", file=sys.stderr) + shutil.rmtree(cache_dir, ignore_errors=True) with _workspace_dirs(cache_dir, skip_cache=args.skip_cache) as (🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/lint/lint_codeql.py` around lines 242 - 247, Before calling _workspace_dirs (where cache_dir is defined), add a check that detects legacy cache layouts (e.g., .cache containing files/directories but missing the new .cache/db and .cache/results subdirs) and either migrate or remove them when args.skip_cache is false; implement this using the existing cache_dir variable and args.skip_cache flag, performing safe migration of files into cache_dir/"db" and cache_dir/"results" or deleting old entries and logging the action, then proceed to call _workspace_dirs to yield active_db and results_dir.
147-171: ⚡ Quick winConsider logging cleanup failures for observability.
The cleanup at line 166 uses
shutil.rmtree(..., ignore_errors=True), which silently suppresses errors like permission issues or file locks. While this prevents the script from crashing during cleanup, it may leave temporary directories behind without any indication.Consider logging cleanup failures to stderr so administrators can detect and address accumulating temp directories.
♻️ Proposed enhancement for cleanup observability
finally: - shutil.rmtree(tmp, ignore_errors=True) + try: + shutil.rmtree(tmp) + except OSError as e: + print(f"warning: cleanup failed for {tmp}: {e}", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/lint/lint_codeql.py` around lines 147 - 171, The cleanup in _workspace_dirs currently calls shutil.rmtree(tmp, ignore_errors=True) which swallows failures; change this to attempt rmtree without ignore_errors and catch exceptions, then log the failure (including the tmp path and exception details) to stderr or the module logger so administrators can detect lingering temp dirs; specifically update the finally block in _workspace_dirs to wrap shutil.rmtree(tmp) in try/except and use logging.error or sys.stderr.write with the path and exception information.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@contrib/lint/common.py`:
- Around line 116-125: The WMIC fallback block in contrib/lint/common.py
currently only catches FileNotFoundError and subprocess.CalledProcessError but
can raise ValueError when int() parsing fails; update the except clause for the
WMIC branch (the try around subprocess.check_output and the loop that returns
int(line.split(...))) to also include ValueError so malformed WMIC output falls
through to the final RuntimeError, e.g., add ValueError to the exception tuple
that currently references FileNotFoundError and subprocess.CalledProcessError.
In `@contrib/lint/lint_codeql.py`:
- Around line 173-183: Update the help string for the --skip-cache argument in
_parse_args to clarify it does not rebuild the persistent cache but instead
bypasses it by using a temporary, discarded database; change the help text to
something like "bypass persistent cache and use a temporary database (discarded
after run)" so callers of _parse_args and the
parser.add_argument("--skip-cache") option reflect the correct behavior.
- Around line 242-247: Before calling _workspace_dirs (where cache_dir is
defined), add a check that detects legacy cache layouts (e.g., .cache containing
files/directories but missing the new .cache/db and .cache/results subdirs) and
either migrate or remove them when args.skip_cache is false; implement this
using the existing cache_dir variable and args.skip_cache flag, performing safe
migration of files into cache_dir/"db" and cache_dir/"results" or deleting old
entries and logging the action, then proceed to call _workspace_dirs to yield
active_db and results_dir.
- Around line 147-171: The cleanup in _workspace_dirs currently calls
shutil.rmtree(tmp, ignore_errors=True) which swallows failures; change this to
attempt rmtree without ignore_errors and catch exceptions, then log the failure
(including the tmp path and exception details) to stderr or the module logger so
administrators can detect lingering temp dirs; specifically update the finally
block in _workspace_dirs to wrap shutil.rmtree(tmp) in try/except and use
logging.error or sys.stderr.write with the path and exception information.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a7e1c045-1c13-4153-9053-18e76646600c
📒 Files selected for processing (5)
contrib/lint/common.pycontrib/lint/lint_codeql.pycontrib/lint/lint_javascript.pycontrib/lint/lint_python.pycontrib/lint/lint_semgrep.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
contrib/codeql/lib/policy.qll (1)
149-158: 💤 Low valueMinor: Regex may not match generic impl declarations.
The regex pattern on line 156-157 uses
\bword boundaries around the type name, which won't match generic types likeimpl serde::Serialize for Container<T>since<isn't a word character. This is likely acceptable since this is a fallback for extractor limitations, but be aware it may miss some generic manual impls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/codeql/lib/policy.qll` around lines 149 - 158, The regex used in the exists(...) clause (via sourceLineContent(...).regexpMatch(...)) uses \b around t.getName().getText() which fails to match generic impls like "impl serde::Serialize for Container<T>"; update the pattern to allow an optional generic parameter or non-word boundary after the type name (e.g., permit '<' or end/non-word boundary) so regexpMatch will match both plain and generic impls; adjust the concatenation that builds the pattern using traitName and t.getName().getText() accordingly so fileRelPath/sourceLineContent still find manual impls for serde traits.contrib/lint/lint_codeql.py (1)
154-164: 💤 Low valuePR title mentions
--skip-cache/-sbut implementation uses--cache/-c.The PR title describes adding a
--skip-cacheflag, but the implementation uses--cachewith opposite semantics (opt-in caching vs opt-out). The current implementation (opt-in caching) is a sensible default—just ensure the PR description is updated to reflect the actual behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/lint/lint_codeql.py` around lines 154 - 164, The PR title and description reference a `--skip-cache`/`-s` flag but the actual implementation in the _parse_args function uses `--cache`/`-c` with opposite semantics (opt-in caching). Since the implementation is correct and sensible, update the PR title and description to accurately reflect the actual `--cache`/`-c` flag behavior instead of the `--skip-cache` terminology mentioned in the PR metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@contrib/codeql/lib/policy.qll`:
- Around line 149-158: The regex used in the exists(...) clause (via
sourceLineContent(...).regexpMatch(...)) uses \b around t.getName().getText()
which fails to match generic impls like "impl serde::Serialize for
Container<T>"; update the pattern to allow an optional generic parameter or
non-word boundary after the type name (e.g., permit '<' or end/non-word
boundary) so regexpMatch will match both plain and generic impls; adjust the
concatenation that builds the pattern using traitName and t.getName().getText()
accordingly so fileRelPath/sourceLineContent still find manual impls for serde
traits.
In `@contrib/lint/lint_codeql.py`:
- Around line 154-164: The PR title and description reference a
`--skip-cache`/`-s` flag but the actual implementation in the _parse_args
function uses `--cache`/`-c` with opposite semantics (opt-in caching). Since the
implementation is correct and sensible, update the PR title and description to
accurately reflect the actual `--cache`/`-c` flag behavior instead of the
`--skip-cache` terminology mentioned in the PR metadata.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0b490ff3-13d0-4d2c-a25b-01b4f8612e1c
📒 Files selected for processing (7)
contrib/codeql/import.qlcontrib/codeql/lib/policy.qllcontrib/lint/common.pycontrib/lint/lint_codeql.pycontrib/lint/lint_javascript.pycontrib/lint/lint_python.pycontrib/lint/lint_semgrep.py
🚧 Files skipped from review as they are similar to previous changes (3)
- contrib/lint/lint_semgrep.py
- contrib/lint/lint_javascript.py
- contrib/lint/common.py
--skip-cache/-s--cache/-c
Additional Information
BaseCodecand{impl,codec}_*helper macros to reduce boilerplate, enforcecreate::preludeuse, drop premature limits #4Breaking Changes
None expected.
How Has This Been Tested?
Checklist