Summary
The current semgrep-cloud-platform/scan ruleset for dotCMS/core does not include rules tailored to dotCMS's DotConnect and HibernateUtil SQL APIs. As a result, classic SQL injection anti-patterns in our DB layer (raw string concatenation into setSQL / setQuery / executeStatement, String.format-based query assembly, and "'" + var + "'" literal-wrap quoting) are not flagged by PR scans today.
This issue tracks adding a .semgrep/dotcms-sqli.yml rule pack that catches these patterns and wiring it into the PR Semgrep GitHub Action so every PR is scanned against it.
Background
semgrep ci runs against every PR via .github/workflows/cicd_comp_semgrep-phase.yml, but it currently uses only the rules configured in the Semgrep Cloud Platform UI. Adding --config .semgrep/ to the semgrep ci invocation lets us version-control dotCMS-specific rules alongside the code, and run them against PRs in addition to the cloud rules.
Goals
- Catch new occurrences of these patterns in PR scans before merge:
dc.setSQL(... + var + ...) and dc.setSQL(String.format(...))
dc.executeStatement(... + var + ...) and dc.executeStatement(String.format(...))
dh.setQuery(... + var + ...) and dh.setQuery(String.format(...)) (HibernateUtil)
"'" + var + "'" and "... = '" + var + "'..." (literal-wrap quoting)
sql.replace(\":named\", value) (manual placeholder replacement)
- Use Semgrep's diff-aware mode so the existing baseline doesn't break unrelated PRs.
- Exclude known-safe areas (startup Task migrations, integrity checkers, schema introspection) to keep signal-to-noise reasonable.
Non-goals
- Fixing the existing baseline of findings (tracked separately).
- Replacing the Semgrep Cloud Platform ruleset — local rules are additive.
Acceptance criteria
Summary
The current
semgrep-cloud-platform/scanruleset fordotCMS/coredoes not include rules tailored to dotCMS'sDotConnectandHibernateUtilSQL APIs. As a result, classic SQL injection anti-patterns in our DB layer (raw string concatenation intosetSQL/setQuery/executeStatement,String.format-based query assembly, and"'" + var + "'"literal-wrap quoting) are not flagged by PR scans today.This issue tracks adding a
.semgrep/dotcms-sqli.ymlrule pack that catches these patterns and wiring it into thePR SemgrepGitHub Action so every PR is scanned against it.Background
semgrep ciruns against every PR via.github/workflows/cicd_comp_semgrep-phase.yml, but it currently uses only the rules configured in the Semgrep Cloud Platform UI. Adding--config .semgrep/to thesemgrep ciinvocation lets us version-control dotCMS-specific rules alongside the code, and run them against PRs in addition to the cloud rules.Goals
dc.setSQL(... + var + ...)anddc.setSQL(String.format(...))dc.executeStatement(... + var + ...)anddc.executeStatement(String.format(...))dh.setQuery(... + var + ...)anddh.setQuery(String.format(...))(HibernateUtil)"'" + var + "'"and"... = '" + var + "'..."(literal-wrap quoting)sql.replace(\":named\", value)(manual placeholder replacement)Non-goals
Acceptance criteria