Skip to content

Add Semgrep rules to catch SQL injection anti-patterns in PR scans #35547

@mbiuki

Description

@mbiuki

Summary

The current semgrep-cloud-platform/scan ruleset for dotCMS/core does not include rules tailored to dotCMS's DotConnect and HibernateUtil SQL APIs. As a result, classic SQL injection anti-patterns in our DB layer (raw string concatenation into setSQL / setQuery / executeStatement, String.format-based query assembly, and "'" + var + "'" literal-wrap quoting) are not flagged by PR scans today.

This issue tracks adding a .semgrep/dotcms-sqli.yml rule pack that catches these patterns and wiring it into the PR Semgrep GitHub Action so every PR is scanned against it.

Background

semgrep ci runs against every PR via .github/workflows/cicd_comp_semgrep-phase.yml, but it currently uses only the rules configured in the Semgrep Cloud Platform UI. Adding --config .semgrep/ to the semgrep ci invocation lets us version-control dotCMS-specific rules alongside the code, and run them against PRs in addition to the cloud rules.

Goals

  • Catch new occurrences of these patterns in PR scans before merge:
    • dc.setSQL(... + var + ...) and dc.setSQL(String.format(...))
    • dc.executeStatement(... + var + ...) and dc.executeStatement(String.format(...))
    • dh.setQuery(... + var + ...) and dh.setQuery(String.format(...)) (HibernateUtil)
    • "'" + var + "'" and "... = '" + var + "'..." (literal-wrap quoting)
    • sql.replace(\":named\", value) (manual placeholder replacement)
  • Use Semgrep's diff-aware mode so the existing baseline doesn't break unrelated PRs.
  • Exclude known-safe areas (startup Task migrations, integrity checkers, schema introspection) to keep signal-to-noise reasonable.

Non-goals

  • Fixing the existing baseline of findings (tracked separately).
  • Replacing the Semgrep Cloud Platform ruleset — local rules are additive.

Acceptance criteria

  • .semgrep/dotcms-sqli.yml exists and validates with semgrep --validate.
  • cicd_comp_semgrep-phase.yml invokes semgrep ci with --config .semgrep/.
  • Local rules catch the PublishAuditAPIImpl.java:231 pattern (regression test).
  • PRs that introduce a new setSQL(... + var + ...) or similar fail the PR Semgrep check.
  • PRs that don't touch DB-layer code see no new findings.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions