chore(ci): fix Dependabot Docker paths after src/ refactor by JayVDZ · Pull Request #504 · TetronIO/JIM

JayVDZ · 2026-04-10T08:45:07Z

Summary

Dependabot's three Docker blocks for JIM.Web, JIM.Worker, and JIM.Scheduler have been silently failing for ~8 weeks. The paths reference the pre-`src/` layout and no longer match where the Dockerfiles live.

Root cause

16 Feb 2026: Dependabot successfully opened and merged PRs chore(deps): Bump dotnet/sdk from a574e62 to 03a7d87 in /JIM.Worker #321-chore(deps): Bump dotnet/sdk from a574e62 to 03a7d87 in /JIM.Web #326 updating base image digests against `/JIM.Web/Dockerfile`, `/JIM.Worker/Dockerfile`, `/JIM.Scheduler/Dockerfile`.
Same day: Commit `70b35553` (refactor: Move source projects into src/ directory to declutter repo root) relocated all three Dockerfiles to `src/JIM.Web/`, `src/JIM.Worker/`, `src/JIM.Scheduler/`.
`.github/dependabot.yml` was not updated to match, so every subsequent scheduled Dependabot run has been unable to find any Dockerfiles to scan.
OS-level CVEs in the pinned base images have accumulated in release SBOMs (discovered via SBOM Observer showing 9 vulnerabilities in `sbom-jim-worker.spdx.json`, all traceable to `openssl`, `tar`, `bsdutils`, and `cvs-utils` in the Ubuntu Noble base layer).

Fix

Consolidate the three near-identical Docker blocks into a single entry using the multi-directory `directories` key (GitHub, June 2024), pointing at the correct `src/` paths. Also include `.devcontainer/`, which ships a Dockerfile but was never tracked by Dependabot.

```yaml

package-ecosystem: "docker"
directories:
- "/src/JIM.Web"
- "/src/JIM.Worker"
- "/src/JIM.Scheduler"
- "/.devcontainer"
```

All other settings (schedule, reviewers, groups, ignore rules) are preserved unchanged.

What to expect after merge

On the next scheduled run (Monday 09:00 Europe/London), Dependabot will scan all four Dockerfiles.
At present, the live digests on MCR for `dotnet/aspnet:10.0-noble`, `dotnet/runtime:10.0-noble`, and `dotnet/sdk:10.0-noble` exactly match what's already pinned in the repo (Microsoft last rebuilt these on 2026-04-03), so no bump PR will open immediately.
When Microsoft next publishes a rebuilt `10.0-noble` image incorporating Ubuntu security patches, Dependabot will open a grouped PR within 24 hours. Merging it will clear the CVEs currently showing in SBOM Observer.

Test plan

YAML parses cleanly (`python3 -c 'import yaml; yaml.safe_load(...)'`)
All four update blocks preserved (nuget, docker-images, docker-compose, github-actions)
Docker block contains all three production Dockerfile paths plus devcontainer
No .NET build/test required; `.github/` config is in the CLAUDE.md build/test exception list
Post-merge: verify next Dependabot scheduled run (Monday 09:00 BST) shows no errors in repo Insights -> Dependency graph -> Dependabot

The three Docker base image blocks pointed at /JIM.Web, /JIM.Worker, and /JIM.Scheduler, but those paths ceased to exist when commit 70b3555 ("refactor: Move source projects into src/ directory to declutter repo root") relocated the Dockerfiles under src/. Dependabot has therefore been silently failing to scan any of the three Dockerfiles since mid-February, leaving base image digests stale and allowing OS-level CVEs to accumulate in release SBOMs. Consolidate the three near-identical blocks into a single entry using the multi-directory 'directories' key (GitHub, June 2024), pointing at the correct src/ paths. Also include /.devcontainer, which uses a Dockerfile but was never tracked by Dependabot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@sha256

The previous scan-base-images job had three gaps that together weakened JIM's supply chain compliance posture: 1. Scheduler was never scanned. The matrix had three legs but only covered two Dockerfiles (Web once, Worker twice for runtime+sdk). Worker and Scheduler use the same base image digest today, but "they're identical" is a dangerous assumption for security-critical scanning: nothing prevents them from drifting apart in a future commit with no scan coverage on the Scheduler leg. 2. Digest-pinning was policy, not enforcement. engineering/DEVELOPER_GUIDE.md states that production Dockerfiles must pin base images by @sha256: digest, but there was no CI check enforcing it. A future commit could silently remove a digest and the change would pass all existing CI. 3. The matrix was hand-maintained. Adding a new production Dockerfile required a manual ci.yml edit; a forgotten edit would silently leave the new Dockerfile unscanned. This is exactly the class of drift that caused the Dependabot path bug fixed in #504. Rewrite scan-base-images as a two-job discovery-then-scan pattern: - discover-base-images: a new PowerShell script at .github/scripts/discover-base-images.ps1 walks the repository for files named Dockerfile, identifies production images by the machine-readable directive "# jim-compliance: production-image" on their own line, parses every external FROM line, enforces digest-pinning, and emits a deduplicated matrix of unique image references for the downstream scan job. Zero production Dockerfiles or any non-digest-pinned FROM in a production Dockerfile fails the build with a clear message. - scan-base-images: now consumes the dynamic matrix via needs.discover-base-images.outputs.matrix. Trivy now emits SARIF instead of table format and uploads findings to GitHub code scanning (security-events: write permission scoped to this job only), so vulnerabilities are surfaced in the Security tab and auditable after the fact, not just visible to whoever happens to read the Actions log. The severity threshold (CRITICAL,HIGH), exit-code behaviour, and ignore-unfixed setting are all preserved from the existing job. The three production Dockerfiles (src/JIM.Web, src/JIM.Worker, src/JIM.Scheduler) are labelled with the compliance directive. The .devcontainer/Dockerfile and integration test fixtures under test/integration/docker/ are deliberately left unlabelled: they are dev and test infrastructure, not customer-shipped artefacts, and tracking upstream tags is the correct behaviour for them. Adding a new production Dockerfile now requires only adding the compliance directive to the file itself; discovery and scanning are automatic. The approach is locality-of-reference correct: the policy lives with the artefact, eliminating the class of drift that caused the Dependabot path breakage. engineering/COMPLIANCE_MAPPING.md is updated to reflect that: - Base image digest-pinning is now machine-enforced, strengthening alignment with NIST CSF GV.SC (Supply Chain Risk Management), UK Software Security Code of Practice Principle 7 (Manage and secure third-party components), and NIST SP 800-53 SI-3 (Malicious Code Protection). - A "Planned" entry is added under Code of Practice Principle 8 (Deploy securely) and NIST SP 800-53 SA-11 (Developer Testing and Evaluation) referencing #518, which tracks the future pre-release integration test gate. - Document version bumped 1.0 -> 1.1. engineering/DEVELOPER_GUIDE.md is updated to document the compliance directive convention and the CI enforcement so future contributors know how to label new production Dockerfiles. Related tracking issues created alongside this change: - #517: Pin all GitHub Actions by commit SHA (v0.9-STABILISATION) - #518: Release gate for full integration test suite (v1.0-ILM-COMPLETE) - #519: Continuous SBOM generation on main (v1.0-ILM-COMPLETE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@sha256

#520) * chore(ci): enforce digest-pinning and cover all production base images The previous scan-base-images job had three gaps that together weakened JIM's supply chain compliance posture: 1. Scheduler was never scanned. The matrix had three legs but only covered two Dockerfiles (Web once, Worker twice for runtime+sdk). Worker and Scheduler use the same base image digest today, but "they're identical" is a dangerous assumption for security-critical scanning: nothing prevents them from drifting apart in a future commit with no scan coverage on the Scheduler leg. 2. Digest-pinning was policy, not enforcement. engineering/DEVELOPER_GUIDE.md states that production Dockerfiles must pin base images by @sha256: digest, but there was no CI check enforcing it. A future commit could silently remove a digest and the change would pass all existing CI. 3. The matrix was hand-maintained. Adding a new production Dockerfile required a manual ci.yml edit; a forgotten edit would silently leave the new Dockerfile unscanned. This is exactly the class of drift that caused the Dependabot path bug fixed in #504. Rewrite scan-base-images as a two-job discovery-then-scan pattern: - discover-base-images: a new PowerShell script at .github/scripts/discover-base-images.ps1 walks the repository for files named Dockerfile, identifies production images by the machine-readable directive "# jim-compliance: production-image" on their own line, parses every external FROM line, enforces digest-pinning, and emits a deduplicated matrix of unique image references for the downstream scan job. Zero production Dockerfiles or any non-digest-pinned FROM in a production Dockerfile fails the build with a clear message. - scan-base-images: now consumes the dynamic matrix via needs.discover-base-images.outputs.matrix. Trivy now emits SARIF instead of table format and uploads findings to GitHub code scanning (security-events: write permission scoped to this job only), so vulnerabilities are surfaced in the Security tab and auditable after the fact, not just visible to whoever happens to read the Actions log. The severity threshold (CRITICAL,HIGH), exit-code behaviour, and ignore-unfixed setting are all preserved from the existing job. The three production Dockerfiles (src/JIM.Web, src/JIM.Worker, src/JIM.Scheduler) are labelled with the compliance directive. The .devcontainer/Dockerfile and integration test fixtures under test/integration/docker/ are deliberately left unlabelled: they are dev and test infrastructure, not customer-shipped artefacts, and tracking upstream tags is the correct behaviour for them. Adding a new production Dockerfile now requires only adding the compliance directive to the file itself; discovery and scanning are automatic. The approach is locality-of-reference correct: the policy lives with the artefact, eliminating the class of drift that caused the Dependabot path breakage. engineering/COMPLIANCE_MAPPING.md is updated to reflect that: - Base image digest-pinning is now machine-enforced, strengthening alignment with NIST CSF GV.SC (Supply Chain Risk Management), UK Software Security Code of Practice Principle 7 (Manage and secure third-party components), and NIST SP 800-53 SI-3 (Malicious Code Protection). - A "Planned" entry is added under Code of Practice Principle 8 (Deploy securely) and NIST SP 800-53 SA-11 (Developer Testing and Evaluation) referencing #518, which tracks the future pre-release integration test gate. - Document version bumped 1.0 -> 1.1. engineering/DEVELOPER_GUIDE.md is updated to document the compliance directive convention and the CI enforcement so future contributors know how to label new production Dockerfiles. Related tracking issues created alongside this change: - #517: Pin all GitHub Actions by commit SHA (v0.9-STABILISATION) - #518: Release gate for full integration test suite (v1.0-ILM-COMPLETE) - #519: Continuous SBOM generation on main (v1.0-ILM-COMPLETE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(ci): disable Trivy cache and pin least-privilege perms on metrics-sync Two workflow hardening fixes surfaced by the first CI run on this PR: 1. Disable Trivy DB cache on scan-base-images. In the first CI run, all three Trivy scan legs exited 1 within ~12ms of logging "Detecting vulnerabilities", without writing any findings to the SARIF file. Running Trivy v0.69.3 locally against the exact same image digests with the exact same flags returned exit 0 with zero findings in every leg, confirming the base images are actually clean. The phantom exit 1 in CI tracks to a corrupted Trivy vuln DB cache restored from key "cache-trivy-2026-04-10". Setting cache: 'false' on the scan step sidesteps the cache and forces a fresh DB download per run, which adds ~5-10s per matrix leg but is acceptable for a security-critical scan. 2. Add explicit "permissions: contents: read" at the workflow level on metrics-sync.yml. This satisfies the CodeQL GitHub Actions analyser finding "Workflow does not contain permissions" (actions/missing-workflow-permissions, #102) which surfaced in the Security tab when SARIF upload started working. The workflow only needs to checkout + git diff (reads) and then dispatch to TetronIO/jim-metrics via a separate PAT stored in secrets.METRICS_REPO_DISPATCH_TOKEN, so the default GITHUB_TOKEN needs no writes. contents: read is the correct minimum. Compliance alignment: both fixes strengthen UK Software Security Code of Practice Principle 5 (Protect the build environment) and NIST CSF GV.SC (Supply Chain Risk Management). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(ci): pre-pull base image and evaluate SARIF findings explicitly The previous CI run still had all three scan-base-images legs failing with exit 1 despite disabling the Trivy DB cache. Further investigation showed this was caused by image acquisition, not vulnerability detection: - Running Trivy v0.69.3 locally with the exact CI env vars against the exact image digest via docker-in-docker always returned exit 0 with zero findings written to a valid SARIF file. - The CI log for each scan leg showed Trivy reaching the "Detecting vulnerabilities" INFO line and then exiting 1 ~5.7 seconds later with no further output. - Locally, Trivy's DEBUG output showed it was finding the image via source="docker" (the local Docker daemon), which had the image pre-pulled as part of this session's earlier troubleshooting. - On GitHub-hosted runners, there is no pre-pulled image, and the trivy-action's image acquisition path appears to fail silently when combined with format=sarif output, producing exit 1 with no finding content. Three fixes: 1. Add an explicit "docker pull" step before the Trivy scan. This guarantees Trivy finds the image via source="docker" and skips whatever acquisition path was silently failing. 2. Set trivy-action exit-code from 1 to 0. The action's exit-code mechanism is the observed point of failure. Instead of trusting it, we evaluate findings ourselves in a follow-up step. 3. Add a PowerShell "Fail build on Trivy findings" step that parses trivy-results.sarif, counts runs[*].results[*], prints a clear "N findings" message, links to the Security tab, and exits 1 if any findings were reported. This is locally testable (verified against both a clean SARIF and a synthetic SARIF with one finding), produces better log output than the previous behaviour, and is robust against whatever trivy-action internal path was misbehaving. The Upload Trivy scan results step is unchanged and still runs on if: always(), so SARIF findings reach the Security tab regardless of whether the fail-build step fires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(ci): force Trivy severity filter via step env vars not action inputs The previous CI run produced 36 Trivy alerts in the Security tab (12 per scan leg) despite the action being configured with severity: CRITICAL,HIGH. Investigation of the alerts revealed: - All 36 alerts are LOW severity openssl CVEs (CVE-2026-28387 through CVE-2026-31790). Trivy's own "Severity" field on each rule says LOW. Trivy's security_severity_level on each rule says "low". GitHub Code Scanning classifies them as "low" or "medium". - The same Trivy version (0.69.3) running locally against the same image digests with the same environment variables returned zero findings - correctly filtering LOW-severity CVEs out at the source. - The trivy-action wrapper uses a set_env_var_if_provided shell helper that writes TRIVY_SEVERITY=CRITICAL,HIGH to a temp file named trivy_envs.txt. Based on the CI log, this file is generated but the severity filter is not being honoured by Trivy at scan time, allowing LOW CVEs to leak through when format=sarif is in use. Fix: remove the severity and ignore-unfixed inputs from the trivy-action step entirely, and set TRIVY_SEVERITY and TRIVY_IGNORE_UNFIXED as step-level environment variables instead. Trivy reads these directly from its own environment, bypassing trivy-action's wrapper logic which is the observed point of failure. This preserves the existing structure (docker pull -> trivy scan -> PowerShell SARIF evaluation -> code scanning upload) and only changes how the filter flags reach Trivy. If step-level env vars still don't take effect, the next escalation is running trivy as a direct shell command via aquasecurity/setup-trivy, bypassing trivy-action entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(ci): filter Trivy findings by CVSS score in PowerShell, not Trivy Investigation of the previous CI run revealed why we kept getting phantom HIGH findings: - We queried code scanning for the 36 Trivy alerts on the PR ref and inspected each rule's properties. - All 36 findings had CVSS security-severity scores in the 2.0-5.5 range (low/medium), with tags ["LOW", "security", "vulnerability"]. - Despite TRIVY_SEVERITY=CRITICAL,HIGH being set both via the trivy-action input AND as a step-level environment variable, Trivy in CI did not filter these LOW-severity findings out of its SARIF output. - Running the same Trivy version (0.69.3) locally with the exact same env vars correctly filtered them out, returning zero results. Rather than continue diagnosing why Trivy's severity filter is unreliable when running under trivy-action in CI, switch to a two-stage approach: 1. Trivy scans without a severity filter and writes everything it finds to the SARIF file. This is reliable. 2. The PowerShell evaluation step reads each rule's CVSS score from rule.properties.'security-severity' (the same field GitHub Code Scanning uses to classify alerts), and counts only findings with CVSS >= 7.0 as blocking (HIGH or CRITICAL). This is strictly better than relying on Trivy's filter: - It uses CVSS as the source of truth, which is the industry standard for severity. - It matches how GitHub Code Scanning classifies alerts in the Security tab, so our gate and the Security tab agree. - It is fully testable locally (verified with both a real Trivy SARIF containing 17 results [14 LOW, 3 MEDIUM, 0 HIGH/CRITICAL] and a synthetic SARIF containing 1 LOW + 1 HIGH + 1 CRITICAL). - It surfaces a clear severity breakdown in the CI log ("CRITICAL: 0, HIGH: 0, MEDIUM: 3, LOW: 14") and lists each blocking CVE by ID and CVSS when failing. - It is robust against trivy-action wrapper bugs. TRIVY_IGNORE_UNFIXED=true is preserved as a step env var so we do not block on CVEs that have no upstream fix yet. For the current state of the digest-pinned base images, this parser correctly classifies every finding as LOW or MEDIUM, so all three scan legs should now pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document upstream-only base image CVE response procedure When the scan-base-images CI gate fires on a fixable HIGH/CRITICAL CVE that lives in a Microsoft-published base image layer (e.g., Ubuntu package CVEs in dotnet/runtime:10.0-noble that have an upstream fix but have not yet been absorbed into a refreshed Microsoft image), the JIM project cannot apply the fix directly. The fix has to come from a Microsoft rebuild, which happens on its own cadence. This commit documents what to do in that situation: - engineering/DEVELOPER_GUIDE.md gains a new subsection ("When the scan-base-images gate blocks on an upstream-only CVE") under the existing Docker Base Images section. It explains the four available response options in order of preference (wait for the Microsoft rebuild, in-Dockerfile apt-get upgrade, temporary gate threshold downgrade, or alert dismissal in the Security tab) and explicitly forbids continue-on-error: true as a permanent workaround. - engineering/COMPLIANCE_MAPPING.md gains a new "Operational Considerations" section that briefly describes the situation, acknowledges it as a known limitation of digest-pinned base images, clarifies it is not a compliance gap (digest pinning, scanning, and SBOM generation all still operate correctly), and links readers to the developer guide for the response procedure. There is no code change in this commit. The operational reality has not changed; the documentation is being added now because the investigation that produced PR #520 surfaced the question and we want the answer captured before it is forgotten. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JayVDZ merged commit 34cdce6 into main Apr 10, 2026
12 checks passed

JayVDZ deleted the fix/dependabot-docker-paths branch April 10, 2026 08:49

JayVDZ mentioned this pull request Apr 10, 2026

chore(ci): enforce digest-pinning and cover all production base images #520

Merged

9 tasks

JayVDZ mentioned this pull request Apr 10, 2026

Harden main branch protection ruleset: required status checks, signed commits, review gates #521

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(ci): fix Dependabot Docker paths after src/ refactor#504

chore(ci): fix Dependabot Docker paths after src/ refactor#504
JayVDZ merged 1 commit intomainfrom
fix/dependabot-docker-paths

JayVDZ commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JayVDZ commented Apr 10, 2026

Summary

Root cause

Fix

What to expect after merge

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant