fix: remove localdns metrics exporter flaky security hardening directives check from e2e #8356
Conversation
The systemd security directive validation (steps 13-17) checked static unit file properties via systemctl show over a bastion SSH tunnel. The 8KB WebSocket buffer limit caused random carriage return injection, making grep anchored matches fail on different properties each run. These checks validate build-time unit file configuration that doesn't change at runtime — code review and shellspec tests are the right place for this, not e2e tests over a flaky tunnel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Validate all 16 systemd security hardening directives are present in the unit file. This replaces the flaky e2e runtime checks that were unreliable over bastion SSH tunnels with deterministic file-level tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to improve e2e reliability by removing flaky LocalDNS exporter systemd security-hardening validation from the e2e path (run over an SSH/WebSocket tunnel) and shifting static unit-file directive validation into ShellSpec, which is a better fit for build-time configuration checks.
Changes:
- Add ShellSpec checks to validate
localdns-exporter@.servicesecurity hardening directives directly from the unit file. - Remove the e2e script’s live
systemctl show-based verification of 16 security directives plus runtime enforcement checks.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh | Adds ShellSpec assertions for security hardening directives in the systemd unit file. |
| e2e/localdns/validate-localdns-exporter-metrics.sh | Removes the flaky live systemd security directive checks and runtime enforcement validations. |
Steps 11-12 (nc connection hold + instance discovery) only existed to support the security directive checks removed in the previous commit. Without them, this code was dead weight that could still cause flaky test failures if no instance was found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
LGTM on the approach — moving static unit file checks to ShellSpec is the right call given the bastion SSH 8KB WebSocket buffer flakiness.
One note: ShellSpec grepping the source unit file isn't fully equivalent to the previous E2E runtime checks (systemctl show on a live instance). The gap is narrow but exists — e.g. directives ignored by the kernel/systemd version, drop-in overrides, or incorrect file installation during VHD build. For full end-to-end coverage of the metrics exporter (including security hardening enforcement + actual Prometheus scraping pipeline), we should add a real E2E test in aks-rp that validates the metrics are scraped correctly by vmagent
What this PR does / why we need it:
The systemd security directive validation (steps 13-17) in the localdns metrics exporter e2e checked static unit file properties via systemctl show over a bastion SSH tunnel. The 8KB WebSocket buffer limit caused random carriage return injection, making grep anchored matches fail on different properties each run.
These checks validate build-time unit file configuration that doesn't change at runtime. code review and shellspec tests are the right place for this, not e2e tests over a flaky tunnel. So I'm also adding shellspec tests instead for validation.
Which issue(s) this PR fixes:
Fixes #