Skip to content

fix: remove localdns metrics exporter flaky security hardening directives check from e2e #8356

Merged
ganeshkumarashok merged 3 commits intomainfrom
jingwenwu/fix-restrict-address-families-check
Apr 20, 2026
Merged

fix: remove localdns metrics exporter flaky security hardening directives check from e2e #8356
ganeshkumarashok merged 3 commits intomainfrom
jingwenwu/fix-restrict-address-families-check

Conversation

@jingwenw15
Copy link
Copy Markdown
Member

@jingwenw15 jingwenw15 commented Apr 20, 2026

What this PR does / why we need it:

The systemd security directive validation (steps 13-17) in the localdns metrics exporter e2e checked static unit file properties via systemctl show over a bastion SSH tunnel. The 8KB WebSocket buffer limit caused random carriage return injection, making grep anchored matches fail on different properties each run.

These checks validate build-time unit file configuration that doesn't change at runtime. code review and shellspec tests are the right place for this, not e2e tests over a flaky tunnel. So I'm also adding shellspec tests instead for validation.

Screenshot 2026-04-20 at 2 03 53 PM

Which issue(s) this PR fixes:

Fixes #

jingwenw15 and others added 2 commits April 20, 2026 13:38
The systemd security directive validation (steps 13-17) checked static
unit file properties via systemctl show over a bastion SSH tunnel. The
8KB WebSocket buffer limit caused random carriage return injection,
making grep anchored matches fail on different properties each run.

These checks validate build-time unit file configuration that doesn't
change at runtime — code review and shellspec tests are the right place
for this, not e2e tests over a flaky tunnel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Validate all 16 systemd security hardening directives are present in
the unit file. This replaces the flaky e2e runtime checks that were
unreliable over bastion SSH tunnels with deterministic file-level tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve e2e reliability by removing flaky LocalDNS exporter systemd security-hardening validation from the e2e path (run over an SSH/WebSocket tunnel) and shifting static unit-file directive validation into ShellSpec, which is a better fit for build-time configuration checks.

Changes:

  • Add ShellSpec checks to validate localdns-exporter@.service security hardening directives directly from the unit file.
  • Remove the e2e script’s live systemctl show-based verification of 16 security directives plus runtime enforcement checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh Adds ShellSpec assertions for security hardening directives in the systemd unit file.
e2e/localdns/validate-localdns-exporter-metrics.sh Removes the flaky live systemd security directive checks and runtime enforcement validations.

Comment thread e2e/localdns/validate-localdns-exporter-metrics.sh Outdated
Steps 11-12 (nc connection hold + instance discovery) only existed to
support the security directive checks removed in the previous commit.
Without them, this code was dead weight that could still cause flaky
test failures if no instance was found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@yewmsft yewmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the approach — moving static unit file checks to ShellSpec is the right call given the bastion SSH 8KB WebSocket buffer flakiness.

One note: ShellSpec grepping the source unit file isn't fully equivalent to the previous E2E runtime checks (systemctl show on a live instance). The gap is narrow but exists — e.g. directives ignored by the kernel/systemd version, drop-in overrides, or incorrect file installation during VHD build. For full end-to-end coverage of the metrics exporter (including security hardening enforcement + actual Prometheus scraping pipeline), we should add a real E2E test in aks-rp that validates the metrics are scraped correctly by vmagent

@ganeshkumarashok ganeshkumarashok merged commit f880933 into main Apr 20, 2026
25 of 32 checks passed
@ganeshkumarashok ganeshkumarashok deleted the jingwenwu/fix-restrict-address-families-check branch April 20, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants