Skip to content

Add agent readiness runtime support#345

Merged
PrzemyslawKlys merged 7 commits intomainfrom
codex/agent-readiness-runtime-support
May 5, 2026
Merged

Add agent readiness runtime support#345
PrzemyslawKlys merged 7 commits intomainfrom
codex/agent-readiness-runtime-support

Conversation

@PrzemyslawKlys
Copy link
Copy Markdown
Member

@PrzemyslawKlys PrzemyslawKlys commented May 5, 2026

Summary

Adds reusable PowerForge.Web support for agent-readiness runtime behavior across Apache-hosted sites.

Changes

  • Adds an Apache support configuration model for generated .htaccess behavior.
  • Emits managed Apache directives for Link headers, Content-Signal, markdown negotiation, discovery resource headers, and markdown content types.
  • Extends verification to understand generated .htaccess headers and declarative WebMCP annotations.
  • Preserves quoted JSON-LD script types after HTML optimization so brittle scanners still recognize schema blocks.
  • Updates schema/docs and adds focused tests for Apache support, WebMCP detection, and JSON-LD optimization.

Operator Note

Missing optional agent-skills and agents-json remote scan resources now report info instead of fail. Sites that used those failures as CI gates should enable explicit required discovery checks or inspect those IDs directly if they want absence to remain blocking.

Why

Cloudflare/isagentready-style checks expect runtime headers and agent discovery signals that static artifacts alone cannot express reliably on Apache deployments. This makes the behavior reusable from the engine instead of requiring site-specific .htaccess hacks.

Validation

  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj --filter "WebAgentReadinessTests|WebSiteAuditOptimizeBuildTests" --no-restore
  • dotnet build .\PowerForge.Web\PowerForge.Web.csproj --no-restore
  • git diff --check

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #345: Add agent readiness runtime support

Overview

This PR adds Apache .htaccess generation to the PowerForge.Web agent-readiness pipeline, broadens WebMCP detection to include declarative tool annotations, fixes HTML optimization stripping quotes from application/ld+json script type attributes, and makes a handful of related improvements to scan result reporting and HTTP header inspection. The scope is reasonable and the approach is generally sound.


Positives

  • Good test coverage. Three focused integration tests cover Apache config generation, declarative WebMCP detection, and JSON-LD quote preservation. The test for Apache output is particularly thorough, asserting specific directive values.
  • Idempotent .htaccess management. Using RemoveGeneratedBlock() before re-writing prevents accumulation of managed blocks on repeated runs.
  • Proper escaping helpers. EscapeApacheQuotedValue and EscapeApacheExpressionString are correctly implemented and consistently used.
  • HeaderExists/HeaderContains refactor. Checking both response and content headers is more correct and the expanded form is easier to reason about.
  • agent-skills/agents-json downgrade to info. These are optional resources, so fail was too aggressive. The info level is appropriate.

Issues

1. Bug — Vary: Accept may not be emitted when only markdown negotiation is enabled

WebAgentReadiness.cs, UpdateApacheConfig:

if (writeLinkHeaders || writeContentSignals || security.Enabled || writeDiscoveryHeaders)
{
    sb.AppendLine("<IfModule mod_headers.c>");
    if (writeMarkdownNegotiation)
        sb.AppendLine("  Header merge Vary \"Accept\"");
    ...
}

Vary: Accept is nested inside the outer <IfModule mod_headers.c> block, which is only opened when at least one of writeLinkHeaders, writeContentSignals, security.Enabled, or writeDiscoveryHeaders is true. If a user sets all four to false but leaves MarkdownNegotiation = true, the Vary header is silently omitted even though the rewrite rules are still emitted. Caches upstream will serve stale HTML to markdown-accepting clients.

Fix: Include writeMarkdownNegotiation in the outer block condition, or move the Vary header into its own <IfModule mod_headers.c> sub-block inside the if (writeMarkdownNegotiation) section.


2. False positives — HtmlContainsWebMcpSignal substring search is too broad

WebAgentReadiness.cs:

private static bool HtmlContainsWebMcpSignal(string html)
    => !string.IsNullOrWhiteSpace(html) &&
       (html.Contains("navigator.modelContext.provideContext", StringComparison.Ordinal) ||
        html.Contains("tool-name=", StringComparison.OrdinalIgnoreCase) ||
        html.Contains("toolname=", StringComparison.OrdinalIgnoreCase) ||
        html.Contains("tool-description=", StringComparison.OrdinalIgnoreCase) ||
        html.Contains("tooldescription=", StringComparison.OrdinalIgnoreCase));

tool-name= and tool-description= are very generic substrings. Any page containing these strings in a comment, a JavaScript string literal, a CSS custom property, or a third-party attribute (e.g. <button data-tool-name="copy">) will produce a false positive. The imperative signal (navigator.modelContext.provideContext) is precise; the declarative ones are not.

Consider tightening to attribute-in-tag context: check for \btool-name\s*= on actual HTML elements (a lightweight regex anchored to tag boundaries), or document the conservatism explicitly in the check message so operators know what it means.


3. Fragile — effectiveHeadersText merges two incompatible formats for security header checks

WebAgentReadiness.cs, Verify:

var effectiveHeadersText = string.Join(Environment.NewLine, headersText, apacheText);
...
AddSecurityHeaderChecks(checks, effectiveHeadersText, ...);

_headers uses Header-Name: value syntax while .htaccess uses Header set Header-Name "value". If AddSecurityHeaderChecks was written expecting the Cloudflare _headers format, security header checks will silently fail when only an .htaccess is present, even if the headers are correctly declared there. The link-header check explicitly handles both formats with separate Contains calls, but AddSecurityHeaderChecks does not appear to do the same.

Either extend AddSecurityHeaderChecks to accept the two formats, or run it twice — once against each file with its own path — and union the results.


4. Design — AgentApacheSupportSpec.OutputPath has a confusing nullable annotation

Models/AgentReadinessSpec.cs:

public string? OutputPath { get; set; } = ".htaccess";

The property is string? (nullable) but initialized to a non-null default. This implies callers must still guard against explicit null (e.g. JSON deserialization with "outputPath": null), which is why the null checks are duplicated throughout:

string.IsNullOrWhiteSpace(spec.Apache?.OutputPath) ? ".htaccess" : spec.Apache!.OutputPath!

This pattern appears in at least three places. A cleaner approach would be a computed property:

public string EffectiveOutputPath => string.IsNullOrWhiteSpace(OutputPath) ? ".htaccess" : OutputPath!;

This is a minor DRY issue but also reduces the chance of a caller forgetting the fallback.


5. Limitation (worth documenting) — RewriteRule misses non-trailing-slash deep paths

UpdateApacheConfig:

RewriteRule ^$ /index.md [L,T=text/markdown]          ← root only
RewriteRule ^(.+)/$ /$1/index.md [L,T=text/markdown]  ← trailing-slash dirs

A request to /blog/post (no trailing slash) with Accept: text/markdown will not match either rule. This is likely intentional for a static-site generator context (where pages are typically served as index.html under a directory), but it is a silent gap: the user gets HTML back with no indication that markdown negotiation failed. A note in the doc block or a comment in the generated .htaccess block would help operators debug this.


6. Minor — Directory.CreateDirectory with null-forgiving operator

UpdateApacheConfig:

Directory.CreateDirectory(Path.GetDirectoryName(path)!);

Path.GetDirectoryName returns null for root paths on some platforms. While this can't happen in practice (.htaccess will be inside siteRoot), the ! silences the nullable warning rather than making the safety contract explicit. A guard like var dir = Path.GetDirectoryName(path); if (!string.IsNullOrEmpty(dir)) Directory.CreateDirectory(dir); would be more defensive and compiler-friendly.


7. Minor — unrelated scan behavior change is bundled

The agent-skills and agents-json severity downgrade from fail to info is a sensible fix but is unrelated to the Apache feature. If there are any consumers expecting fail status for those checks (monitoring, CI gates), this is a quiet breaking change. A mention in the PR description or a separate commit would make the intent easier to audit.


Summary

The core implementation is solid and the test coverage is good. The Vary: Accept omission under a specific configuration is the only functional bug. The false-positive risk in WebMCP detection and the mixed-format security header check are the next most important issues. Everything else is minor polish.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #345 — Add agent readiness runtime support

Overview

This PR adds Apache .htaccess generation for agent-readiness signals (Link headers, Content-Signals, Markdown negotiation, discovery resource Content-Type/CORS headers). It also extends WebMCP detection to cover declarative HTML tool attributes, downgrades missing optional discovery documents from fail to info in the remote scan, and fixes a post-minification regression where the HTML optimizer was stripping quotes from type="application/ld+json".

The feature is well-scoped and the implementation is consistent with the existing _headers generation pattern. Test coverage is solid.


Code Quality and Best Practices

Positives:

  • AgentApacheSupportSpec is clean and symmetrical with existing spec classes; sensible opt-in defaults (Enabled = false).
  • Extraction of CreateParentDirectory eliminates a subtle inline Directory.CreateDirectory that was missing the null guard; good defensive refactoring.
  • BuildLinkHeaderValue, BuildContentSignalValue, EscapeApacheQuotedValue, and EscapeApacheExpressionString are small and focused.
  • Managed block strategy (AgentBlockStart/AgentBlockEnd) is reused correctly, keeping the idempotent append/replace pattern consistent.
  • The comment explaining why per-file Header blocks are avoided for Markdown artifacts is exactly the right kind of inline comment.
  • ScanAsync change downgrading agent-skills / agents-json from fail to info aligns with the stated design principle of not forcing optional discovery contracts.

Minor concerns:

  1. Bare catch in HtmlContainsWebMcpSignal silently swallows all exceptions from HtmlParser.ParseWithAngleSharp. Returning false is a safe fallback, but a thrown ObjectDisposedException or OutOfMemoryException would be suppressed too. Consider catching only Exception (or a narrower type) and either logging or re-throwing fatal exceptions.

  2. (spec.Apache ?? new AgentApacheSupportSpec()).EffectiveOutputPath in the Verify path allocates a throwaway object on every call. A private constant DefaultHtaccessPath = ".htaccess" used directly would be simpler and clearer.

  3. writeDiscoveryHeaders condition: var writeDiscoveryHeaders = apache.DiscoveryResourceHeaders defaults to true, so the outer <IfModule mod_headers.c> block will always open even when no discovery specs are enabled and only the Vary header or nothing is written. Not a correctness bug, but the emitted .htaccess would contain an empty or near-empty mod_headers block in minimal configurations.


Potential Bugs

  1. HeaderDirectiveExists regex is not compiled. The method is called 5–6 times per Verify invocation and constructs two Regex objects from user-controlled strings each time. Regex.Escape is correct, but the pattern should use RegexOptions.Compiled or be pre-compiled for repeated use. Currently this re-JITs on every security header check.

  2. JsonLdScriptTypeRegex lookahead(?=(?:\s|>|/)) correctly handles > and / as terminators but would miss an unquoted type immediately followed by \r (bare carriage return). This is an edge case that is unlikely to matter in practice but worth noting.

  3. No path traversal test for apache.OutputPath. The existing Prepare_RejectsAgentOutputPathOutsideSiteRoot test covers HeadersPath. An equivalent test for Apache.OutputPath = "../../etc/something" is missing. If ResolveSitePath already enforces boundary checking, a test still helps document the guarantee.


Performance Considerations

  1. HtmlContainsWebMcpSignal invokes AngleSharp for every HTML file when looking for declarative tool annotations. For a 500-file site with no WebMCP signal, this parses all 500 documents. A fast pre-check before entering AngleSharp would help significantly:

    if (!html.Contains("tool-name", StringComparison.OrdinalIgnoreCase) &&
        !html.Contains("tool-description", StringComparison.OrdinalIgnoreCase))
        return false;
  2. As noted above, HeaderDirectiveExists uses uncompiled Regex.IsMatch in a hot path. Pre-compiling the patterns (or using a GeneratedRegex attribute if .NET 7+ is the target) would eliminate repeated regex compilation.


Security Considerations

  • EscapeApacheQuotedValue (escapes \ and ") and EscapeApacheExpressionString (escapes \ and ') look correct for their respective contexts.
  • The values being embedded come from spec configuration, not from untrusted user input, which keeps the injection risk low.
  • CORS (Access-Control-Allow-Origin: *) in Verify_PassesSecurityHeadersFromApacheConfig is tested correctly; the wildcard is intentional for agent discovery resources.
  • No concerns about the managed block removal regex — it's anchored to the known sentinel strings.

Summary

Area Status
Core feature correctness ✅ Looks correct
Test coverage ✅ Solid; one gap (Apache output path traversal)
Performance ⚠️ AngleSharp called unconditionally; regex not compiled
Security ✅ No issues with current scope
Code style ✅ Consistent with existing codebase

Suggested before merge:

  1. Add a fast Contains pre-check in HtmlContainsWebMcpSignal before invoking AngleSharp.
  2. Add a test asserting that an Apache.OutputPath pointing outside the site root is rejected (mirrors the existing headers-path test).
  3. Consider narrowing the catch in HtmlContainsWebMcpSignal to avoid suppressing fatal exceptions.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #345: Add agent readiness runtime support

Overview

This PR adds Apache .htaccess generation to PowerForge.Web's agent-readiness pipeline, enabling mod_headers/mod_rewrite rules for Link headers, Content-Signal, Markdown negotiation, and discovery resource headers on Apache-hosted static sites. It also refines WebMCP detection to accept declarative HTML tool annotations alongside the existing imperative navigator.modelContext check, fixes a JSON-LD script type quote-stripping bug in the HTML optimizer, and downgrades missing optional discovery documents from fail to info in remote scans.

The change is well-scoped and the new model, service, schema, and tests all hang together coherently.


Potential Bug: Unconditional Apache file read in Verify()

In WebAgentReadiness.cs, apacheText is always read regardless of whether Apache support is enabled:

var apachePath = ResolveSitePath(siteRoot, spec.Apache?.EffectiveOutputPath ?? AgentApacheSupportSpec.DefaultOutputPath);
var apacheText = File.Exists(apachePath) ? File.ReadAllText(apachePath) : string.Empty;
var effectiveHeadersText = string.Join(Environment.NewLine, headersText, apacheText);

If a site has an existing hand-crafted .htaccess (e.g., with redirect rules that coincidentally contain Strict-Transport-Security in a comment), the security header checks will produce false positives even when spec.Apache is null or Enabled = false. Consider gating the read on spec.Apache?.Enabled == true.


Security: Missing newline sanitization in Apache header values

EscapeApacheQuotedValue escapes \ and " but does not strip or reject newline characters (\r, \n). A newline embedded in a link Href or Rel (sourced from site.json) could break the .htaccess structure and inject arbitrary Apache directives. Since this is a build tool fed by developer-controlled config the practical risk is low, but sanitizing newlines in AppendApacheHeaderSet/AppendApacheHeaderAdd would make the output reliably safe:

private static string EscapeApacheQuotedValue(string value)
    => value
        .Replace("\r", string.Empty, StringComparison.Ordinal)
        .Replace("\n", string.Empty, StringComparison.Ordinal)
        .Replace("\\", "\\\\", StringComparison.Ordinal)
        .Replace("\"", "\\\"", StringComparison.Ordinal);

JSON-LD Regex Edge Case

The new JsonLdScriptTypeRegex pattern:

(<script\b[^>]*?\btype\s*=\s*)application/ld\+json(?=(?:\s|>|/))

The [^>]*? segment will stop at the first > it encounters, which is correct for well-formed HTML. However, an inline event handler like <script onload="x>y" type=application/ld+json> could fool the pattern into not matching. This is an unlikely real-world case, but the lookahead does cover the common patterns. If the HTML optimizer (HtmlOptimizer.OptimizeHtml) already normalises tags before this runs, document this assumption in the method.


Minor: AppendApacheHeaderAdd vs Header always add

The generated .htaccess uses plain Header add Link "...". On some Apache configurations, response headers set without the always keyword are not emitted for error responses (4xx/5xx). For discovery Link headers that agents might request on any response, Header always add Link "..." is more robust. Worth a note in the docs or a config option.


Positive highlights

  • Path traversal guard: Prepare_RejectsApacheOutputPathOutsideSiteRoot correctly catches ../outside.htaccess and is tested with Assert.Throws<ArgumentException>.
  • HeaderDirectiveExists(): The two-format parser (Cloudflare _headers colon style + Apache Header set/add/append/merge style) is clean and handles the always keyword correctly.
  • HtmlContainsWebMcpSignal(): The fast-path string check before invoking AngleSharp parsing is a good performance guard, and the exception catch correctly excludes OutOfMemoryException.
  • ScanAsync downgrade: Changing agent-skills and agents-json from fail to info when absent is the right semantic — these are optional discovery documents and shouldn't fail a site that hasn't adopted them.
  • BuildContentSignalValue / BuildLinkHeaderValue: These new helpers centralise the value-building logic that was previously inlined, making future changes easier.
  • Test coverage: Five new focused tests cover the happy path, the Vary-only path, .htaccess security header detection, both polarities of WebMCP declarative detection, and the path traversal rejection — solid coverage for the new surface area.

Summary

Three action items before merge:

  1. Gate apacheText read in Verify() on spec.Apache?.Enabled == true to prevent false-positive security checks from pre-existing .htaccess files.
  2. Add newline stripping to EscapeApacheQuotedValue (low severity but good hygiene for a code-generation path).
  3. Consider whether Header always add is preferable to Header add for the Link headers (not a blocker, but worth a conscious decision).

The rest of the implementation is clean, well-tested, and follows the existing patterns in the codebase.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review – PR #345: Add agent readiness runtime support

Overview

This PR adds reusable Apache .htaccess generation for agent-readiness headers and Markdown content negotiation. It also extends WebMCP detection to support declarative HTML tool annotations, preserves quoted JSON-LD type attributes after HTML minification, refactors optional-discovery checks to emit info instead of fail, and unifies header detection to handle both Cloudflare-style _headers and Apache Header directives in one pass.

The overall implementation is solid: good security practices, path traversal protection, injection escaping, and comprehensive tests.


Strengths

  • Injection protection is thorough. EscapeApacheQuotedValue() strips embedded newlines and escapes backslashes/quotes before emitting values into .htaccess. The test Prepare_WritesApacheHeadersAndMarkdownNegotiationRulesWhenEnabled exercises this with a CSP value containing \r\nHeader set X-Injected "bad" and asserts the injected directive does not appear. Good defensive coverage.
  • Path traversal is validated. Prepare_RejectsApacheOutputPathOutsideSiteRoot confirms that OutputPath = "../outside.htaccess" throws ArgumentException.
  • HeaderDirectiveExists() is well-structured. It correctly handles both Header-Name: value (Cloudflare format) and Header [always] {set|add|append|merge} HeaderName (Apache format), skips comments, and supports mixed line endings via explicit split on \r\n, \n, \r. The : suffix guard prevents X-Frame-Options from matching X-Frame-Options-Extra:.
  • Eliminated double evaluation. The original ScanAsync called ValidateAgentSkillsIndexText and ValidateAgentsJsonText twice each. Introducing agentSkillsValid / agentsJsonValid variables is a clean fix.
  • info for optional discovery docs is the right call. These are voluntary contracts; a missing agents.json on a site that hasn't published one is not a failure.

Issues and Suggestions

1. Header set vs. Header always set for discovery resource Content-Type (minor correctness)

AppendApacheResourceHeaders emits Header set Content-Type ... without always. Apache's Header set only applies to 2xx responses. If a discovery resource returns 4xx (e.g. misconfigured rewrite), the Content-Type override is silently skipped. The Link headers already use Header always add, so always would be consistent:

-    AppendApacheHeaderSet(sb, "Content-Type", contentType, indent: "    ");
+    AppendApacheHeaderAlwaysSet(sb, "Content-Type", contentType, indent: "    ");

Or rename the helper to accept an always flag. Not a blocking issue, but worth making consistent.

2. apachePath is resolved even when Apache support is disabled (minor clarity)

var apachePath = apacheEnabled
    ? ResolveSitePath(siteRoot, spec.Apache!.EffectiveOutputPath)
    : ResolveSitePath(siteRoot, AgentApacheSupportSpec.DefaultOutputPath);  // never used

When apacheEnabled = false, apacheText is forced to string.Empty anyway, so the path is computed but never read. Simplifying to:

var apachePath = apacheEnabled
    ? ResolveSitePath(siteRoot, spec.Apache!.EffectiveOutputPath)
    : (string?)null;
var apacheText = apachePath is not null && File.Exists(apachePath) ? File.ReadAllText(apachePath) : string.Empty;
var effectiveHeadersPath = File.Exists(headersPath) ? headersPath : (apachePath ?? headersPath);

...would remove the implicit fallback that only resolves to .htaccess in the disabled case.

3. Split((char[]?)null, ...) idiom in HeaderDirectiveExists (readability)

var parts = line.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries);

The (char[]?)null cast is valid—it invokes the whitespace-splitting overload—but it is easy to misread as a null-check rather than a split selector. line.Split([' ', '\t'], StringSplitOptions.RemoveEmptyEntries) is more explicit and also slightly safer: Apache directives don't span lines, so splitting only on spaces/tabs is semantically tighter than splitting on all Unicode whitespace.

4. markdownExtension used before null-guard (worth confirming)

var markdownExtension = NormalizeMarkdownExtension(spec.MarkdownArtifacts?.Extension);
...
sb.Append("  AddType text/markdown ").Append(markdownExtension).AppendLine();

If NormalizeMarkdownExtension returns null or empty (e.g. when spec.MarkdownArtifacts is null), the generated AddType line would be malformed. writeMarkdownNegotiation guards the outer block, but that flag requires spec.MarkdownArtifacts?.Enabled == true—not null-safe for the extension alone. Worth adding a guard inside NormalizeMarkdownExtension or asserting the extension is non-empty before the AppendLine.

5. Long inline HTML in tests (minor readability)

Several test methods embed the same single-line index.html content across multiple facts:

<body><header><nav><a href="/">Home</a></nav></header><main><h1>Example?</h1>...</main><footer>Footer</footer></body>

A small private helper (e.g. MinimalIndexHtml(string bodyContent)) or shared raw string literal would reduce duplication and make assertion diffs easier to read. Not a blocking concern given the tests are clearly focused, but worth cleaning up if the file grows further.


Tests

Coverage for the new code is good:

  • Prepare_WritesApacheHeadersAndMarkdownNegotiationRulesWhenEnabled — full integration including injection protection
  • Prepare_WritesApacheVaryHeaderWhenOnlyMarkdownNegotiationIsEnabled — isolated Markdown negotiation path
  • Verify_PassesSecurityHeadersFromApacheConfig — Apache .htaccess recognized during local verify
  • Verify_IgnoresApacheConfigWhenApacheSupportIsDisabled — correct non-regression when disabled
  • Verify_PassesWebMcpWhenDeclarativeToolAnnotationsExist — new declarative detection
  • Verify_DoesNotPassWebMcpForCommentsOrDataAttributes — negative case for false positives
  • Prepare_RejectsApacheOutputPathOutsideSiteRoot — path traversal guard
  • OptimizeDetailed_PreservesQuotedJsonLdScriptType — HTML minification regression

One gap worth considering: no test covers the case where .htaccess already contains an existing managed PowerForge block that must be replaced cleanly on a second prepare run. The RemoveGeneratedBlock + re-append pattern is load-bearing; a round-trip test would make the idempotency guarantee explicit.


Summary

This is a clean, well-thought-out addition. The security considerations (escaping, path traversal, injection tests) are handled correctly, and the behavior changes to optional discovery checks are clearly motivated. The four issues above are all minor; the most actionable is probably Header set vs Header always set for discovery resource headers and the idempotency test gap. Everything else is readability/style.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #345 — Add agent readiness runtime support

Overview

This PR adds Apache .htaccess generation support to the PowerForge.Web agent-readiness pipeline — emitting mod_headers directives for Link headers, Content-Signal, security headers, and CORS, plus mod_rewrite rules for Accept: text/markdown negotiation. It also extends WebMCP detection to cover declarative tool-name/tool-description HTML attributes, unifies header-presence checks across Cloudflare-style _headers and Apache .htaccess formats, and fixes a post-HTML-optimization bug where application/ld+json script type quotes were being stripped.

The implementation is solid overall. A few items worth discussing:


Potential Bugs

effectiveHeadersPath may report the wrong file in diagnostics

var effectiveHeadersPath = File.Exists(headersPath) ? headersPath : (apachePath ?? headersPath);

When both _headers and .htaccess exist, effectiveHeadersPath always resolves to headersPath. If the relevant header was found only in .htaccess, check messages will point users to _headers. This could produce confusing output during debugging. Consider resolving to whichever file actually contained the match, or listing both paths in the message.

BuildLinkHeaderValue does not escape Href or Rel

sb.Append('<').Append(target.Href).Append(">; rel=\"").Append(target.Rel).Append('"');

If target.Href contains > or ", the emitted Apache Header always add line will be malformed. In practice these values come from config, but using EscapeApacheQuotedValue on the Rel/Type fields and validating that Href is a safe URL-path would harden this.


Behavioral Change Worth Flagging

ScanAsync: agent-skills and agents-json changed from "fail" to "info" when absent

// Before
agentSkills.Success && ValidateAgentSkillsIndexText(agentSkills.Text) ? "pass" : "fail"

// After
agentSkillsValid ? "pass" : "info"

This is a silent severity downgrade for any existing pipeline that depends on those scan failures to block a deploy. The PR description mentions it, but since ScanAsync is likely called in CI, users relying on the old behavior may not notice until their scanner goes green for a misconfigured site. Consider gating the "info" path on a flag or documenting the migration prominently in the changelog.


Minor Issues

HtmlContainsWebMcpSignal fast-path interacts with comment content

The substring pre-check html.Contains("tool-name") will trigger DOM parsing even when tool-name appears only inside an HTML comment (e.g., <!-- tool-name="..." -->). The DOM parse correctly returns false in that case (confirmed by the negative test), but the unnecessary parse adds overhead. A comment noting this edge case would help future readers.

AgentApacheSupportSpec.OutputPath has redundant defaults

public string? OutputPath { get; set; } = DefaultOutputPath;
public string EffectiveOutputPath => string.IsNullOrWhiteSpace(OutputPath) ? DefaultOutputPath : OutputPath!;

Both the property initializer and EffectiveOutputPath guard against null/empty. No functional issue, but the property type string? combined with a non-null default is slightly confusing — a reader might wonder why EffectiveOutputPath is needed at all.


Missing Test Cases

  • New-file creation: No test exercises the path where .htaccess does not exist before Prepare runs. The idempotency test seeds an existing file; a greenfield test would give explicit coverage of CreateParentDirectory and the no-cleaned-text branch.
  • ContentSignalsHeader = false: No test verifies that disabling ContentSignalsHeader suppresses the Content-Signal directive while other Apache output is still emitted.
  • BuildLinkHeaderValue with special characters: As noted above, a test with a Link target whose Href or Rel contains " or > would lock in the escaping behaviour.

What's Well Done

  • Idempotency via managed block markers — the BEGIN/END PowerForge Agent Readiness pattern is clean and the test runs Prepare twice to confirm exactly one occurrence of each marker.
  • Injection hardeningEscapeApacheQuotedValue strips \r\n before embedding user-controlled values (e.g., CSP) into .htaccess quoted strings; the test with a newline-injected CSP value validates this explicitly.
  • Path traversal guardPrepare_RejectsApacheOutputPathOutsideSiteRoot covers the ../outside.htaccess case.
  • JSON-LD quote preservation — the regex approach with a lookahead to avoid re-quoting already-quoted values is correct, and the test is straightforward.
  • HeaderDirectiveExists unification — parsing both Cloudflare-style Header-Name: value lines and Apache Header [always] set|add|… Name value lines in a single pass is a neat abstraction that avoids duplicating all six security-header checks.
  • Test coverage is broad — 9 new focused tests cover prepare, verify, idempotency, security-header reading, WebMCP true/false positives, path rejection, and the optimizer fix.

Overall this is a well-structured addition. The main asks before merging: address the effectiveHeadersPath diagnostic ambiguity, add escaping to BuildLinkHeaderValue, and decide whether the ScanAsync severity downgrade needs a config flag or a more visible migration note.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #345 — Add agent readiness runtime support (Apache .htaccess)

Overview

This PR adds reusable Apache .htaccess generation for agent-readiness signals — Link headers, Content-Signal, Markdown negotiation via mod_rewrite, and discovery resource Content-Type/CORS headers. It also extends verification to read headers from both _headers and .htaccess, adds declarative WebMCP attribute detection, and fixes JSON-LD script type quote-stripping during HTML optimization.

The implementation is solid overall: it follows existing patterns, ships focused tests, and handles several security edge cases proactively. A few specific items below are worth addressing before merge.


Issues Worth Addressing

1. Redundant linkHeadersPresent computation (potential logic confusion)

In WebAgentReadiness.cs (~line 745-746):

var linkHeaderPath = ResolveHeaderSourcePath("Link", headersText, headersPath, apacheText, apachePath ?? headersPath, fallbackHeadersPath);
var linkHeadersPresent = linkHeaderPath != fallbackHeadersPath || HeaderDirectiveExists(effectiveHeadersText, "Link");

ResolveHeaderSourcePath already calls HeaderDirectiveExists on both headersText and apacheText internally. If it returns fallbackHeadersPath, neither source has the header, so HeaderDirectiveExists(effectiveHeadersText, "Link") will also return false. The || branch is dead code. This makes linkHeadersPresent effectively equivalent to linkHeaderPath != fallbackHeadersPath. Consider simplifying to avoid misleading future readers:

var linkHeadersPresent = linkHeaderPath != fallbackHeadersPath;

2. Misleading nullable annotation on AgentApacheSupportSpec.OutputPath

In Models/AgentReadinessSpec.cs:

public string? OutputPath { get; set; } = DefaultOutputPath;

The property is declared nullable but initialised to a non-null constant. The EffectiveOutputPath computed property then has to use the ! null-forgiving operator. Since the nullable annotation implies callers may intentionally pass null, this creates a mismatch. Either make it non-nullable (the simpler fix), or document explicitly that null means "use default".

3. T=text/markdown RewriteRule flag has version dependency

The mod_rewrite rules use the T flag:

RewriteRule ^$ /index.md [L,T=text/markdown]

The T (type-map) flag is only available in Apache 2.4.26+. For older Apache deployments, this silently produces a directive parse error or ignored rule. Consider adding a comment in the generated .htaccess block (or the documentation) noting the minimum Apache version required.


Positive Observations

  • Security: header injection preventionStripHeaderControlCharacters, EscapeApacheQuotedValue, and EscapeApacheExpressionString all handle the relevant injection vectors. The test with ContentSecurityPolicyValue = "default-src 'self'\r\nHeader set X-Injected \"bad\"" concretely verifies CRLF injection is neutralised.

  • Security: path traversal validationPrepare_RejectsApacheOutputPathOutsideSiteRoot confirms the guard works for ../outside.htaccess.

  • IdempotencyPrepare_ReplacesExistingApacheManagedBlock verifies that running prepare twice produces exactly one managed block, using RemoveGeneratedBlock consistent with the existing _headers pattern.

  • WebMCP declarative detectionHtmlContainsWebMcpSignal uses a quick string pre-check before invoking AngleSharp, which avoids HTML parsing overhead for pages without tool annotations. The false-positive tests for HTML comments and data-tool-name are exactly the right edge cases to cover.

  • agent-skills / agents-json scan downgrade to "info" — Changing missing optional discovery documents from "fail" to "info" is the right call; these are opt-in features and their absence should not fail a scan by default.

  • BuildLinkHeaderValue extraction — Good refactoring: centralises the link header format and adds escaping for both the _headers and Apache paths, removing duplicated inline construction.

  • HeaderDirectiveExists dual-mode parsing — Cleanly handles both Cloudflare-style Header-Name: value and Apache-style Header [always] set|add|... Header-Name formats in a single method.

  • PreserveJsonLdScriptTypeQuotes — The targeted fix for the type=application/ld+json quote-stripping is appropriately narrow and the compiled regex is correct.


Minor Notes (non-blocking)

  • effectiveHeadersText = string.Join(Environment.NewLine, headersText, apacheText) allocates a combined string that is only used for the (now-redundant) linkHeadersPresent fallback. If that redundancy is removed, effectiveHeadersText itself becomes unused and can be deleted.
  • The CreateParentDirectory helper is a 2-line extract of Path.GetDirectoryName + Directory.CreateDirectory. It's fine, but it adds a layer of indirection that offers minimal abstraction value. Consider inlining it, especially since the pattern only appears twice.
  • In AgentApacheSupportSpec, the XML doc on EffectiveOutputPath is absent while all other properties have summaries — worth adding for consistency.

Test Coverage

The 7 new tests cover the key scenarios well:

  • Happy-path .htaccess generation with all features enabled
  • Idempotent re-run (block replacement)
  • Vary-only mode (markdown negotiation without Link/Content-Signal headers)
  • Security header verification via .htaccess
  • Apache disabled (.htaccess headers ignored)
  • WebMCP declarative attributes pass
  • WebMCP false-positive rejection (comments, data-attributes)
  • Path traversal rejection

The JSON-LD quote-preservation test in WebSiteAuditOptimizeBuildTests.Part2b.cs is also appropriately scoped.

One gap: there is no test verifying that a CSP value with embedded newlines is safely included in the final .htaccess as a single quoted value (rather than leaking onto a new line). The injection test checks DoesNotContain(Environment.NewLine + " Header set X-Injected", apache) which covers the injection scenario, but it would be clearer to also assert that the CSP header value is present as a complete line.


Overall: This is a well-implemented, security-conscious feature addition. The redundant linkHeadersPresent branch and the OutputPath nullable annotation are the two items I'd suggest fixing before merge; everything else is at most stylistic.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #345 — Add agent readiness runtime support (Follow-up / Synthesis)

Note: several prior reviews exist on this PR. This comment corrects a few inaccurate findings in those reviews and consolidates what actually needs attention.


Corrections to Prior Review Findings

Several earlier comments flagged issues that the actual diff already handles correctly:

  • Vary: Accept omission — The outer <IfModule mod_headers.c> condition in UpdateApacheConfig already includes || writeMarkdownNegotiation, and the test Prepare_WritesApacheVaryHeaderWhenOnlyMarkdownNegotiationIsEnabled covers this path. The bug described in review Release version number does not automatically update #1 is not present.

  • apacheText read when Apache is disabledapachePath is gated on apacheEnabled. When disabled, apachePath = null and apacheText = "", so pre-existing .htaccess files cannot produce false-positive security header matches. Review Update for building binary modules #3's concern does not apply to the as-written code.

  • BuildLinkHeaderValue not escapingEscapeLinkUriReference and EscapeLinkParameterValue are used on Href, Rel, and Type respectively, and StripHeaderControlCharacters strips \r/\n in both. Review Allow dependencies without GUID and version #5's claim that escaping is absent is incorrect.

  • HtmlContainsWebMcpSignal calls AngleSharp unconditionally — A Contains("tool-name") / Contains("tool-description") fast-path pre-check is present before AngleSharp is invoked. The performance concern in review Customized Release Path #2 is already addressed.

  • Header always add for Link headersAppendApacheHeaderAdd emits Header always add, not a bare Header add. Review Rework how the configuration is done #4's note on this was already handled.

  • Apache 2.4.26+ T flag — The generated .htaccess already includes the comment # The T= response type flag requires Apache 2.4.26 or newer.


Genuine Issues Remaining

1. ScanAsync severity downgrade is a silent breaking change

Changing agent-skills and agents-json from "fail" to "info" is semantically correct, but any CI pipeline calling ScanAsync with --fail-on-failures or checking the result for "fail" statuses will silently go green for sites that no longer publish these documents. This change is not flagged in the PR description as a breaking change. At minimum, add a changelog entry or a comment in the code documenting the intentional semantic change.

2. Empty <IfModule mod_headers.c> block when no discovery resources are enabled

var writeDiscoveryHeaders = apache.DiscoveryResourceHeaders && HasApacheDiscoveryResources(spec) computes to false when all discovery specs are disabled. However, if writeLinkHeaders, writeContentSignals, security.Enabled, and writeMarkdownNegotiation are all also false, the outer <IfModule> block is skipped entirely — which is correct. But if only apache.DiscoveryResourceHeaders = true with no enabled discovery specs and nothing else, writeDiscoveryHeaders is false but DiscoveryResourceHeaders stays true as a default — the block is not opened, which is correct. This is fine; noting it here only because multiple prior reviews misread it.

The one real variant: if security.Enabled = true but HasApacheDiscoveryResources returns false, AppendApacheDiscoveryHeaders is never called (guarded by writeDiscoveryHeaders) so the block won't be empty. No issue here.

3. HeaderDirectiveExists split pattern

var parts = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);

This is correct, but several prior reviews mistakenly described this as a compiled regex. For clarity: no regex is used in HeaderDirectiveExists. This is efficient string splitting and is fine as-is.


What Is Well Done

  • Injection hardening: CRLF injection test with ContentSecurityPolicyValue containing a newline-injected directive is a concrete, high-value security test.
  • Path traversal guard tested and confirmed.
  • Idempotent managed block pattern is consistent with the _headers approach.
  • EffectiveOutputPath computed property is a clean DRY improvement.
  • 9 new focused tests cover the full feature surface including negative cases.

Summary

The implementation is correct. The one genuine pre-merge ask is a changelog or PR description note for the ScanAsync "fail""info" severity change for agent-skills and agents-json, so operators relying on the old behavior know to update their CI gates. Everything else is minor style or already addressed.

@PrzemyslawKlys PrzemyslawKlys merged commit f453b55 into main May 5, 2026
6 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the codex/agent-readiness-runtime-support branch May 5, 2026 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant