Skip to content

Add PowerForge.Web agent readiness support#318

Merged
PrzemyslawKlys merged 5 commits intomainfrom
feature/agent-readiness
Apr 17, 2026
Merged

Add PowerForge.Web agent readiness support#318
PrzemyslawKlys merged 5 commits intomainfrom
feature/agent-readiness

Conversation

@PrzemyslawKlys
Copy link
Copy Markdown
Member

Summary

  • Add a PowerForge.Web agent readiness engine with prepare, verify, and scan operations plus an agent-ready pipeline task.
  • Generate and validate agent-facing assets for Cloudflare/isagentready scanners: robots content signals, Link headers, security headers, API catalog, Agent Skills index, agents.json, A2A card, MCP card metadata, OpenAPI discovery, and markdown/WebMCP checks.
  • Add site/pipeline schema coverage, docs, regression tests, and a direct System.Security.Cryptography.Xml 10.0.6 override to clear NU1903.

Validation

  • dotnet build .\PowerForge.Web.Cli\PowerForge.Web.Cli.csproj -c Release --no-restore
  • dotnet restore .\PowerForge.Tests\PowerForge.Tests.csproj --force-evaluate
  • dotnet list PowerForge.PowerShell\PowerForge.PowerShell.csproj package --include-transitive --vulnerable
  • dotnet list PSPublishModule\PSPublishModule.csproj package --include-transitive --vulnerable
  • dotnet list PowerForge.Tests\PowerForge.Tests.csproj package --include-transitive --vulnerable
  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --no-restore --filter WebAgentReadinessTests

Full solution test note: dotnet test .\PSPublishModule.sln -c Release --no-restore is no longer blocked by NU1903, but it hit pre-existing unrelated failures and then hung, so I stopped dotnet/vstest.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Code Review — PR #318: Add PowerForge.Web agent readiness support

Overview

This PR adds a comprehensive agent-ready subsystem to PowerForge.Web that prepares, verifies, and scans static sites for AI/agent discovery compliance (robots.txt content signals, API catalog, Agent Skills, A2A, MCP, security headers, etc.). It also bumps System.Security.Cryptography.Xml and Pkcs from 8.0.0 to 10.0.6 to clear NU1903.

The feature is well-scoped and clearly documented. The main concerns below are roughly ordered by severity.


Issues & Suggestions

1. Duplicate constants — dead alias (minor bug)

// WebAgentReadiness.cs
private const string AgentBlockStart  = "# BEGIN PowerForge Agent Readiness";
private const string AgentBlockEnd    = "# END PowerForge Agent Readiness";
private const string HeadersBlockStart = "# BEGIN PowerForge Agent Readiness";   // identical to AgentBlockStart
private const string HeadersBlockEnd   = "# END PowerForge Agent Readiness";     // identical to AgentBlockEnd

HeadersBlockStart / HeadersBlockEnd are never used in the diff — they shadow AgentBlockStart / AgentBlockEnd with the same values. This is either dead code or a copy-paste mistake that will confuse future readers. Remove the Headers* pair and reuse the Agent* constants everywhere.


2. ResolveSitePath has no path-traversal guard

private static string ResolveSitePath(string siteRoot, string path)
{
    var normalized = path.Trim().Trim('"')
        .Replace('/', Path.DirectorySeparatorChar)
        .TrimStart(Path.DirectorySeparatorChar);
    return Path.GetFullPath(Path.Combine(siteRoot, normalized));
}

TrimStart removes a leading separator but does not prevent ../../ sequences. A config value like spec.OutputPath = "../../etc/cron.d/backdoor" would resolve to a path outside siteRoot. The fix is one extra check after GetFullPath:

var resolved = Path.GetFullPath(Path.Combine(siteRoot, normalized));
if (!resolved.StartsWith(siteRoot, StringComparison.OrdinalIgnoreCase))
    throw new ArgumentException($"Path '{path}' resolves outside site root.", nameof(path));
return resolved;

Similarly, AgentSkillSpec.SourcePath is expanded with Path.GetFullPath but is never verified to stay within any expected boundary — a user-controlled value from a site.json is a reasonable trust boundary, but it's worth a comment or a constraint in the docs.


3. Blocking async calls in a synchronous context

Both CLI and pipeline runner call ScanAsync synchronously:

// WebCliCommandHandlers.AgentReady.cs
var result = WebAgentReadiness.ScanAsync(...).GetAwaiter().GetResult();

// WebPipelineRunner.Tasks.AgentReady.cs
result = WebAgentReadiness.ScanAsync(...).GetAwaiter().GetResult();

GetAwaiter().GetResult() is fine in a console / pipeline runner that has no SynchronizationContext, but ScanAsync also creates a bare new HttpClient on each call. Consider extracting an IHttpClientFactory-style shared instance or at minimum a static readonly HttpClient. The per-call using var http = new HttpClient { ... } is safe here because the socket is closed immediately after use, but it is noted as a known footgun if the pattern is copied into a hot path.


4. Default CSP allows 'unsafe-inline' for both scripts and styles

The default ContentSecurityPolicyValue emitted into every site's _headers:

script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'

This is intentional for static-site compatibility (the doc calls it out), but it means the generated security headers would fail a strict CSP audit. Since this is the default, many users will ship it without realizing it weakens XSS protection. Consider:

  • Defaulting to a stricter policy and documenting how to loosen it, or
  • Emitting a warn-level check in Verify when the CSP contains unsafe-inline, so users are aware.

5. ScanAsync creates a new HttpClient inline

using var http = new HttpClient { Timeout = TimeSpan.FromMilliseconds(...) };

The using ensures disposal, but repeated instantiation puts pressure on the socket pool. For a CLI one-shot this is fine; if ScanAsync is ever called in a loop (e.g., multiple sites in one pipeline run), it would exhaust ephemeral ports. Accepting an optional HttpClient parameter would make the API testable and reuse-friendly without breaking the current call sites.


6. Test coverage is narrow but adequate for the feature scope

The two tests (Prepare_WritesDiscoveryFilesAndHeaders, RunPipeline_AgentReady_PreparesAfterBuildAndSitemap) cover the happy path well. Missing coverage:

  • Verify without a prior Prepare — does it report the right failures?
  • failOnFailures path — no test asserts that a bad site causes a non-zero exit code.
  • Path traversal — no test for OutputPath = "../../outside".
  • Scan operation — not tested at all (understandable without a live server, but a MockHttpMessageHandler would help).

7. agentready alias in dispatch and pipeline

"agent-ready" => HandleAgentReady(...),
"agentready" => HandleAgentReady(...),

The alias is undocumented in PrintUsage() and the docs. Either drop it to reduce surface area or document it.


8. Dependency bump scope

The System.Security.Cryptography.Xml 10.0.6 override is applied via an explicit PackageReference only to projects that already target net8.0/net10.0. The net472 frame is excluded. This is correct because the Windows-inbox assembly handles it there, but a comment in the .csproj would make the intentional exclusion clear to future maintainers.


Strengths

  • Documentation in Docs/PowerForge.Web.AgentReadiness.md is thorough and accurately describes what the tooling can and cannot do (Cloudflare Markdown for Agents caveat is well stated).
  • Schema files (powerforge.web.sitespec.schema.json, powerforge.web.pipelinespec.schema.json) are updated — good.
  • ResolveSpec appears to return a safe default spec when AgentReadiness is null, avoiding null-ref chains throughout.
  • Slugify is conservative (letter/digit only) — good for producing safe file names.
  • Prepare runs Verify internally and surfaces the results in a single pass — clean design.
  • TryDeleteDirectory in tests silently eats cleanup failures — correct pattern for temp-dir tests.

Summary

The two items that warrant a fix before merge are the duplicate constants (cosmetic but confusing) and the missing path-traversal guard in ResolveSitePath (low risk given current callers but a correctness hole). The CSP default and HttpClient concerns are worth addressing but are not blockers. Everything else is polish.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a4c5e7e97

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


var apiCatalogPath = Path.Combine(siteRoot, ".well-known", "api-catalog");
AddCheck(checks, "api-catalog", "api-auth-mcp-skill-discovery", "API Catalog (RFC 9727)",
ValidateApiCatalog(apiCatalogPath, out var apiCatalogMessage) ? "pass" : "fail",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor disabled discovery checks in verify mode

This check is unconditional even when agentReadiness.apiCatalog.enabled is set to false, so Verify() (and therefore Prepare(), which calls Verify()) can report Success=false for configurations that explicitly opt out of API catalog generation. The same pattern appears for other optional checks in this method, which makes feature flags ineffective and can fail CI when failOnFailures is enabled despite a valid config.

Useful? React with 👍 / 👎.

if (!string.IsNullOrWhiteSpace(apiCatalogPath))
{
written.Add(apiCatalogPath!);
linkTargets.Add(new HeaderLinkTarget("/.well-known/api-catalog", "api-catalog", "application/linkset+json"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use configured API catalog path in discovery link headers

The API catalog file path is configurable via AgentApiCatalogSpec.OutputPath, but the generated link target is hard-coded to /.well-known/api-catalog. When a site uses a custom output path, headers (and downstream checks/scanners that follow them) point to the wrong URL, so readiness signals become inconsistent and can be reported as broken even though the file was generated.

Useful? React with 👍 / 👎.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Code Review: PowerForge.Web Agent Readiness Engine

Overview

This PR adds a substantial new feature (~2,800 lines net) that enables static sites to prepare, verify, and scan AI/agent-readiness signals including robots.txt Content Signals, security headers, API catalog, Agent Skills, agents.json, A2A Agent Card, MCP Server Card, and structured data checks. The feature is split cleanly into models, a service class, CLI handlers, a pipeline task, JSON schema updates, and docs. Overall the code is well-structured and includes good defensive coding.


Correctness / Bugs

agentready alias not covered by pipeline schema
WebPipelineRunner.Tasks.cs and Dispatch.cs both dispatch on "agentready" (no hyphen) as an alias for "agent-ready", but the pipeline JSON schema (powerforge.web.pipelinespec.schema.json) only declares the "agent-ready" const. Users who write "task": "agentready" in their pipeline will get a schema validation error even though it works at runtime.

_headers always emits well-known route entries regardless of feature flags
UpdateHeaders in WebAgentReadiness.cs unconditionally appends CORS + Content-Type entries for /.well-known/api-catalog, /.well-known/agent-skills/index.json, /agents.json, /.well-known/agents.json, /.well-known/agent-card.json, and /.well-known/mcp/server-card.json even when those features are disabled in the spec. If only LinkHeaders is enabled with everything else off, the _headers file will still advertise routes for files that were never written.

ResolveSpec mutates the caller's spec object

// WebAgentReadiness.cs ~line 1843
if (spec.Enabled)
{
    spec.SecurityHeaders ??= new AgentSecurityHeadersSpec();
    ...
}
return spec;

This silently modifies the AgentReadinessSpec instance owned by the caller (SiteSpec.AgentReadiness). If the same spec object is reused or inspected after the call, it will have been patched in place, which is surprising and could cause issues in multi-call scenarios.

sitemap check in Verify always fails if file is missing, regardless of config
The sitemap check status is "fail" unconditionally when sitemap.xml is absent:

AddCheck(checks, "sitemap", "discoverability", "sitemap.xml",
    ValidateSitemap(sitemapPath, out var sitemapMessage) ? "pass" : "fail", ...)

Unlike every other check in the method, there is no "info" branch for when sitemap generation was never part of the configured pipeline. A site that only uses prepare + verify without a prior sitemap step will always see a fail here.


Code Quality

HasRobotsUserAgent has redundant case variants

private static bool HasRobotsUserAgent(string text)
    => text.Contains("User-agent:", StringComparison.OrdinalIgnoreCase) ||
       text.Contains("User-Agent:", StringComparison.OrdinalIgnoreCase);

OrdinalIgnoreCase already makes both checks match the same input. The second branch is dead code.

HasReasonableHeadingHierarchy(html) called twice
In AddHtmlSemanticsChecks, the method is called twice in the same AddCheck invocation (once for the status condition and once for the message). Extract the result to a local variable.

Inline Regex patterns are not cached
AddHtmlSemanticsChecks issues multiple Regex.IsMatch(html, @"...", ...) calls with ad-hoc patterns. Only GeneratedBlockRegex is compiled and cached at class level. For local verify operations this is fine, but ScanAsync would pay the compilation cost on every call. Consider adding static compiled fields alongside GeneratedBlockRegex for the patterns used in HTML semantics checks.

Synchronous wait on async in CLI and pipeline

// WebCliCommandHandlers.AgentReady.cs
result = WebAgentReadiness.ScanAsync(...).GetAwaiter().GetResult();

This is a standard pattern for console apps and is fine here, but it's worth noting that if the surrounding host ever becomes async-friendly, these call sites should be updated.


Performance

WebMcp check reads up to 500 HTML files

Directory.EnumerateFiles(siteRoot, "*.html", SearchOption.AllDirectories)
    .Take(500)
    .Any(file => File.ReadAllText(file).Contains("navigator.modelContext.provideContext", ...));

This can read a large amount of data for big sites. Since Any short-circuits on the first match, the worst case is 500 full HTML file reads. A streaming read (e.g. checking the first few KB of each file) would be more efficient.

Prepare always runs Verify immediately after writing
Every Prepare call ends with a full Verify call over the same files that were just written. For large sites with many HTML files (the semantics checks enumerate HTML) this doubles the I/O. Worth documenting or making opt-in if not already expected behavior.


Security

Default CSP includes unsafe-inline

"default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; ..."

This is pragmatic for documentation sites that use inline scripts and styles, and users can override it. The documentation and the doc comment clearly note it can be customized. The current default is fine as a starting point, but consider adding a note in the docs about tightening this for sites that don't need unsafe-inline.

Path traversal protection is solid
ResolveSitePath correctly rejects paths that escape the site root, and the dedicated test Prepare_RejectsAgentOutputPathOutsideSiteRoot verifies this.

CORS wildcard on well-known discovery resources
Using Access-Control-Allow-Origin: * on public agent discovery JSON documents (api-catalog, agents.json, etc.) is correct per the relevant specifications. No issue here.


Test Coverage

The four tests cover the key happy paths and one security path well:

  • Prepare_WritesDiscoveryFilesAndHeaders — good integration coverage of the prepare flow
  • RunPipeline_AgentReady_PreparesAfterBuildAndSitemap — good pipeline ordering test
  • Verify_HonorsDisabledOptionalAgentReadinessChecks — ensures disabled features produce no failures
  • Prepare_RejectsAgentOutputPathOutsideSiteRoot — important security regression test

Missing coverage worth considering:

  • No test for WriteMcpServerCard (missing endpoint warning path)
  • No test for UpdateHeaders respecting disabled features (which would surface the bug noted above)
  • No test for the sitemap "fail" when sitemap is absent but robots is also disabled

Dependency Update

Bumping System.Security.Cryptography.Xml from 8.0.0 → 10.0.6 across all projects to resolve NU1903 is correct and well-motivated. The PR explicitly lists the vulnerability validation commands, which is good practice.


Summary

This is a large, well-executed feature with clean architecture, solid defensive coding, and useful documentation. The three most actionable items before merging:

  1. Add "agentready" to the pipeline JSON schema (or remove the alias from the dispatcher if it's not intended)
  2. Guard _headers well-known route entries behind their respective feature flags
  3. Refactor ResolveSpec to return a new spec rather than mutating the caller's object (or document the mutation contract clearly)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b08cecc26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +141 to +145
var spec = ResolveSpec(options.AgentReadiness);
var checks = new List<WebAgentReadinessCheck>();
var warnings = new List<string>();

var robotsPath = Path.Combine(siteRoot, "robots.txt");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Short-circuit verify when agent readiness is disabled

Verify() resolves the spec and immediately runs all checks without honoring AgentReadinessSpec.Enabled == false, so agent-ready verify can fail on missing robots/headers/sitemap even when the feature is explicitly turned off in site.json. This makes a valid opt-out configuration fail CI for verify-mode workflows.

Useful? React with 👍 / 👎.

Comment on lines +108 to +109
if (spec.LinkHeaders)
written.Add(UpdateHeaders(siteRoot, spec, linkTargets));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Generate security headers independently of Link header toggle

Header generation is gated entirely by spec.LinkHeaders, so setting linkHeaders: false prevents writing _headers at all, including security/CORS directives controlled by securityHeaders. In that configuration (with security defaults still enabled), Prepare() produces no security headers and then Verify() reports failures, so a valid “no Link hints” setup becomes self-failing.

Useful? React with 👍 / 👎.

Comment on lines +691 to +693
sb.AppendLine("/.well-known/api-catalog");
sb.AppendLine(" Content-Type: application/linkset+json; profile=\"https://www.rfc-editor.org/info/rfc9727\"");
AppendCorsHeaders(sb, security);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use configured API catalog route in _headers resource block

The _headers block hard-codes /.well-known/api-catalog for content-type/CORS rules instead of using apiCatalog.outputPath. When a custom catalog path is configured, the generated catalog file won’t receive the intended static header rules while a non-existent default route does, leading to inconsistent discovery behavior (especially MIME/CORS checks) despite successful file generation.

Useful? React with 👍 / 👎.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Code Review: Add PowerForge.Web Agent Readiness Support

Overview

This PR adds a comprehensive agent-ready feature to PowerForge.Web — a prepare/verify/scan pipeline for generating and validating AI agent discovery assets (robots.txt content signals, Link headers, security headers, agents.json, A2A Agent Card, MCP server card, Agent Skills index, API catalog, OpenAPI detection). It also includes a direct System.Security.Cryptography.Xml 10.0.6 override to resolve NU1903 vulnerability advisory warnings.

The code is well-structured and the feature scope is clearly defined. Below are observations grouped by concern.


Security

ResolveSitePath path traversal guard is good — the path traversal check correctly resolves both the root and the target to full absolute paths before comparing, and handles the directory-separator-suffix edge case. The test Prepare_RejectsAgentOutputPathOutsideSiteRoot covers this explicitly. ✅

CSP default in docs is permissive — The example site.json in the README uses script-src 'self' 'unsafe-inline' and style-src 'self' 'unsafe-inline'. Since this is a documentation default it won't affect generated code, but it may be copy-pasted into real site configs. Consider adding a note that these are conservative starting points, not hardened values.

Input values written verbatim into _headers / robots.txt — Fields like spec.HstsValue and ContentSecurityPolicyValue flow directly into _headers. A newline (\n, \r) injected via a crafted config value could pollute the file with extra header entries. Consider stripping CR/LF from header values before writing them.


Correctness / Potential Bugs

HasRobotsUserAgent is trivially satisfied by its own outputBuildRobotsBlock always writes User-agent: *, so the verify check will always pass after prepare. This is arguably correct, but it means the check won't catch a case where crawler-specific rules were expected but missing.

AddHtmlSemanticsChecks duplicates regex evaluation — For example meta-robots calls Regex.IsMatch twice (once for the check, once for the message). These use ad-hoc inline regexes; moving them to static compiled fields alongside GeneratedBlockRegex would avoid repeated allocations on every verify call.

ReadFirstHtml fallback is filesystem-order-dependent — If index.html is absent, the method picks the first .html file found via Directory.EnumerateFiles, which has no defined order. Fine for heuristic use; just worth noting if consistency is ever needed.

ValidateAgentSkillsIndex silently skips external-URL skills — Skills whose url does not start with / are not digest-verified. The behaviour is correct for static sites but a comment noting the intention would help future maintainers.

WebMCP check reads up to 500 files in fullFile.ReadAllText is called on each file before the string.Contains check. A streaming StreamReader loop with early exit would reduce memory pressure on large sites.


Code Quality

ResolveSpec is a shallow copy — The resolved spec copies top-level properties but shares child object references (ApiCatalog, AgentSkills, etc.). Nothing currently mutates the children post-resolution, but documenting this invariant (or deep-cloning the children) would prevent a future regression. The existing test only checks that null properties remain null, not that shared references are not mutated.

BuildDefaultSkill raw string literal indentation — The default SKILL.md template is indented with four leading spaces inside the method body (artifact of C# indentation inside the $""" raw string). These spaces will appear at the start of every line in the generated SKILL.md, which breaks the YAML front-matter --- block. Please verify the generated file content is correct.

ScanAsync creates HttpClient directly — Fine for one-shot CLI/pipeline use, but it makes the scan path untestable without a live server. An HttpClient/HttpMessageHandler injection overload would enable unit testing; this could be tracked as a follow-up task.


Dependency / Package Changes

Upgrading System.Security.Cryptography.Xml to 10.0.6 to clear NU1903 is a sound approach. The explicit PackageReference on PowerForge.PowerShell.csproj forces the patched version transitively. Lock files across all four projects are updated consistently. ✅


Test Coverage

The seven new xUnit tests cover the main happy paths and important edge cases well. Notable gaps worth tracking as follow-up:

  • ScanAsync — No tests; the live HTTP path is currently untestable without server infrastructure or client injection.
  • BuildRobotsBlock with explicit BotRules — The prepare test uses the default all-allow config; a test with a disallow rule would catch formatting regressions.
  • Verify on corrupted prepare output — e.g., malformed JSON in the API catalog.

Documentation

Docs/PowerForge.Web.AgentReadiness.md is thorough and accurate. The pipeline examples are clear, and the note that Cloudflare Markdown-for-Agents is a host-level feature (not provable from static output) is an important caveat that is well-placed. ✅


Summary

Solid feature addition with good security hygiene (path-traversal guard, spec immutability design, NU1903 override). Three items to address before merging:

  1. Header value injection — strip newlines from user-supplied values written into _headers and robots.txt.
  2. BuildDefaultSkill raw string indentation — verify the generated SKILL.md has correct front-matter with no spurious leading spaces.
  3. WebMCP file scan — consider streaming reads to avoid loading up to 500 full HTML files into memory.

Everything else noted above is advisory / low priority.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7f0f5fd32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
private static void ExecuteAgentReady(JsonElement step, string baseDir, string lastBuildOutPath, WebPipelineStepResult stepResult)
{
var operation = GetString(step, "operation") ?? GetString(step, "mode") ?? "prepare";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop using pipeline mode as agent-ready operation

mode is already used by pipeline scheduling (ShouldExecuteStepMode) to mean execution mode constraints, but this fallback also interprets it as the agent-ready operation. For a step like { "task": "agent-ready", "mode": "ci" } (with no explicit operation), operation becomes ci, the switch falls into the default case, and the step throws instead of running the default prepare behavior.

Useful? React with 👍 / 👎.

Comment on lines +546 to +547
if (string.IsNullOrWhiteSpace(siteRoot))
return Array.Empty<string>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat implicit build output as cacheable agent-ready output root

Returning no expected outputs when siteRoot is omitted makes cache validation too weak for this task: ExecuteAgentReady can still run using lastBuildOutPath, but cache checks consider an empty output list as present. In that configuration, if generated agent-ready artifacts are removed between runs, the step can be incorrectly skipped as a cache hit and leave required files missing.

Useful? React with 👍 / 👎.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Code Review: PR #318 — PowerForge.Web Agent Readiness Support

Overview

This is a substantial, well-scoped feature that adds a three-operation engine (prepare, verify, scan) for generating and checking AI/agent-readiness discovery signals on static sites. The architecture — spec models, a single static service, CLI handlers, pipeline task, JSON schemas, docs, and tests — follows the established patterns in this codebase cleanly.


Strengths

  • Security-conscious path handling: ResolveSitePath and ToSiteRoute both enforce containment within the site root, and the test Prepare_RejectsAgentOutputPathOutsideSiteRoot verifies this is not bypassable via ../ traversal. Well done.
  • Spec immutability: ResolveSpec creates a copy rather than mutating the caller's object, and ResolveSpec_DoesNotMutateCallerSpec explicitly asserts this. Important when Prepare calls Verify with the same (resolved) spec internally.
  • Default-safe spec: ResolveSpec(null) returns a sensible enabled default, preventing silent no-ops when config is absent.
  • SHA-256 digest verification: Skill content integrity is verified against the index on Verify, and the test confirms local digest roundtrip.
  • Pipeline cache propagation: The fix to thread lastBuildOutPath into GetExpectedStepOutputs and capture it from cached build steps is correct and tested.

Issues

1. BotRules allows newline injection into robots.txt (security)

BuildRobotsBlock writes rule.UserAgent.Trim(), rule.Disallow, and rule.Allow values directly from config into robots.txt without stripping newlines. A user-agent value containing \n would corrupt the file format.

// WebAgentReadiness.cs ~line 2119
sb.Append("User-agent: ").Append(rule.UserAgent.Trim()).AppendLine();

Recommend stripping \r and \n (and normalising whitespace) from any value written into robots.txt:

static string SanitizeRobotsToken(string value) =>
    Regex.Replace(value.Trim(), @"[\r\n]+", " ");

Same applies to Allow/Disallow route values.

2. File size limits missing in TryGetTextAsync and ReadFirstHtml (performance / robustness)

TryGetTextAsync reads the full response body with ReadAsStringAsync and no size cap. A large HTML page or oversized API response during scan could consume significant memory. Similarly, ReadFirstHtml has no file-size guard.

For TryGetTextAsync, consider reading with a byte limit:

// e.g. cap at 2 MB
var bytes = await response.Content.ReadAsByteArrayAsync(cancellationToken);
if (bytes.Length > 2 * 1024 * 1024)
    return new HttpTextResult(false, "Response too large", string.Empty, response);
text = Encoding.UTF8.GetString(bytes);

3. GetAwaiter().GetResult() on async scan (minor, known pattern)

HandleAgentReadyScan and ExecuteAgentReady both call .GetAwaiter().GetResult() on ScanAsync. In a CLI host without a sync context this is safe, but it is a well-known deadlock risk if the call site ever gains a sync context. The existing codebase likely does this elsewhere, so this is low priority — but worth tracking.

4. WebAgentReadiness.cs is 1516 lines — exceeds the repo's 800-line discipline

AGENTS.md calls out node .\Build\linecount.js . 800 as the expected ceiling. At 1516 lines this file nearly doubles it. The logical sub-domains are well-defined (robots, API catalog, agent skills, agents.json, A2A card, MCP card, headers, HTML semantics, remote scan, shared helpers) — splitting into a Services/WebAgentReadiness/ subdirectory with partial files per sub-domain would bring each under budget without changing the public API.

5. Reflection test on a private method (test smell)

AgentReadyExpectedOutputsUseLastBuildOutputWhenSiteRootIsImplicit uses BindingFlags.NonPublic | BindingFlags.Static to invoke GetExpectedStepOutputs. This binds the test to an implementation detail (method name, signature, parameter order). The same assertion is more robustly covered by the full pipeline integration test RunPipeline_AgentReady_PreparesAfterBuildAndSitemap. Consider making GetExpectedStepOutputs internal and using [InternalsVisibleTo] rather than reflection, or relying solely on the integration test.

6. ShouldWriteHeaders can write _headers even with zero link targets

When SecurityHeaders.Enabled = true but all optional features are disabled and linkTargets is empty, ShouldWriteHeaders returns true and UpdateHeaders writes a _headers block containing only security headers (no Link: line). This is likely the intended behavior based on Prepare_WritesSecurityHeadersWhenLinkHeadersAreDisabled, but the condition spec.ApiCatalog?.Enabled == true || ... in ShouldWriteHeaders always evaluates to true for spec defaults (which auto-enable ApiCatalog). Worth confirming the security-headers-only path is intentional by adding a dedicated comment or test assertion.


Minor / Nits

  • Dependency bump is well-reasoned: The System.Security.Cryptography.Xml override from 8.0.0 → 10.0.6 to clear NU1903 is correct and the lock files are updated consistently across all four affected projects.

  • TrySendAsync treats 3xx as success (status >= 200 && status < 400) while TryGetTextAsync uses < 300. Since HttpClient follows redirects by default, the final response is usually 2xx, so this is a no-op in practice — but the inconsistency is a readability hazard. Align both to < 300.

  • Verify is called with a resolved spec internally from Prepare: This is correct but worth a one-line comment noting that spec here is already resolved, since the double-resolution path in Verify would otherwise be confusing.

  • Schema duplication (PascalCase + camelCase) for every property in powerforge.web.sitespec.schema.json is consistent with existing schema patterns in the repo, so no action needed — just noting it adds ~250 lines.


Test Coverage Assessment

Scenario Covered
Prepare writes all expected files
Prepare with custom OutputPath
Prepare rejects path traversal
Prepare with link headers disabled
Verify disabled short-circuit
Verify disabled optional checks (no failures)
ResolveSpec non-mutation
Pipeline integration (agent-ready after build+sitemap)
BotRules newline injection
Remote scan (live HTTP) ❌ (expected — HTTP mocking would help)
Large response body / OOM guard

Summary

The feature is well-designed and ready for use. The robots.txt injection issue (#1) is the only concern worth addressing before merging — the rest are robustness improvements or post-merge housekeeping. Excellent work on the path traversal guards and spec immutability.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b29cb062f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +548 to +549
if (string.IsNullOrWhiteSpace(siteRoot))
return Array.Empty<string>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disable cache hits for agent-ready scan steps

For agent-ready steps that run operation: "scan", pipelines typically provide only url and no siteRoot; this branch then returns an empty expected-output list. In RunPipeline, an empty output list is treated as present, so a prior successful scan can be replayed as a cache hit and the live HTTP checks are skipped on later runs, allowing remote regressions to pass unnoticed. scan should be excluded from caching (or use a remote-aware invalidation signal).

Useful? React with 👍 / 👎.

Comment on lines +554 to +558
Path.Combine(siteRoot, "robots.txt"),
Path.Combine(siteRoot, ".well-known", "api-catalog"),
Path.Combine(siteRoot, ".well-known", "agent-skills", "index.json"),
Path.Combine(siteRoot, ".well-known", "mcp", "server-card.json"),
Path.Combine(siteRoot, headersPath)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop requiring optional agent-ready artifacts for cache reuse

The cache output list for agent-ready is hard-coded to include artifacts like /.well-known/mcp/server-card.json, even though MCP is optional and disabled by default; as a result, AreExpectedOutputsPresent will fail on normal configurations and the step will never get a cache hit. This also ignores configured custom output paths, so valid runs are treated as cache misses indefinitely.

Useful? React with 👍 / 👎.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Code Review — PR 318: Add PowerForge.Web agent readiness support

Overview

This PR adds a comprehensive agent-ready pipeline task and CLI surface that generates, verifies, and scans agent-facing discovery assets (robots.txt signals, headers, agents.json, A2A card, MCP card, API catalog, Agent Skills index, OpenAPI detection). It also bumps System.Security.Cryptography.Xml / Pkcs from 8.0.0 to 10.0.6 to clear NU1903.

The design is solid and the feature coverage is excellent. A few issues and observations follow.


Security

Path-traversal guard is correct but SourcePath has a gap

ResolveSitePath uses Path.GetFullPath + prefix check and is properly tested by Prepare_RejectsAgentOutputPathOutsideSiteRoot. Good.

However, the skill.SourcePath read path calls Path.GetFullPath(skill.SourcePath.Trim().Trim('"')) but does NOT verify the resolved path sits inside the site root (or any other safe boundary). An operator who controls site.json could point sourcePath at "../../../../etc/passwd" and the file would be read and embedded into generated output. Consider applying the same ResolveSitePath guard or restricting to the config-file directory.

HttpClient instantiated per ScanAsync call

new HttpClient in a per-call pattern causes socket exhaustion if ScanAsync is ever called more than once per process. In a single-use CLI tool this is low-risk, but a code comment explaining the single-call assumption would prevent a future regression. A shared static SocketsHttpHandler is the safer long-term fix.


Code Quality / Design

Reflection in tests couples tests to implementation details

GetExpectedStepOutputs and IsCacheableStep are private methods tested via reflection. A rename or signature change makes the tests silently no-ops — Assert.NotNull(method) only guards the null case, not a signature mismatch. Consider making these internal and exposing them via [InternalsVisibleTo] so the compiler validates call sites.

--fail-on-warnings is silently aliased to --fail-on-failures

All three handlers treat --fail-on-warnings identically to --fail-on-failures, but the flag is absent from the help text. Either document it with distinct semantics (exit non-zero when Warnings.Count > 0), or remove the alias before it becomes a hard-to-change undocumented contract.

verify cacheability is unconfirmed

Only prepare is explicitly cacheable; scan is excluded. But verify produces no output files tracked by the cache. If verify is run in a cached pipeline and the cache key is warm, it may be skipped incorrectly. A test parallel to AgentReadyScanStepsAreNotCacheable for the verify operation would make the intent explicit.

Repeated null-coalescing for output paths

There are ~8 identical patterns like:
string.IsNullOrWhiteSpace(spec.OutputPath) ? ".well-known/api-catalog" : spec.OutputPath!
across the implementation. Small helper properties on the spec types would reduce duplication and make the null-suppression operators easier to audit.


Dependency Bump

The 8.0.0 -> 10.0.6 bump for Cryptography.Xml/Pkcs (+ Microsoft.Bcl.Cryptography, System.Formats.Asn1) to clear NU1903 is correct. The minimum-floor pin [10.0.6, ) is the right syntax for a security override, and lock files are updated consistently across all four projects.

One note: PowerForge.PowerShell.csproj gates the PackageReference on net8.0 Or net10.0. The project also targets net472 — confirm the net472 TFM does not resolve the old transitive version of Cryptography.Xml through a separate path.


Test Coverage

Good breadth: prepare, verify, disabled-path, no-mutate, pipeline integration, scan cacheability, optional-output honor, and path-traversal rejection are all covered. Gaps worth noting:

  • No test for the scan operation itself (requires HTTP mocking or a test server).
  • No test for --fail-on-warnings exit-code behavior.
  • Verify_HonorsDisabledOptionalAgentReadinessChecks asserts no fail checks but does not verify that warn checks for disabled features are also suppressed.

Minor Nits

  • Prepare_WritesDiscoveryFilesAndHeaders fixture has alt="" on the test image. If verify ever promotes the missing-alt warning to a failure, result.Success could break unexpectedly. A real alt value is safer.
  • All fixtures hardcode "dateModified":"2026-04-17". A clearly sentinel value like "2000-01-01" or a dynamic date would age better.

Summary

The implementation is well-structured, the path-traversal guard is correctly implemented, the documentation is thorough, and the test suite covers the main paths. The two items most worth addressing before merge are the SourcePath read boundary gap and the undocumented --fail-on-warnings alias. The reflection-in-tests concern is lower priority but worth tracking.

@PrzemyslawKlys PrzemyslawKlys merged commit 10757f1 into main Apr 17, 2026
6 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the feature/agent-readiness branch April 17, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant