Skip to content

Refine PowerForge Web social cards and asset workflows#325

Merged
PrzemyslawKlys merged 7 commits intomainfrom
wip/pspublishmodule-agent-readiness-20260422
Apr 24, 2026
Merged

Refine PowerForge Web social cards and asset workflows#325
PrzemyslawKlys merged 7 commits intomainfrom
wip/pspublishmodule-agent-readiness-20260422

Conversation

@PrzemyslawKlys
Copy link
Copy Markdown
Member

@PrzemyslawKlys PrzemyslawKlys commented Apr 22, 2026

Summary

  • adds reusable generated social-card themes with site, collection, and page-level overrides
  • moves card behavior into the PowerForge.Web engine: token merging, theme precedence, project metadata inference, and semantic metric widgets
  • improves visual rendering with measured text alignment, centered badges/monograms, cleaner spacing, long-title handling, SVG metric icons, bounded/capped fetched remote card media, and renderer cache invalidation
  • addresses review feedback by sharing CSS strategy and social metric normalization, avoiding sync-over-async asset reads, preserving entity-encoded URLs when rewrites do not match, keeping exact query-string redirect paths in nginx, sanitizing remote download UserAgent, guarding SourceUrl downloads with HTTPS + explicit host allow-lists, sanitizing SVG media before raster rendering, trimming local sitemap BOMs, adding sitemap fetch timeouts/depth guard, and fixing localization fallback selector output
  • documents social-card design guidance, theme tokens, data widgets, asset rewrite trust boundaries, and recommended presets for Evotec-owned sites such as blogs, APIs, examples, projects, RSS, QR/code pages, releases, and status/benchmark pages

Validation

  • git diff --check
  • PowerShell parse check for Build/Compare-WebSitemaps.ps1
  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --filter "FullyQualifiedNameWebSocialCardGeneratorTests|FullyQualifiedNameWebSiteSocialCardsTests|FullyQualifiedName~WebSiteAuditOptimizeBuildTests.AssetRewriteDownloadHelpers_SanitizeUserAgentAndRejectUnsafeHosts" - 56 passed
  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --filter "FullyQualifiedName~WebSiteAuditOptimizeBuildTests" - 67 passed
  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --filter "FullyQualifiedNameWebSocialCardGeneratorTests|FullyQualifiedNameWebSiteAuditOptimizeBuildTests.AssetRewriteDownloadHelpers_SanitizeUserAgentAndRejectUnsafeHosts" - 37 passed earlier in review cycle
  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --filter "FullyQualifiedNameWebSocialCardGeneratorTests|FullyQualifiedNameWebSiteSocialCardsTests|FullyQualifiedNameWebSiteAuditOptimizeBuildTests.Build_GeneratesApacheAndIisRedirectArtifacts_ForWildcardRedirects|FullyQualifiedNameWebSiteAuditOptimizeBuildTests.OptimizeDetailed_AssetRewrites_PreservesEncodedUrlWhenNoRuleMatches|FullyQualifiedName~WebSiteAuditOptimizeBuildTests.AssetRewriteDownloadHelpers_SanitizeUserAgentAndRejectUnsafeHosts" - 55 passed earlier in review cycle

Review notes

  • The localization fallback test was confirmed to fail on origin/main (8ea8cab) during triage, then fixed in this branch so the broader affected test class is now green.
  • Existing file-size concerns remain best handled as a follow-up split/refactor; this PR now addresses the functional/security blockers raised by review.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Code Review — PR #325: Refine PowerForge.Web social cards and asset workflows

Overview

This PR covers four distinct areas:

  1. Social card rendering — visual layout overhaul (vertical centering, brand panels, optical title offset, badge sizing) plus extending editorial body-image fallback beyond blog to any collection using the editorial preset.
  2. Asset optimizer — new SourceUrl/DownloadDependencies support for remote asset self-hosting, plus a new CssStrategy option (blocking / preload / async).
  3. Redirect generation — query-string-aware redirect rules for Apache/IIS/nginx, per-language host mapping for multi-domain deployments, and a 404.html auto-wire in .htaccess.
  4. Sitemap comparison helper — a new standalone PowerShell script (Build/Compare-WebSitemaps.ps1) for migration and redirect review workflows.

The scope is large but well-organized. Tests cover the behavioural changes meaningfully. Below are specific observations.


Positive highlights

  • NormalizeCssStrategy is defined in two separate files (WebAssetOptimizer.Images.cs and WebSiteBuilder.RenderAssetsAndRouting.cs) with identical switch expressions — good consistency, and both files share the same strategy vocabulary. Consider extracting to a shared helper if the two files ever diverge.
  • AppendHtmlAttribute is a clean refactor from the previous string-concatenation approach; HTML-encoding is applied uniformly.
  • FilterItemsForLanguages gracefully handles all three states (no filter, single-language, multi-language allow-list) without breaking backward compatibility.
  • TryGetRedirectSourceQuery / NormalizeRedirectSourcePath correctly split query strings from paths before generating rewrite rules — a real correctness fix for query-parameterised redirects.
  • BuildDownloadedAssetFileName hashes the full URI to avoid collisions — good defensive practice.

Issues / Suggestions

1. RewriteDownloadClient — static HttpClient with no DNS refresh

// WebAssetOptimizer.cs
private static readonly HttpClient RewriteDownloadClient = CreateRewriteDownloadClient();

A static HttpClient backed by a plain HttpClientHandler will hold DNS entries indefinitely. For a CLI/build-time tool this is low risk, but it is worth a brief comment explaining the lifetime expectation. If DNS could change during a long pipeline run, prefer SocketsHttpHandler with PooledConnectionLifetime.

2. Sync-over-async in DownloadText / DownloadBytes

return response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
return response.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult();

These are .GetAwaiter().GetResult() calls on async methods from a synchronous context. They can deadlock in environments with a custom synchronization context (e.g. running inside a test host or IDE plug-in). Use .ReadAsString() / .ReadAsByteArray() (the synchronous overloads, available in .NET 6+), or make the whole call chain async.

3. MeasureRenderedTitleGlyphInset — bare catch swallowing all exceptions

catch
{
    return EstimateTitleGlyphInset(fontSize, glyph);
}

This silently swallows OutOfMemoryException, ThreadAbortException, and other non-recoverable failures. Prefer catch (Exception) (or, better, catch the specific MagickException / InvalidOperationException you expect from ImageMagick).

4. Compare-WebSitemaps.ps1Get-AmpHtmlAlternateUrl makes unbounded HTTP probes

$ampProbeSources = $legacyUnique | Where-Object { Test-AmpProbeCandidate -Url $_ }
foreach ($legacyUrl in $ampProbeSources) {
    $ampUrl = Get-AmpHtmlAlternateUrl -Url $legacyUrl

With -DiscoverAmpHtml, every URL that passes Test-AmpProbeCandidate triggers a live HTTP request. For large sitemaps this can produce hundreds or thousands of requests with no rate-limiting or timeout beyond Invoke-WebRequest's default. Consider adding a $ThrottleMs parameter or -MaxAmpProbes guard, and document the expected network cost in the parameter help.

5. WriteProjectExternalRedirectCsv — manual CSV serialization

rows.Add(string.Join(",",
    CsvEscape(legacyUrl),
    CsvEscape(targetUrl),
    "301",
    "project-catalog",
    CsvEscape(slug),
    CsvEscape(notes)));

The rest of the codebase uses Export-Csv (PowerShell side) and SplitApacheCsvLine for reading. Writing CSV manually risks subtle edge cases (numeric status field skips escaping, trailing \r\n on the last row is added but middle rows use string.Join("\r\n", rows) which should be fine). The status column "301" is never quoted regardless of content — fine for an integer, but worth a comment. More critically: is "project-catalog" and slug under source_id safe to assume never contains a comma? It could if slugs allow hyphens but not commas — still worth the CsvEscape call for defensive consistency.

// Suggested: wrap all fields unconditionally
rows.Add(string.Join(",",
    CsvEscape(legacyUrl),
    CsvEscape(targetUrl),
    CsvEscape("301"),          // harmless, consistent
    CsvEscape("project-catalog"),
    CsvEscape(slug),
    CsvEscape(notes)));

6. BuildNginxRequestUriMatcher — double-escaping concern

var literal = exactPath + "?" + sourceQuery;
return "\"^" + Regex.Escape(literal) + "$\"";

Regex.Escape will escape the literal ? in sourceQuery to \?. In an nginx if ($request_uri ~ …) regex, \? matches a literal question mark — which is correct. However, the query string itself is not URL-decoded before comparison, so if the incoming query contains encoded characters (%20, etc.) the rule may not fire. This is probably acceptable for the current use case (WordPress legacy IDs are numeric), but worth a note in a comment.

7. Get-SlugVariants — hardcoded language codes

if ($current -match '^(.*?)-(pl|fr|de|es)$' …)

The supported locale suffixes (pl, fr, de, es) are baked in. If another locale suffix is needed downstream, this regex must be updated. A parameter like $LocaleSuffixes would make the script more reusable without changing its default behavior.

8. Minor: OptimizeNetworkHints visibility change

// Before: private static
// After:  internal static
internal static string OptimizeNetworkHints(string html)

This is intentionally widened so WebAssetOptimizer can call it directly. Fine, but the change is undocumented — a short inline comment noting why the visibility was raised (cross-assembly call from optimizer) would help future readers.

9. data-cfasync="false" on all deferred scripts

var jsHtml = string.Join(Environment.NewLine, jsLinks.Select(j =>
    $"<script src=\"{j}\" defer data-cfasync=\"false\"></script>"));

data-cfasync="false" is a Cloudflare Rocket Loader opt-out attribute. This hardcodes a CDN-specific behaviour for all users, including those not using Cloudflare. Consider making it conditional on a spec option (e.g. Head.DisableRocketLoader: true) rather than always emitting it.


Test coverage

  • The three new WebSiteSocialCardsTests tests cover: editorial body-image fallback, explicit-over-body-image precedence, and the renamed blog test — good.
  • The new WebSiteLocalizationFeaturesTests test validates the languages allow-list correctly filters out non-listed languages while keeping their hreflang tags — solid.
  • The WebSocialCardGeneratorTests additions cover font fallback, optical-offset range, and token-override path.
  • There are no tests for DownloadRewriteAsset, RewriteDownloadedCssDependencies, or NormalizeApacheRedirectRow — the new async/network download path and the language-host mapping logic are not tested. Given that these involve network I/O or complex transformation logic, even minimal unit tests with a mock/stub would strengthen confidence.

Summary

The PR is well-structured and the changes are generally clean. The two most actionable issues before merge are:

  1. Sync-over-async (Customized Release Path #2) — real potential for deadlock in hosted environments.
  2. Bare catch in MeasureRenderedTitleGlyphInset (Update for building binary modules #3) — masks non-recoverable failures.

Everything else is advisory / quality-of-life. Overall this is a solid set of improvements.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6b9529240

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 526 to 528
var decodedUrl = System.Web.HttpUtility.HtmlDecode(url);
var replaced = ApplyRewriteRules(decodedUrl, rewrites);
return replaced == url ? match.Value : $"{match.Groups["attr"].Value}=\"{replaced}\"";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve entity-encoded URLs when no rewrite applies

RewriteHtmlAssets now decodes href/src values before rewrite matching, but it still compares the rewritten value against the original encoded attribute text. For URLs containing entities (for example &amp; in query strings), this causes a write even when no rewrite rule matched, mutating the HTML output and potentially changing URL parsing/escaping semantics whenever asset rewrites are enabled.

Useful? React with 👍 / 👎.

Comment on lines +651 to +652
var exactPath = path == "/" ? "/" : path.TrimEnd('/');
var literal = exactPath + "?" + sourceQuery;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match trailing-slash paths in query-based Nginx redirects

The Nginx matcher for query-specific redirects removes the trailing slash from the source path before building the request_uri regex. As a result, a source like /legacy/?id=1 generates a matcher for /legacy?id=1, so requests that keep the slash (/legacy/?id=1) do not redirect. This introduces missed redirects for a valid source form and makes behavior inconsistent with the Apache/IIS generation in the same file.

Useful? React with 👍 / 👎.

Add reusable generated-card theme resolution, token merging, metric widgets, and project metadata inference.

Improve renderer alignment, spacing, semantic metric icons, and document the design guidance for owned websites.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review — PR #325: Refine PowerForge Web social cards and asset workflows

Overview

This PR delivers three major areas of work:

  1. Named social-card themes — reusable presets with token deep-merging and collection/page-level overrides
  2. Metric data widgets — semantic metric chips inferred from project metadata or configured explicitly
  3. Visual rendering improvements — optical text alignment, centered badge/monogram, brand panels, cleaner vertical rhythm, and long-title handling

The change is large (~4100 additions / 264 deletions) but well-scoped and comes with 51 passing tests. The architectural approach is sound. Below are specific observations.


Issues

NormalizeCssStrategy is duplicated

WebAssetOptimizer.Images.cs and WebSiteBuilder.RenderAssetsAndRouting.cs each contain an identical NormalizeCssStrategy private static method with the same switch table. If the strategy list ever needs to grow (e.g. add "defer") it will need to be changed in two places. This should be extracted to a shared utility.

EstimateTitleGlyphInset discards both parameters

private static int EstimateTitleGlyphInset(int fontSize, char glyph)
{
    _ = fontSize;
    _ = glyph;
    return 0;
}

Both parameters are explicitly discarded and the method always returns 0. This looks like a placeholder that was left behind when the real implementation (MeasureRenderedTitleGlyphInset) was written. The method is only called as a fallback when ImageMagick measurement throws or returns no pixels, so the behaviour (fall back to zero offset) is correct — but the discards suggest either the body was not implemented, or the method signature should be () with no parameters to make the intention clear.

NormalizeSocialCardMetrics exists in two classes with different helpers

WebSiteBuilder.SocialCards.cs has a NormalizeSocialCardMetrics that calls TrimSocialCardMetricText; WebSocialCardGenerator.Renderer.cs has a version that calls TrimSingleLine. They do the same job through different helpers. If the truncation logic ever diverges, rendered output will differ from the fingerprinted input, potentially causing cache misses or stale cards. One of these should delegate to the other, or both should call the same helper.


Minor / style observations

Import-LocalSitemapUrls doesn't handle sitemap indexes (Compare-WebSitemaps.ps1): the remote Import-SitemapUrls recursively follows <sitemapindex> nodes, but the local version only handles <urlset>. Passing a local sitemap index file will throw. A comment or guard would clarify the intended limitation.

No timeout on Invoke-WebRequest (Compare-WebSitemaps.ps1, line ~84): the script uses $ErrorActionPreference = 'Stop' (good), but remote sitemap fetches have no -TimeoutSec parameter. A slow host will hang the script indefinitely.

trimmed.Skip(1).All(Uri.IsHexDigit) in IsSafeCssColor: LINQ on a string allocates an enumerator for each validation call. Since this runs per-metric per-card, a simple for loop would be more direct and avoids the allocation.

Redundant case-sensitive lookup in ResolveSocialCardTheme: the method calls TryGetValue (case-sensitive) and then, on failure, falls back to a linear foreach with OrdinalIgnoreCase. In practice the keys come from JSON config where casing is fixed, so the two-phase lookup rarely differs. A single OrdinalIgnoreCase dictionary (or a case-insensitive lookup to start) would be simpler. Not a bug, just extra work.

Formatting nit — dangling comma in testing helper: WebSocialCardGenerator.Renderer.cs, GetTitleOpticalOffsetForTesting:

CtaLabel = "Learn More"
,
FrameInset = 0,

The comma placement on its own line stands out from the rest of the initializer style in this file.


Positive highlights

  • Renderer version bump to v8 and inclusion in the hash key ensures stale cached cards are invalidated automatically — good defensive practice.
  • ConcurrentDictionary<string, int> for glyph/centering caches avoids repeated ImageMagick renders for the same font/size/glyph combination across pages.
  • MergeSocialCardThemeTokens deep-copies before merging so site-level token maps are never mutated.
  • Static RewriteDownloadClient with AutomaticDecompression follows .NET best practices for HttpClient lifetime.
  • Pixel-level ink tests (MeasureInkLeft, MeasureInkBox) are an unusually thorough way to verify visual alignment regressions and give confidence that optical-offset logic is working correctly at the rendered level.
  • 51 tests passing with explicit filter coverage of both WebSocialCardGeneratorTests and WebSiteSocialCardsTests.

Summary

The core architecture (theme resolution precedence, token merging, metric inference, cache fingerprinting) is solid and the test coverage is strong. The three concrete issues above (duplicated NormalizeCssStrategy, placeholder EstimateTitleGlyphInset, dual NormalizeSocialCardMetrics) are worth addressing before merge to prevent silent drift. The minor points are low risk but worth a quick pass.

Share CSS strategy and social metric normalization, preserve encoded asset URLs when rewrites do not match, and keep trailing slashes in query-based nginx redirect matchers.

Replace sync-over-async asset downloads with synchronous reads and cover the Codex review cases with regression tests.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review — PR #325: Refine PowerForge Web social cards and asset workflows

Overview

This PR adds reusable social-card themes with site/collection/page-level overrides, moves card rendering logic into the PowerForge.Web engine, improves SVG/PNG visual output, fixes a URL encoding preservation bug in asset rewrites, and ships a new Compare-WebSitemaps.ps1 build utility. The scope is broad but the changes are generally well-engineered. Key concerns are below.


File Size Violations

AGENTS.md mandates an 800-line file limit (node .\Build\linecount.js . 800). Several files in this PR exceed that limit significantly:

File Lines Over limit by
WebSocialCardGenerator.Renderer.cs 1,696 +112%
WebSocialCardGenerator.cs 1,438 +80%
WebSiteBuilder.RenderAssetsAndRouting.cs 1,376 +72%
WebSiteBuilder.SocialCards.cs 1,022 +28%
Build/Compare-WebSitemaps.ps1 898 +12%

The two WebSocialCardGenerator files together are the most actionable split. Renderer.cs contains seven independent layout engines (AppendSpotlightLayout, AppendShelfLayout, AppendReferenceLayout, etc.) that could each live in their own partial class file, and the remote-media plumbing (TryLoadImageSource, GetRemoteImageBytes, FetchRemoteImageBytes) is a natural extraction candidate.


Known Failing Test

A broader FullyQualifiedName~WebSiteAuditOptimizeBuildTests run currently fails on Build_LocalizationSwitcher_FallsBackToDefaultLanguage_WhenTranslationMissing; this appears unrelated to the review changes and was left untouched.

Leaving a pre-existing test failure in place is risky — CI baselines erode when failing tests are normalised as "known issues." Before merge, please either:

  • Confirm the failure is truly pre-existing (i.e. fails on main before this branch) and file a tracking issue, or
  • Fix it if it turns out to be a latent regression exposed by this PR's changes.

Security / Robustness

No response size limit on remote image fetching

FetchRemoteImageBytes() (Renderer.cs ~L1539–1549) reads the full HTTP response into a MemoryStream with no size guard:

using var stream = response.Content.ReadAsStream();
using var memory = new MemoryStream();
stream.CopyTo(memory);
return memory.ToArray();

A slow or large remote image URL (e.g. a multi-hundred-MB file, or a Content-Length-less chunked response) will buffer into process memory without bound during a site build. Add a cap — e.g. read at most 10 MB and bail:

const int MaxBytes = 10 * 1024 * 1024;
var buffer = new byte[MaxBytes + 1];
int read = stream.ReadAtLeast(buffer, buffer.Length, throwOnEndOfStream: false);
if (read > MaxBytes)
    return null; // image too large
return buffer[..read];

Positive: EscapeXml() is applied consistently to all user-supplied strings before they are injected into SVG output. Path-traversal protection for asset resolution is in place. HTTP timeout is set to 10 s. No sync-over-async (the HttpClient.Send() + Lazy<> pattern is synchronous-correct here, not a deadlock risk).


PR Scope

AGENTS.md: "Commit frequently. Avoid 'big bang' diffs that mix unrelated changes."

At 4,179 additions this PR bundles:

  1. Social-card theme engine (the main feature)
  2. CSS strategy normalization (WebAssetCssStrategy.cs)
  3. URL encoding preservation in asset rewrites
  4. Trailing-slash fix in nginx redirects
  5. A new 898-line sitemap comparison build script
  6. Multiple doc files

Items 2–5 are independent bug/quality fixes. Even if splitting is impractical now, future work in this area should prefer narrower PRs to keep reviewability and bisectability intact.


Architecture

  • SocialCardMetricNormalizer is a clean, focused extraction — exactly the right approach per AGENTS.md's "prefer reusable services" guidance.
  • WebSiteBuilder.SocialCards.cs at 1,022 lines is doing too much in one place: orchestration, style/variant/color-scheme resolution, metric inference, and asset-URI conversion. InferSocialCardMetrics() and the asset-URI pipeline look like natural service extractions for a follow-up.
  • WebAssetCssStrategy.Normalize() is a small, single-responsibility static helper — well-scoped.

Test Coverage

53 tests passing including pixel-level PNG validation via ImageMagick is a strong signal. Coverage of the main scenarios looks solid:

  • Theme precedence / deep-merge
  • Per-collection overrides
  • Long title adaptation
  • Remote media caching and fallback
  • Path-traversal rejection
  • Encoding preservation in asset rewrites

One gap: there is no test that exercises the FetchRemoteImageBytes size-limit path (because it does not exist yet). Adding one alongside the size guard would complete the story.


Summary

Area Status
Core social-card theme engine ✅ Well-implemented
SVG escaping / XSS safety ✅ Consistent
Test coverage ✅ Strong
HTTP timeout ✅ Present (10 s)
Remote image size limit ❌ Missing
File size guideline (800 lines) ❌ Multiple violations
Known failing test ⚠️ Needs triage before merge
PR scope / commit discipline ⚠️ Broad but acceptable given feature size

The core work is solid. The two blockers to address before merge are the missing response size cap and the failing test triage. The file-size violations are worth a follow-up split rather than a blocking issue.

Cap remote image downloads used by generated social cards and add a regression test for oversized streams.

Fix the localization fallback selector regression surfaced during review triage so the broader website build tests pass.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: PR #325 — Refine PowerForge Web social cards and asset workflows

This is a substantial PR (4268 additions, 277 deletions) touching social card rendering, asset optimization, redirect generation, CSS strategy, and a new sitemap-comparison helper. The test coverage and validation evidence in the PR description are appreciated. Below are findings ordered by severity.


🔴 Medium Severity

Bug: Variable shadowing in Compare-WebSitemaps.ps1 (~line 744)

Inside the if ($DiscoverAmpHtml) block, a loop assigns:

$legacyNormalized = Get-NormalizedUrl -Url $legacyUrl

This shadows the outer script-scope $legacyNormalized array (built on line ~691). PowerShell has no block scope, so after the if block, $legacyNormalized holds a single string (the last iterated value) rather than the full array. The result object built afterward — @($legacyNormalized).Count and $_ -in $newNormalized comparisons — will silently produce wrong counts whenever $DiscoverAmpHtml is set.

Fix: Rename the inner loop variable (e.g. $innerLegacyNormalized).


Security: SourceUrl downloads without SSRF guard (WebAssetOptimizer.cs)

DownloadRewriteAsset validates that the scheme is http/https but does not restrict the target host. A site.json entry pointing to http://169.254.169.254/latest/meta-data/ would be fetched in cloud environments. Even if site configs are fully trusted today, the trust boundary should be explicitly documented, and an optional host allowlist parameter would be a good defence-in-depth addition.


Security: CRLF injection via UserAgent field (WebAssetOptimizer.cs)

request.Headers.TryAddWithoutValidation("User-Agent", userAgent);

TryAddWithoutValidation bypasses header-value validation. A UserAgent value containing \r\nX-Injected: evil would cause CRLF header injection. Most .NET SocketsHttpHandler versions absorb this silently. Trim the string to a single line before use:

request.Headers.TryAddWithoutValidation("User-Agent", userAgent.Split('\n', '\r')[0].Trim());

🟡 Low Severity

Bug: Nginx query-string redirect path gets a spurious trailing slash (WebSiteBuilder.Redirects.cs)

NormalizeRedirectSourcePath unconditionally appends / to the path. For a redirect configured as /legacy?id=1 (no trailing slash), the nginx pattern becomes ^/legacy/\?id=1$, which never matches a real request for /legacy?id=1. Query-string redirects should skip the trailing-slash normalisation, or the behaviour should be explicitly documented.

Bug: Dead code in ResolveSocialCardTheme (WebSiteBuilder.SocialCardThemes.cs)

if (social.GeneratedCardThemes.TryGetValue(themeKey.Trim(), out var direct))
    return direct;

// This linear scan is unreachable — TryGetValue already uses OrdinalIgnoreCase
foreach (var pair in social.GeneratedCardThemes)
{
    if (string.Equals(pair.Key, themeKey.Trim(), StringComparison.OrdinalIgnoreCase))
        return pair.Value;
}

The fallback linear scan can never be reached and should be removed.

Bug: Missing null guard on Path.GetDirectoryName (WebAssetOptimizer.cs)

Directory.CreateDirectory(Path.GetDirectoryName(destinationPath)!);

GetDirectoryName returns null for root-level paths. The ! operator suppresses the warning but Directory.CreateDirectory(null!) throws ArgumentNullException. Add a null check consistent with the Source-based branch below it.

Bug: ParseSocialCardMetricsString — comma is both a metric separator and a potential value character

Splitting on [';', ','] means a locale-formatted number like "1,200" inside a metric value segment would be split incorrectly. Semicolon-only separation (or explicit documentation that commas are treated as delimiters) would prevent silent truncation.


🔵 Code Quality

catch without exception type (WebSocialCardGenerator.Renderer.cs)

IsFontFamilyAvailable and CreateAvailableMagickFontFamilies still use bare catch {} blocks. The rest of the PR uses catch (Exception). Bare catch can swallow non-Exception-derived objects on older runtimes. Unify to catch (Exception).

SocialCardMetricNormalizer.Trim accessibility mismatch

The method is public on an internal static class. This has no external effect but is misleading. Prefer internal static.

"inline" CSS strategy silently maps to "blocking" (WebAssetCssStrategy.cs)

A user setting CssStrategy: "inline" expecting CSS inlining will get blocking behaviour with no error. Either document "inline" as an alias for "blocking" or reject the value with a clear message.

Double Trim() on themeKey (WebSiteBuilder.SocialCardThemes.cs)

themeKey.Trim() is called twice per lookup branch. Assign to a local variable once at the top of the method.


📋 PowerShell (Compare-WebSitemaps.ps1)

  • No timeout on Invoke-WebRequest (lines ~84, ~382): A slow sitemap server will hang the script indefinitely. Recommend adding -TimeoutSec 30 or a $FetchTimeoutSec parameter.
  • Recursive Import-SitemapUrls without depth limit: A malformed sitemap index pointing to itself causes unbounded recursion. Add a $depth guard (max 3 recommended).
  • [System.Uri] $Url cast without try/catch: An invalid URL in the sitemap halts the entire script when $ErrorActionPreference = 'Stop'. Wrap in a try/catch or pre-validate.
  • No Pester tests: The variable-shadowing bug above would be caught immediately by a Pester test that runs with $DiscoverAmpHtml set. Consider adding basic coverage.

ℹ️ Informational

  • data-cfasync="false" added unconditionally (WebSiteBuilder.OutputRendering.cs): This is Cloudflare Rocket Loader-specific. Adding it for all users may cause confusion for non-Cloudflare deployments. Consider making it opt-in via a site config flag.
  • ErrorDocument 404 /404.html auto-injection (WebSiteBuilder.Redirects.cs): This depends on the 404 page existing at build time. A comment noting the heuristic nature of the check would help future maintainers.
  • ContentLength vs stream cap in ReadRemoteImageBytes (WebSocialCardGenerator.Renderer.cs): The Content-Length header check is a fast-path optimisation, but since servers can lie, a comment clarifying that the stream-based 10 MB cap is the authoritative guard would improve readability.

Test Coverage Gaps

Area Gap
DownloadRewriteAsset No test exercises SourceUrl + DownloadDependencies + UserAgent HTTP path
CSS strategy cascade priority --css-strategy CLI → spec → registry → default fallback not covered
NormalizeApacheRedirectRow edge cases Language prefix stripping on uppercase/mixed-case paths
Compare-WebSitemaps.ps1 No Pester tests at all; the variable-shadowing bug is invisible without them

Summary

The core social card rendering improvements and the entity-encoded URL fix are well-implemented and properly tested. The main actionable items are:

  1. Fix the variable shadowing bug in Compare-WebSitemaps.ps1 — it silently breaks result counts.
  2. Document or restrict the SSRF surface for SourceUrl downloads.
  3. Sanitize UserAgent before passing to TryAddWithoutValidation.
  4. Consider whether query-string redirects should bypass the trailing-slash normalisation.

Everything else is low-severity and could be addressed in follow-up PRs.

Fix sitemap AMP discovery variable shadowing, sanitize remote asset download headers, add SourceUrl host allow-listing and private IP rejection, and preserve exact query redirect paths for nginx.

Document the remote asset download trust boundary and add regression coverage for the review cases.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: PR #325 — Refine PowerForge Web Social Cards and Asset Workflows

Overview

This is a substantial PR that adds reusable social card themes, metric chip widgets, smarter asset rewriting with remote downloads, optical text alignment improvements, language filtering, and Apache/Nginx query-string redirect handling. Test coverage for the new features is solid (55+ passing tests cited in the PR description). The implementation is generally well-structured, with good separation of concerns across new files (WebSiteBuilder.SocialCardThemes.cs, SocialCardMetricNormalizer, etc.).


🔴 Critical

1. SVG Injection in Social Card Rendering

Location: WebSocialCardGenerator.Renderer.csCreateMagickImage() path for .svg files

SVG assets are loaded via File.ReadAllBytes() and passed directly to ImageMagick without sanitization. SVG files can embed <script> tags, onload handlers, and external entity references (XXE). If user-configurable logo paths ever point to an untrusted or attacker-controlled SVG, this becomes a code execution vector.

if (Path.GetExtension(localPath).Equals(".svg", StringComparison.OrdinalIgnoreCase))
    return CreateMagickImage(File.ReadAllBytes(localPath), localPath, widthHint, heightHint);

Recommendation: Strip <script>, event attributes, and external entity refs before passing to ImageMagick, or restrict SVG sources to a pre-approved allow-list (e.g., only from the project's own asset directory).


🟠 Major

2. User-Agent Header Validation is Incomplete

Location: WebAssetOptimizer.csNormalizeHeaderSingleLine()

CRLF injection is correctly blocked by taking only the first line, but there are no guards on:

  • Max length — an oversized User-Agent could hit server header limits or downstream issues
  • Non-ASCII / remaining control characters (e.g., \t, \x00\x1F)
private static string NormalizeHeaderSingleLine(string? value)
{
    var firstLine = value.Split(new[] { '\r', '\n' }, StringSplitOptions.None)[0];
    return firstLine.Trim();
}

Recommendation: Add if (firstLine.Length > 512) firstLine = firstLine[..512]; and strip remaining control characters (Regex.Replace(firstLine, @"[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]", "") or similar).

3. Remote Image Cache is Unbounded

Location: WebSocialCardGenerator.Renderer.csRemoteImageByteCache

Each remote image is individually capped at 10 MB (good), but the ConcurrentDictionary that holds them has no total size limit or eviction policy. A build that processes hundreds of pages with distinct remote images could exhaust process memory.

private static readonly ConcurrentDictionary<string, Lazy<byte[]>> RemoteImageByteCache = new(StringComparer.Ordinal);

Recommendation: Add a total cache size limit (e.g., 100 MB) with LRU eviction, or at minimum bound the cache entry count.

4. BOM Handling Inconsistency in Compare-WebSitemaps.ps1

Location: Lines 84–86 (remote fetch) vs. line 116 (local file read)

Remote sitemaps correctly trim UTF-8 BOM:

$content = $content.TrimStart([char] 0xFEFF)

But local XML loaded via Get-Content -Raw does not trim BOM before casting to [xml]. A UTF-8-with-BOM local sitemap will fail to parse.

Recommendation: Apply the same TrimStart([char] 0xFEFF) before [xml] $xml = ... for the local path.

5. Documentation Contradicts Code on Theme Precedence

Location: PowerForge.Web.ContentSpec.md — theme selection section

The docs state that "direct page overrides and collection-specific style/variant/color maps still win over named theme defaults," which reads as if the comparison is between page overrides and the named theme. But the code actually applies: page meta → named theme → global site defaults. The phrase "still win over" implies the theme takes priority over something, but the direction isn't clear and could mislead consumers of this API.

Recommendation: Replace with an explicit precedence list:

Page-level meta.social_card_* overrides → Named theme tokens → Global social.generated_card_* site defaults


🟡 Minor

6. Title Glyph Inset Cache is Unbounded

Location: WebSocialCardGenerator.Renderer.csTitleGlyphInsetCache

The cache key includes font family, size, and glyph, so in practice the upper bound is small (few hundred entries). Still worth noting for long-running processes or highly parameterized builds.

Recommendation: Document the expected max size, or add an entry limit with a comment explaining why overflow is unlikely.

7. Hardcoded Renderer Version Could Cause Stale Cache Hits

Location: WebSocialCardGenerator.Renderer.csRendererVersion = "social-card-renderer-v8"

If the rendering algorithm changes in a future commit but this string is not bumped, previously cached social card PNGs will not be regenerated even though they should be.

Recommendation: Consider deriving this from assembly file version or a hash of renderer configuration, so changes automatically invalidate the cache.

8. Missing Test Coverage Areas

The new tests are thorough for the happy paths. A few gaps worth noting for follow-up:

  • SVG asset rewriting via DownloadRewriteAsset() — no test exercises the SVG download path
  • RewriteDownloadedCssDependencies() — complex regex substitution for CSS url() references has limited coverage
  • Apache language redirect host map (new feature, lines ~2432–2503) — no test
  • Nginx query-string redirect matching via BuildNginxRequestUriMatcher() — no test

✅ Positive Observations

  • Host validation for remote downloads is solid: blocks localhost, RFC-1918 ranges (10/8, 172.16/12, 192.168/16), link-local (169.254/16), IPv6 link-local/site-local, and supports wildcard allow-lists.
  • HTML attribute encoding after URL rewrite (HtmlAttributeEncode) correctly prevents XSS in rewritten markup.
  • FirstNonEmpty() helper, WebAssetCssStrategy.Normalize(), and SocialCardMetricNormalizer are clean abstractions that improve readability.
  • Nginx query-string redirect correctly uses $request_uri matching instead of a plain location block, avoiding the need for RewriteCond equivalents in nginx config.
  • Apache QSD flag usage for query-string redirects is correct — matches source query then discards it from the target URL.
  • Theme metric precedence (page → theme → site) is implemented correctly even if the docs description is ambiguous.
  • Set-StrictMode -Version Latest and $ErrorActionPreference = 'Stop' in Compare-WebSitemaps.ps1 are good defensive PowerShell practices.

Summary

Severity Count Items
🔴 Critical 1 SVG injection
🟠 Major 4 User-Agent validation, unbounded remote cache, BOM inconsistency, doc mismatch
🟡 Minor 3 Glyph cache, hardcoded renderer version, test gaps

The SVG sanitization issue and the unbounded remote image cache are the most important items to address before merging, as they have correctness/security implications beyond this PR's scope. The other items are lower-risk but worth tracking.


Review generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review — PR #325: Refine PowerForge Web social cards and asset workflows

Overall this is a well-structured PR that adds meaningful capability (reusable social-card themes, measured text alignment, SVG sanitization, SSRF protection, localization allowlist, nginx query-exact redirects). The test coverage is solid and the validation story in the PR description is thorough. A few things worth addressing before merge:


Security

IsAllowedRewriteSourceUri: empty allowlist = open by default
WebAssetOptimizer.cs — when SourceUrlAllowedHosts is not configured, configuredHosts.Length == 0 returns true, allowing downloads from any non-SSRF remote host. This is a footgun: omitting the allowlist silently grants broad network access. Either:

  • Document this prominently near the property declaration, or
  • Default to deny-all when no allowlist is provided and require explicit ["*"] to mean "any trusted external host".

HTTP scheme allowed for SourceUrl downloads
IsAllowedRewriteSourceUri permits http:// alongside https://. Downloading over plain HTTP during a build can expose assets to MITM-substitution. Unless there's a specific case that requires HTTP, consider restricting to HTTPS only (or at minimum emitting a warning).

Wildcard allowlist boundary check
The wildcard match:

sourceHost.EndsWith(allowedHost[1..], ...) && sourceHost.Length > allowedHost.Length - 1

allowedHost[1..] strips * but keeps the leading ., so *.example.com becomes .example.com. evil.example.com → suffix match + length > 11. This looks correct, but example.com (exact parent) would also pass if the suffix is .example.com — verify whether that's intentional. A test case covering example.com vs *.example.com would remove ambiguity.


Code quality

Bare catch (Exception) blocks — silent failure
Multiple sites in WebSocialCardGenerator.Renderer.cs swallow every exception:

catch (Exception)
{
    return DefaultTitleGlyphInset();  // or return 0
}

Swallowing OutOfMemoryException, StackOverflowException, or unexpected InvalidOperationException silently degrades quality without any signal. At minimum log the exception type/message at Debug/Trace level or restrict to the expected exception type (e.g. MagickException, IOException) so structural bugs are surfaced.

Unbounded static caches
TitleGlyphInsetCache and CenteredTextOffsetCache in WebSocialCardGenerator.Renderer.cs are process-lifetime ConcurrentDictionary instances that grow without eviction. For a CLI build tool this is acceptable if the process is short-lived; for a long-running service it would be a memory concern. A simple capacity cap or ClearCachesForTesting() call in the existing ClearRemoteImageCache() path would make the design consistent.

Two separate static HttpClient instances
RewriteDownloadClient and SocialImageHttpClient are both static singletons but configured separately. This is fine functionally, but a brief comment on why they're kept separate (different timeout/handler needs) would help future maintainers avoid inadvertent merging.


PowerShell (Build/Compare-WebSitemaps.ps1)

No timeout on Invoke-WebRequest calls
Import-SitemapUrls and Get-AmpHtmlAlternateUrl call Invoke-WebRequest without -TimeoutSec. A slow or unresponsive server will hang the script indefinitely. Recommend -TimeoutSec 30 (or a configurable parameter).

Recursive sitemap index loading without depth cap
Import-SitemapUrls calls itself for nested sitemaps without limiting recursion depth. A malformed or adversarial sitemap index that points to itself would cause a stack overflow. A simple $MaxDepth guard would be sufficient.

Get-SlugVariants hardcodes language codes
The language-suffix strip list (pl|fr|de|es) is hardcoded in the regex. If the script is ever used against sites with additional locales, silent non-matches will produce confusing results. Consider exposing this as a parameter or at least naming the constant.


Tests

Renamed test reflects behavior change — needs a comment
Build_UsesGeneratedCard_ForBlogSocialPreview...Build_UsesFirstMarkdownImage_ForBlogSocialPreview... signals a reversal: body images now take precedence over auto-generated cards when no explicit social image is set. The test is clear, but a short comment in the test (or a changelog entry) explaining the intentional precedence order (explicit meta > first-body-image > auto-generated) would prevent future contributors from reverting it by accident.

Missing: *.example.com vs example.com wildcard boundary test
See Security note above — the existing IsAllowedRewriteSourceUriForTesting tests cover the happy path and the SSRF block but don't explicitly test the parent-domain edge case.


Minor

  • WebSiteBuilder.MergeSocialCardThemeTokens is now internal (called from tests via InternalsVisibleTo?) — if it is genuinely part of the public API surface, consider making it public. If it's test-only, a [TestOnly] comment would help.
  • The Compare-WebSitemaps.ps1 script's 900-line size is acknowledged but could benefit from being split into a module (CompareWebSitemaps.psm1) to ease unit testing of individual helper functions.

Summary

Area Status
Security (SVG sanitization, SSRF) ✅ Good — test-covered
Security (empty allowlist = open) ⚠️ Worth tightening or documenting
Security (HTTP scheme allowed) ⚠️ Consider HTTPS-only
Error handling (bare catches) ⚠️ Add minimal logging
Test coverage ✅ Good breadth
Cache growth (static dicts) ℹ️ Fine for CLI, note for future
PS script robustness (timeouts) ⚠️ Add -TimeoutSec
Behavior change (social image precedence) ✅ Test covers it, consider a changelog note

Most items are non-blocking improvements. The security notes (empty allowlist + HTTP scheme) are the ones I'd most like to see addressed or explicitly documented before merge.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: Refine PowerForge Web social cards and asset workflows

This is a large, well-structured PR (4 627 additions across 39 files). The summary and validation section are thorough, which is appreciated. Here is a focused review across the key areas.


Security

Host Allowlist / SSRF — ✅ Strong
IsAllowedRewriteSourceUri() enforces HTTPS-only and explicit allowlist matching (including wildcard subdomains). Private IP ranges (RFC 1918, link-local, loopback, IPv6 site-local/multicast) are blocked. Tests in WebSiteAuditOptimizeBuildTests.Part2.cs cover these restrictions. No SSRF concern.

SVG Sanitization — ✅ Excellent
WebSocialCardGenerator.Renderer.cs removes <script> elements, strips all on* event handlers, blocks href/src with http:, https:, file:, javascript:, and data: schemes, blocks CSS url(), and prohibits DTD (XXE prevention). This covers the main SVG attack surface well.

User-Agent Sanitization — ✅ Good
NormalizeHeaderSingleLine() strips CR/LF and non-printable ASCII, capping at 512 characters. Tested.

URL Encoding Preservation — ✅ Good
HtmlDecode() + HtmlAttributeEncode() round-trip in rewrite logic prevents injection while preserving encoded query parameters. Tested.

Minor: PowerShell Compare-WebSitemaps.ps1 — The script accepts URLs as [string[]] without validating they use HTTPS before fetching. For internal migration tooling the risk is low, but consider a scheme guard at the top of the fetch loop (if (-not $url.StartsWith('https://')) { throw ... }).


Code Quality

Async / Disposal — ✅ Good
SocketsHttpHandler with PooledConnectionLifetime is used consistently in both WebAssetOptimizer.cs and WebSocialCardGenerator.Renderer.cs, and clients are disposed correctly.

Silent download failures (Medium concern)
In WebAssetOptimizer.cs, remote CSS/font download failures are logged only via Trace.TraceWarning(). For a build tool this is defensible (fail-open is safer than a hard abort), but a consuming CI pipeline may not surface these warnings. Consider promoting critical failures (e.g., a referenced font that completely failed to download) to a structured warning or returning a bool flag that the pipeline runner surfaces to stdout/stderr.

Optical text alignment cache key
In WebSocialCardGenerator.Renderer.cs, the glyph-width cache key is $"{fontFamily}|{fontSize}|{firstGlyph}". Caching per first character only is a pragmatic choice (the test validates ≤1 px tolerance), but it is worth a short comment explaining this trade-off so future maintainers understand it is intentional.

CSS dependency download loop is sequential
The regex-replace loop in WebAssetOptimizer.cs that rewrites and downloads referenced CSS imports runs sequentially. For most real-world stylesheets this is fine. If a stylesheet ever references many fonts (50 +), this becomes a build bottleneck. A Parallel.ForEachAsync or batched Task.WhenAll with a concurrency cap would keep the door open for faster builds.


Test Coverage

The security-critical paths have solid coverage:

  • SVG sanitization (scripts, event handlers, external references, DOCTYPE)
  • Host allowlist (private IPs, non-HTTPS, wildcard matching)
  • User-Agent injection
  • URL encoding round-trip

Gaps worth addressing (not blockers, but worth a follow-up issue):

  1. Polyglot / namespace-prefixed SVGsvg:script or <script xmlns="..."> variants are not explicitly tested. Most parsers handle this, but an adversarial polyglot (valid PNG bytes inside a valid SVG wrapper) has not been exercised.
  2. Wildcard redirects with query strings — New query-string preservation logic (TryGetRedirectSourceQuery()) has a basic test but no test for wildcards + multi-param query strings with special characters.
  3. Concurrent cache evictionMaxRemoteImageCacheEntries = 128 / MaxRemoteImageCacheBytes = 100 MB bounds look correct, but there is no test for concurrent eviction under contention. The ConcurrentDictionary pattern is generally safe, but a quick multithreaded test would increase confidence.

Performance

Lazy font enumeration — ✅ Good
Lazy<T> for AvailableMagickFontFamilies avoids repeated system scans.

StringBuilder usage — ✅ Good
SVG generation uses StringBuilder throughout; no hot-path string concatenation issues.

Renderer cache invalidation — ✅ Good
The PR description notes cache invalidation was added; tests confirm this.


Documentation

The new docs (PowerForge.Web.Assets.md, PowerForge.Web.ContentSpec.md, PowerForge.Web.SeoParityPlan.md, PowerForge.Web.WebsiteStarter.md) are thorough and clearly explain the trust boundary for remote asset rewrites, the theme token system, and preset guidance. This is a genuine improvement to the project's maintainability.


Summary

Area Status
SSRF / Host allowlist ✅ Secure
SVG / XSS sanitization ✅ Excellent
HTTPS enforcement ✅ Secure
Header injection ✅ Secure
URL encoding ✅ Secure
Resource / disposal ✅ Good
Silent failure transparency ⚠️ Medium (logged only)
Glyph cache key documentation ⚠️ Minor
Sequential CSS download loop ⚠️ Minor (scalability)
Test gaps (polyglot SVG, concurrent eviction) ⚠️ Low (follow-up)
Documentation ✅ Excellent

Verdict: Approve with minor notes. The security posture is strong and the validation section shows the key test suites pass. The items flagged above are not blockers — the most actionable one is adding a short comment on the glyph-cache key and promoting critical download failures to a surfaced warning rather than a silent trace. The polyglot-SVG and concurrent-eviction test gaps are worth filing as follow-up issues.


Review generated with Claude Code

@PrzemyslawKlys PrzemyslawKlys merged commit cc68fcf into main Apr 24, 2026
6 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the wip/pspublishmodule-agent-readiness-20260422 branch April 24, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant