Add website contribution import support#333
Conversation
6afaa76 to
19a06cd
Compare
Code Review — PR #333: Add website contribution import supportOverviewThis PR adds a Bugs / Correctness1.
Fix: extract the front matter block, apply the regex only to it, then reconstruct. 2. The function appends Fix: strip (or replace) the existing fields before inserting the canonical ones. 3. Import leaves the target in a partial/dirty state on failure Directory.CreateDirectory(targetContentRoot);
Directory.CreateDirectory(targetAssetRoot);
foreach (var asset in post.Assets)
{
if (CopyFile(sourceAsset, targetAsset, options.Force, errors))
result.CopiedAssetCount++;
}
if (errors.Count > 0)
continue; // directories already created, some assets may be copiedWhen a mid-import error occurs (e.g. an asset copy fails) the destination directories have already been created and some assets have been written. There is no rollback, and a subsequent run without 4. normalized = Regex.Replace(normalized, @"[^a-z0-9]+", "-", RegexOptions.CultureInvariant, RegexTimeout).Trim('-');This creates a new 5.
Security6. if (!candidate.StartsWith(rootPrefix, StringComparison.OrdinalIgnoreCase))
return false;On Linux, 7. YAML deserialization is vulnerable to a billion-laughs / alias explosion new DeserializerBuilder().Build().Deserialize<Dictionary<string, object?>>(File.ReadAllText(path));YamlDotNet's default deserializer resolves YAML anchors and aliases, which means a malicious Performance8.
9. var map = new DeserializerBuilder().Build().Deserialize<...>(File.ReadAllText(path));
Code Quality10. Redundant action check after early defaulting var action = subArgs.Length > 0 && !subArgs[0].StartsWith("--", StringComparison.Ordinal)
? subArgs[0].Trim().ToLowerInvariant()
: "validate";
var effectiveArgs = action == "validate" || action == "import" ? subArgs.Skip(1).ToArray() : subArgs;
if (action is not ("validate" or "import"))
return Fail(...);
11. No test coverage for the new processor
12. Minor: the default Summary
The highest-priority items to address before merge are #2 (duplicate front matter fields on import), #3 (partial import on failure), #1 (body rewrite for
|
Code Review — PR #333: Add website contribution import supportOverviewThis PR introduces a The implementation is generally well-structured and defensive (path traversal guards, regex timeouts, asset size limits, asset type allowlists). The following issues range from bugs to performance nits. 🐛 Bugs1. In var rewritten = FrontMatterImageRegex.Replace(markdown, match => { ... });The regex ( Fix: Extract and rewrite only the front matter section, then concatenate with the body unchanged. A simple split on the closing 2. foreach (var asset in post.Assets)
{
if (CopyFile(...)) result.CopiedAssetCount++;
}
if (errors.Count > 0)
continue; // skip writing the markdown
Fix: Track per-post errors separately, or save the pre-loop error count and compare against it after the asset copies for the current post. ⚡ Performance3. foreach (var path in Directory.GetFiles(...))
{
var map = new DeserializerBuilder().Build().Deserialize<...>(File.ReadAllText(path));
...
}
4. Non-compiled inline normalized = Regex.Replace(normalized, @"[^a-z0-9]+", "-", RegexOptions.CultureInvariant, RegexTimeout).Trim('-');Every call allocates a new 🔍 Minor / Suggestions5. In catch (Exception ex)
{
return Fail(ex.Message, outputJson, logger, "web.contributions");
}
6. Duplicate author metadata on re-import with
When 7. No test coverage 1,107 lines of new validation/import logic — slug normalization, path traversal guards, front matter rewriting, YAML deserialization, asset size enforcement — are added with no corresponding tests. The verification section lists manual dotnet CLI commands, which is a reasonable smoke-test, but the edge cases (malformed front matter, missing authors, path traversal attempts in asset names, mixed line endings) are the kind that unit tests catch cheaply and reliably. 8. if (current is IReadOnlyDictionary<string, object?> ro && ro.TryGetValue(part, out var next))YamlDotNet deserializes nested maps to Summary
The path-traversal prevention in |
|
Addressed the contribution importer review feedback in the latest push:
Local checks passed:
|
Code Review — PR #333: Add website contribution import supportOverviewThis PR adds a new The feature is well-structured and security-conscious overall. Below are findings organized by severity. Bugs / Correctness Issues1. X/Twitter handle validation rejects underscores (false negatives)
// WebContributionProcessor.cs — IsEmptyOrValidSocialValue
if (!trimmed.Contains("://", StringComparison.Ordinal))
return SlugRegex.IsMatch(trimmed.ToLowerInvariant()); // rejects "john_doe"Consider a separate handle-specific regex: 2. Empty asset directory always created for posts with no assets In Directory.CreateDirectory(targetContentRoot);
Directory.CreateDirectory(targetAssetRoot); // created even when assetCopies is emptyGuard it: Behavioral / Regression Risk3. In // Before:
var imageOverride = ReadMetaString(item.Meta, "article.image", "news.image", "schema.image", "social_image");
// After:
var imageOverride = ReadMetaString(item.Meta, "article.image", "news.image", "schema.image", "social_image", "image");Articles that already have an 4. Silent year fallback in When a post's date is missing, null, or outside 2000–2100, the asset year silently defaults to private static int ResolveYear(DateTime? date)
{
if (date is { Year: >= 2000 and <= 2100 } value)
return value.Year;
return DateTime.UtcNow.Year; // silent fallback
}Emit a warning to the Test Coverage5. Very thin test suite for a ~900-line processor Only two tests exist (
The path-traversal case is worth a targeted test given the explicit mitigation in Minor / Style6. The method's 7. Missing namespace in test file
8. A new YamlDotNet deserializer is built inside 9. Repeated
What's Done Well
|
|
Followed up on the second review pass too:
I left the Local checks passed again:
|
Code Review: Add website contribution import supportOverall: This is a solid, well-structured addition. The validation pipeline, path-traversal guards, and regex-timeout discipline are all good. A few items worth addressing before merge. OverviewAdds a
Potential Bugs1. In if (string.IsNullOrWhiteSpace(line) || char.IsWhiteSpace(line[0]) || line.TrimStart().StartsWith("-", ...))
continue;In practice YAML front matter rarely has blank lines between keys, but it is technically valid YAML. At minimum this deserves a comment explaining the assumption. 2. private static int ResolveYear(DateTime? date)
{
if (date is { Year: >= 2000 and <= 2100 } value)
return value.Year;
return DateTime.UtcNow.Year; // non-deterministic
}A post with 3. Duplicate error messages possible in The Code Quality4. Per
5. object? current = meta;
foreach (var part in key.Split('.', ...)) // yields zero parts for ""
{ ... }
value = current; // = meta itself
return true;Only called with hardcoded string constants today, but worth guarding with an early return or a check that parts is non-empty. 6. The Security7. Path-traversal protection is correct and tested — good
8. YamlDotNet default deserializer
Test CoverageThe four existing tests are well-written. Missing coverage worth adding:
Minor Nits
SummaryThe core logic is solid and the security-sensitive pieces (path traversal, URL validation, regex timeouts) are handled correctly. The main asks before merge:
Everything else is low-priority polish. Nice work overall. |
|
Final small cleanup pass from the latest review:
I am intentionally not splitting Local checks passed again:
|
Code Review — PR #333: Add website contribution import supportOverviewThis PR adds a full contribution pipeline for Strengths
Issues and Suggestions1.
|
| Scenario | Covered |
|---|---|
| Front matter + body image rewriting | ✅ |
| Fenced code block preservation | ✅ |
Partial import (existing target, --force) |
✅ |
Draft preservation (Publish=false) |
✅ |
| X handle with underscore | ✅ |
| Path traversal in markdown image | ✅ |
| Asset size limits | ❌ |
| Invalid / out-of-range date | ❌ |
| Author profile validation failures | ❌ |
Validate-only (no Import) with success path |
❌ |
Summary
The core design is solid. The two issues most worth addressing before merge are #1 (the "image" key broadening the fallback chain in structured data) and #4 (silent fallback for out-of-range dates). The rest are improvements rather than blockers.
|
Addressed the two concrete items from the newest review:
Local checks passed:
|
Code Review: PR #333 — Add website contribution import supportOverviewThis PR adds a full validate/import pipeline for external contributor post bundles (
The overall design is solid and security-conscious. A few things worth addressing before merge: Line-Count Limit
Potential Bugs
Dead branch in if (raw is string[] stringArray) // YamlDotNet never produces string[] hereYamlDotNet's deserializer returns RSS namespace declared unconditionally Security
Missing CLI Exposure
Test Coverage GapsThe existing tests are good. The following scenarios are not covered:
Minor Observations
SummaryThe feature is well-designed, security-aware, and meaningfully tested. The main actionable items before merge are:
Good work overall — the traversal guards and regex timeout discipline in particular are exactly the right instinct for a code path that processes externally-submitted content. |
|
One last small cleanup from the latest review pass:
I checked the line-count script too. It reports many existing files over 800 lines, so I am leaving the Local checks passed:
|
Code Review — PR #333: Add website contribution import supportOverviewThis PR introduces a complete contribution pipeline for the PowerForge website: a Security — Solid Foundations
Potential Issues1. The catalog write uses reflection-based serialization: var json = JsonSerializer.Serialize(catalog, new JsonSerializerOptions { ... });The rest of the CLI layer uses the source-generated 2. private static bool IsFrontMatterKeyLine(string line, ISet<string> keys)
{
var trimmed = line.TrimStart(); // ← strips leading whitespace first
var colon = trimmed.IndexOf(':');
return colon > 0 && keys.Contains(trimmed[..colon].Trim());
}Because the method calls 3. When 4. The method casts nested objects to Minor Observations
FencedCodeBlockRegex.Replace(markdown, match => new string('\n', match.Value.Count(static ch => ch == '\n')));For documents with large fenced blocks this is O(n) per block. Not a real concern for blog posts, but
Dual alias registration "contributions" => HandleContributions(...),
"contribution" => HandleContributions(...),Intentional and appreciated — the singular alias is a common mistake and this is a clean way to handle it. Test Coverage
The missing tests are for the feed/meta output changes and author validation rules — they are lower-risk since the structured-data profile test does cover the safe-image filtering path, but adding them would improve confidence in the author URL resolution and feed author injection. SummaryThe core contribution processor is well-designed, security-conscious, and comes with a solid set of behavioral tests. The three items above worth addressing before merge are:
Everything else is clean and consistent with the existing codebase conventions. 🤖 Reviewed with Claude Code |
Code Review — PR #333: Add website contribution import supportOverviewThis PR introduces a complete contribution pipeline for the PowerForge website: contributors can submit post bundles (Markdown + assets + author YAML) to a separate repo, and this tooling validates the structure and optionally imports it into the main website repo. The feature is well-scoped, security-conscious, and ships with a solid test suite. A few issues are worth addressing before merge. Positive Highlights
Issues1. Test helper hardcodes the author slug (bug)
File.WriteAllText(Path.Combine(authorsRoot, slug + ".yml"),
$$"""
name: {{name}}
slug: jane-doe // ← always "jane-doe" regardless of the `slug` parameter
linkedin: {{linkedin}}
x: {{x}}
""");The 2.
|
|
Code Review: PR 333 - Website Contribution Import Support OVERVIEW This PR adds a contributions validate/import pipeline for public Website.Contributions bundles. New components: WebContributionProcessor (validates post bundles, imports into website repo), CLI routing for contributions/contribution sub-commands, and author metadata emission into JSON-LD, OpenGraph, RSS, Atom, and JSON Feed. The design is sound and test coverage is solid. A few items need attention before merge. ISSUES == 1. FILE TOO LONG: WebContributionProcessor.cs (938 lines) == AGENTS.md documents a hard 800-line limit (node Build/linecount.js . 800). At 938 lines this file already fails that gate. Suggested split:
== 2. DraftRegex matches indented nested draft: true == Pattern is (?m)^\sdraft\s:\strue\s$. The leading \s* matches an indented 'draft: true' inside a nested YAML block (e.g. inside a metadata: sub-object). Should use (?m)^draft\s*:\strue\s$. The validation code itself uses ^ without \s* for other key patterns. == 3. EscapeYamlScalar is incomplete == Only backslash and double-quote are escaped. If an author Name or Title contains a tab, newline, or other YAML control character the generated front matter will be malformed. At minimum escape newline and carriage-return. == 4. Import: catalog File.WriteAllText can throw unhandled exception == The catalog write is not wrapped in a try/catch. On permission error or disk full, the exception bypasses the errors list and the try/catch in HandleContributions. Wrap in try/catch that appends to errors. OBSERVATIONS
WHAT IS WORKING WELL
SUMMARY - before merging:
|
Code Review — PR #333: Add website contribution import supportOverviewThis PR introduces a full contribution pipeline for the PowerForge website engine: contributors submit post bundles (Markdown + assets) to a sibling repo, and Correctness Concerns1. Front matter
The import path should either fail-fast when 2. Atom feed entries without The Atom spec (RFC 4287 §4.1.2) requires that either the feed has a 3. The line-by-line removal logic skips lines that start with whitespace or 4. YamlDotNet deserializer permits YAML anchors/aliases
Minor Issues5. File line budget is nearly reached Per AGENTS.md, the project enforces an 800-line limit ( 6. Error deduplication with result.Errors = errors.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();Two different posts with the same validation error message (e.g., two posts both missing 7. In 8. private static int ResolveYear(DateTime? date)
{
if (date is { Year: >= 2000 and <= 2100 } value)
return value.Year;
// Missing or invalid dates are validation errors; keep result paths deterministic until import is blocked.
return 2000;
}The comment is correct that validation will catch this upstream. However, if Missing Test CoverageThe following scenarios have no corresponding unit tests:
The existing What's Good
SummaryThe core logic is sound and the security-critical path traversal checks are well-done. The main items worth addressing before merge are the Atom feed validity gap (#2), the |
Summary
Verification