Skip to content

Implement private gallery indexing engine#392

Merged
PrzemyslawKlys merged 9 commits into
mainfrom
codex/private-gallery-engine
May 23, 2026
Merged

Implement private gallery indexing engine#392
PrzemyslawKlys merged 9 commits into
mainfrom
codex/private-gallery-engine

Conversation

@PrzemyslawKlys
Copy link
Copy Markdown
Member

Summary

  • add core private gallery models plus Azure Artifacts inventory, metrics, NuGet V3 download, and static nupkg inspection services
  • add PowerForge.Web JSON generation for feed, per-module, per-version, and search data without creating website pages
  • wire a private-gallery-index pipeline task and schema/docs coverage

Validation

  • dotnet test .\PowerForge.Tests\PowerForge.Tests.csproj -c Release --no-restore --filter "FullyQualifiedName~PrivateGalleryEngineTests"
  • dotnet build .\PowerForge\PowerForge.csproj -c Release -f net472 --no-restore
  • dotnet build .\PowerForge.Web.Cli\PowerForge.Web.Cli.csproj -c Release -f net10.0 --no-restore
  • git diff --cached --check

Stacking

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd00e7dc96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (options is null) throw new ArgumentNullException(nameof(options));

using var http = PrivateGalleryHttp.CreateClient(options.RequestTimeoutSeconds, _httpHandler);
var packageBase = await ResolvePackageBaseAddressAsync(http, serviceIndexUrl, options, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cache package base address lookups across downloads

DownloadPackageAsync resolves the NuGet V3 service index on every package download, but PrivateGalleryIndexer calls this method once per package version; for feeds with many versions this creates a repeated extra HTTP request per download (service index + package), which significantly slows indexing and increases throttling/timeout risk in CI. Resolve the package base URL once per feed (or cache by serviceIndexUrl) before iterating versions.

Useful? React with 👍 / 👎.

Comment on lines +252 to +255
else if (lowerPath.Contains("/docs/", StringComparison.Ordinal) && IsTextDocument(lowerName))
kind = "docs";
else if (lowerPath.Contains("/examples/", StringComparison.Ordinal) && IsExampleDocument(lowerName))
kind = "example";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect docs/examples at package root paths

The document classifier only matches "/docs/" and "/examples/" substrings, so common root-level paths like docs/GettingStarted.md or examples/Run.ps1 are never tagged as docs/example. In those packages the generated metadata and search.json silently miss documentation/example entries, reducing discoverability and producing incomplete quality counts.

Useful? React with 👍 / 👎.

Comment on lines +196 to +200
"protocolType=NuGet",
"includeUrls=true",
"includeAllVersions=" + includeAllVersions.ToString(CultureInfo.InvariantCulture).ToLowerInvariant(),
"includeDescription=" + includeDescription.ToString(CultureInfo.InvariantCulture).ToLowerInvariant(),
"$top=" + top.ToString(CultureInfo.InvariantCulture),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include prerelease packages in inventory query

The Azure Artifacts packages request omits the isRelease query parameter; Microsoft’s REST API defaults isRelease to true, which filters results to non-prerelease versions when includeAllVersions is used. This means feeds that publish prerelease-only modules (or rely on prerelease history) are indexed incompletely even though the task advertises full private-gallery inventory.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0761dd8461

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +111 to +112
if (packages.Count >= maxPackages)
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Run prerelease pass even when package cap is reached

When IncludeAllVersions is enabled, the client correctly plans two queries (isRelease=true then isRelease=false), but this early break exits before the prerelease pass whenever the first pass reaches MaxPackages (500 by default). In feeds with many packages, that means each selected package only keeps release versions, so the output silently violates the "all versions" contract and misses prerelease history. Azure’s isRelease default is true (release-only), so the second pass is required even after the package cap is met; only adding new package IDs should be capped, not merging additional versions for already selected packages.

Useful? React with 👍 / 👎.

};
using var stream = entry.Open();
using var reader = XmlReader.Create(stream, settings);
var document = XDocument.Load(reader, LoadOptions.None);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce size limit before loading help XML

The inspector caps manifest text reads to MaxTextBytes, but ApplyHelpXml parses every -help.xml entry with XDocument.Load without any entry-length check. A package can include a very large help XML file (still within the 250 MB package cap), which can drive high memory/CPU usage and stall or crash indexing when includePackageContent is enabled. Add a size guard (or streaming parse limits) before loading help XML content.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f11bc6d8a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +55 to +56
var fetchedForFilter = 0;
while (fetchedForFilter < maxPackages)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce MaxPackages across all release-filter passes

The package cap is applied per isRelease pass because fetchedForFilter is reset for each filter, so with IncludeAllVersions=true this can return more than MaxPackages package IDs (e.g., release packages in the first pass plus prerelease-only packages in the second). That violates the MaxPackages contract and can materially increase runtime/output size on large feeds.

Useful? React with 👍 / 👎.

Comment on lines +105 to +106
if (fetched < take)
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Paginate using API page size, not parsed item count

Pagination currently stops when fetched < take, but fetched only counts successfully parsed packages. If Azure returns any malformed item (already handled by ParsePackage returning null), fetched drops below take and the loop breaks early, truncating the remaining pages. This means one bad row can silently prevent indexing many valid later packages.

Useful? React with 👍 / 👎.

if (fetched < take)
break;

skip += fetched;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Increment $skip by returned rows, not parsed rows

The next page offset is advanced by fetched, which excludes rows that failed parsing. Azure's $skip is based on rows in the response, so after any malformed entry the client requests an overlapping page ($skip too small), causing duplicates and potentially preventing progress to later packages under MaxPackages limits. Use the response row count (value length) to advance pagination even when some rows are discarded.

Useful? React with 👍 / 👎.

Comment on lines +3335 to +3340
"anyOf": [
{ "required": ["out"] },
{ "required": ["output"] },
{ "required": ["outputDirectory"] },
{ "required": ["output-directory"] }
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep schema output optional to match runtime default

The runner intentionally supports omitting out/output* by defaulting private gallery output to ./data/private-gallery, but the pipeline schema now requires one of those properties. This creates false validation failures in schema-aware tooling/CI for configurations that the runtime accepts, making documented default behavior unusable in strict schema validation flows.

Useful? React with 👍 / 👎.

Base automatically changed from codex/private-gallery-web-engine-plan to main May 23, 2026 15:48
…-engine-merge

# Conflicts:
#	Docs/PowerForge.Web.PrivateGallery.md
@PrzemyslawKlys PrzemyslawKlys merged commit feaee91 into main May 23, 2026
6 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the codex/private-gallery-engine branch May 23, 2026 15:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f16d40109

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


skip += returnedRows;

if (packages.Count >= maxPackages && fetched == 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep paging prerelease results after hitting MaxPackages

With includeAllVersions=true, the second (isRelease=false) pass can still contain prerelease versions for already-selected packages, but the loop currently stops at packages.Count >= maxPackages && fetched == 0. Because fetched only increments when a row merges into an existing package or adds a new one, any prerelease page that contains only out-of-cap package IDs causes an early break, even though later pages may still contain prerelease rows for selected packages. Fresh evidence: the new guard at this line still short-circuits pagination based on a single page’s composition, so capped runs can silently miss prerelease history.

Useful? React with 👍 / 👎.

Comment on lines +55 to +56
Directory.CreateDirectory(moduleDirectory);
foreach (var package in result.Document.Packages)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune obsolete module JSON when regenerating gallery output

The generator writes current package/version files under modules/ but never removes files from packages or versions that disappeared since the previous run. Re-running against a changed feed therefore leaves stale JSON artifacts on disk that no longer match feed.json/search.json, which can leak outdated package data to consumers that read or serve the per-module/per-version files directly. Cleaning or reconciling the modules subtree before writing would keep the output contract consistent.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant