Skip to content

Replace SemaphoreSlim with Parallel.ForEachAsync, rename MaxConcurrentPageFetches#47

Merged
donaldgray merged 1 commit into
mainfrom
feature/concurrency
May 26, 2026
Merged

Replace SemaphoreSlim with Parallel.ForEachAsync, rename MaxConcurrentPageFetches#47
donaldgray merged 1 commit into
mainfrom
feature/concurrency

Conversation

@donaldgray
Copy link
Copy Markdown
Member

Summary

  • Replaces the manual SemaphoreSlim + Task.WhenAll concurrency pattern in TextBuildJob.ProcessPagesAsync with Parallel.ForEachAsync
  • Consolidates three per-format fetch methods (FetchXmlWithSemaphoreAsync, FetchVttWithSemaphoreAsync, FetchAnnotationPageWithSemaphoreAsync) into a single FetchPageAsync
  • Renames MaxConcurrentAltoFetchesMaxConcurrentPageFetches since the limit applies to all text formats (ALTO, VTT, AnnotationPage), not just ALTO
  • Adds a Builder API configuration reference table to README.md

Why Parallel.ForEachAsync over SemaphoreSlim

Less boilerplate — the concurrency limit is a single ParallelOptions property rather than a SemaphoreSlim that must be created, passed into every method, and released in a finally block.

Lazy task schedulingParallel.ForEachAsync only starts a new task when a slot is free, so it never allocates more than MaxDegreeOfParallelism concurrent tasks at once. The previous approach created a Task object for every page upfront (all queued on the semaphore), which wastes memory on large manifests.

Clearer intent — "process this collection with bounded parallelism" is exactly what Parallel.ForEachAsync was designed to express. A semaphore is a lower-level primitive that requires the reader to mentally reconstruct that intent.

What is unchanged

  • MaxConcurrentPageFetches default remains 8; behaviour is identical
  • Per-page errors are still caught inside the delegate and accumulated without aborting the job
  • Cancellation still propagates via ShutdownToken
  • Results are written into a pre-allocated ordered array so TextBuilder still receives canvases in original canvas sequence

🤖 Generated with Claude Code

Base automatically changed from feature/fulfilled_capabilities to main May 26, 2026 16:00
Replaces the manual SemaphoreSlim + Task.WhenAll fan-out with
Parallel.ForEachAsync, consolidating three per-format fetch methods
into a single FetchPageAsync. Results are written to a pre-allocated
array to preserve canvas order for TextBuilder.

Renames MaxConcurrentAltoFetches → MaxConcurrentPageFetches since the
limit applies to all text formats (ALTO, VTT, AnnotationPage), not
just ALTO. Updates docs, decisions log, and adds a Builder API
configuration reference table to the README.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@donaldgray donaldgray force-pushed the feature/concurrency branch from 9ef8d8b to 9c0c4dc Compare May 26, 2026 16:00
@donaldgray donaldgray merged commit f57bb74 into main May 26, 2026
4 checks passed
@donaldgray donaldgray deleted the feature/concurrency branch May 26, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant