Skip to content

Auto-chunking conveniences for batch operations (#307)#389

Merged
leogdion merged 3 commits into
v1.0.0-beta.2from
307-auto-paginating-extension
May 26, 2026
Merged

Auto-chunking conveniences for batch operations (#307)#389
leogdion merged 3 commits into
v1.0.0-beta.2from
307-auto-paginating-extension

Conversation

@leogdion
Copy link
Copy Markdown
Member

Closes #307 (partial — see scope note).

Summary

Applies the page-primitive + auto-paginating-extension pattern (established by queryRecords/queryAllRecords) to the remaining non-deprecated, 200-item-capped batch operations. Most of #307 was already done on this milestone (fetchAllRecordChanges, fetchAllZoneChanges); this PR closes the gap for the batch-chunking candidates.

New public API

Primitive (single request) Auto-chunking convenience
lookupRecords(recordNames:desiredKeys:database:) lookupAllRecords(recordNames:desiredKeys:database:batchSize:)
discoverUserIdentities(lookupInfos:) discoverAllUserIdentities(lookupInfos:batchSize:)

Both delegate to a shared internal chunkedBatches engine: splits the input into ≤batchSize batches (clamped to 1...maxRecordsPerRequest), calls the primitive per batch, concatenates results in input order. Chunk count is ceil(n/batchSize) — deterministic and finite — so there is no maxPages-style throwing ceiling; batchSize (default maxRecordsPerRequest) is the only knob. CloudKitService.maxRecordsPerRequest is now public.

Scope decisions

  • users/lookup/email and users/lookup/id get no chunking convenience. Both are deprecated by Apple in favor of POST users/discover (verified against Apple's archived CloudKit Web Services reference). Adding new public API on deprecated endpoints would be wrong; callers needing >200 should use discoverAllUserIdentities(lookupInfos:). The existing primitives are left as-is (no deprecation annotations) pending public feedback.
  • listZones is not a pagination candidatezones/list (GET) returns every zone in one response, no continuation marker.
  • modifyRecords/sync<T> already chunk by 200; fetchAllRecordChanges/fetchAllZoneChanges already implement the pattern.

Tests

ResponseProvider now records request count + bodies, so tests assert exact batch boundaries, input order, accumulation, batchSize clamping (low/high), and empty input. 9 new chunking tests; full suite (529) passes. ./Scripts/lint.sh clean (swiftlint, headers, periphery).

MistDemo

Adds lookup-all and discover-all subcommands with a --batch-size flag. Verified end-to-end against live CloudKit: --batch-size 1 issues N separate requests and the results are correctly accumulated in order (identical to a single --batch-size 200 request).

🤖 Generated with Claude Code

Apply the page-primitive + auto-paginating-extension pattern (from
queryRecords/queryAllRecords) to the non-deprecated 200-item-capped
batch operations:

- lookupAllRecords(recordNames:desiredKeys:database:batchSize:) over
  the records/lookup primitive
- discoverAllUserIdentities(lookupInfos:batchSize:) over the
  users/discover (POST) primitive

Both delegate to a shared chunkedBatches engine that splits input into
<=batchSize batches (clamped to 1...maxRecordsPerRequest), calls the
primitive per batch, and concatenates results in input order. Chunk
count is deterministic, so there is no maxPages-style throwing ceiling;
batchSize is the only knob. maxRecordsPerRequest is now public.

The users/lookup/email and users/lookup/id primitives are deprecated by
Apple in favor of users/discover (verified against Apple's archived
CloudKit Web Services reference), so they intentionally get no chunking
convenience. listZones (zones/list) has no continuation marker and is
not a pagination candidate. fetchAllRecordChanges / fetchAllZoneChanges
already implement the pattern.

Tests: ResponseProvider now records request count/bodies so tests can
assert exact batch boundaries, order, accumulation, batchSize clamping,
and empty input.

MistDemo: adds `lookup-all` and `discover-all` subcommands with a
`--batch-size` flag for exercising chunking end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5028f5e7-3bee-4bc1-bb78-496c07203caa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 307-auto-paginating-extension

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review — PR #389: Auto-chunking conveniences for batch operations

Overview

Solid addition that follows the established queryRecords/queryAllRecords primitive+convenience pattern. The shared chunkedBatches engine is clean, the typed-throws signature is correct, and the test infrastructure upgrades to ResponseProvider (capturing request log, call counts, bodies) are genuinely useful.


Issues

Bug: Division-by-zero in command batch-count preview

Both LookupAllRecordsCommand and DiscoverAllUserIdentitiesCommand compute the batch count for a stderr note before chunkedBatches gets a chance to clamp:

// LookupAllRecordsCommand.swift:219
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If a caller passes --batch-size 0 (or batch.size=0 in the env), this panics with a division-by-zero before the request is ever sent. chunkedBatches would have safely clamped to 1. The fix is either to clamp locally before the division or to derive the note after the call returns.

Hardcoded 200 in config parsing instead of CloudKitService.maxRecordsPerRequest

DiscoverConfig.swift:275 and LookupConfig.swift:317:

let batchSize = configuration.int(forKey: , default: 200) ?? 200

Using the literal ties the default to a magic number rather than the constant. If the cap ever changes, this silently diverges. Prefer CloudKitService.maxRecordsPerRequest (it's now public).


Code-quality observations

makeLookupRecordsService is a transparent alias

CloudKitServiceTests.BatchChunking+Helpers.swift:700-703 defines makeLookupRecordsService as a one-liner that calls makeUserService. It adds a call-site indirection without adding anything (both use the same credentials). Either give it a distinct implementation or call makeUserService directly in the LookupAllRecords tests.

LookupAllRecordsCommand wraps errors unnecessarily

DiscoverAllUserIdentitiesCommand propagates CloudKitError directly. LookupAllRecordsCommand wraps every non-LookupError in LookupError.operationFailed(error.localizedDescription), which loses the underlying CloudKitError type and its structured fields. Either both commands should wrap (consistently) or neither should; as-is the discrepancy is surprising and the wrapping is lossy.

No test for error propagation

There's a test for empty input and for clamping, but no test asserting that a failing batch throws CloudKitError and stops the loop. The behavior is correct by construction, but a test would prevent a regression if chunkedBatches is ever refactored.


Documentation nit

CLAUDE.md still lists discoverAllUserIdentities() (unavailable — pending #28) in the CloudKitService+UserOperations.swift row, while the new table just below it adds discoverAllUserIdentities(lookupInfos:batchSize:). The stale unavailable annotation for the no-arg address-book form is confusing now that the name is overloaded — worth a clarifying pass so readers understand that this overload is new and the no-arg form remains unavailable.


Minor

  • singleBatch test asserts results.count == Self.recordsPerCall (2), not names.count (5). The assertion is correct — the mock returns 2 records regardless of input — but the comment says "passes results through", which reads as if results should mirror input size. Worth either renaming the constant or adding a comment explaining that mock results ≠ input length by design.

Summary

The core chunking engine and public API are well-designed and follow project conventions. The division-by-zero in the batch-count preview is the only crash-path bug worth fixing before merge. The hardcoded 200 and the makeLookupRecordsService alias are minor cleanups. Everything else is optional polish.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (v1.0.0-beta.2@cd083b6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...loudKitService/CloudKitService+BatchChunking.swift 93.33% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##             v1.0.0-beta.2     #389   +/-   ##
================================================
  Coverage                 ?   70.06%           
================================================
  Files                    ?      165           
  Lines                    ?     3758           
  Branches                 ?        0           
================================================
  Hits                     ?     2633           
  Misses                   ?     1125           
  Partials                 ?        0           
Flag Coverage Δ
mistdemo-spm-macos 10.85% <0.00%> (?)
mistdemo-swift-6.2-jammy 10.88% <0.00%> (?)
mistdemo-swift-6.2-noble 10.85% <0.00%> (?)
mistdemo-swift-6.3-jammy 10.96% <0.00%> (?)
mistdemo-swift-6.3-noble 10.85% <0.00%> (?)
spm 70.45% <96.96%> (?)
swift-6.1-jammy 70.72% <96.96%> (?)
swift-6.1-noble 70.53% <96.96%> (?)
swift-6.2-jammy 70.78% <96.96%> (?)
swift-6.2-noble 70.58% <96.96%> (?)
swift-6.3-jammy 70.58% <96.96%> (?)
swift-6.3-noble 70.67% <96.96%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leogdion leogdion marked this pull request as ready for review May 26, 2026 13:57
@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review — Auto-chunking conveniences for batch operations (#307)

Overview

Clean, well-scoped PR. The chunkedBatches generic engine is a good abstraction; the decision not to add chunking conveniences on deprecated endpoints is correct; and the test suite covers the happy path thoroughly. A few issues below worth addressing before merge.


Bugs / Correctness

1. Division-by-zero crash when --batch-size 0 is passed in MistDemo

Both command implementations compute the expected batch count for the progress log before passing batchSize to chunkedBatches (which clamps it internally):

// LookupAllRecordsCommand.swift:219, DiscoverAllUserIdentitiesCommand.swift:123
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If a user passes --batch-size 0, this divides by zero and crashes before the service even runs. The internal engine never gets a chance to clamp. The fix is either to derive the log message after the call (use results.count / actual batch observations), or clamp batchSize before the calculation:

let effectiveBatchSize = max(1, min(config.batchSize, CloudKitService.maxRecordsPerRequest))
let batches = (config.recordNames.count + effectiveBatchSize - 1) / effectiveBatchSize

Note also that because clamping happens only inside chunkedBatches, when batchSize is very large (e.g. 9999) the log will say "1 request" but the engine will correctly issue ceil(n/200) requests — log and reality diverge. Deriving the log count from effectiveBatchSize would fix both.


Design / Style

2. makeLookupRecordsService is a transparent alias for makeUserService

CloudKitServiceTests.BatchChunking+Helpers.swift:

internal static func makeLookupRecordsService(provider: ResponseProvider) throws -> CloudKitService {
  try makeUserService(provider: provider)
}

This wrapping function adds no behaviour and will confuse the next reader who wonders whether the two differ. Tests in LookupAllRecords call it, but they could call makeUserService directly. Worth removing.

3. batchSize defaults are hardcoded 200 instead of CloudKitService.maxRecordsPerRequest

DiscoverConfig.swift and LookupConfig.swift both hard-code 200:

batchSize: Int = 200
configuration.int(forKey: MistDemoConstants.ConfigKeys.batchSize, default: 200) ?? 200

maxRecordsPerRequest is now public specifically to be the single source of truth. These should reference it so that if the constant is ever updated the demo config tracks automatically.

4. Inconsistent error handling between the two new demo commands

LookupAllRecordsCommand.execute() wraps every non-LookupError into LookupError.operationFailed(error.localizedDescription), discarding the original CloudKitError details. DiscoverAllUserIdentitiesCommand.execute() propagates errors directly. The inconsistency is jarring and the wrapping pattern specifically loses structured error info (e.g. HTTP status, CloudKit error code). Aligning both to propagate directly would be cleaner for a demo that's meant to show what the library returns.


Test Coverage Gaps

5. No test for error propagation from a mid-run failing batch

The chunking loop:

results.append(contentsOf: try await perBatch(Array(items[index..<end])))

...should short-circuit on the first failure and not call subsequent batches. This is the correct behaviour, but there is no test asserting it. A test that queues a success response followed by an error response and verifies (a) the call throws the error and (b) only the first batch was issued would close this gap.

6. No test for the Task.checkCancellation() path

The cancellation check is the only non-trivial control-flow in chunkedBatches that is not currently covered. A test that cancels the task between batches (via TaskGroup or Task { … }.cancel()) would be straightforward given the current ResponseProvider infrastructure.


Minor / Performance

7. results array is not pre-sized

var results: [Output] = []

For large inputs the array will reallocate multiple times. Since the total item count is known upfront from items.count, adding results.reserveCapacity(items.count) (or a proportional estimate) would avoid unnecessary copies. Low priority but worth noting.


What's Good

  • The chunkedBatches generic implementation is clean, minimal, and easy to audit.
  • Typed throws (throws(CloudKitError)) used consistently throughout.
  • Cooperative cancellation (Task.checkCancellation()) is wired in correctly.
  • Test suite asserts exact batch boundaries, sizes, and input-order preservation — the ResponseProvider request-log additions are a solid investment.
  • Scope decisions (no chunking for deprecated endpoints, listZones correctly excluded) are well-reasoned and documented.
  • maxRecordsPerRequest going public is the right call for callers who want to size their own batches.

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

This PR adds auto-chunking convenience wrappers for two CloudKit batch endpoints capped at 200 items per request: lookupAllRecords (over lookupRecords) and discoverAllUserIdentities(lookupInfos:) (over discoverUserIdentities). The design mirrors the existing queryRecords/queryAllRecords pagination pattern, sharing a single internal chunkedBatches engine. Overall the architecture is sound and the scope decisions (no chunking for deprecated lookup/email/lookup/id, listZones excluded) are well-reasoned.


Issues

Bug: Division by zero in MistDemo batch-count display

Both new commands compute and print the expected request count before the library can clamp batchSize, so a user-supplied zero crashes:

// LookupAllRecordsCommand.swift:219 / DiscoverAllUserIdentitiesCommand.swift:123
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If --batch-size 0 is passed, this divides by zero. The library clamps after the fact (min(max(batchSize, 1), maxRecordsPerRequest)), but the display runs first. Fix: use max(config.batchSize, 1) in the denominator, or apply the clamp in the config parser before the field is stored.

Minor: Inconsistent error handling between the two commands

LookupAllRecordsCommand wraps all errors in a catch-all LookupError.operationFailed; DiscoverAllUserIdentitiesCommand propagates errors directly. Both commands should follow the same pattern.


Design Observations

Sequential batch execution

chunkedBatches issues one request at a time in order. For lookupAllRecords with 450 names this is 3 sequential round trips where 3 parallel ones would be equivalent (the sub-requests are independent). Sequential is a reasonable v1 call — it's simpler and implicitly rate-limit-friendly — but a note in the engine doc-string would help callers reason about latency for large inputs.

Test helper indirection

makeLookupRecordsService is a one-liner alias for makeUserService with the same credentials and no added semantics. One factory is sufficient; the indirection adds noise.


Positives

  • Correct clampingmin(max(batchSize, 1), maxRecordsPerRequest) handles both bounds cleanly in the engine.
  • Empty-input fast path — zero items returns immediately without hitting the network; tested.
  • Typed throwsasync throws(CloudKitError) throughout is consistent with the rest of the library.
  • Cancellation checkTask.checkCancellation() at the top of each iteration is a good-citizen pattern for long batch runs.
  • Test coverage — 9 new tests asserting exact batch boundaries, clamping (low/high), order preservation, empty input, and accumulation. ResponseProvider body-recording is a clean, reusable addition.
  • Doc-comment cross-linkslookupRecords and discoverUserIdentities now point to their chunking counterparts; chunkedBatches explains the no-ceiling design rationale.
  • maxRecordsPerRequest promoted to public — callers can now size their own pre-checks or UI displays without a magic constant.
  • Scope decisions documented — the PR body and CLAUDE.md both explain why lookup/email, lookup/id, and listZones are excluded. Saves future readers from re-investigating.
  • Import conventions — all new files correctly use explicit internal import / public import per project rules.

claude added 2 commits May 26, 2026 10:45
- Clamp the stderr batch-count preview in lookup-all/discover-all to
  1...maxRecordsPerRequest, fixing a division-by-zero when --batch-size 0
- Use CloudKitService.maxRecordsPerRequest instead of the literal 200 for
  the batchSize defaults in LookupConfig/DiscoverConfig
- Stop wrapping CloudKitError in LookupError in LookupAllRecordsCommand so
  it propagates structured errors like DiscoverAllUserIdentitiesCommand
- Remove the transparent makeLookupRecordsService alias; call makeUserService
  directly and add a test asserting a failing batch throws and stops the loop
- Clarify the singleBatch comment and the CLAUDE.md no-arg discover annotation

https://claude.ai/code/session_01CiP2JokL5xB62ctP1iFasJ
The batchSize default arguments on the public LookupConfig/DiscoverConfig
initializers reference CloudKitService.maxRecordsPerRequest, which requires
the MistKit import to be public rather than internal.

https://claude.ai/code/session_01CiP2JokL5xB62ctP1iFasJ
@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

This PR adds a chunkedBatches engine and two new public auto-chunking conveniences — lookupAllRecords and discoverAllUserIdentities(lookupInfos:batchSize:) — following the established queryRecords/queryAllRecords pattern. The scope decisions (skipping deprecated endpoints, no chunking for listZones) are well-reasoned and clearly documented. Overall this is a clean, well-tested addition.


Code Quality

Strengths:

  • chunkedBatches is properly generic and reusable — the two call sites are thin wrappers.
  • Typed async throws(CloudKitError) is used consistently throughout.
  • Task.checkCancellation() between batches is the right cooperative cancellation hook.
  • batchSize clamping (min(max(batchSize, 1), maxRecordsPerRequest)) happens once in chunkedBatches, so the public API naturally enforces the invariant regardless of what callers pass.
  • Empty-input fast path (returns [] without issuing any requests) is explicit and documented.
  • Sequential batching (rather than parallel) is the correct call — preserves input order, avoids per-session rate-limit risk, and keeps error handling simple.

Issues

Minor: stderr note shows raw config.batchSize instead of the effective value

In both DiscoverAllUserIdentitiesCommand (line 131) and LookupAllRecordsCommand (line 231):

let effectiveBatchSize = min(max(config.batchSize, 1), CloudKitService.maxRecordsPerRequest)
let batches = (config.emails.count + effectiveBatchSize - 1) / effectiveBatchSize
let note = "discover-all: \(config.emails.count) lookup(s), batchSize \(config.batchSize)\(batches) request(s)"
//                                                                         ^^^^^^^^^^^^ should be effectiveBatchSize

If the caller passes --batch-size 0, the note says "batchSize 0 → 5 requests" but the actual behavior is batchSize=1. The note should use effectiveBatchSize in the label so the output reflects what actually happened. Low risk since the actual API call still gets the unclamped value (clamped again internally), but the log is misleading.

Minor: batchSize field leaks into base config structs used by non--all commands

DiscoverConfig and LookupConfig now carry batchSize even though the plain discover and lookup commands ignore it. This is documented with a comment but it means CLOUDKIT_BATCH_SIZE=1 in a .env file would silently be picked up without effect by those commands. Not a bug, just a potential footgun worth mentioning in the --help text or config docs.


Test Coverage

The test suite is comprehensive:

  • Empty input, single batch, multi-batch accumulation, input-order preservation.
  • batchSize clamped low (0 → 1) and high (9999 → 200).
  • Error propagation stops the loop at the failed batch (verifies call count == 2 out of 3).

Gap: no error-propagation test for discoverAllUserIdentities

LookupAllRecords has a failingBatchPropagates test. The equivalent test is absent for UserIdentityChunking. Since both go through the same engine it's low-risk, but the symmetry would be nice for future readers and for catching any regression if the delegation ever changes.


Performance

No concerns. Sequential batching is correct here. The Array(items[index..<end]) copy for each batch is fine for ≤200-item slices.


Security

No concerns. batchSize clamping prevents an unbounded number of requests from being issued against CloudKit's API via a user-supplied value.


Summary

This is a well-scoped, well-tested PR. The two minor issues (misleading stderr label and config struct leakage) are cosmetic rather than correctness problems. The missing error-propagation test for discoverAllUserIdentities is the only substantive gap worth addressing before merge.

🤖 Generated with Claude Code

@leogdion leogdion merged commit cf3bb66 into v1.0.0-beta.2 May 26, 2026
72 checks passed
@leogdion leogdion deleted the 307-auto-paginating-extension branch May 26, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants