Auto-chunking conveniences for batch operations (#307) by leogdion · Pull Request #389 · brightdigit/MistKit

leogdion · 2026-05-26T12:40:22Z

Closes #307 (partial — see scope note).

Summary

Applies the page-primitive + auto-paginating-extension pattern (established by queryRecords/queryAllRecords) to the remaining non-deprecated, 200-item-capped batch operations. Most of #307 was already done on this milestone (fetchAllRecordChanges, fetchAllZoneChanges); this PR closes the gap for the batch-chunking candidates.

New public API

Primitive (single request)	Auto-chunking convenience
`lookupRecords(recordNames:desiredKeys:database:)`	`lookupAllRecords(recordNames:desiredKeys:database:batchSize:)`
`discoverUserIdentities(lookupInfos:)`	`discoverAllUserIdentities(lookupInfos:batchSize:)`

Both delegate to a shared internal chunkedBatches engine: splits the input into ≤batchSize batches (clamped to 1...maxRecordsPerRequest), calls the primitive per batch, concatenates results in input order. Chunk count is ceil(n/batchSize) — deterministic and finite — so there is no maxPages-style throwing ceiling; batchSize (default maxRecordsPerRequest) is the only knob. CloudKitService.maxRecordsPerRequest is now public.

Scope decisions

users/lookup/email and users/lookup/id get no chunking convenience. Both are deprecated by Apple in favor of POST users/discover (verified against Apple's archived CloudKit Web Services reference). Adding new public API on deprecated endpoints would be wrong; callers needing >200 should use discoverAllUserIdentities(lookupInfos:). The existing primitives are left as-is (no deprecation annotations) pending public feedback.
listZones is not a pagination candidate — zones/list (GET) returns every zone in one response, no continuation marker.
modifyRecords/sync<T> already chunk by 200; fetchAllRecordChanges/fetchAllZoneChanges already implement the pattern.

Tests

ResponseProvider now records request count + bodies, so tests assert exact batch boundaries, input order, accumulation, batchSize clamping (low/high), and empty input. 9 new chunking tests; full suite (529) passes. ./Scripts/lint.sh clean (swiftlint, headers, periphery).

MistDemo

Adds lookup-all and discover-all subcommands with a --batch-size flag. Verified end-to-end against live CloudKit: --batch-size 1 issues N separate requests and the results are correctly accumulated in order (identical to a single --batch-size 200 request).

🤖 Generated with Claude Code

Apply the page-primitive + auto-paginating-extension pattern (from queryRecords/queryAllRecords) to the non-deprecated 200-item-capped batch operations: - lookupAllRecords(recordNames:desiredKeys:database:batchSize:) over the records/lookup primitive - discoverAllUserIdentities(lookupInfos:batchSize:) over the users/discover (POST) primitive Both delegate to a shared chunkedBatches engine that splits input into <=batchSize batches (clamped to 1...maxRecordsPerRequest), calls the primitive per batch, and concatenates results in input order. Chunk count is deterministic, so there is no maxPages-style throwing ceiling; batchSize is the only knob. maxRecordsPerRequest is now public. The users/lookup/email and users/lookup/id primitives are deprecated by Apple in favor of users/discover (verified against Apple's archived CloudKit Web Services reference), so they intentionally get no chunking convenience. listZones (zones/list) has no continuation marker and is not a pagination candidate. fetchAllRecordChanges / fetchAllZoneChanges already implement the pattern. Tests: ResponseProvider now records request count/bodies so tests can assert exact batch boundaries, order, accumulation, batchSize clamping, and empty input. MistDemo: adds `lookup-all` and `discover-all` subcommands with a `--batch-size` flag for exercising chunking end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-26T12:40:31Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5028f5e7-3bee-4bc1-bb78-496c07203caa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 307-auto-paginating-extension

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-05-26T13:19:26Z

Code Review — PR #389: Auto-chunking conveniences for batch operations

Overview

Solid addition that follows the established queryRecords/queryAllRecords primitive+convenience pattern. The shared chunkedBatches engine is clean, the typed-throws signature is correct, and the test infrastructure upgrades to ResponseProvider (capturing request log, call counts, bodies) are genuinely useful.

Issues

Bug: Division-by-zero in command batch-count preview

Both LookupAllRecordsCommand and DiscoverAllUserIdentitiesCommand compute the batch count for a stderr note before chunkedBatches gets a chance to clamp:

// LookupAllRecordsCommand.swift:219
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If a caller passes --batch-size 0 (or batch.size=0 in the env), this panics with a division-by-zero before the request is ever sent. chunkedBatches would have safely clamped to 1. The fix is either to clamp locally before the division or to derive the note after the call returns.

Hardcoded `200` in config parsing instead of `CloudKitService.maxRecordsPerRequest`

DiscoverConfig.swift:275 and LookupConfig.swift:317:

let batchSize = configuration.int(forKey: …, default: 200) ?? 200

Using the literal ties the default to a magic number rather than the constant. If the cap ever changes, this silently diverges. Prefer CloudKitService.maxRecordsPerRequest (it's now public).

Code-quality observations

`makeLookupRecordsService` is a transparent alias

CloudKitServiceTests.BatchChunking+Helpers.swift:700-703 defines makeLookupRecordsService as a one-liner that calls makeUserService. It adds a call-site indirection without adding anything (both use the same credentials). Either give it a distinct implementation or call makeUserService directly in the LookupAllRecords tests.

`LookupAllRecordsCommand` wraps errors unnecessarily

DiscoverAllUserIdentitiesCommand propagates CloudKitError directly. LookupAllRecordsCommand wraps every non-LookupError in LookupError.operationFailed(error.localizedDescription), which loses the underlying CloudKitError type and its structured fields. Either both commands should wrap (consistently) or neither should; as-is the discrepancy is surprising and the wrapping is lossy.

No test for error propagation

There's a test for empty input and for clamping, but no test asserting that a failing batch throws CloudKitError and stops the loop. The behavior is correct by construction, but a test would prevent a regression if chunkedBatches is ever refactored.

Documentation nit

CLAUDE.md still lists discoverAllUserIdentities() (unavailable — pending #28) in the CloudKitService+UserOperations.swift row, while the new table just below it adds discoverAllUserIdentities(lookupInfos:batchSize:). The stale unavailable annotation for the no-arg address-book form is confusing now that the name is overloaded — worth a clarifying pass so readers understand that this overload is new and the no-arg form remains unavailable.

Minor

singleBatch test asserts results.count == Self.recordsPerCall (2), not names.count (5). The assertion is correct — the mock returns 2 records regardless of input — but the comment says "passes results through", which reads as if results should mirror input size. Worth either renaming the constant or adding a comment explaining that mock results ≠ input length by design.

Summary

The core chunking engine and public API are well-designed and follow project conventions. The division-by-zero in the batch-count preview is the only crash-path bug worth fixing before merge. The hardcoded 200 and the makeLookupRecordsService alias are minor cleanups. Everything else is optional polish.

🤖 Generated with Claude Code

codecov · 2026-05-26T13:22:45Z

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (v1.0.0-beta.2@cd083b6). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...loudKitService/CloudKitService+BatchChunking.swift	93.33%	1 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff                @@
##             v1.0.0-beta.2     #389   +/-   ##
================================================
  Coverage                 ?   70.06%           
================================================
  Files                    ?      165           
  Lines                    ?     3758           
  Branches                 ?        0           
================================================
  Hits                     ?     2633           
  Misses                   ?     1125           
  Partials                 ?        0

Flag	Coverage Δ
mistdemo-spm-macos	`10.85% <0.00%> (?)`
mistdemo-swift-6.2-jammy	`10.88% <0.00%> (?)`
mistdemo-swift-6.2-noble	`10.85% <0.00%> (?)`
mistdemo-swift-6.3-jammy	`10.96% <0.00%> (?)`
mistdemo-swift-6.3-noble	`10.85% <0.00%> (?)`
spm	`70.45% <96.96%> (?)`
swift-6.1-jammy	`70.72% <96.96%> (?)`
swift-6.1-noble	`70.53% <96.96%> (?)`
swift-6.2-jammy	`70.78% <96.96%> (?)`
swift-6.2-noble	`70.58% <96.96%> (?)`
swift-6.3-jammy	`70.58% <96.96%> (?)`
swift-6.3-noble	`70.67% <96.96%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude · 2026-05-26T13:59:36Z

Code Review — Auto-chunking conveniences for batch operations (#307)

Overview

Clean, well-scoped PR. The chunkedBatches generic engine is a good abstraction; the decision not to add chunking conveniences on deprecated endpoints is correct; and the test suite covers the happy path thoroughly. A few issues below worth addressing before merge.

Bugs / Correctness

1. Division-by-zero crash when --batch-size 0 is passed in MistDemo

Both command implementations compute the expected batch count for the progress log before passing batchSize to chunkedBatches (which clamps it internally):

// LookupAllRecordsCommand.swift:219, DiscoverAllUserIdentitiesCommand.swift:123
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If a user passes --batch-size 0, this divides by zero and crashes before the service even runs. The internal engine never gets a chance to clamp. The fix is either to derive the log message after the call (use results.count / actual batch observations), or clamp batchSize before the calculation:

let effectiveBatchSize = max(1, min(config.batchSize, CloudKitService.maxRecordsPerRequest))
let batches = (config.recordNames.count + effectiveBatchSize - 1) / effectiveBatchSize

Note also that because clamping happens only inside chunkedBatches, when batchSize is very large (e.g. 9999) the log will say "1 request" but the engine will correctly issue ceil(n/200) requests — log and reality diverge. Deriving the log count from effectiveBatchSize would fix both.

Design / Style

2. makeLookupRecordsService is a transparent alias for makeUserService

CloudKitServiceTests.BatchChunking+Helpers.swift:

internal static func makeLookupRecordsService(provider: ResponseProvider) throws -> CloudKitService {
  try makeUserService(provider: provider)
}

This wrapping function adds no behaviour and will confuse the next reader who wonders whether the two differ. Tests in LookupAllRecords call it, but they could call makeUserService directly. Worth removing.

3. batchSize defaults are hardcoded 200 instead of CloudKitService.maxRecordsPerRequest

DiscoverConfig.swift and LookupConfig.swift both hard-code 200:

batchSize: Int = 200
configuration.int(forKey: MistDemoConstants.ConfigKeys.batchSize, default: 200) ?? 200

maxRecordsPerRequest is now public specifically to be the single source of truth. These should reference it so that if the constant is ever updated the demo config tracks automatically.

4. Inconsistent error handling between the two new demo commands

LookupAllRecordsCommand.execute() wraps every non-LookupError into LookupError.operationFailed(error.localizedDescription), discarding the original CloudKitError details. DiscoverAllUserIdentitiesCommand.execute() propagates errors directly. The inconsistency is jarring and the wrapping pattern specifically loses structured error info (e.g. HTTP status, CloudKit error code). Aligning both to propagate directly would be cleaner for a demo that's meant to show what the library returns.

Test Coverage Gaps

5. No test for error propagation from a mid-run failing batch

The chunking loop:

results.append(contentsOf: try await perBatch(Array(items[index..<end])))

...should short-circuit on the first failure and not call subsequent batches. This is the correct behaviour, but there is no test asserting it. A test that queues a success response followed by an error response and verifies (a) the call throws the error and (b) only the first batch was issued would close this gap.

6. No test for the Task.checkCancellation() path

The cancellation check is the only non-trivial control-flow in chunkedBatches that is not currently covered. A test that cancels the task between batches (via TaskGroup or Task { … }.cancel()) would be straightforward given the current ResponseProvider infrastructure.

Minor / Performance

7. results array is not pre-sized

var results: [Output] = []

For large inputs the array will reallocate multiple times. Since the total item count is known upfront from items.count, adding results.reserveCapacity(items.count) (or a proportional estimate) would avoid unnecessary copies. Low priority but worth noting.

What's Good

The chunkedBatches generic implementation is clean, minimal, and easy to audit.
Typed throws (throws(CloudKitError)) used consistently throughout.
Cooperative cancellation (Task.checkCancellation()) is wired in correctly.
Test suite asserts exact batch boundaries, sizes, and input-order preservation — the ResponseProvider request-log additions are a solid investment.
Scope decisions (no chunking for deprecated endpoints, listZones correctly excluded) are well-reasoned and documented.
maxRecordsPerRequest going public is the right call for callers who want to size their own batches.

🤖 Generated with Claude Code

claude · 2026-05-26T14:30:48Z

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

This PR adds auto-chunking convenience wrappers for two CloudKit batch endpoints capped at 200 items per request: lookupAllRecords (over lookupRecords) and discoverAllUserIdentities(lookupInfos:) (over discoverUserIdentities). The design mirrors the existing queryRecords/queryAllRecords pagination pattern, sharing a single internal chunkedBatches engine. Overall the architecture is sound and the scope decisions (no chunking for deprecated lookup/email/lookup/id, listZones excluded) are well-reasoned.

Issues

Bug: Division by zero in MistDemo batch-count display

Both new commands compute and print the expected request count before the library can clamp batchSize, so a user-supplied zero crashes:

// LookupAllRecordsCommand.swift:219 / DiscoverAllUserIdentitiesCommand.swift:123
let batches = (config.recordNames.count + config.batchSize - 1) / config.batchSize

If --batch-size 0 is passed, this divides by zero. The library clamps after the fact (min(max(batchSize, 1), maxRecordsPerRequest)), but the display runs first. Fix: use max(config.batchSize, 1) in the denominator, or apply the clamp in the config parser before the field is stored.

Minor: Inconsistent error handling between the two commands

LookupAllRecordsCommand wraps all errors in a catch-all LookupError.operationFailed; DiscoverAllUserIdentitiesCommand propagates errors directly. Both commands should follow the same pattern.

Design Observations

Sequential batch execution

chunkedBatches issues one request at a time in order. For lookupAllRecords with 450 names this is 3 sequential round trips where 3 parallel ones would be equivalent (the sub-requests are independent). Sequential is a reasonable v1 call — it's simpler and implicitly rate-limit-friendly — but a note in the engine doc-string would help callers reason about latency for large inputs.

Test helper indirection

makeLookupRecordsService is a one-liner alias for makeUserService with the same credentials and no added semantics. One factory is sufficient; the indirection adds noise.

Positives

Correct clamping — min(max(batchSize, 1), maxRecordsPerRequest) handles both bounds cleanly in the engine.
Empty-input fast path — zero items returns immediately without hitting the network; tested.
Typed throws — async throws(CloudKitError) throughout is consistent with the rest of the library.
Cancellation check — Task.checkCancellation() at the top of each iteration is a good-citizen pattern for long batch runs.
Test coverage — 9 new tests asserting exact batch boundaries, clamping (low/high), order preservation, empty input, and accumulation. ResponseProvider body-recording is a clean, reusable addition.
Doc-comment cross-links — lookupRecords and discoverUserIdentities now point to their chunking counterparts; chunkedBatches explains the no-ceiling design rationale.
maxRecordsPerRequest promoted to public — callers can now size their own pre-checks or UI displays without a magic constant.
Scope decisions documented — the PR body and CLAUDE.md both explain why lookup/email, lookup/id, and listZones are excluded. Saves future readers from re-investigating.
Import conventions — all new files correctly use explicit internal import / public import per project rules.

- Clamp the stderr batch-count preview in lookup-all/discover-all to 1...maxRecordsPerRequest, fixing a division-by-zero when --batch-size 0 - Use CloudKitService.maxRecordsPerRequest instead of the literal 200 for the batchSize defaults in LookupConfig/DiscoverConfig - Stop wrapping CloudKitError in LookupError in LookupAllRecordsCommand so it propagates structured errors like DiscoverAllUserIdentitiesCommand - Remove the transparent makeLookupRecordsService alias; call makeUserService directly and add a test asserting a failing batch throws and stops the loop - Clarify the singleBatch comment and the CLAUDE.md no-arg discover annotation https://claude.ai/code/session_01CiP2JokL5xB62ctP1iFasJ

The batchSize default arguments on the public LookupConfig/DiscoverConfig initializers reference CloudKitService.maxRecordsPerRequest, which requires the MistKit import to be public rather than internal. https://claude.ai/code/session_01CiP2JokL5xB62ctP1iFasJ

claude · 2026-05-26T14:47:03Z

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

This PR adds a chunkedBatches engine and two new public auto-chunking conveniences — lookupAllRecords and discoverAllUserIdentities(lookupInfos:batchSize:) — following the established queryRecords/queryAllRecords pattern. The scope decisions (skipping deprecated endpoints, no chunking for listZones) are well-reasoned and clearly documented. Overall this is a clean, well-tested addition.

Code Quality

Strengths:

chunkedBatches is properly generic and reusable — the two call sites are thin wrappers.
Typed async throws(CloudKitError) is used consistently throughout.
Task.checkCancellation() between batches is the right cooperative cancellation hook.
batchSize clamping (min(max(batchSize, 1), maxRecordsPerRequest)) happens once in chunkedBatches, so the public API naturally enforces the invariant regardless of what callers pass.
Empty-input fast path (returns [] without issuing any requests) is explicit and documented.
Sequential batching (rather than parallel) is the correct call — preserves input order, avoids per-session rate-limit risk, and keeps error handling simple.

Issues

Minor: stderr note shows raw config.batchSize instead of the effective value

In both DiscoverAllUserIdentitiesCommand (line 131) and LookupAllRecordsCommand (line 231):

let effectiveBatchSize = min(max(config.batchSize, 1), CloudKitService.maxRecordsPerRequest)
let batches = (config.emails.count + effectiveBatchSize - 1) / effectiveBatchSize
let note = "discover-all: \(config.emails.count) lookup(s), batchSize \(config.batchSize) → \(batches) request(s)"
//                                                                         ^^^^^^^^^^^^ should be effectiveBatchSize

If the caller passes --batch-size 0, the note says "batchSize 0 → 5 requests" but the actual behavior is batchSize=1. The note should use effectiveBatchSize in the label so the output reflects what actually happened. Low risk since the actual API call still gets the unclamped value (clamped again internally), but the log is misleading.

Minor: batchSize field leaks into base config structs used by non--all commands

DiscoverConfig and LookupConfig now carry batchSize even though the plain discover and lookup commands ignore it. This is documented with a comment but it means CLOUDKIT_BATCH_SIZE=1 in a .env file would silently be picked up without effect by those commands. Not a bug, just a potential footgun worth mentioning in the --help text or config docs.

Test Coverage

The test suite is comprehensive:

Empty input, single batch, multi-batch accumulation, input-order preservation.
batchSize clamped low (0 → 1) and high (9999 → 200).
Error propagation stops the loop at the failed batch (verifies call count == 2 out of 3).

Gap: no error-propagation test for discoverAllUserIdentities

LookupAllRecords has a failingBatchPropagates test. The equivalent test is absent for UserIdentityChunking. Since both go through the same engine it's low-risk, but the symmetry would be nice for future readers and for catching any regression if the delegation ever changes.

Performance

No concerns. Sequential batching is correct here. The Array(items[index..<end]) copy for each batch is fine for ≤200-item slices.

Security

No concerns. batchSize clamping prevents an unbounded number of requests from being issued against CloudKit's API via a user-supplied value.

Summary

This is a well-scoped, well-tested PR. The two minor issues (misleading stderr label and config struct leakage) are cosmetic rather than correctness problems. The missing error-propagation test for discoverAllUserIdentities is the only substantive gap worth addressing before merge.

🤖 Generated with Claude Code

leogdion marked this pull request as ready for review May 26, 2026 13:57

claude Bot mentioned this pull request May 26, 2026

Squashed commit of the following: #391

Merged

claude added 2 commits May 26, 2026 10:45

leogdion merged commit cf3bb66 into v1.0.0-beta.2 May 26, 2026
72 checks passed

leogdion deleted the 307-auto-paginating-extension branch May 26, 2026 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-chunking conveniences for batch operations (#307)#389

Auto-chunking conveniences for batch operations (#307)#389
leogdion merged 3 commits into
v1.0.0-beta.2from
307-auto-paginating-extension

leogdion commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

claude Bot commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leogdion commented May 26, 2026

Summary

New public API

Scope decisions

Tests

MistDemo

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

claude Bot commented May 26, 2026

Code Review — PR #389: Auto-chunking conveniences for batch operations

Overview

Issues

Bug: Division-by-zero in command batch-count preview

Hardcoded 200 in config parsing instead of CloudKitService.maxRecordsPerRequest

Code-quality observations

makeLookupRecordsService is a transparent alias

LookupAllRecordsCommand wraps errors unnecessarily

No test for error propagation

Documentation nit

Minor

Summary

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot commented May 26, 2026

Code Review — Auto-chunking conveniences for batch operations (#307)

Overview

Bugs / Correctness

Design / Style

Test Coverage Gaps

Minor / Performance

What's Good

Uh oh!

claude Bot commented May 26, 2026

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

Issues

Design Observations

Positives

Uh oh!

claude Bot commented May 26, 2026

Code Review: Auto-chunking conveniences for batch operations (#307)

Overview

Code Quality

Issues

Test Coverage

Performance

Security

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Hardcoded `200` in config parsing instead of `CloudKitService.maxRecordsPerRequest`

`makeLookupRecordsService` is a transparent alias

`LookupAllRecordsCommand` wraps errors unnecessarily

codecov Bot commented May 26, 2026 •

edited

Loading