feat(sync): add --minimize-reads with type-scoped storage loading#544
Merged
michael-richey merged 3 commits intomainfrom Apr 27, 2026
Merged
feat(sync): add --minimize-reads with type-scoped storage loading#544michael-richey merged 3 commits intomainfrom
michael-richey merged 3 commits intomainfrom
Conversation
2 tasks
5d25cd9 to
dc8b99d
Compare
michael-richey
commented
Apr 24, 2026
Collaborator
Author
michael-richey
left a comment
There was a problem hiding this comment.
Staff review of --minimize-reads (type-scoped strategy). The design is clean: scoping storage reads to requested resource types is the right approach, and the fallback semantics (None = full load) preserve backward compatibility perfectly. The _check_id_collisions fix from log-only to active skip is a correct behavior change. Test coverage is solid for the happy path.
Critical
--minimize-reads+--cleanupcombination is undocumented in code and unenforceable by help text alone. The code only validatesresource_per_fileandresources; the cleanup incompatibility is absent. This is a data-loss risk: cleanup compares ALL destination resources against scoped source state and would delete everything not in scope.
Significant
- "MODIFIED" comment in
state.py'sload_stateis a dev note, not a permanent code comment. - "ID-targeted loading added in PR 3" in
configuration.pyleaks PR numbering into production code.
Minor
LocalFile.get()withresource_typesscoping still scans the full directory (O(N)os.listdir), just skipping file opens. The performance win is real but "reduces reads" in the PR description overstates the LocalFile gain — for S3/GCS/Azure it's a genuine listing reduction, for local it's a file-open reduction.- No test for
--minimize-reads --cleanupbeing rejected (since it isn't).
Inline comments on specific lines below.
michael-richey
commented
Apr 24, 2026
michael-richey
commented
Apr 24, 2026
michael-richey
commented
Apr 24, 2026
michael-richey
commented
Apr 24, 2026
michael-richey
commented
Apr 24, 2026
michael-richey
added a commit
that referenced
this pull request
Apr 24, 2026
- Add UsageError when --minimize-reads combined with --cleanup (data-loss guard) - Clean up dev-note comments: 'MODIFIED' and 'PR 3' references removed - Add test_minimize_reads_cannot_be_combined_with_cleanup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dc8b99d to
7385592
Compare
riyazsh
previously approved these changes
Apr 27, 2026
For large orgs (10,000+ resources), sync-cli loads all resource files from cloud storage even when syncing a small subset. This causes Phase 2 (roles) and Phase 3 (users) to take ~25 minutes each, leaving no time for Phase 6. Adds --minimize-reads flag (sync command only) that scopes storage reads to only the requested resource type(s). For --resources=roles with 10,000 total resources, this reduces reads from ~20,000 to ~100 (10x per backend per origin). Constraints: requires --resource-per-file and --resources; not compatible with --cleanup; sync command only. ID-targeted loading (for exact filter matches) added in the next PR. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add UsageError when --minimize-reads combined with --cleanup (data-loss guard) - Clean up dev-note comments: 'MODIFIED' and 'PR 3' references removed - Add test_minimize_reads_cannot_be_combined_with_cleanup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7385592 to
2c24451
Compare
riyazsh
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #543.
For large orgs (10,000+ resources), sync-cli loads all resource files from cloud storage even when syncing a small subset (e.g.
--resources=roles). This causes managed-sync Phase 2/3 to take ~25 minutes each, leaving no time for Phase 6.Adds
--minimize-readsflag (sync command only) that scopes storage reads to only the requested resource type(s):--resources=roles, only list/fetchroles.*files instead of all 10,000 files--resources=roleswith 10,000 total resources: reduces reads from ~20,000 to ~100Constraints (enforced at CLI parse time):
--resource-per-file--resourcessynccommand only--cleanupID-targeted loading (exact filter matches → direct key fetch, no listing) added in the next PR.
Test plan
pytest tests/unit/test_minimize_reads_type_scoped.py— 10 tests passpytest tests/unit/— full regression, 360 tests passdiffs/importcommands reject--minimize-readswith "no such option"🤖 Generated with Claude Code