Skip to content

feat(sync): add --minimize-reads with type-scoped storage loading#544

Merged
michael-richey merged 3 commits intomainfrom
hamr/minimize-reads-2-type-scoped
Apr 27, 2026
Merged

feat(sync): add --minimize-reads with type-scoped storage loading#544
michael-richey merged 3 commits intomainfrom
hamr/minimize-reads-2-type-scoped

Conversation

@michael-richey
Copy link
Copy Markdown
Collaborator

@michael-richey michael-richey commented Apr 23, 2026

Summary

Stacked on #543.

For large orgs (10,000+ resources), sync-cli loads all resource files from cloud storage even when syncing a small subset (e.g. --resources=roles). This causes managed-sync Phase 2/3 to take ~25 minutes each, leaving no time for Phase 6.

Adds --minimize-reads flag (sync command only) that scopes storage reads to only the requested resource type(s):

  • Type-scoped strategy: when --resources=roles, only list/fetch roles.* files instead of all 10,000 files
  • For --resources=roles with 10,000 total resources: reduces reads from ~20,000 to ~100

Constraints (enforced at CLI parse time):

  • Requires --resource-per-file
  • Requires --resources
  • Restricted to sync command only
  • Must not be combined with --cleanup

ID-targeted loading (exact filter matches → direct key fetch, no listing) added in the next PR.

Test plan

  • pytest tests/unit/test_minimize_reads_type_scoped.py — 10 tests pass
  • pytest tests/unit/ — full regression, 360 tests pass
  • Verify diffs/import commands reject --minimize-reads with "no such option"

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

@michael-richey michael-richey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staff review of --minimize-reads (type-scoped strategy). The design is clean: scoping storage reads to requested resource types is the right approach, and the fallback semantics (None = full load) preserve backward compatibility perfectly. The _check_id_collisions fix from log-only to active skip is a correct behavior change. Test coverage is solid for the happy path.

Critical

  • --minimize-reads + --cleanup combination is undocumented in code and unenforceable by help text alone. The code only validates resource_per_file and resources; the cleanup incompatibility is absent. This is a data-loss risk: cleanup compares ALL destination resources against scoped source state and would delete everything not in scope.

Significant

  • "MODIFIED" comment in state.py's load_state is a dev note, not a permanent code comment.
  • "ID-targeted loading added in PR 3" in configuration.py leaks PR numbering into production code.

Minor

  • LocalFile.get() with resource_types scoping still scans the full directory (O(N) os.listdir), just skipping file opens. The performance win is real but "reduces reads" in the PR description overstates the LocalFile gain — for S3/GCS/Azure it's a genuine listing reduction, for local it's a file-open reduction.
  • No test for --minimize-reads --cleanup being rejected (since it isn't).

Inline comments on specific lines below.

Comment thread datadog_sync/utils/configuration.py
Comment thread datadog_sync/utils/state.py Outdated
Comment thread datadog_sync/utils/configuration.py Outdated
Comment thread tests/unit/test_minimize_reads_type_scoped.py
Comment thread datadog_sync/utils/storage/_base_storage.py
michael-richey added a commit that referenced this pull request Apr 24, 2026
- Add UsageError when --minimize-reads combined with --cleanup (data-loss guard)
- Clean up dev-note comments: 'MODIFIED' and 'PR 3' references removed
- Add test_minimize_reads_cannot_be_combined_with_cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@michael-richey michael-richey force-pushed the hamr/minimize-reads-2-type-scoped branch from dc8b99d to 7385592 Compare April 24, 2026 18:03
riyazsh
riyazsh previously approved these changes Apr 27, 2026
Base automatically changed from hamr/minimize-reads-1-sanitize to main April 27, 2026 15:41
@michael-richey michael-richey dismissed riyazsh’s stale review April 27, 2026 15:41

The base branch was changed.

michael-richey and others added 3 commits April 27, 2026 11:43
For large orgs (10,000+ resources), sync-cli loads all resource files from
cloud storage even when syncing a small subset. This causes Phase 2 (roles)
and Phase 3 (users) to take ~25 minutes each, leaving no time for Phase 6.

Adds --minimize-reads flag (sync command only) that scopes storage reads to
only the requested resource type(s). For --resources=roles with 10,000 total
resources, this reduces reads from ~20,000 to ~100 (10x per backend per origin).

Constraints: requires --resource-per-file and --resources; not compatible with
--cleanup; sync command only. ID-targeted loading (for exact filter matches)
added in the next PR.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add UsageError when --minimize-reads combined with --cleanup (data-loss guard)
- Clean up dev-note comments: 'MODIFIED' and 'PR 3' references removed
- Add test_minimize_reads_cannot_be_combined_with_cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@michael-richey michael-richey force-pushed the hamr/minimize-reads-2-type-scoped branch from 7385592 to 2c24451 Compare April 27, 2026 15:43
@michael-richey michael-richey merged commit aa2cbb9 into main Apr 27, 2026
11 checks passed
@michael-richey michael-richey deleted the hamr/minimize-reads-2-type-scoped branch April 27, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants