feat(sync): add ID-targeted loading and per-dependency lazy loading#545
feat(sync): add ID-targeted loading and per-dependency lazy loading#545michael-richey merged 4 commits intomainfrom
Conversation
705472c to
ed342c2
Compare
5d25cd9 to
dc8b99d
Compare
michael-richey
left a comment
There was a problem hiding this comment.
Staff review of the ID-targeted loading + lazy dependency loading feature. The overall design is clean and the performance wins (40k → ~204 reads for the managed-sync use case) are compelling. Strategy selection in build_config is well-structured and the fallback chain is easy to follow. A few issues worth addressing before merge.
Critical
ensure_resource_loadedhas a repeated-miss performance problem when a dependency file doesn't exist at all —get_singleis called on every invocation for that ID forever.
Significant
get_by_idsis copy-pasted identically across S3, GCS, and Azure. It belongs in the base class.get_singleignores theoriginparameter and always reads both source and destination, wasting I/O when only one side is needed.- The
ValueErrorcatch inextract_exact_id_filtersis unreachable given the surroundingf.operator == EXACT_MATCH_OPERATORguard.
Minor
- No debug/info log when ID-targeted loading falls back to type-scoped — makes troubleshooting harder.
- Known
connect_id()override limitation (monitors, restriction_policies) should surface alog.warningrather than silently skip, since silent ID-remapping failure is hard to detect in production.
Inline comments on the specific locations below.
| # (source+destination) so the source check below is accurate, | ||
| # and so connect_resources() in _apply_resource_cb() can | ||
| # successfully remap the ID in the destination. | ||
| self.config.state.ensure_resource_loaded(resource_to_connect, f_id) |
There was a problem hiding this comment.
The PR description calls out a known limitation: monitors.py and restriction_policies.py have connect_id() overrides that access hardcoded destination types beyond what appears in resource_connections. Those types won't be in the scoped load and ensure_resource_loaded won't be called for them, so their IDs will silently not be remapped.
Silent ID-remapping failure is very hard to detect in production — the sync appears to succeed but references in the destination resource are stale. Even if fixing the root cause is deferred, the failure should be surfaced:
# After ensure_resource_loaded + source check:
if f_id in self.config.state.source[resource_to_connect]:
# it loaded successfully
else:
if self.config.state._minimize_reads:
log.warning(
'minimize-reads: dependency %s.%s not found in storage; '
'ID remapping may be incomplete', resource_to_connect, f_id
)
missing_resources.add((resource_to_connect, f_id))At minimum, the test plan should include a case that syncs a monitor widget in a dashboard (which exercises the monitors override path) to confirm it doesn't silently break.
There was a problem hiding this comment.
Fixed. Added a self.config.logger.warning call after the source check in _resource_connections. Noted in the comment that the monitors.py override (which accesses synthetics_tests directly in connect_id) won't be caught here since that path bypasses _resource_connections — fixing the root cause is a separate refactor.
dc8b99d to
7385592
Compare
25d9aa4 to
e808545
Compare
7385592 to
2c24451
Compare
Extends --minimize-reads with a faster strategy: when all filters for the requested type use Name=id+ExactMatch+OR, construct storage keys directly and fetch only those files — no listing needed. For managed-sync's typical invocation (100 dashboards with exact ID filters), this reduces reads from ~20,000 to ~204 (100 dashboards + up to 100 dependency files, source+destination). Phase 6 per-user-per-type invocations become viable. Cross-type dependencies (e.g. a dashboard referencing a monitor) are loaded lazily via State.ensure_resource_loaded(), called from _resource_connections() before the cross-type source state check. This loads both source and destination state for the specific dependency ID, ensuring connect_id() in _apply_resource_cb() can correctly remap IDs in the destination. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Consolidate get_by_ids into BaseStorage (was identically copied across all 4 backends) - Enforce --minimize-reads + --cleanup incompatibility with a UsageError - Fix ensure_resource_loaded repeated-miss bug using _ensure_attempted set - Use self.config.logger for minimize-reads warning in _resource_connections - Remove unused List imports from storage backends - Add debug log when ID-targeted loading falls back to type-scoped - Add defensive comment on ValueError catch in extract_exact_id_filters - Add tests: cleanup rejection, repeated-miss idempotency, process_filters integration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `from collections import defaultdict` to gcs_bucket and azure_blob_container (remote moved inline import to top-level in ee396f8) - Add `resource_per_file=True` guard to BaseStorage.get_by_ids (remote added this guard in backends; consolidate it in the base class) - Add `resource_per_file=True` to all State/backend constructions in tests that use exact_ids mode (guard now enforced at base class level) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge-base changed after approval.
e808545 to
4ddaf6e
Compare
Summary
Stacked on #544.
Extends
--minimize-readswith a faster strategy and lazy dependency loading:ID-targeted loading (fast path): When all filters use
Name=id+ExactMatch+OR, constructs storage keys directly and fetches only those files — no listing needed. For managed-sync's typical invocation (100 resources with exact ID filters), reduces reads from ~20,000 to ~204 (100 files + up to 100 dependency files, source+destination).Lazy dependency loading:
State.ensure_resource_loaded(type, id)lazily fetches cross-type dependencies when encountered in_resource_connections(). Loads both source and destination state soconnect_id()can remap IDs correctly.Strategy selection (automatic, in
build_config()):Name=id+ExactMatch+OR→ direct key fetchNote:
get_by_ids()requires--resource-per-fileand raisesValueErrorif called without it. AllStateinstantiations that passexact_idsmust also passresource_per_file=True.Performance (10k dashboards + 10k monitors, 1-to-1 deps):
--resources=dashboards --filter=...Name=id...(100 IDs)--resources=roles(500 roles, type-scoped)Known limitation:
monitors.pyandrestriction_policies.pyhaveconnect_id()overrides that access hardcoded destination types beyondresource_to_connect. With--minimize-reads, those types return empty dicts (no crash, just silent ID-remapping skip). Testing should confirm the specific types managed-sync syncs don't hit this.Test plan
pytest tests/unit/test_minimize_reads_id_targeted.py— 19 tests passpytest tests/unit/— full regression, 379 tests pass🤖 Generated with Claude Code