fix(sync): bulk-load synthetics_tests for full-scan lookups in minimize-reads mode#547
Conversation
michael-richey
left a comment
There was a problem hiding this comment.
Review
Overall: Approve. The root cause is correctly identified, the fix is targeted, and the test coverage is thorough. A few nits below.
state.py — ensure_resource_type_loaded
Solid implementation. Specific things I noticed that are worth calling out as intentional:
- Marking the type in
_bulk_loaded_typesbefore theget()call correctly prevents an infinite retry on persistent storage failure. Good. - The early return when
not src_loaded and not dst_loadeddoes not re-raise or retry on the next call — by design (type is already in the set). Worth confirming this is the desired behavior when a resource type has no data at all (e.g.synthetics_testsgenuinely not exported), since a second caller would assume it's been loaded. - The insert-if-absent merge and
_ensure_attemptedbackfill are exactly right — no clobbering mid-sync state, and per-resourceensure_resource_loadedcalls for already-bulk-loaded keys become no-ops.
Nit — docstring asyncio safety claim:
"Python GIL protects set/dict operations"
This is technically true for individual ops but slightly misleading. The actual safety guarantee here is that ensure_resource_type_loaded has no await points, so asyncio cooperative scheduling can't interleave another coroutine mid-execution. The GIL note is correct but may confuse readers into thinking this is threadsafe in the general multi-thread sense. Could read: "Fully synchronous with no await points — safe to call from concurrent asyncio workers without locking."
monitors.py
Moving synthetics_tests and slos lookups out of the function preamble and into the branches that need them is the right call. It eliminates the eager defaultdict access that would silently return {} in minimize-reads mode and cause the scan to find nothing.
The "synthetics_tests": [] entry in resource_connections is confirmed correct: the inner loop in _resource_connections / connect_resources iterates over attribute paths, so an empty list produces no find_attr calls, no topological edges, and no ensure_resource_loaded triggers. It's purely documentary, which is exactly what the comment says.
synthetics_global_variables.py
Nit: The call is ensure_resource_type_loaded(resource_to_connect) rather than ensure_resource_type_loaded("synthetics_tests"). Today resource_to_connect is always synthetics_tests for this model (per resource_connections in line 19), so it's equivalent. But it means if a new resource_to_connect is ever added to this model's config, it would get bulk-loaded unconditionally on every connect_id call. Compare with synthetics_test_suites.py which guards inside if resource_to_connect == "synthetics_tests":. Not a correctness issue now, just slightly over-broad.
Tests
- Green/green regression tests are present and cover the right cases (composite remap, SLO alert, non-composite no-op, principals).
- Red/green tests use real
LocalFile+Staterather than mocked storage — this is the right call for validating the lazy-load path end-to-end. test_handles_empty_storage_gracefullyassertingstate._data.source.get("synthetics_tests") is Noneis a good guard against accidentally polluting the defaultdict with empty entries when storage returns nothing.test_non_composite_does_not_trigger_bulk_loadandtest_slo_regular_monitor_does_not_trigger_bulk_loadcorrectly assert that the happy/fast path doesn't pay the bulk-load cost — important perf property to lock in.
All CI checks green. The two nits above are minor and don't block merging.
e808545 to
4ddaf6e
Compare
…n minimize-reads mode In --minimize-reads mode, connect_id overrides in monitors, SLOs, synthetics_test_suites, and synthetics_global_variables perform endswith/startswith scans over all synthetics_tests entries. These scans require the full type to be present, but minimize-reads only loads the requested resource types up front. Root cause: two variants — 1. monitors.py and service_level_objectives.py accessed state.destination["synthetics_tests"] eagerly at the top of connect_id, getting an empty defaultdict when synthetics_tests was not in scope. 2. synthetics_test_suites.py and synthetics_global_variables.py went through resource_connections but ensure_resource_loaded() cannot find compound keys (public_id#monitor_id) from a bare public_id. Fix: add State.ensure_resource_type_loaded(resource_type) — a generic lazy bulk-load method that loads an entire resource type from storage on demand. Uses insert-if-absent merge to protect mid-sync destination entries. Idempotent via _bulk_loaded_types set. Call sites updated to invoke ensure_resource_type_loaded just before the full-scan loop, keeping non-affected code paths untouched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ans in minimize-reads mode
SyntheticsTests.connect_id() for the synthetics_tests -> synthetics_tests
path performs a startswith scan over compound keys ({public_id}#{monitor_id})
to remap subtestPublicId references. In minimize-reads mode only explicitly
requested test IDs are loaded, so referenced subtests are absent and the scan
finds nothing.
The framework-level ensure_resource_loaded() cannot compensate because it does
exact-key lookup and the state key is compound while the subtestPublicId is a
bare public_id.
Fix: add ensure_resource_type_loaded("synthetics_tests") before the scan,
mirroring the identical pattern already in synthetics_test_suites.py:101,
monitors.py:145, service_level_objectives.py:89, and
synthetics_global_variables.py:117.
RED/GREEN: TestSyntheticsTestsSelfRefMinimizeReads (3 tests) + GREEN/GREEN
regression: TestSyntheticsTestsConnectIdRegression (3 tests). Full suite: 409
passed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2100e48 to
cd59981
Compare
Summary
State.ensure_resource_type_loaded(resource_type)— a generic lazy bulk-load method that loads an entire resource type from storage on first call, using insert-if-absent merge to protect mid-sync destination entriesmonitors.pyandservice_level_objectives.py: removes eager top-level access tostate.destination["synthetics_tests"]; moves access into the composite/fallback branches with aensure_resource_type_loadedcall before each scansynthetics_test_suites.pyandsynthetics_global_variables.py: addsensure_resource_type_loadedbefore thestartswithscan that requires all synthetics_tests compound keys to be present"synthetics_tests": []to monitorsresource_connections(documents implicit dependency; empty attr list creates no topological edges)Root cause:
endswith/startswithscans inconnect_idoverrides require ALLsynthetics_testsentries to be loaded. In--minimize-readsmode only the requested resource types are loaded up front. Per-resource lazy loading can't resolve compound keys (public_id#monitor_id) from a bare monitor ID.Stacked on:
hamr/minimize-reads-3-id-targetedTest plan
TestEnsureResourceTypeLoaded,TestMonitorsCompositeMinimizeReads,TestSLOConnectIdMinimizeReads,TestSyntheticsTestSuitesMinimizeReads,TestSyntheticsGlobalVariablesMinimizeReads)tox -e ruff,black: clean🤖 Generated with Claude Code