feat(workflow_engine): Add in hook for producing occurrences from the stateful detector#82
feat(workflow_engine): Add in hook for producing occurrences from the stateful detector#82camcalaquian wants to merge 1 commit into
Conversation
… stateful detector (getsentry#80168) This adds a hook that can be implemented to produce an occurrence specific to the detector that is subclassing the StatefulDetector. Also change the signature of evaluate to return a dict keyed by groupkey instead of a list. This helps avoid the chance of duplicate results for the same group key. <!-- Describe your PR here. -->
|
Tenki Code Review - Complete Files Reviewed: 4 By Severity:
PR #82 refactors the detector evaluation API to return a dict-based result set instead of a list, eliminating manual duplicate-detection logic. The production code changes are well-structured and maintainable. Two findings: one test correctness bug (hardcoded occurrence IDs mask value/fingerprint assertions) and one minor caching opportunity (group_type property). Files Reviewed (4 files) |
There was a problem hiding this comment.
Overview
This PR refactors the detector handler evaluation API from returning list[DetectorEvaluationResult] to dict[DetectorGroupKey, DetectorEvaluationResult]. The key changes include:
- API Contract Change:
DetectorHandler.evaluate()now returns a dict-keyed by group key, eliminating the need for caller-side duplicate detection - New Abstract Method:
build_occurrence_and_event_data()replaces a TODO comment, allowing handlers to generate occurrence data - Simplified Logic: Removed manual duplicate-key detection loop from
process_detectors() - Test Refactoring: Consolidated
StatefulDetectorHandlerTestMixinandTestProcessDetectorsinto a commonBaseDetectorHandlerTestbase class
Findings
🔴 Test Coverage Bug (Medium)
The new test helper build_mock_occurrence_and_event() creates all IssueOccurrence objects with a hardcoded id, causing all occurrences to compare equal regardless of actual differences in fingerprints or priority. This masks test assertion failures:
test_state_results_multi_groupconstructsoccurrence_2withvalue=6when the packet hasgroup_2: 10— the assertion passes anyway- Similar issues in
test_results_on_changeandtest_dedupe
Impact: Tests cannot catch bugs where build_occurrence_and_event_data() produces wrong fingerprints or evidence per group key.
🟡 Performance: Uncached Property (Low)
The new Detector.group_type property performs a dict lookup on every access with no memoization. Since detector types are immutable during a request, using @functools.cached_property would prevent redundant lookups when both group_type and detector_handler are accessed.
Code Quality Assessment
✅ Strengths:
- Clean API contract: dict-keyed results prevent duplicates by design
- Good test coverage breadth (multi-group, dedupe, state transitions, edge cases)
- Clear separation of concerns with new
build_occurrence_and_event_datahook - Safe refactoring of test base classes (consolidation improves maintainability)
- Test assertion flaw on occurrence values needs immediate fix before merge
- Minor optimization opportunity (caching) for production code
|
|
||
| def get_group_key_values(self, data_packet: DataPacket[dict]) -> dict[str, int]: | ||
| return data_packet.packet.get("group_vals", {}) | ||
| def build_mock_occurrence_and_event( |
There was a problem hiding this comment.
🟠 Test occurrences share hardcoded ID, masking incorrect value/fingerprint assertions (bug)
build_mock_occurrence_and_event always produces IssueOccurrence with the same hardcoded id ('eb4b0acffadb4d098d48cb14165ab578') and event_id. Since IssueOccurrence.__eq__ only compares id, every occurrence created by this helper compares equal to every other. In test_state_results_multi_group (line 190-191), occurrence_2 is constructed with value=6 instead of the actual packet value 10. The assertion on line 200-204 passes despite the wrong value because equality is id-only. Similarly test_results_on_change (line 489) and test_dedupe (line 509) call build_mock_occurrence_and_event with value=6 when the real packet values are 100 and 8 respectively. The tests will never fail even if build_occurrence_and_event_data produces occurrences with wrong fingerprints or wrong evidence.
💡 Suggestion: Use unique IDs per occurrence (e.g. uuid.uuid4().hex) inside build_mock_occurrence_and_event, and pass the actual packet value when constructing expected occurrences — e.g. value=10 for group_2. This forces occurrence_2 to be constructed identically to the one the handler produces, making the test meaningful. Alternatively, assert individual fields (fingerprint, initial_issue_priority) rather than relying on whole-object equality.
| def build_mock_occurrence_and_event( | |
| occurrence = IssueOccurrence( | |
| id=hashlib.md5(f'{group_key}:{value}'.encode()).hexdigest(), | |
| project_id=handler.detector.project_id, | |
| event_id=hashlib.md5(f'{group_key}:{value}:event'.encode()).hexdigest(), |
📋 Prompt for AI Agents
In tests/sentry/workflow_engine/processors/test_detector.py, the build_mock_occurrence_and_event helper (around lines 244-275) uses a hardcoded id='eb4b0acffadb4d098d48cb14165ab578' and event_id='43878ab4419f4ab181f6379ac376d5aa' for every IssueOccurrence it creates. Because IssueOccurrence.__eq__ only compares id, all occurrences compare equal to each other, so tests like test_state_results_multi_group pass even when the wrong value or fingerprint is used. Fix by: (1) replacing the hardcoded id and event_id with deterministic but unique values (e.g. hashlib.md5(f'{group_key}:{value}'.encode()).hexdigest()), (2) updating all call sites (test_state_results_multi_group line 191, test_results_on_change line 490, test_dedupe line 510) to pass the correct actual value that matches the data packet (e.g. value=10 for group_2 in test_state_results_multi_group). This ensures the assertions actually verify the correct occurrence is produced per group key.
| @property | ||
| def group_type(self) -> builtins.type[GroupType] | None: | ||
| return grouptype.registry.get_by_slug(self.type) |
There was a problem hiding this comment.
🟡 New group_type property performs uncached registry lookup on every access (bug)
The newly introduced Detector.group_type property in src/sentry/workflow_engine/models/detector.py (line 58-60) calls grouptype.registry.get_by_slug(self.type) on every property access without any caching. If anything calls this property multiple times per request (e.g. user code calling both detector.group_type and detector.detector_handler which itself calls self.group_type), the registry is looked up twice. Since the detector type doesn't change within a request lifetime, this could reasonably be memoized.
💡 Suggestion: Use functools.cached_property instead of @property for group_type so the registry lookup is performed at most once per Detector instance. This is safe because a Detector's type field is immutable during a request's lifetime.
| @property | |
| def group_type(self) -> builtins.type[GroupType] | None: | |
| return grouptype.registry.get_by_slug(self.type) | |
| @functools.cached_property | |
| def group_type(self) -> builtins.type[GroupType] | None: | |
| return grouptype.registry.get_by_slug(self.type) |
📋 Prompt for AI Agents
In src/sentry/workflow_engine/models/detector.py around lines 58-60, the group_type property calls grouptype.registry.get_by_slug(self.type) on every access with no caching. Change @property to @functools.cached_property (adding import functools at the top of the file) so that the registry lookup is performed at most once per Detector instance. This avoids redundant lookups when both detector.group_type and detector.detector_handler (which already calls self.group_type) are accessed.
Uh oh!
There was an error while loading. Please reload this page.