Skip to content

feat(workflow_engine): Add in hook for producing occurrences from the stateful detector#82

Open
camcalaquian wants to merge 1 commit into
tenki/base-10from
tenki/head-10
Open

feat(workflow_engine): Add in hook for producing occurrences from the stateful detector#82
camcalaquian wants to merge 1 commit into
tenki/base-10from
tenki/head-10

Conversation

@camcalaquian
Copy link
Copy Markdown

@camcalaquian camcalaquian commented Apr 28, 2026

… stateful detector (getsentry#80168)

This adds a hook that can be implemented to produce an occurrence
specific to the detector that is subclassing the StatefulDetector.

Also change the signature of evaluate to return a dict keyed by groupkey
instead of a list. This helps avoid the chance of duplicate results for
the same group key.

<!-- Describe your PR here. -->
@camcalaquian camcalaquian marked this pull request as draft April 30, 2026 16:25
@camcalaquian camcalaquian marked this pull request as ready for review April 30, 2026 16:25
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@camcalaquian
Copy link
Copy Markdown
Author

@tenki-reviewer

@tenki-reviewer
Copy link
Copy Markdown

tenki-reviewer Bot commented May 5, 2026

Tenki Code Review - Complete

Files Reviewed: 4
Findings: 2

By Severity:

  • 🟠 Medium: 1
  • 🟡 Low: 1

PR #82 refactors the detector evaluation API to return a dict-based result set instead of a list, eliminating manual duplicate-detection logic. The production code changes are well-structured and maintainable. Two findings: one test correctness bug (hardcoded occurrence IDs mask value/fingerprint assertions) and one minor caching opportunity (group_type property).

Files Reviewed (4 files)
src/sentry/incidents/grouptype.py
src/sentry/workflow_engine/models/detector.py
src/sentry/workflow_engine/processors/detector.py
tests/sentry/workflow_engine/processors/test_detector.py

Copy link
Copy Markdown

@tenki-reviewer tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR refactors the detector handler evaluation API from returning list[DetectorEvaluationResult] to dict[DetectorGroupKey, DetectorEvaluationResult]. The key changes include:

  • API Contract Change: DetectorHandler.evaluate() now returns a dict-keyed by group key, eliminating the need for caller-side duplicate detection
  • New Abstract Method: build_occurrence_and_event_data() replaces a TODO comment, allowing handlers to generate occurrence data
  • Simplified Logic: Removed manual duplicate-key detection loop from process_detectors()
  • Test Refactoring: Consolidated StatefulDetectorHandlerTestMixin and TestProcessDetectors into a common BaseDetectorHandlerTest base class

Findings

🔴 Test Coverage Bug (Medium)

The new test helper build_mock_occurrence_and_event() creates all IssueOccurrence objects with a hardcoded id, causing all occurrences to compare equal regardless of actual differences in fingerprints or priority. This masks test assertion failures:

  • test_state_results_multi_group constructs occurrence_2 with value=6 when the packet has group_2: 10 — the assertion passes anyway
  • Similar issues in test_results_on_change and test_dedupe

Impact: Tests cannot catch bugs where build_occurrence_and_event_data() produces wrong fingerprints or evidence per group key.

🟡 Performance: Uncached Property (Low)

The new Detector.group_type property performs a dict lookup on every access with no memoization. Since detector types are immutable during a request, using @functools.cached_property would prevent redundant lookups when both group_type and detector_handler are accessed.

Code Quality Assessment

Strengths:

  • Clean API contract: dict-keyed results prevent duplicates by design
  • Good test coverage breadth (multi-group, dedupe, state transitions, edge cases)
  • Clear separation of concerns with new build_occurrence_and_event_data hook
  • Safe refactoring of test base classes (consolidation improves maintainability)

⚠️ Areas for Attention:

  • Test assertion flaw on occurrence values needs immediate fix before merge
  • Minor optimization opportunity (caching) for production code


def get_group_key_values(self, data_packet: DataPacket[dict]) -> dict[str, int]:
return data_packet.packet.get("group_vals", {})
def build_mock_occurrence_and_event(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Test occurrences share hardcoded ID, masking incorrect value/fingerprint assertions (bug)

build_mock_occurrence_and_event always produces IssueOccurrence with the same hardcoded id ('eb4b0acffadb4d098d48cb14165ab578') and event_id. Since IssueOccurrence.__eq__ only compares id, every occurrence created by this helper compares equal to every other. In test_state_results_multi_group (line 190-191), occurrence_2 is constructed with value=6 instead of the actual packet value 10. The assertion on line 200-204 passes despite the wrong value because equality is id-only. Similarly test_results_on_change (line 489) and test_dedupe (line 509) call build_mock_occurrence_and_event with value=6 when the real packet values are 100 and 8 respectively. The tests will never fail even if build_occurrence_and_event_data produces occurrences with wrong fingerprints or wrong evidence.

💡 Suggestion: Use unique IDs per occurrence (e.g. uuid.uuid4().hex) inside build_mock_occurrence_and_event, and pass the actual packet value when constructing expected occurrences — e.g. value=10 for group_2. This forces occurrence_2 to be constructed identically to the one the handler produces, making the test meaningful. Alternatively, assert individual fields (fingerprint, initial_issue_priority) rather than relying on whole-object equality.

Suggested change
def build_mock_occurrence_and_event(
occurrence = IssueOccurrence(
id=hashlib.md5(f'{group_key}:{value}'.encode()).hexdigest(),
project_id=handler.detector.project_id,
event_id=hashlib.md5(f'{group_key}:{value}:event'.encode()).hexdigest(),
📋 Prompt for AI Agents

In tests/sentry/workflow_engine/processors/test_detector.py, the build_mock_occurrence_and_event helper (around lines 244-275) uses a hardcoded id='eb4b0acffadb4d098d48cb14165ab578' and event_id='43878ab4419f4ab181f6379ac376d5aa' for every IssueOccurrence it creates. Because IssueOccurrence.__eq__ only compares id, all occurrences compare equal to each other, so tests like test_state_results_multi_group pass even when the wrong value or fingerprint is used. Fix by: (1) replacing the hardcoded id and event_id with deterministic but unique values (e.g. hashlib.md5(f'{group_key}:{value}'.encode()).hexdigest()), (2) updating all call sites (test_state_results_multi_group line 191, test_results_on_change line 490, test_dedupe line 510) to pass the correct actual value that matches the data packet (e.g. value=10 for group_2 in test_state_results_multi_group). This ensures the assertions actually verify the correct occurrence is produced per group key.

Comment on lines +58 to +60
@property
def group_type(self) -> builtins.type[GroupType] | None:
return grouptype.registry.get_by_slug(self.type)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 New group_type property performs uncached registry lookup on every access (bug)

The newly introduced Detector.group_type property in src/sentry/workflow_engine/models/detector.py (line 58-60) calls grouptype.registry.get_by_slug(self.type) on every property access without any caching. If anything calls this property multiple times per request (e.g. user code calling both detector.group_type and detector.detector_handler which itself calls self.group_type), the registry is looked up twice. Since the detector type doesn't change within a request lifetime, this could reasonably be memoized.

💡 Suggestion: Use functools.cached_property instead of @property for group_type so the registry lookup is performed at most once per Detector instance. This is safe because a Detector's type field is immutable during a request's lifetime.

Suggested change
@property
def group_type(self) -> builtins.type[GroupType] | None:
return grouptype.registry.get_by_slug(self.type)
@functools.cached_property
def group_type(self) -> builtins.type[GroupType] | None:
return grouptype.registry.get_by_slug(self.type)
📋 Prompt for AI Agents

In src/sentry/workflow_engine/models/detector.py around lines 58-60, the group_type property calls grouptype.registry.get_by_slug(self.type) on every access with no caching. Change @property to @functools.cached_property (adding import functools at the top of the file) so that the registry lookup is performed at most once per Detector instance. This avoids redundant lookups when both detector.group_type and detector.detector_handler (which already calls self.group_type) are accessed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants