ref(crons): Reorganize incident creation / issue occurrence logic by camcalaquian · Pull Request #80 · codereview-benchmark/sentry

camcalaquian · 2026-04-28T18:04:56Z

…tsentry#80528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of getsentryGH-80527

tenki-reviewer · 2026-04-28T18:05:11Z

Tenki Code Review - Complete

Files Reviewed: 4
Findings: 5

By Severity:

🔴 High: 3
🟠 Medium: 2

This PR refactors monitor incident logic with 5 significant code quality issues: 1 config dict bug that silences human-readable display strings, 2 data validation bugs (unbounded DB query, unchecked array access), 1 output rendering bug, and 1 N+1 query performance issue.

Files Reviewed (4 files)

src/sentry/monitors/logic/incident_occurrence.py
src/sentry/monitors/logic/incidents.py
src/sentry/monitors/logic/mark_failed.py
src/sentry/monitors/types.py

tenki-reviewer

Review Summary

This PR refactors monitor failure and incident detection logic into separate modules (incident_occurrence.py, incidents.py) and extracts a reusable SimpleCheckIn type into types.py. The refactoring improves code organization but introduces 5 significant issues across 3 severity/category levels.

High-Severity Bugs (3)

1. Config dict not mutated in response (cq-001)

File: incident_occurrence.py:168
Impact: get_monitor_environment_context() translates schedule_type to a human-readable display string (lines 160–162) but returns the original unmodified config instead of the modified copy. This silently discards the translation work.
Affects: Issue occurrence context data sent to Kafka; incident details will show raw enum values instead of display strings.

2. Unbounded DB query before Python trim (cq-002)

File: incidents.py:40–44
Impact: When failure_issue_threshold > 1, the ORM query has no .values()[:N] LIMIT. The Python slice on line 48 evaluates the full queryset first, fetching ALL check-ins for an environment into memory before trimming.
Affects: For active monitors with thousands of check-ins, this wastes memory and DB I/O.

3. Unguarded array access after DB query (cq-003)

File: incidents.py:59
Impact: After querying previous check-ins, starting_checkin = previous_checkins[0] is accessed without checking for an empty list. A race condition or data inconsistency could cause an IndexError.
Affects: Monitor incident tracking reliability; could crash the check-in processor.

Medium-Severity Bugs (2)

4. Garbled user-facing output (cq-004)

File: incident_occurrence.py:145–146
Impact: get_failure_reason() has no guard for the empty status_counts case. When all check-ins are unrecognized statuses (or the input is empty), the output becomes " check-ins detected".
Affects: User-visible incident failure reason text in Sentry UI.

5. N+1 DB queries in loop (cq-005)

File: incidents.py:93–100
Impact: Inside the checkin loop, create_incident_occurrence() calls invariant DB queries (get_last_successful_checkin(), get_environment()) on each iteration. For failure_issue_threshold=N, this multiplies DB round-trips by N.
Affects: Performance; for large thresholds, creates unnecessary database load.

Recommendation

Request changes — All 5 issues should be resolved before merging, with priority on the three high-severity bugs (config loss, unbounded query, IndexError).

tenki-reviewer · 2026-04-28T18:09:41Z

+        "id": str(monitor_environment.monitor.guid),
+        "slug": str(monitor_environment.monitor.slug),
+        "name": monitor_environment.monitor.name,
+        "config": monitor_environment.monitor.config,


🔴 get_monitor_environment_context uses original config instead of modified copy (bug)

In get_monitor_environment_context, lines 160-162 copy the monitor's config dict and translate the schedule_type field using get_schedule_type_display(). However, line 168 returns monitor_environment.monitor.config (the original dict) instead of the local config variable. This means the schedule_type translation is silently dropped — callers always receive the raw integer/enum value rather than the human-readable display string.

💡 Suggestion: Change line 168 to use the local config variable that was prepared on lines 160-162, not the original monitor_environment.monitor.config.

Suggested change

"config": monitor_environment.monitor.config,

"config": config,

📋 Prompt for AI Agents

In src/sentry/monitors/logic/incident_occurrence.py at line 168, the get_monitor_environment_context function builds a mutated config copy on lines 160-162 (translating schedule_type to a human-readable string) but then returns monitor_environment.monitor.config (the unmodified original) instead of the local config variable. Change line 168 from "config": monitor_environment.monitor.config, to "config": config, so that the translated schedule_type is actually included in the returned context.

tenki-reviewer · 2026-04-28T18:09:41Z

+                MonitorCheckIn.objects.filter(
+                    monitor_environment=monitor_env, date_added__lte=failed_checkin.date_added
+                )
+                .order_by("-date_added")
+                .values("id", "date_added", "status"),


🔴 Unbounded DB query: LIMIT not pushed to database before Python slice (bug)

In try_incident_threshold (incidents.py), the else branch for failure_issue_threshold > 1 assigns the full .values() queryset to previous_checkins on lines 37-45 without a .values_list()[:N] or .values()[:N] limit. The Python slice on line 48 (previous_checkins[:failure_issue_threshold]) triggers evaluation of the queryset and fetches ALL matching check-ins for the monitor environment before trimming. For environments with many check-ins this can pull thousands of rows when only failure_issue_threshold rows are needed. The comment on line 39 says "Using .values for performance reasons" but the actual DB LIMIT is missing.

💡 Suggestion: Apply the slice limit at the ORM query level by chaining [:failure_issue_threshold] onto the queryset before assigning, then remove the Python-level slice on line 48 (or keep it as a safety guard).

📋 Prompt for AI Agents

In src/sentry/monitors/logic/incidents.py at lines 40-44, the queryset MonitorCheckIn.objects.filter(...).order_by('-date_added').values('id','date_added','status') is assigned without a database LIMIT. The Python slice on line 48 (previous_checkins[:failure_issue_threshold]) only applies after all rows are fetched. Fix by appending [:failure_issue_threshold] to the ORM queryset chain (before the closing parenthesis of the cast call), converting it to a database LIMIT. Then wrap it in list(...) at assignment time and remove the redundant Python slice on line 48 (the list(reversed(...)) should operate directly on the already-limited list).

tenki-reviewer · 2026-04-28T18:09:41Z

+        monitor_env.status = MonitorStatus.ERROR
+        monitor_env.save(update_fields=("status",))
+
+        starting_checkin = previous_checkins[0]


🔴 Potential IndexError: previous_checkins[0] accessed without empty-list guard (bug)

In try_incident_threshold (incidents.py), line 59 unconditionally accesses previous_checkins[0] to retrieve the starting_checkin. In the failure_issue_threshold > 1 branch, if the DB query on lines 40-44 returns an empty queryset (e.g. due to a race where the check-in hasn't been committed yet), previous_checkins after line 48 would be an empty list, causing an unhandled IndexError. The failure threshold branch checks any([...OK...]) and returns False only when there is an OK checkin; it does not guard against the empty-list case.

💡 Suggestion: Add an explicit guard before line 59: if not previous_checkins: return False to safely handle the empty list case.

📋 Prompt for AI Agents

In src/sentry/monitors/logic/incidents.py, add an empty-list guard immediately before line 59 (starting_checkin = previous_checkins[0]). Insert if not previous_checkins: return False (with appropriate indentation inside the if monitor_env.status in [MonitorStatus.OK, MonitorStatus.ACTIVE]: block, after the any(...) check on line 52). This prevents an IndexError if the queryset returns no results due to a race condition or data inconsistency.

tenki-reviewer · 2026-04-28T18:09:41Z

+    if sum(status_counts.values()) == 1:
+        return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]


🟠 get_failure_reason produces garbled output when no recognized failure statuses are present (bug)

In get_failure_reason (incident_occurrence.py), the generator on lines 139-143 filters check-ins to only those whose status is in HUMAN_FAILURE_STATUS_MAP. If failed_checkins is empty or contains only statuses not in the map (e.g. CheckInStatus.IN_PROGRESS), status_counts will be empty. sum(status_counts.values()) == 0, which is not 1, so execution falls through to line 148. get_text_list([]) returns '', and the final _('%(problem_checkins)s check-ins detected') becomes ' check-ins detected' — a confusing, user-visible string.

💡 Suggestion: Add a guard for the empty/zero case before the sum == 1 check. Return a sensible fallback string such as _('A check-in failure was detected') when status_counts is empty.

📋 Prompt for AI Agents

In src/sentry/monitors/logic/incident_occurrence.py at lines 145-146, add an early-return guard for the case where status_counts is empty (i.e., no recognized failure statuses). Before the existing if sum(status_counts.values()) == 1: check, insert:

if not status_counts: return _("A check-in failure was detected")

This prevents the function from producing a garbled string like ' check-ins detected' when all check-ins have unrecognized statuses or when the input list is empty.

tenki-reviewer · 2026-04-28T18:09:41Z

+        checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins])
+        for checkin in checkins:
+            create_incident_occurrence(
+                previous_checkins,
+                checkin,
+                incident,
+                received=received,
+            )


🟠 N+1 DB queries: per-checkin calls inside occurrence creation loop (performance)

In try_incident_threshold (incidents.py, lines 93-100), a MonitorCheckIn.objects.filter(...) result set is iterated and create_incident_occurrence is called for each item. Inside create_incident_occurrence (incident_occurrence.py), monitor_env.get_last_successful_checkin() (line 46, a DB query) and monitor_env.get_environment() (lines 67, 90, DB lookups called twice per occurrence) are issued on each iteration. For a failure_issue_threshold of N, this results in N extra DB round-trips for get_last_successful_checkin and up to 2N for get_environment. These values are identical across all loop iterations since monitor_env doesn't change.

💡 Suggestion: Hoist the invariant queries (get_last_successful_checkin, get_environment) out of the loop and pass their results as arguments to create_incident_occurrence. Alternatively, cache the results before the loop. Also consider whether firing one occurrence per checkin (rather than a single aggregate occurrence) is intentional — if a single occurrence suffices, eliminate the loop entirely.

📋 Prompt for AI Agents

In src/sentry/monitors/logic/incidents.py at lines 93-100, the loop calls create_incident_occurrence once per checkin. Inside that function (incident_occurrence.py lines 46, 67, 90), monitor_env.get_last_successful_checkin() and monitor_env.get_environment() are called on every iteration — the same queries repeated N times for identical results. Fix by caching these results before the loop: call monitor_env.get_last_successful_checkin() once before the for loop, and fetch monitor_env.get_environment() once and store it. Pass the cached values into create_incident_occurrence as additional parameters (updating its signature accordingly), so that DB queries are not duplicated per checkin.

devin-ai-integration

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

devin-ai-integration · 2026-04-30T16:20:39Z

+        "id": str(monitor_environment.monitor.guid),
+        "slug": str(monitor_environment.monitor.slug),
+        "name": monitor_environment.monitor.name,
+        "config": monitor_environment.monitor.config,


🟡 get_monitor_environment_context returns original config instead of the modified copy

In get_monitor_environment_context, a local config copy is created and schedule_type is overwritten with a human-readable display value (line 160-162). However, the return dict on line 168 uses monitor_environment.monitor.config (the original, unmodified config) instead of the local config variable. This means the schedule_type transformation is silently discarded and the monitor context always contains the raw schedule_type value. This is a pre-existing bug that was carried over during this refactor.

Suggested change

"config": monitor_environment.monitor.config,

"config": config,

Was this helpful? React with 👍 or 👎 to provide feedback.

ref(crons): Reorganize incident creation / issue occurrence logic (ge…

c3b32a7

…tsentry#80528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of getsentryGH-80527

tenki-reviewer Bot reviewed Apr 28, 2026

View reviewed changes

camcalaquian marked this pull request as draft April 30, 2026 16:18

camcalaquian marked this pull request as ready for review April 30, 2026 16:19

devin-ai-integration Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(crons): Reorganize incident creation / issue occurrence logic#80

ref(crons): Reorganize incident creation / issue occurrence logic#80
camcalaquian wants to merge 1 commit into
tenki/base-8from
tenki/head-8

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

tenki-reviewer Bot left a comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"config": monitor_environment.monitor.config,
	"config": config,

		if sum(status_counts.values()) == 1:
		return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]

Conversation

camcalaquian commented Apr 28, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

Review Summary

High-Severity Bugs (3)

1. Config dict not mutated in response (cq-001)

2. Unbounded DB query before Python trim (cq-002)

3. Unguarded array access after DB query (cq-003)

Medium-Severity Bugs (2)

4. Garbled user-facing output (cq-004)

5. N+1 DB queries in loop (cq-005)

Recommendation

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading