feat(upsampling) - Support upsampled error count with performance optimizations by camcalaquian · Pull Request #75 · codereview-benchmark/sentry

camcalaquian · 2026-04-28T18:04:41Z

…etsentry#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

tenki-reviewer · 2026-04-28T18:05:15Z

Tenki Code Review - Complete

Files Reviewed: 8
Findings: 3

By Severity:

🔴 High: 1
🟡 Low: 2

This PR implements error event upsampling to improve query accuracy, but has one critical caching bug that breaks the feature in multi-worker deployments and two code quality issues. The non-deterministic hash-based cache key prevents cache hits across worker processes, reducing the optimization to a no-op.

Files Reviewed (8 files)

pyproject.toml
sentry-repo
src/sentry/api/endpoints/organization_events_stats.py
src/sentry/api/helpers/error_upsampling.py
src/sentry/search/events/datasets/discover.py
src/sentry/testutils/factories.py
tests/sentry/api/helpers/test_error_upsampling.py
tests/snuba/api/endpoints/test_organization_events_stats.py

tenki-reviewer

Overview

The PR adds error upsampling support to Sentry's event statistics API, allowing weighted aggregations for queries on allowlisted projects. The changes include:

New helper module (error_upsampling.py) with eligibility checks and query transformations
Updated endpoint to conditionally apply upsampling transformations based on project allowlist
New SnQL aggregate function for weighted counting (upsampled_count)
Test factory improvements to handle client_sample_rate extraction

Critical Issues

🔴 Non-deterministic Cache Key (cq-001)

The cache key uses Python's hash(tuple(...)) which is randomized per process due to PYTHONHASHSEED. In a multi-worker deployment (production), worker A stores cache entries under hash value X, but worker B computes hash value Y for the same project IDs and never finds the entry. The entire 60-second cache optimization is a no-op in production. Additionally, invalidate_upsampling_cache() fails silently because it can't delete entries keyed with other processes' hash values.

Code Quality Concerns

🟡 Dead Intermediate Variable (cq-003)

The variable should_upsample is assigned and immediately copied to upsampling_enabled with no other use. Pure dead code that adds noise.

🟡 Redundant Transformation Calls (cq-004)

The same transform_query_columns_for_error_upsampling(query_columns) is called three times in separate branches with identical inputs, creating maintenance risk and misleading comments suggesting they're different transformations.

Recommendation

Request Changes: The caching bug must be fixed before merge. The non-deterministic hash key means the performance optimization doesn't work in production. Code quality issues should also be cleaned up before merging.

tenki-reviewer · 2026-04-28T18:12:33Z

+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"


🔴 Non-deterministic hash() in shared cache key breaks cross-worker caching (bug)

In src/sentry/api/helpers/error_upsampling.py at lines 27 and 73, the cache key is built using Python's hash(tuple(sorted(...))). Python randomizes the hash seed per process (PYTHONHASHSEED), so the same list of project IDs will produce a different integer hash value in each worker process. Because the cache is shared (Redis), entries written by process A are keyed with hash X, but process B computes hash Y for the same input and will never find them. The caching optimization is effectively a no-op in any multi-worker deployment. The invalidate_upsampling_cache function (line 73) has the same defect: it deletes a key under hash Y, but the entry was stored under hash X by another process, so invalidation silently fails.

💡 Suggestion: Replace hash(tuple(sorted(...))) with a deterministic string representation, e.g. ','.join(str(pid) for pid in sorted(snuba_params.project_ids)). Apply the same fix to invalidate_upsampling_cache at line 73.

Suggested change

cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"

cache_key = f"error_upsampling_eligible:{organization.id}:{','.join(str(pid) for pid in sorted(snuba_params.project_ids))}"

📋 Prompt for AI Agents

In src/sentry/api/helpers/error_upsampling.py, the cache keys at lines 27 and 73 use hash(tuple(sorted(...))) which is non-deterministic across Python processes due to PYTHONHASHSEED randomization. Replace both occurrences with a deterministic string representation. At line 27: change f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" to f"error_upsampling_eligible:{organization.id}:{','.join(str(pid) for pid in sorted(snuba_params.project_ids))}". At line 73: change f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}" to f"error_upsampling_eligible:{organization_id}:{','.join(str(pid) for pid in sorted(project_ids))}".

tenki-reviewer · 2026-04-28T18:12:33Z

+            should_upsample = is_errors_query_for_error_upsampled_projects(
+                snuba_params, organization, dataset, request
+            )
+
+            # Store the upsampling decision to apply later during query building
+            # This separation allows for better query optimization and caching
+            upsampling_enabled = should_upsample


🟡 Dead intermediate variable should_upsample in _get_event_stats (maintainability)

In src/sentry/api/endpoints/organization_events_stats.py at lines 220-226, the result of is_errors_query_for_error_upsampled_projects(...) is first assigned to should_upsample, then immediately copied verbatim to upsampling_enabled. The variable should_upsample is never read again. This is a pure no-op that adds noise and suggests the comment "Store the upsampling decision to apply later" was written for a different architecture that didn't survive.

💡 Suggestion: Remove should_upsample and assign directly: upsampling_enabled = is_errors_query_for_error_upsampled_projects(snuba_params, organization, dataset, request).

📋 Prompt for AI Agents

In src/sentry/api/endpoints/organization_events_stats.py around lines 218-226 in the _get_event_stats closure, remove the intermediate should_upsample variable. Replace the two-step should_upsample = is_errors_query_for_error_upsampled_projects(...); upsampling_enabled = should_upsample with a single assignment: upsampling_enabled = is_errors_query_for_error_upsampled_projects(snuba_params, organization, dataset, request). Also remove the comments that were explaining this unnecessary separation.

tenki-reviewer · 2026-04-28T18:12:33Z

+            # Store the upsampling decision to apply later during query building
+            # This separation allows for better query optimization and caching
+            upsampling_enabled = should_upsample
+            final_columns = query_columns


🟡 transform_query_columns_for_error_upsampling called redundantly in three branches (maintainability)

In src/sentry/api/endpoints/organization_events_stats.py the transformation transform_query_columns_for_error_upsampling(query_columns) is called separately in three execution branches (lines 233, 277, 296). All three calls pass the same unchanged query_columns input. The result could be computed once after the upsampling_enabled check and reused in all branches. As written, a future change to the transformation arguments or logic would need to be made in three places, and the three calls have misleading copy-paste comments suggesting they serve different purposes ('just before query execution', 'just before RPC query execution', 'just before standard query execution').

💡 Suggestion: Compute final_columns = transform_query_columns_for_error_upsampling(query_columns) if upsampling_enabled else query_columns once after the upsampling_enabled assignment, and remove the three repeated conditional blocks inside the branches.

📋 Prompt for AI Agents

In src/sentry/api/endpoints/organization_events_stats.py in the _get_event_stats closure, after computing upsampling_enabled and final_columns = query_columns (around lines 226-227), add: if upsampling_enabled: final_columns = transform_query_columns_for_error_upsampling(query_columns). Then remove the three identical if upsampling_enabled: final_columns = transform_query_columns_for_error_upsampling(query_columns) blocks inside the if top_events > 0: branch (line 232-233), the if use_rpc: branch (lines 276-277), and the final standard query branch (lines 295-296). This centralizes the transformation logic and eliminates the risk of the three copies diverging.

devin-ai-integration

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-30T16:07:15Z

+            should_upsample = is_errors_query_for_error_upsampled_projects(
+                snuba_params, organization, dataset, request
+            )


🔴 Upsampling check uses outer closure dataset instead of the scoped_dataset parameter

In _get_event_stats, the upsampling eligibility check on line 221 passes the outer closure variable dataset to is_errors_query_for_error_upsampled_projects, but the function's actual dataset is scoped_dataset (the first parameter). When _get_event_stats is called from the widget-splitting logic in get_event_stats_factory (e.g., src/sentry/api/endpoints/organization_events_stats.py:372, :386, :427, :444), a different dataset like discover can be passed as scoped_dataset while the outer dataset remains metrics_enhanced_performance. This causes _should_apply_sample_weight_transform to evaluate against the wrong dataset module, so upsampling will never be applied in widget-splitting paths even when querying error events on the discover dataset.

Suggested change

should_upsample = is_errors_query_for_error_upsampled_projects(

snuba_params, organization, dataset, request

)

should_upsample = is_errors_query_for_error_upsampled_projects(

snuba_params, organization, scoped_dataset, request

)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-30T16:07:17Z

+    query = request.GET.get("query", "").lower()
+
+    if "event.type:error" in query:
+        return True
+
+    return False


🔴 _is_error_focused_query matches negated queries like !event.type:error

The substring check "event.type:error" in query at line 137 also matches negated queries such as !event.type:error or NOT event.type:error. A user query like !event.type:error means "everything except errors" (i.e., transactions), but this function would incorrectly return True, causing count() to be transformed to upsampled_count() for non-error events — producing incorrect results by summing sample_weight on events that may not have meaningful sample weights.

Suggested change

query = request.GET.get("query", "").lower()

if "event.type:error" in query:

return True

return False

query = request.GET.get("query", "").lower()

if "!event.type:error" in query or "not event.type:error" in query:

return False

if "event.type:error" in query:

return True

return False

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-30T16:07:18Z

+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"


🔴 Non-deterministic hash() in cache key makes cross-process caching and invalidation broken

Python's hash() on tuples is randomized per-process (via PYTHONHASHSEED since Python 3.3). The cache key at line 27 uses hash(tuple(sorted(...))), so different worker processes produce different cache keys for the same org/project combination. Since sentry.utils.cache.cache is Django's default cache (typically Redis, shared across processes), this means: (1) cache entries from one process are invisible to other processes, and (2) invalidate_upsampling_cache (src/sentry/api/helpers/error_upsampling.py:73) will compute a different hash and fail to delete entries created by other processes, making explicit invalidation ineffective.

Prompt for agents

The cache key on line 27 and the matching key in invalidate_upsampling_cache on line 73 both use Python's hash() on a tuple, which is randomized per process. Replace hash(tuple(sorted(snuba_params.project_ids))) with a deterministic representation, for example a hashlib-based hash or simply use the sorted project IDs directly as a string in the cache key: e.g., ','.join(str(pid) for pid in sorted(snuba_params.project_ids)). Both locations (line 27 in is_errors_query_for_error_upsampled_projects and line 73 in invalidate_upsampling_cache) must use the same deterministic key generation.

Was this helpful? React with 👍 or 👎 to provide feedback.

yuvmen and others added 2 commits July 25, 2025 09:48

tenki-reviewer Bot reviewed Apr 28, 2026

View reviewed changes

camcalaquian marked this pull request as draft April 30, 2026 14:57

camcalaquian marked this pull request as ready for review April 30, 2026 14:57

camcalaquian marked this pull request as draft April 30, 2026 16:02

camcalaquian marked this pull request as ready for review April 30, 2026 16:02

devin-ai-integration Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upsampling) - Support upsampled error count with performance optimizations#75

feat(upsampling) - Support upsampled error count with performance optimizations#75
camcalaquian wants to merge 2 commits into
tenki/base-3from
tenki/head-3

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

tenki-reviewer Bot left a comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
	cache_key = f"error_upsampling_eligible:{organization.id}:{','.join(str(pid) for pid in sorted(snuba_params.project_ids))}"

Conversation

camcalaquian commented Apr 28, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

Overview

Critical Issues

🔴 Non-deterministic Cache Key (cq-001)

Code Quality Concerns

🟡 Dead Intermediate Variable (cq-003)

🟡 Redundant Transformation Calls (cq-004)

Recommendation

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading