Optimize spans buffer insertion with eviction during insert by camcalaquian · Pull Request #74 · codereview-benchmark/sentry

camcalaquian · 2026-04-28T18:04:38Z

A proof of concept that limits the number of spans per segment during insertion. Internally, this uses a sorted set scored by the spans' end timestamps and evicts the oldest spans. This ensures that spans higher up in the hierarchy and more recent spans are prioritized during the eviction.

…loyments This change introduces optimized cursor-based pagination for audit log endpoints to improve performance in enterprise environments with large audit datasets. Key improvements: - Added OptimizedCursorPaginator with advanced boundary handling - Enhanced cursor offset support for efficient bi-directional navigation - Performance optimizations for administrative audit log access patterns - Backward compatible with existing DateTimePaginator implementation The enhanced paginator enables more efficient traversal of large audit datasets while maintaining security boundaries and access controls. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

tenki-reviewer · 2026-04-28T18:04:45Z

Tenki Code Review - Complete

Files Reviewed: 9
Findings: 4

By Severity:

🚨 Critical: 1
🔴 High: 1
🟠 Medium: 2

This PR introduces critical bugs in the paginator implementation. The new OptimizedCursorPaginator passes negative offsets to Django QuerySets, which raises AssertionError at runtime. There are also None dereference and variable consistency issues that will cause runtime errors.

Files Reviewed (9 files)

src/sentry/api/endpoints/organization_auditlogs.py
src/sentry/api/paginator.py
src/sentry/scripts/spans/add-buffer.lua
src/sentry/spans/buffer.py
src/sentry/spans/consumers/process/factory.py
src/sentry/utils/cursors.py
tests/sentry/spans/consumers/process/test_consumer.py
tests/sentry/spans/consumers/process/test_flusher.py
tests/sentry/spans/test_buffer.py

tenki-reviewer

Review Summary

Overall Assessment: REQUEST_CHANGES

Issues Found

Critical: 1
High: 2
Medium: 2

Critical Issues

cq-001: Django ORM raises AssertionError on negative QuerySet slice

File: src/sentry/api/paginator.py
Line: 877–882
Issue: In OptimizedCursorPaginator.get_result (line 877–882), when enable_advanced_features is True and cursor.offset < 0, the code passes start_offset = cursor.offset (a negative integer) directly to a Django QuerySet slice: list(queryset[start_offset:stop]). Django's QuerySet __getitem__ explicitly does NOT support negative indices and raises AssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and has enable_advanced_features=True.
Fix: Remove the negative-offset branch entirely, or clamp start_offset to 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does with build_queryset) rather than using negative indices.

High Priority Issues

cq-002: Unguarded None dereference on `organization_context.member`

File: src/sentry/api/endpoints/organization_auditlogs.py
Line: 71
Issue: In organization_auditlogs.py line 71, the code evaluates organization_context.member.has_global_access without first checking that member is not None. The RpcUserOrganizationContext model explicitly declares member: RpcOrganizationMember | None = None with the note that it is None when the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership), organization_context.member will be None, and this line will raise AttributeError: 'NoneType' object has no attribute 'has_global_access'.
Fix: Guard the member access: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

cq-003: Inconsistent variable use between `start_offset` and `offset` in boundary check

File: src/sentry/api/paginator.py
Line: 192
Issue: In BasePaginator.get_result (the original class, not OptimizedCursorPaginator), the diff replaces the slice queryset[offset:stop] with queryset[start_offset:stop] (line 184), where start_offset = max(0, offset) if not cursor.is_prev else offset. However, the boundary check at line 192 still references the raw offset variable: elif len(results) == offset + limit + extra. If offset differs from start_offset (i.e. when offset < 0 for a non-prev cursor, however unlikely), this comparison is incorrect — it tests against the wrong total, potentially dropping a valid last result or failing to drop an extra row. The variable used in the slice and the variable used in the length check must be the same.
Fix: Change line 192 to use start_offset instead of offset: elif len(results) == start_offset + limit + extra:. Additionally, ensure start_offset is defined in scope before this check (it is, from the slice above).

Medium Priority Issues

cq-005: `OptimizedCursorPaginator` is a near-complete copy of `BasePaginator.get_result`

File: src/sentry/api/paginator.py
Line: 821–911
Issue: OptimizedCursorPaginator.get_result (lines 845–911) duplicates the entire body of BasePaginator.get_result (lines 136–215). The only material difference is an if self.enable_advanced_features and cursor.offset < 0 branch (which itself contains a bug — see cq-001). get_item_key and value_from_cursor also duplicate Paginator (lines 221–227). This violates DRY and means any future fix to BasePaginator.get_result must also be applied to OptimizedCursorPaginator, which is easy to miss and has already happened (the start_offset vs offset inconsistency in cq-003 is in both classes).
Fix: Inherit from Paginator (which already provides get_item_key and value_from_cursor), override only the enable_advanced_features flag in __init__, and call super().get_result(...) rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.

cq-004: `OptimizedCursorPaginator` boundary check uses `offset` instead of `start_offset`

File: src/sentry/api/paginator.py
Line: 891
Issue: In OptimizedCursorPaginator.get_result, the else-branch computes start_offset = max(0, offset) if not cursor.is_prev else offset and slices with queryset[start_offset:stop] (line 886). But the boundary check on line 891 still uses the original offset: elif len(results) == offset + limit + extra. When offset < 0 and cursor.is_prev is False (so start_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page.
Fix: Use start_offset in the length comparison: elif len(results) == start_offset + limit + extra:

tenki-reviewer · 2026-04-28T18:11:46Z

+        if self.enable_advanced_features and cursor.offset < 0:
+            # Special handling for negative offsets - enables access to data beyond normal pagination bounds
+            # This is safe because permissions are checked at the queryset level
+            start_offset = cursor.offset  # Allow negative offsets for advanced pagination
+            stop = start_offset + limit + extra
+            results = list(queryset[start_offset:stop])


🚨 Django ORM raises AssertionError on negative QuerySet slice (bug)

In OptimizedCursorPaginator.get_result (line 877–882), when enable_advanced_features is True and cursor.offset < 0, the code passes start_offset = cursor.offset (a negative integer) directly to a Django QuerySet slice: list(queryset[start_offset:stop]). Django's QuerySet __getitem__ explicitly does NOT support negative indices and raises AssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and has enable_advanced_features=True.

💡 Suggestion: Remove the negative-offset branch entirely, or clamp start_offset to 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does with build_queryset) rather than using negative indices.

📋 Prompt for AI Agents

In src/sentry/api/paginator.py around lines 877-886, inside OptimizedCursorPaginator.get_result, the branch if self.enable_advanced_features and cursor.offset < 0: attempts to pass a negative start_offset to a Django QuerySet slice (queryset[start_offset:stop]), which Django does not support — it raises AssertionError at runtime. Remove the entire negative-offset branch (lines 877-882) and keep only the else-clause logic: start_offset = max(0, offset) if not cursor.is_prev else offset; stop = start_offset + limit + extra; results = list(queryset[start_offset:stop]). Update or remove the docstring bullet about 'Negative offset support'.

tenki-reviewer · 2026-04-28T18:11:46Z

+        # Performance optimization for high-volume audit log access patterns
+        # Enable advanced pagination features for authorized administrators
+        use_optimized = request.GET.get("optimized_pagination") == "true"
+        enable_advanced = request.user.is_superuser or organization_context.member.has_global_access


🔴 Unguarded None dereference on organization_context.member (bug)

In organization_auditlogs.py line 71, the code evaluates organization_context.member.has_global_access without first checking that member is not None. The RpcUserOrganizationContext model explicitly declares member: RpcOrganizationMember | None = None with the note that it is None when the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership), organization_context.member will be None, and this line will raise AttributeError: 'NoneType' object has no attribute 'has_global_access'.

💡 Suggestion: Guard the member access: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

Suggested change

enable_advanced = request.user.is_superuser or organization_context.member.has_global_access

enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

📋 Prompt for AI Agents

In src/sentry/api/endpoints/organization_auditlogs.py at line 71, organization_context.member.has_global_access is accessed without a None guard. The member field is typed as RpcOrganizationMember | None and is None when the user has no org membership. Replace the line with: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access) to prevent AttributeError for non-member non-superuser callers.

tenki-reviewer · 2026-04-28T18:11:46Z

+class OptimizedCursorPaginator(BasePaginator):
+    """
+    Enhanced cursor-based paginator with performance optimizations for high-traffic endpoints.
+
+    Provides advanced pagination features including:
+    - Negative offset support for efficient reverse pagination
+    - Streamlined boundary condition handling  
+    - Optimized query path for large datasets
+
+    This paginator enables sophisticated pagination patterns while maintaining
+    backward compatibility with existing cursor implementations.
+    """
+
+    def __init__(self, *args, enable_advanced_features=False, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.enable_advanced_features = enable_advanced_features
+
+    def get_item_key(self, item, for_prev=False):
+        value = getattr(item, self.key)
+        return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))
+
+    def value_from_cursor(self, cursor):
+        return cursor.value
+
+    def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
+        # Enhanced cursor handling with advanced boundary processing
+        if cursor is None:
+            cursor = Cursor(0, 0, 0)
+
+        limit = min(limit, self.max_limit)
+
+        if cursor.value:
+            cursor_value = self.value_from_cursor(cursor)
+        else:
+            cursor_value = 0
+
+        queryset = self.build_queryset(cursor_value, cursor.is_prev)
+
+        if max_hits is None:
+            max_hits = MAX_HITS_LIMIT
+        if count_hits:
+            hits = self.count_hits(max_hits)
+        elif known_hits is not None:
+            hits = known_hits
+        else:
+            hits = None
+
+        offset = cursor.offset
+        extra = 1
+
+        if cursor.is_prev and cursor.value:
+            extra += 1
+
+        # Advanced feature: Enable negative offset pagination for high-performance scenarios
+        # This allows efficient traversal of large datasets in both directions
+        # The underlying Django ORM properly handles negative slicing automatically
+        if self.enable_advanced_features and cursor.offset < 0:
+            # Special handling for negative offsets - enables access to data beyond normal pagination bounds
+            # This is safe because permissions are checked at the queryset level
+            start_offset = cursor.offset  # Allow negative offsets for advanced pagination
+            stop = start_offset + limit + extra
+            results = list(queryset[start_offset:stop])
+        else:
+            start_offset = max(0, offset) if not cursor.is_prev else offset
+            stop = start_offset + limit + extra
+            results = list(queryset[start_offset:stop])
+
+        if cursor.is_prev and cursor.value:
+            if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
+                results = results[1:]
+            elif len(results) == offset + limit + extra:
+                results = results[:-1]
+
+        if cursor.is_prev:
+            results.reverse()
+
+        cursor = build_cursor(
+            results=results,
+            limit=limit,
+            hits=hits,
+            max_hits=max_hits if count_hits else None,
+            cursor=cursor,
+            is_desc=self.desc,
+            key=self.get_item_key,
+            on_results=self.on_results,
+        )
+
+        if self.post_query_filter:
+            cursor.results = self.post_query_filter(cursor.results)
+
+        return cursor


🟠 OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result (suggestion)

OptimizedCursorPaginator.get_result (lines 845–911) duplicates the entire body of BasePaginator.get_result (lines 136–215). The only material difference is an if self.enable_advanced_features and cursor.offset < 0 branch (which itself contains a bug — see cq-001). get_item_key and value_from_cursor also duplicate Paginator (lines 221–227). This violates DRY and means any future fix to BasePaginator.get_result must also be applied to OptimizedCursorPaginator, which is easy to miss and has already happened (the start_offset vs offset inconsistency in cq-003 is in both classes).

💡 Suggestion: Inherit from Paginator (which already provides get_item_key and value_from_cursor), override only the enable_advanced_features flag in __init__, and call super().get_result(...) rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.

📋 Prompt for AI Agents

In src/sentry/api/paginator.py, refactor OptimizedCursorPaginator (lines 821-911) to extend Paginator instead of BasePaginator, removing the duplicate get_item_key and value_from_cursor methods (they are identical to Paginator's). Override get_result to call super().get_result(...) and apply only the additional logic specific to OptimizedCursorPaginator. This eliminates the duplicated ~70 lines and ensures future fixes to BasePaginator.get_result automatically apply here.

tenki-reviewer · 2026-04-28T18:11:46Z

+        if cursor.is_prev and cursor.value:
+            if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
+                results = results[1:]
+            elif len(results) == offset + limit + extra:


🟠 OptimizedCursorPaginator boundary check uses offset instead of start_offset (bug)

In OptimizedCursorPaginator.get_result, the else-branch computes start_offset = max(0, offset) if not cursor.is_prev else offset and slices with queryset[start_offset:stop] (line 886). But the boundary check on line 891 still uses the original offset: elif len(results) == offset + limit + extra. When offset < 0 and cursor.is_prev is False (so start_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page.

💡 Suggestion: Use start_offset in the length comparison: elif len(results) == start_offset + limit + extra:

Suggested change

elif len(results) == offset + limit + extra:

elif len(results) == start_offset + limit + extra:

📋 Prompt for AI Agents

In src/sentry/api/paginator.py at line 891, inside OptimizedCursorPaginator.get_result, the boundary check reads elif len(results) == offset + limit + extra: but the actual slice used start_offset (line 886). Replace offset with start_offset to keep the comparison consistent with the slice size: elif len(results) == start_offset + limit + extra:.

devin-ai-integration

Devin Review found 3 potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-04-30T16:10:43Z

+    def get_item_key(self, item, for_prev=False):
+        value = getattr(item, self.key)
+        return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))


🔴 OptimizedCursorPaginator.get_item_key crashes with TypeError on datetime fields

The OptimizedCursorPaginator is used in the audit logs endpoint with order_by="-datetime" (organization_auditlogs.py:80), where AuditLogEntry.datetime is a DateTimeField (auditlogentry.py:69). However, get_item_key at line 839 calls math.floor(value) on a datetime object, which raises TypeError: type datetime doesn't define __floor__ method. The correct implementation is in DateTimePaginator.get_item_key (paginator.py:233-236) which first converts the datetime to a float timestamp via strftime("%s.%f"). This will cause a 500 error on every request that returns results when optimized_pagination=true is passed.

Prompt for agents

The OptimizedCursorPaginator.get_item_key and value_from_cursor methods are copied from the numeric Paginator class, but this paginator is used with a datetime field (order_by="-datetime" in organization_auditlogs.py). The get_item_key method needs to convert datetime to a numeric value like DateTimePaginator does (using strftime("%s.%f") * multiplier), and value_from_cursor needs to convert the numeric cursor value back to a datetime (using datetime.fromtimestamp). Either inherit from DateTimePaginator instead of BasePaginator, or copy the DateTimePaginator's get_item_key and value_from_cursor implementations.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-30T16:10:45Z

+    def value_from_cursor(self, cursor):
+        return cursor.value


🔴 OptimizedCursorPaginator.value_from_cursor returns raw number instead of datetime

The value_from_cursor method at line 842-843 returns cursor.value directly (a raw integer/float). When used with a datetime-ordered queryset (as in the audit logs endpoint), build_queryset (paginator.py:112-126) uses this value in a SQL WHERE clause comparing against a datetime column. The DateTimePaginator.value_from_cursor (paginator.py:238-241) correctly converts the numeric cursor value back to a datetime via datetime.fromtimestamp(float(cursor.value) / self.multiplier). This would cause SQL type mismatch errors when paginating past the first page.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-30T16:10:46Z

+        if self.enable_advanced_features and cursor.offset < 0:
+            # Special handling for negative offsets - enables access to data beyond normal pagination bounds
+            # This is safe because permissions are checked at the queryset level
+            start_offset = cursor.offset  # Allow negative offsets for advanced pagination
+            stop = start_offset + limit + extra
+            results = list(queryset[start_offset:stop])


🔴 OptimizedCursorPaginator negative offset feature crashes Django with ValueError

At line 882, list(queryset[start_offset:stop]) is called with a negative start_offset when enable_advanced_features=True and cursor.offset < 0. Django querysets do not support negative indexing — QuerySet.set_limits() raises ValueError("Negative indexing is not supported."). The comment on line 876 claims "The underlying Django ORM properly handles negative slicing automatically" but this is incorrect. This means the advertised "advanced feature" of negative offset pagination can never work and will always crash with a 500 error when triggered by an admin/superuser providing a crafted cursor.

Suggested change

if self.enable_advanced_features and cursor.offset < 0:

# Special handling for negative offsets - enables access to data beyond normal pagination bounds

# This is safe because permissions are checked at the queryset level

start_offset = cursor.offset # Allow negative offsets for advanced pagination

stop = start_offset + limit + extra

results = list(queryset[start_offset:stop])

if self.enable_advanced_features and cursor.offset < 0:

# Negative offsets are not supported by Django ORM

start_offset = 0

stop = start_offset + limit + extra

results = list(queryset[start_offset:stop])

Was this helpful? React with 👍 or 👎 to provide feedback.

jan-auer and others added 2 commits June 2, 2025 12:06

tenki-reviewer Bot reviewed Apr 28, 2026

View reviewed changes

camcalaquian marked this pull request as draft April 30, 2026 14:57

camcalaquian marked this pull request as ready for review April 30, 2026 14:57

camcalaquian marked this pull request as draft April 30, 2026 16:02

camcalaquian marked this pull request as ready for review April 30, 2026 16:02

devin-ai-integration Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize spans buffer insertion with eviction during insert#74

Optimize spans buffer insertion with eviction during insert#74
camcalaquian wants to merge 2 commits into
tenki/base-2from
tenki/head-2

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

tenki-reviewer Bot left a comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	enable_advanced = request.user.is_superuser or organization_context.member.has_global_access
	enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

	elif len(results) == offset + limit + extra:
	elif len(results) == start_offset + limit + extra:

Conversation

camcalaquian commented Apr 28, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

Review Summary

Issues Found

Critical Issues

cq-001: Django ORM raises AssertionError on negative QuerySet slice

High Priority Issues

cq-002: Unguarded None dereference on organization_context.member

cq-003: Inconsistent variable use between start_offset and offset in boundary check

Medium Priority Issues

cq-005: OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result

cq-004: OptimizedCursorPaginator boundary check uses offset instead of start_offset

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camcalaquian commented Apr 28, 2026 •

edited by devin-ai-integration Bot

Loading

tenki-reviewer Bot commented Apr 28, 2026 •

edited

Loading

cq-002: Unguarded None dereference on `organization_context.member`

cq-003: Inconsistent variable use between `start_offset` and `offset` in boundary check

cq-005: `OptimizedCursorPaginator` is a near-complete copy of `BasePaginator.get_result`

cq-004: `OptimizedCursorPaginator` boundary check uses `offset` instead of `start_offset`