Skip to content

Optimize spans buffer insertion with eviction during insert#74

Open
camcalaquian wants to merge 2 commits into
tenki/base-2from
tenki/head-2
Open

Optimize spans buffer insertion with eviction during insert#74
camcalaquian wants to merge 2 commits into
tenki/base-2from
tenki/head-2

Conversation

@camcalaquian
Copy link
Copy Markdown

@camcalaquian camcalaquian commented Apr 28, 2026

jan-auer and others added 2 commits June 2, 2025 12:06
A proof of concept that limits the number of spans per segment during
insertion. Internally, this uses a sorted set scored by the spans' end
timestamps and evicts the oldest spans. This ensures that spans higher
up in the hierarchy and more recent spans are prioritized during the
eviction.
…loyments

This change introduces optimized cursor-based pagination for audit log endpoints
to improve performance in enterprise environments with large audit datasets.

Key improvements:
- Added OptimizedCursorPaginator with advanced boundary handling
- Enhanced cursor offset support for efficient bi-directional navigation
- Performance optimizations for administrative audit log access patterns
- Backward compatible with existing DateTimePaginator implementation

The enhanced paginator enables more efficient traversal of large audit datasets
while maintaining security boundaries and access controls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@tenki-reviewer
Copy link
Copy Markdown

tenki-reviewer Bot commented Apr 28, 2026

Tenki Code Review - Complete

Files Reviewed: 9
Findings: 4

By Severity:

  • 🚨 Critical: 1
  • 🔴 High: 1
  • 🟠 Medium: 2

This PR introduces critical bugs in the paginator implementation. The new OptimizedCursorPaginator passes negative offsets to Django QuerySets, which raises AssertionError at runtime. There are also None dereference and variable consistency issues that will cause runtime errors.

Files Reviewed (9 files)
src/sentry/api/endpoints/organization_auditlogs.py
src/sentry/api/paginator.py
src/sentry/scripts/spans/add-buffer.lua
src/sentry/spans/buffer.py
src/sentry/spans/consumers/process/factory.py
src/sentry/utils/cursors.py
tests/sentry/spans/consumers/process/test_consumer.py
tests/sentry/spans/consumers/process/test_flusher.py
tests/sentry/spans/test_buffer.py

Copy link
Copy Markdown

@tenki-reviewer tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Overall Assessment: REQUEST_CHANGES

Issues Found

  • Critical: 1
  • High: 2
  • Medium: 2

Critical Issues

cq-001: Django ORM raises AssertionError on negative QuerySet slice

  • File: src/sentry/api/paginator.py
  • Line: 877–882
  • Issue: In OptimizedCursorPaginator.get_result (line 877–882), when enable_advanced_features is True and cursor.offset < 0, the code passes start_offset = cursor.offset (a negative integer) directly to a Django QuerySet slice: list(queryset[start_offset:stop]). Django's QuerySet __getitem__ explicitly does NOT support negative indices and raises AssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and has enable_advanced_features=True.
  • Fix: Remove the negative-offset branch entirely, or clamp start_offset to 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does with build_queryset) rather than using negative indices.

High Priority Issues

cq-002: Unguarded None dereference on organization_context.member

  • File: src/sentry/api/endpoints/organization_auditlogs.py
  • Line: 71
  • Issue: In organization_auditlogs.py line 71, the code evaluates organization_context.member.has_global_access without first checking that member is not None. The RpcUserOrganizationContext model explicitly declares member: RpcOrganizationMember | None = None with the note that it is None when the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership), organization_context.member will be None, and this line will raise AttributeError: 'NoneType' object has no attribute 'has_global_access'.
  • Fix: Guard the member access: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

cq-003: Inconsistent variable use between start_offset and offset in boundary check

  • File: src/sentry/api/paginator.py
  • Line: 192
  • Issue: In BasePaginator.get_result (the original class, not OptimizedCursorPaginator), the diff replaces the slice queryset[offset:stop] with queryset[start_offset:stop] (line 184), where start_offset = max(0, offset) if not cursor.is_prev else offset. However, the boundary check at line 192 still references the raw offset variable: elif len(results) == offset + limit + extra. If offset differs from start_offset (i.e. when offset < 0 for a non-prev cursor, however unlikely), this comparison is incorrect — it tests against the wrong total, potentially dropping a valid last result or failing to drop an extra row. The variable used in the slice and the variable used in the length check must be the same.
  • Fix: Change line 192 to use start_offset instead of offset: elif len(results) == start_offset + limit + extra:. Additionally, ensure start_offset is defined in scope before this check (it is, from the slice above).

Medium Priority Issues

cq-005: OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result

  • File: src/sentry/api/paginator.py
  • Line: 821–911
  • Issue: OptimizedCursorPaginator.get_result (lines 845–911) duplicates the entire body of BasePaginator.get_result (lines 136–215). The only material difference is an if self.enable_advanced_features and cursor.offset < 0 branch (which itself contains a bug — see cq-001). get_item_key and value_from_cursor also duplicate Paginator (lines 221–227). This violates DRY and means any future fix to BasePaginator.get_result must also be applied to OptimizedCursorPaginator, which is easy to miss and has already happened (the start_offset vs offset inconsistency in cq-003 is in both classes).
  • Fix: Inherit from Paginator (which already provides get_item_key and value_from_cursor), override only the enable_advanced_features flag in __init__, and call super().get_result(...) rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.

cq-004: OptimizedCursorPaginator boundary check uses offset instead of start_offset

  • File: src/sentry/api/paginator.py
  • Line: 891
  • Issue: In OptimizedCursorPaginator.get_result, the else-branch computes start_offset = max(0, offset) if not cursor.is_prev else offset and slices with queryset[start_offset:stop] (line 886). But the boundary check on line 891 still uses the original offset: elif len(results) == offset + limit + extra. When offset < 0 and cursor.is_prev is False (so start_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page.
  • Fix: Use start_offset in the length comparison: elif len(results) == start_offset + limit + extra:

Comment on lines +877 to +882
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Django ORM raises AssertionError on negative QuerySet slice (bug)

In OptimizedCursorPaginator.get_result (line 877–882), when enable_advanced_features is True and cursor.offset < 0, the code passes start_offset = cursor.offset (a negative integer) directly to a Django QuerySet slice: list(queryset[start_offset:stop]). Django's QuerySet __getitem__ explicitly does NOT support negative indices and raises AssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and has enable_advanced_features=True.

💡 Suggestion: Remove the negative-offset branch entirely, or clamp start_offset to 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does with build_queryset) rather than using negative indices.

📋 Prompt for AI Agents

In src/sentry/api/paginator.py around lines 877-886, inside OptimizedCursorPaginator.get_result, the branch if self.enable_advanced_features and cursor.offset < 0: attempts to pass a negative start_offset to a Django QuerySet slice (queryset[start_offset:stop]), which Django does not support — it raises AssertionError at runtime. Remove the entire negative-offset branch (lines 877-882) and keep only the else-clause logic: start_offset = max(0, offset) if not cursor.is_prev else offset; stop = start_offset + limit + extra; results = list(queryset[start_offset:stop]). Update or remove the docstring bullet about 'Negative offset support'.

# Performance optimization for high-volume audit log access patterns
# Enable advanced pagination features for authorized administrators
use_optimized = request.GET.get("optimized_pagination") == "true"
enable_advanced = request.user.is_superuser or organization_context.member.has_global_access
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unguarded None dereference on organization_context.member (bug)

In organization_auditlogs.py line 71, the code evaluates organization_context.member.has_global_access without first checking that member is not None. The RpcUserOrganizationContext model explicitly declares member: RpcOrganizationMember | None = None with the note that it is None when the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership), organization_context.member will be None, and this line will raise AttributeError: 'NoneType' object has no attribute 'has_global_access'.

💡 Suggestion: Guard the member access: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)

Suggested change
enable_advanced = request.user.is_superuser or organization_context.member.has_global_access
enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)
📋 Prompt for AI Agents

In src/sentry/api/endpoints/organization_auditlogs.py at line 71, organization_context.member.has_global_access is accessed without a None guard. The member field is typed as RpcOrganizationMember | None and is None when the user has no org membership. Replace the line with: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access) to prevent AttributeError for non-member non-superuser callers.

Comment on lines +821 to +911
class OptimizedCursorPaginator(BasePaginator):
"""
Enhanced cursor-based paginator with performance optimizations for high-traffic endpoints.

Provides advanced pagination features including:
- Negative offset support for efficient reverse pagination
- Streamlined boundary condition handling
- Optimized query path for large datasets

This paginator enables sophisticated pagination patterns while maintaining
backward compatibility with existing cursor implementations.
"""

def __init__(self, *args, enable_advanced_features=False, **kwargs):
super().__init__(*args, **kwargs)
self.enable_advanced_features = enable_advanced_features

def get_item_key(self, item, for_prev=False):
value = getattr(item, self.key)
return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))

def value_from_cursor(self, cursor):
return cursor.value

def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
# Enhanced cursor handling with advanced boundary processing
if cursor is None:
cursor = Cursor(0, 0, 0)

limit = min(limit, self.max_limit)

if cursor.value:
cursor_value = self.value_from_cursor(cursor)
else:
cursor_value = 0

queryset = self.build_queryset(cursor_value, cursor.is_prev)

if max_hits is None:
max_hits = MAX_HITS_LIMIT
if count_hits:
hits = self.count_hits(max_hits)
elif known_hits is not None:
hits = known_hits
else:
hits = None

offset = cursor.offset
extra = 1

if cursor.is_prev and cursor.value:
extra += 1

# Advanced feature: Enable negative offset pagination for high-performance scenarios
# This allows efficient traversal of large datasets in both directions
# The underlying Django ORM properly handles negative slicing automatically
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
else:
start_offset = max(0, offset) if not cursor.is_prev else offset
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])

if cursor.is_prev and cursor.value:
if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
results = results[1:]
elif len(results) == offset + limit + extra:
results = results[:-1]

if cursor.is_prev:
results.reverse()

cursor = build_cursor(
results=results,
limit=limit,
hits=hits,
max_hits=max_hits if count_hits else None,
cursor=cursor,
is_desc=self.desc,
key=self.get_item_key,
on_results=self.on_results,
)

if self.post_query_filter:
cursor.results = self.post_query_filter(cursor.results)

return cursor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result (suggestion)

OptimizedCursorPaginator.get_result (lines 845–911) duplicates the entire body of BasePaginator.get_result (lines 136–215). The only material difference is an if self.enable_advanced_features and cursor.offset < 0 branch (which itself contains a bug — see cq-001). get_item_key and value_from_cursor also duplicate Paginator (lines 221–227). This violates DRY and means any future fix to BasePaginator.get_result must also be applied to OptimizedCursorPaginator, which is easy to miss and has already happened (the start_offset vs offset inconsistency in cq-003 is in both classes).

💡 Suggestion: Inherit from Paginator (which already provides get_item_key and value_from_cursor), override only the enable_advanced_features flag in __init__, and call super().get_result(...) rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.

📋 Prompt for AI Agents

In src/sentry/api/paginator.py, refactor OptimizedCursorPaginator (lines 821-911) to extend Paginator instead of BasePaginator, removing the duplicate get_item_key and value_from_cursor methods (they are identical to Paginator's). Override get_result to call super().get_result(...) and apply only the additional logic specific to OptimizedCursorPaginator. This eliminates the duplicated ~70 lines and ensures future fixes to BasePaginator.get_result automatically apply here.

if cursor.is_prev and cursor.value:
if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
results = results[1:]
elif len(results) == offset + limit + extra:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 OptimizedCursorPaginator boundary check uses offset instead of start_offset (bug)

In OptimizedCursorPaginator.get_result, the else-branch computes start_offset = max(0, offset) if not cursor.is_prev else offset and slices with queryset[start_offset:stop] (line 886). But the boundary check on line 891 still uses the original offset: elif len(results) == offset + limit + extra. When offset < 0 and cursor.is_prev is False (so start_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page.

💡 Suggestion: Use start_offset in the length comparison: elif len(results) == start_offset + limit + extra:

Suggested change
elif len(results) == offset + limit + extra:
elif len(results) == start_offset + limit + extra:
📋 Prompt for AI Agents

In src/sentry/api/paginator.py at line 891, inside OptimizedCursorPaginator.get_result, the boundary check reads elif len(results) == offset + limit + extra: but the actual slice used start_offset (line 886). Replace offset with start_offset to keep the comparison consistent with the slice size: elif len(results) == start_offset + limit + extra:.

@camcalaquian camcalaquian marked this pull request as draft April 30, 2026 14:57
@camcalaquian camcalaquian marked this pull request as ready for review April 30, 2026 14:57
@camcalaquian camcalaquian marked this pull request as draft April 30, 2026 16:02
@camcalaquian camcalaquian marked this pull request as ready for review April 30, 2026 16:02
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +838 to +840
def get_item_key(self, item, for_prev=False):
value = getattr(item, self.key)
return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 OptimizedCursorPaginator.get_item_key crashes with TypeError on datetime fields

The OptimizedCursorPaginator is used in the audit logs endpoint with order_by="-datetime" (organization_auditlogs.py:80), where AuditLogEntry.datetime is a DateTimeField (auditlogentry.py:69). However, get_item_key at line 839 calls math.floor(value) on a datetime object, which raises TypeError: type datetime doesn't define __floor__ method. The correct implementation is in DateTimePaginator.get_item_key (paginator.py:233-236) which first converts the datetime to a float timestamp via strftime("%s.%f"). This will cause a 500 error on every request that returns results when optimized_pagination=true is passed.

Prompt for agents
The OptimizedCursorPaginator.get_item_key and value_from_cursor methods are copied from the numeric Paginator class, but this paginator is used with a datetime field (order_by="-datetime" in organization_auditlogs.py). The get_item_key method needs to convert datetime to a numeric value like DateTimePaginator does (using strftime("%s.%f") * multiplier), and value_from_cursor needs to convert the numeric cursor value back to a datetime (using datetime.fromtimestamp). Either inherit from DateTimePaginator instead of BasePaginator, or copy the DateTimePaginator's get_item_key and value_from_cursor implementations.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +842 to +843
def value_from_cursor(self, cursor):
return cursor.value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 OptimizedCursorPaginator.value_from_cursor returns raw number instead of datetime

The value_from_cursor method at line 842-843 returns cursor.value directly (a raw integer/float). When used with a datetime-ordered queryset (as in the audit logs endpoint), build_queryset (paginator.py:112-126) uses this value in a SQL WHERE clause comparing against a datetime column. The DateTimePaginator.value_from_cursor (paginator.py:238-241) correctly converts the numeric cursor value back to a datetime via datetime.fromtimestamp(float(cursor.value) / self.multiplier). This would cause SQL type mismatch errors when paginating past the first page.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +877 to +882
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 OptimizedCursorPaginator negative offset feature crashes Django with ValueError

At line 882, list(queryset[start_offset:stop]) is called with a negative start_offset when enable_advanced_features=True and cursor.offset < 0. Django querysets do not support negative indexing — QuerySet.set_limits() raises ValueError("Negative indexing is not supported."). The comment on line 876 claims "The underlying Django ORM properly handles negative slicing automatically" but this is incorrect. This means the advertised "advanced feature" of negative offset pagination can never work and will always crash with a 500 error when triggered by an admin/superuser providing a crafted cursor.

Suggested change
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
if self.enable_advanced_features and cursor.offset < 0:
# Negative offsets are not supported by Django ORM
start_offset = 0
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants