Optimize spans buffer insertion with eviction during insert#74
Optimize spans buffer insertion with eviction during insert#74camcalaquian wants to merge 2 commits into
Conversation
A proof of concept that limits the number of spans per segment during insertion. Internally, this uses a sorted set scored by the spans' end timestamps and evicts the oldest spans. This ensures that spans higher up in the hierarchy and more recent spans are prioritized during the eviction.
…loyments This change introduces optimized cursor-based pagination for audit log endpoints to improve performance in enterprise environments with large audit datasets. Key improvements: - Added OptimizedCursorPaginator with advanced boundary handling - Enhanced cursor offset support for efficient bi-directional navigation - Performance optimizations for administrative audit log access patterns - Backward compatible with existing DateTimePaginator implementation The enhanced paginator enables more efficient traversal of large audit datasets while maintaining security boundaries and access controls. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Tenki Code Review - Complete Files Reviewed: 9 By Severity:
This PR introduces critical bugs in the paginator implementation. The new OptimizedCursorPaginator passes negative offsets to Django QuerySets, which raises AssertionError at runtime. There are also None dereference and variable consistency issues that will cause runtime errors. Files Reviewed (9 files) |
There was a problem hiding this comment.
Review Summary
Overall Assessment: REQUEST_CHANGES
Issues Found
- Critical: 1
- High: 2
- Medium: 2
Critical Issues
cq-001: Django ORM raises AssertionError on negative QuerySet slice
- File:
src/sentry/api/paginator.py - Line: 877–882
- Issue: In
OptimizedCursorPaginator.get_result(line 877–882), whenenable_advanced_featuresis True andcursor.offset < 0, the code passesstart_offset = cursor.offset(a negative integer) directly to a Django QuerySet slice:list(queryset[start_offset:stop]). Django's QuerySet__getitem__explicitly does NOT support negative indices and raisesAssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and hasenable_advanced_features=True. - Fix: Remove the negative-offset branch entirely, or clamp
start_offsetto 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does withbuild_queryset) rather than using negative indices.
High Priority Issues
cq-002: Unguarded None dereference on organization_context.member
- File:
src/sentry/api/endpoints/organization_auditlogs.py - Line: 71
- Issue: In
organization_auditlogs.pyline 71, the code evaluatesorganization_context.member.has_global_accesswithout first checking thatmemberis notNone. TheRpcUserOrganizationContextmodel explicitly declaresmember: RpcOrganizationMember | None = Nonewith the note that it isNonewhen the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership),organization_context.memberwill beNone, and this line will raiseAttributeError: 'NoneType' object has no attribute 'has_global_access'. - Fix: Guard the member access:
enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)
cq-003: Inconsistent variable use between start_offset and offset in boundary check
- File:
src/sentry/api/paginator.py - Line: 192
- Issue: In
BasePaginator.get_result(the original class, notOptimizedCursorPaginator), the diff replaces the slicequeryset[offset:stop]withqueryset[start_offset:stop](line 184), wherestart_offset = max(0, offset) if not cursor.is_prev else offset. However, the boundary check at line 192 still references the rawoffsetvariable:elif len(results) == offset + limit + extra. Ifoffsetdiffers fromstart_offset(i.e. whenoffset < 0for a non-prev cursor, however unlikely), this comparison is incorrect — it tests against the wrong total, potentially dropping a valid last result or failing to drop an extra row. The variable used in the slice and the variable used in the length check must be the same. - Fix: Change line 192 to use
start_offsetinstead ofoffset:elif len(results) == start_offset + limit + extra:. Additionally, ensurestart_offsetis defined in scope before this check (it is, from the slice above).
Medium Priority Issues
cq-005: OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result
- File:
src/sentry/api/paginator.py - Line: 821–911
- Issue:
OptimizedCursorPaginator.get_result(lines 845–911) duplicates the entire body ofBasePaginator.get_result(lines 136–215). The only material difference is anif self.enable_advanced_features and cursor.offset < 0branch (which itself contains a bug — see cq-001).get_item_keyandvalue_from_cursoralso duplicatePaginator(lines 221–227). This violates DRY and means any future fix toBasePaginator.get_resultmust also be applied toOptimizedCursorPaginator, which is easy to miss and has already happened (thestart_offsetvsoffsetinconsistency in cq-003 is in both classes). - Fix: Inherit from
Paginator(which already providesget_item_keyandvalue_from_cursor), override only theenable_advanced_featuresflag in__init__, and callsuper().get_result(...)rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.
cq-004: OptimizedCursorPaginator boundary check uses offset instead of start_offset
- File:
src/sentry/api/paginator.py - Line: 891
- Issue: In
OptimizedCursorPaginator.get_result, the else-branch computesstart_offset = max(0, offset) if not cursor.is_prev else offsetand slices withqueryset[start_offset:stop](line 886). But the boundary check on line 891 still uses the originaloffset:elif len(results) == offset + limit + extra. Whenoffset < 0andcursor.is_previs False (sostart_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page. - Fix: Use
start_offsetin the length comparison:elif len(results) == start_offset + limit + extra:
| if self.enable_advanced_features and cursor.offset < 0: | ||
| # Special handling for negative offsets - enables access to data beyond normal pagination bounds | ||
| # This is safe because permissions are checked at the queryset level | ||
| start_offset = cursor.offset # Allow negative offsets for advanced pagination | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) |
There was a problem hiding this comment.
🚨 Django ORM raises AssertionError on negative QuerySet slice (bug)
In OptimizedCursorPaginator.get_result (line 877–882), when enable_advanced_features is True and cursor.offset < 0, the code passes start_offset = cursor.offset (a negative integer) directly to a Django QuerySet slice: list(queryset[start_offset:stop]). Django's QuerySet __getitem__ explicitly does NOT support negative indices and raises AssertionError: Negative indexing is not supported. The accompanying comment — "The underlying Django ORM properly handles negative slicing automatically" — is factually incorrect. This path will crash with an unhandled AssertionError for any request that supplies a negative cursor offset and has enable_advanced_features=True.
💡 Suggestion: Remove the negative-offset branch entirely, or clamp start_offset to 0 the same way the else-branch does. Django QuerySets do not support negative slicing. If reverse pagination is needed, the correct approach is to flip the queryset ordering (as BasePaginator already does with build_queryset) rather than using negative indices.
📋 Prompt for AI Agents
In src/sentry/api/paginator.py around lines 877-886, inside OptimizedCursorPaginator.get_result, the branch if self.enable_advanced_features and cursor.offset < 0: attempts to pass a negative start_offset to a Django QuerySet slice (queryset[start_offset:stop]), which Django does not support — it raises AssertionError at runtime. Remove the entire negative-offset branch (lines 877-882) and keep only the else-clause logic: start_offset = max(0, offset) if not cursor.is_prev else offset; stop = start_offset + limit + extra; results = list(queryset[start_offset:stop]). Update or remove the docstring bullet about 'Negative offset support'.
| # Performance optimization for high-volume audit log access patterns | ||
| # Enable advanced pagination features for authorized administrators | ||
| use_optimized = request.GET.get("optimized_pagination") == "true" | ||
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access |
There was a problem hiding this comment.
🔴 Unguarded None dereference on organization_context.member (bug)
In organization_auditlogs.py line 71, the code evaluates organization_context.member.has_global_access without first checking that member is not None. The RpcUserOrganizationContext model explicitly declares member: RpcOrganizationMember | None = None with the note that it is None when the user has no membership in the organization. For non-superuser requests where the user is not an org member (e.g., a Sentry employee with a superuser session that has expired, or an API token holder without membership), organization_context.member will be None, and this line will raise AttributeError: 'NoneType' object has no attribute 'has_global_access'.
💡 Suggestion: Guard the member access: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access)
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access | |
| enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access) |
📋 Prompt for AI Agents
In src/sentry/api/endpoints/organization_auditlogs.py at line 71, organization_context.member.has_global_access is accessed without a None guard. The member field is typed as RpcOrganizationMember | None and is None when the user has no org membership. Replace the line with: enable_advanced = request.user.is_superuser or (organization_context.member is not None and organization_context.member.has_global_access) to prevent AttributeError for non-member non-superuser callers.
| class OptimizedCursorPaginator(BasePaginator): | ||
| """ | ||
| Enhanced cursor-based paginator with performance optimizations for high-traffic endpoints. | ||
|
|
||
| Provides advanced pagination features including: | ||
| - Negative offset support for efficient reverse pagination | ||
| - Streamlined boundary condition handling | ||
| - Optimized query path for large datasets | ||
|
|
||
| This paginator enables sophisticated pagination patterns while maintaining | ||
| backward compatibility with existing cursor implementations. | ||
| """ | ||
|
|
||
| def __init__(self, *args, enable_advanced_features=False, **kwargs): | ||
| super().__init__(*args, **kwargs) | ||
| self.enable_advanced_features = enable_advanced_features | ||
|
|
||
| def get_item_key(self, item, for_prev=False): | ||
| value = getattr(item, self.key) | ||
| return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value)) | ||
|
|
||
| def value_from_cursor(self, cursor): | ||
| return cursor.value | ||
|
|
||
| def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None): | ||
| # Enhanced cursor handling with advanced boundary processing | ||
| if cursor is None: | ||
| cursor = Cursor(0, 0, 0) | ||
|
|
||
| limit = min(limit, self.max_limit) | ||
|
|
||
| if cursor.value: | ||
| cursor_value = self.value_from_cursor(cursor) | ||
| else: | ||
| cursor_value = 0 | ||
|
|
||
| queryset = self.build_queryset(cursor_value, cursor.is_prev) | ||
|
|
||
| if max_hits is None: | ||
| max_hits = MAX_HITS_LIMIT | ||
| if count_hits: | ||
| hits = self.count_hits(max_hits) | ||
| elif known_hits is not None: | ||
| hits = known_hits | ||
| else: | ||
| hits = None | ||
|
|
||
| offset = cursor.offset | ||
| extra = 1 | ||
|
|
||
| if cursor.is_prev and cursor.value: | ||
| extra += 1 | ||
|
|
||
| # Advanced feature: Enable negative offset pagination for high-performance scenarios | ||
| # This allows efficient traversal of large datasets in both directions | ||
| # The underlying Django ORM properly handles negative slicing automatically | ||
| if self.enable_advanced_features and cursor.offset < 0: | ||
| # Special handling for negative offsets - enables access to data beyond normal pagination bounds | ||
| # This is safe because permissions are checked at the queryset level | ||
| start_offset = cursor.offset # Allow negative offsets for advanced pagination | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
| else: | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
|
|
||
| if cursor.is_prev and cursor.value: | ||
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | ||
| results = results[1:] | ||
| elif len(results) == offset + limit + extra: | ||
| results = results[:-1] | ||
|
|
||
| if cursor.is_prev: | ||
| results.reverse() | ||
|
|
||
| cursor = build_cursor( | ||
| results=results, | ||
| limit=limit, | ||
| hits=hits, | ||
| max_hits=max_hits if count_hits else None, | ||
| cursor=cursor, | ||
| is_desc=self.desc, | ||
| key=self.get_item_key, | ||
| on_results=self.on_results, | ||
| ) | ||
|
|
||
| if self.post_query_filter: | ||
| cursor.results = self.post_query_filter(cursor.results) | ||
|
|
||
| return cursor |
There was a problem hiding this comment.
🟠 OptimizedCursorPaginator is a near-complete copy of BasePaginator.get_result (suggestion)
OptimizedCursorPaginator.get_result (lines 845–911) duplicates the entire body of BasePaginator.get_result (lines 136–215). The only material difference is an if self.enable_advanced_features and cursor.offset < 0 branch (which itself contains a bug — see cq-001). get_item_key and value_from_cursor also duplicate Paginator (lines 221–227). This violates DRY and means any future fix to BasePaginator.get_result must also be applied to OptimizedCursorPaginator, which is easy to miss and has already happened (the start_offset vs offset inconsistency in cq-003 is in both classes).
💡 Suggestion: Inherit from Paginator (which already provides get_item_key and value_from_cursor), override only the enable_advanced_features flag in __init__, and call super().get_result(...) rather than re-implementing the entire method. If special pre/post-processing is needed, wrap the super call.
📋 Prompt for AI Agents
In src/sentry/api/paginator.py, refactor OptimizedCursorPaginator (lines 821-911) to extend Paginator instead of BasePaginator, removing the duplicate get_item_key and value_from_cursor methods (they are identical to Paginator's). Override get_result to call super().get_result(...) and apply only the additional logic specific to OptimizedCursorPaginator. This eliminates the duplicated ~70 lines and ensures future fixes to BasePaginator.get_result automatically apply here.
| if cursor.is_prev and cursor.value: | ||
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | ||
| results = results[1:] | ||
| elif len(results) == offset + limit + extra: |
There was a problem hiding this comment.
🟠 OptimizedCursorPaginator boundary check uses offset instead of start_offset (bug)
In OptimizedCursorPaginator.get_result, the else-branch computes start_offset = max(0, offset) if not cursor.is_prev else offset and slices with queryset[start_offset:stop] (line 886). But the boundary check on line 891 still uses the original offset: elif len(results) == offset + limit + extra. When offset < 0 and cursor.is_prev is False (so start_offset = 0 != offset), this comparison computes the wrong expected total and may incorrectly decide whether to drop the extra row, leading to one too many or one too few results being returned on that page.
💡 Suggestion: Use start_offset in the length comparison: elif len(results) == start_offset + limit + extra:
| elif len(results) == offset + limit + extra: | |
| elif len(results) == start_offset + limit + extra: |
📋 Prompt for AI Agents
In src/sentry/api/paginator.py at line 891, inside OptimizedCursorPaginator.get_result, the boundary check reads elif len(results) == offset + limit + extra: but the actual slice used start_offset (line 886). Replace offset with start_offset to keep the comparison consistent with the slice size: elif len(results) == start_offset + limit + extra:.
| def get_item_key(self, item, for_prev=False): | ||
| value = getattr(item, self.key) | ||
| return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value)) |
There was a problem hiding this comment.
🔴 OptimizedCursorPaginator.get_item_key crashes with TypeError on datetime fields
The OptimizedCursorPaginator is used in the audit logs endpoint with order_by="-datetime" (organization_auditlogs.py:80), where AuditLogEntry.datetime is a DateTimeField (auditlogentry.py:69). However, get_item_key at line 839 calls math.floor(value) on a datetime object, which raises TypeError: type datetime doesn't define __floor__ method. The correct implementation is in DateTimePaginator.get_item_key (paginator.py:233-236) which first converts the datetime to a float timestamp via strftime("%s.%f"). This will cause a 500 error on every request that returns results when optimized_pagination=true is passed.
Prompt for agents
The OptimizedCursorPaginator.get_item_key and value_from_cursor methods are copied from the numeric Paginator class, but this paginator is used with a datetime field (order_by="-datetime" in organization_auditlogs.py). The get_item_key method needs to convert datetime to a numeric value like DateTimePaginator does (using strftime("%s.%f") * multiplier), and value_from_cursor needs to convert the numeric cursor value back to a datetime (using datetime.fromtimestamp). Either inherit from DateTimePaginator instead of BasePaginator, or copy the DateTimePaginator's get_item_key and value_from_cursor implementations.
Was this helpful? React with 👍 or 👎 to provide feedback.
| def value_from_cursor(self, cursor): | ||
| return cursor.value |
There was a problem hiding this comment.
🔴 OptimizedCursorPaginator.value_from_cursor returns raw number instead of datetime
The value_from_cursor method at line 842-843 returns cursor.value directly (a raw integer/float). When used with a datetime-ordered queryset (as in the audit logs endpoint), build_queryset (paginator.py:112-126) uses this value in a SQL WHERE clause comparing against a datetime column. The DateTimePaginator.value_from_cursor (paginator.py:238-241) correctly converts the numeric cursor value back to a datetime via datetime.fromtimestamp(float(cursor.value) / self.multiplier). This would cause SQL type mismatch errors when paginating past the first page.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if self.enable_advanced_features and cursor.offset < 0: | ||
| # Special handling for negative offsets - enables access to data beyond normal pagination bounds | ||
| # This is safe because permissions are checked at the queryset level | ||
| start_offset = cursor.offset # Allow negative offsets for advanced pagination | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) |
There was a problem hiding this comment.
🔴 OptimizedCursorPaginator negative offset feature crashes Django with ValueError
At line 882, list(queryset[start_offset:stop]) is called with a negative start_offset when enable_advanced_features=True and cursor.offset < 0. Django querysets do not support negative indexing — QuerySet.set_limits() raises ValueError("Negative indexing is not supported."). The comment on line 876 claims "The underlying Django ORM properly handles negative slicing automatically" but this is incorrect. This means the advertised "advanced feature" of negative offset pagination can never work and will always crash with a 500 error when triggered by an admin/superuser providing a crafted cursor.
| if self.enable_advanced_features and cursor.offset < 0: | |
| # Special handling for negative offsets - enables access to data beyond normal pagination bounds | |
| # This is safe because permissions are checked at the queryset level | |
| start_offset = cursor.offset # Allow negative offsets for advanced pagination | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| if self.enable_advanced_features and cursor.offset < 0: | |
| # Negative offsets are not supported by Django ORM | |
| start_offset = 0 | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) |
Was this helpful? React with 👍 or 👎 to provide feedback.
Uh oh!
There was an error while loading. Please reload this page.