Skip to content

Optimize TraceQueryService.sortSpans from O(N^2) to O(N)#13831

Merged
wu-sheng merged 1 commit intoapache:masterfrom
currenjin:perf/trace-span-sorting-oh-n-squared
Apr 20, 2026
Merged

Optimize TraceQueryService.sortSpans from O(N^2) to O(N)#13831
wu-sheng merged 1 commit intoapache:masterfrom
currenjin:perf/trace-span-sorting-oh-n-squared

Conversation

@currenjin
Copy link
Copy Markdown
Contributor

Summary

TraceQueryService.sortSpans() builds the parent-child span tree for trace detail rendering. The existing implementation runs in O(N^2):

  • findRoot() iterates all N spans and, for each, linearly scans the full list to check whether its parent exists.
  • findChildren() is called per root and recursively scans the full list at each node to find direct children.

For traces with deep call chains or many parallel spans this becomes noticeable latency on every trace detail query (a UI hot path).

This PR pre-indexes the span list once per trace and turns the algorithm into O(N):

  • Set<String> segmentSpanIds — membership check used by findRoot to decide if a span has a parent in the current list.
  • Map<String, List<Span>> childrenByParentSegmentSpanId — direct lookup for findChildren.

The three helpers (sortSpans, findRoot, findChildren) are now package-private static because they have always been pure functions — no instance state. This also makes them directly unit-testable.

Behavior

Preserved from the previous implementation:

  • Root spans sorted by startTime.
  • Children traversed in input order (DFS).
  • Orphaned spans (parent not in the list due to lost/sampled segments) treated as roots.
  • Cross-segment ref parents resolved via the shared segmentSpanId namespace (segmentId + 'S' + spanId).

Tests

Adds TraceQueryServiceTest (8 cases):

  • empty input
  • single root marked as root
  • linear chain re-ordered from shuffled input
  • multiple roots sorted by startTime
  • multiple children of same parent preserve input order
  • cross-segment ref (parent in different segment, still in list)
  • orphaned span treated as root
  • sibling subtrees traversed depth-first

All pass locally (./mvnw -pl oap-server/server-core -am test -Dtest=TraceQueryServiceTest).

Which issue does this PR relate to?

N/A — found by code review while looking at query hot paths.

findRoot and findChildren used nested linear scans to match
segmentParentSpanId against segmentSpanId, producing O(N^2)
behavior that is unnecessary for trace detail rendering.

Pre-index the spans once per trace:
- Set<String> of segmentSpanIds for the has-parent check
- Map<String, List<Span>> from segmentParentSpanId to children
  for traversal

sortSpans now runs in O(N). findRoot/findChildren/sortSpans
are package-private static helpers so they can be unit tested
in isolation.

Add TraceQueryServiceTest covering linear chains, sibling
subtrees, multi-root ordering by startTime, cross-segment
parent references, and orphaned spans.
@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Apr 19, 2026
@wu-sheng wu-sheng added this to the 10.5.0 milestone Apr 19, 2026
Copy link
Copy Markdown
Member

@wankai123 wankai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wu-sheng wu-sheng merged commit 0dabc70 into apache:master Apr 20, 2026
420 of 423 checks passed
@wu-sheng
Copy link
Copy Markdown
Member

Thanks! This is good, just asking, did you face any query performance issues? Usually, the query is paging, so, it should not have too many traces/spans. This optimization is solid, but may not be felt by human.

@currenjin
Copy link
Copy Markdown
Contributor Author

Thanks! I didn’t see a concrete production issue yet.
I made this change as a preventive optimization while reviewing the trace detail query path, since the previous implementation was O(N^2) and could be simplified to O(N). But I agree the user-visible impact may be limited in typical paged queries, so this is probably more of a low-risk cleanup than a response to a known bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend OAP backend related. feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants