Skip to content

Conversation

@clintropolis
Copy link
Member

@clintropolis clintropolis commented Nov 7, 2025

Description

changes:

  • added canLoadSegments() method to SegmentManager and SegmentCacheManager
  • added 'query/load/batch/time' metric for 'wall time' spent waiting on all segments to load for a query
  • added 'query/load/time/avg' metric for average per segment load time for a query
  • added 'query/load/time/max' metric for max per segment load time for a query
  • added 'query/load/wait/avg' metric for average per segment wait time to begin loading for a query
  • added 'query/load/wait/max' metric for max per segment wait time to begin loading for a query
  • added 'query/load/count' metric for number of segments loaded on demand for a query
  • added 'query/load/bytes/total' metric for amount of data loaded on demand for a query

changes:
* added `canLoadSegments()` method to `SegmentManager` and `SegmentCacheManager`
* added 'query/load/time' metric for 'wall time' spent waiting on all segments to load for a query
* added 'query/load/time/avg' metric for average per segment load time for a query
* added 'query/load/time/max' metric for max per segment load time for a query
* added 'query/load/wait/avg' metric for average per segment wait time to begin loading for a query
* added 'query/load/wait/max' metric for max per segment wait time to begin loading for a query
* added 'query/load/count' metric for number of segments loaded on demand for a query
* added 'query/load/bytes' metric for amount of data loaded on demand for a query
}
boolean canLoadSegmentsOnDemand();

boolean canLoadSegmentOnDemand(DataSegment segment);

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'segment' is never used.
@jtuglu1
Copy link
Contributor

jtuglu1 commented Nov 7, 2025

While we're at it, do you think we could add an explicit segment scan time metric? That is, the time it takes to fully load a segment from disk into memory (the Segment object)?

query/wait/time is useful, but IMO not granular enough as it takes into account other things like:

  • Other task runner creation/iteration overhead (Java/heap alloc overhead)
  • Processing pool queueing time (this can dominate the latency)

This would help a LOT with understanding various bottlenecks of the system (and help break that kitchen sink query/wait/time metric into more granular parts).

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the name of the metric query/load/time. It sounds very "clean" but it's actually the least clean metric in terms of future interpretability. Currently it means something like the amount of time the query was blocked waiting for segments to load. But in the future, we should be chunking and pipelining loading and execution. This will make the meaning of query/load/time increasingly murky.

How about calling it query/load/batch/time? That way, it's up to the execution model to define what a "batch" means and the interpretation of what it means to spend time "loading a batch".

The rest of the metrics generally LGTM.

@clintropolis
Copy link
Member Author

clintropolis commented Nov 13, 2025

While we're at it, do you think we could add an explicit segment scan time metric? That is, the time it takes to fully load a segment from disk into memory (the Segment object)?

We discussed this a bit offline, but for the sake of visibility for everyone else, query/segment/time is this basically, though it does not separate time spent reading from disk from time spent doing stuff since it sort of happens incrementally/interleaved as the query engine reads stuff and the mmap file is paged into memory from disk. I'm not sure the best way to collect that measurement, but it would sure be useful.

Also to clarify, query/wait/time is the time a segment spent waiting for a slot on the processing pool, before it gets to the point of doing any work or even reading anything from the segment, while query/segment/time starts where wait/time stops and ends when the internal sequence is fully consumed (so in some sense is dependent on how fast the caller consumes the sequence too). query/segmentAndCache/time includes time to check results cache + query/segment/time if there is no cache results.

@clintropolis clintropolis merged commit bbc62dc into apache:master Nov 13, 2025
103 of 104 checks passed
@clintropolis clintropolis deleted the vsf-query-metrics branch November 13, 2025 05:32
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants