specify how many segments were pruned by each server segment pruner#8884
specify how many segments were pruned by each server segment pruner#8884walterddr merged 15 commits intoapache:masterfrom
Conversation
Initially there is only one warning, which is shown when segments were pruned because they were invalid
Codecov Report
@@ Coverage Diff @@
## master #8884 +/- ##
============================================
- Coverage 70.09% 61.48% -8.62%
- Complexity 4732 4769 +37
============================================
Files 1826 1815 -11
Lines 95980 95711 -269
Branches 14352 14318 -34
============================================
- Hits 67281 58850 -8431
- Misses 24057 32531 +8474
+ Partials 4642 4330 -312
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
walterddr
left a comment
There was a problem hiding this comment.
looks good to me. thanks for adding this support and making it available for users to diagnose invalid segments.
is it possible to add a test for SegmentPruner for this behavior by mocking an invalid segment?
Jackie-Jiang
left a comment
There was a problem hiding this comment.
It looks good in general. My only concern is around the empty segment, which IMO shouldn't be count as invalid. Empty segment is normal for certain streaming types e.g. Kinesis. If we emit warning for empty segment, all the query will get the warning, which is undesired
Co-authored-by: Rong Rong <walterddr.walterddr@gmail.com>
Changed
I've added some tests inspired by the other SegmentPruner tests. |
| SYSTEM_ACTIVITIES_CPU_TIME_NS("systemActivitiesCpuTimeNs", MetadataValueType.LONG), | ||
| RESPONSE_SER_CPU_TIME_NS("responseSerializationCpuTimeNs", MetadataValueType.LONG), | ||
| NUM_SEGMENTS_PRUNED_BY_SERVER("numSegmentsPrunedByServer", MetadataValueType.INT), | ||
| NUM_SEGMENTS_PRUNED_INVALID("numSegmentsPrunedByInvalid", MetadataValueType.INT), |
There was a problem hiding this comment.
(MAJOR) Please append the new keys to the end. See the javadoc for this enum.
We should associate an id to each key instead of relying on the ordinal of the enum. That is out of the scope of this PR, and we need a newer version data table so that the change is backward compatible
| int invalid = 0; | ||
| for (IndexSegment segment : segments) { | ||
| if (!isInvalidSegment(segment, query)) { | ||
| boolean isInvalid = isInvalidSegment(segment, query); |
There was a problem hiding this comment.
(minor) We can skip the valid check if a segment is empty
| boolean isInvalid = isInvalidSegment(segment, query); | |
| if (!isEmptySegment(segment)) { | |
| if (isInvalidSegment(segment, query)) { | |
| invalid++ | |
| } else { | |
| segments.set(selected++, segment); | |
| } | |
| } |
…gmentPrunerService.java Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
This PR adds three new metrics that break down the older (and still supported)
numSegmentsPrunedByServer.The three metrics are:
numSegmentsPrunedInvalid: Which include both segments that have been pruned because they are actually invalid (aka having docs) or their physical schema doesn't contain all the required columns (quite common when a new column is added but segments were not reloaded)numSegmentsPrunedByLimit: The ones pruned by SelectionQuerySegmentPruner, for example:select * from Table limit 10numSegmentsPrunedByValue: The ones pruned by ColumnValueSegmentPruner, for example when segments can be pruned by using bloom filters or max/min of each column.These metrics are shown when the controller is used. It may also be interesting to add a warning in the controller when
numSegmentsPrunedInvalidis different than 0.