Skip to content

specify how many segments were pruned by each server segment pruner#8884

Merged
walterddr merged 15 commits intoapache:masterfrom
gortiz:report-invalid-segments
Jul 5, 2022
Merged

specify how many segments were pruned by each server segment pruner#8884
walterddr merged 15 commits intoapache:masterfrom
gortiz:report-invalid-segments

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jun 13, 2022

This PR adds three new metrics that break down the older (and still supported) numSegmentsPrunedByServer.

The three metrics are:

  • numSegmentsPrunedInvalid: Which include both segments that have been pruned because they are actually invalid (aka having docs) or their physical schema doesn't contain all the required columns (quite common when a new column is added but segments were not reloaded)
  • numSegmentsPrunedByLimit: The ones pruned by SelectionQuerySegmentPruner, for example: select * from Table limit 10
  • numSegmentsPrunedByValue: The ones pruned by ColumnValueSegmentPruner, for example when segments can be pruned by using bloom filters or max/min of each column.

These metrics are shown when the controller is used. It may also be interesting to add a warning in the controller when numSegmentsPrunedInvalid is different than 0.

@codecov-commenter
Copy link

codecov-commenter commented Jun 14, 2022

Codecov Report

Merging #8884 (c376da9) into master (3ff4d13) will decrease coverage by 8.61%.
The diff coverage is 89.10%.

❗ Current head c376da9 differs from pull request most recent head 5c6ef86. Consider uploading reports for the commit 5c6ef86 to get more accurate results

@@             Coverage Diff              @@
##             master    #8884      +/-   ##
============================================
- Coverage     70.09%   61.48%   -8.62%     
- Complexity     4732     4769      +37     
============================================
  Files          1826     1815      -11     
  Lines         95980    95711     -269     
  Branches      14352    14318      -34     
============================================
- Hits          67281    58850    -8431     
- Misses        24057    32531    +8474     
+ Partials       4642     4330     -312     
Flag Coverage Δ
integration1 26.43% <87.12%> (+0.01%) ⬆️
integration2 ?
unittests1 66.87% <89.10%> (+0.05%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...e/pinot/core/query/config/SegmentPrunerConfig.java 100.00% <ø> (ø)
.../pinot/core/query/pruner/SegmentPrunerService.java 83.33% <69.23%> (-16.67%) ⬇️
...che/pinot/core/query/scheduler/QueryScheduler.java 84.02% <83.33%> (-1.25%) ⬇️
...a/org/apache/pinot/common/metrics/ServerMeter.java 100.00% <100.00%> (ø)
...t/common/response/broker/BrokerResponseNative.java 96.27% <100.00%> (+0.30%) ⬆️
.../java/org/apache/pinot/common/utils/DataTable.java 95.45% <100.00%> (+0.33%) ⬆️
...core/query/executor/ServerQueryExecutorV1Impl.java 82.72% <100.00%> (-0.22%) ⬇️
...pinot/core/query/pruner/SegmentPrunerProvider.java 84.61% <100.00%> (ø)
...not/core/query/pruner/SegmentPrunerStatistics.java 100.00% <100.00%> (ø)
...che/pinot/core/query/reduce/BaseReduceService.java 94.47% <100.00%> (+0.39%) ⬆️
... and 348 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ff4d13...5c6ef86. Read the comment docs.

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. thanks for adding this support and making it available for users to diagnose invalid segments.
is it possible to add a test for SegmentPruner for this behavior by mocking an invalid segment?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in general. My only concern is around the empty segment, which IMO shouldn't be count as invalid. Empty segment is normal for certain streaming types e.g. Kinesis. If we emit warning for empty segment, all the query will get the warning, which is undesired

@gortiz
Copy link
Contributor Author

gortiz commented Jun 22, 2022

My only concern is around the empty segment, which IMO shouldn't be count as invalid.

Changed

is it possible to add a test for SegmentPruner for this behavior by mocking an invalid segment?

I've added some tests inspired by the other SegmentPruner tests.

SYSTEM_ACTIVITIES_CPU_TIME_NS("systemActivitiesCpuTimeNs", MetadataValueType.LONG),
RESPONSE_SER_CPU_TIME_NS("responseSerializationCpuTimeNs", MetadataValueType.LONG),
NUM_SEGMENTS_PRUNED_BY_SERVER("numSegmentsPrunedByServer", MetadataValueType.INT),
NUM_SEGMENTS_PRUNED_INVALID("numSegmentsPrunedByInvalid", MetadataValueType.INT),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MAJOR) Please append the new keys to the end. See the javadoc for this enum.
We should associate an id to each key instead of relying on the ordinal of the enum. That is out of the scope of this PR, and we need a newer version data table so that the change is backward compatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

int invalid = 0;
for (IndexSegment segment : segments) {
if (!isInvalidSegment(segment, query)) {
boolean isInvalid = isInvalidSegment(segment, query);
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We can skip the valid check if a segment is empty

Suggested change
boolean isInvalid = isInvalidSegment(segment, query);
if (!isEmptySegment(segment)) {
if (isInvalidSegment(segment, query)) {
invalid++
} else {
segments.set(selected++, segment);
}
}

…gmentPrunerService.java

Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Copy link
Contributor Author

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved new literals to the end of the enum list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants