Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp by priyen · Pull Request #13207 · apache/pinot

priyen · 2024-05-22T23:44:59Z

This updates the minConsumingFreshnessTimestampMs query response metadata field to be backed by the REALTIME_INGESTION_DELAY_MS metric's timestamp (backed by the IngestionDelayTracker class).
The current implementation can cause false positives because low volume partitions, even on a high-volume table could make it seem like there is lag where this is not in reality. This makes it hard to reliably trust/use this metric to make any decisions

Previous behaviour:

uses the last ingestion timestamp & low volume partitions will cause this metric to suggest there is lag when there may not be

New behaviour:

uses the last ingestion timestamp, though partitions with no messages to consume will be reported as near-realtime (preventing low volume partitions from inflating the lag). If there is real lag, the metric will ofcourse spike when an incoming messages' last ingestion timestamp is indexed. This is how the IngestionDelayTracker works

Unchanged:

will report as 0 if no consuming segments are queried

This is prone to the issue described in #11448, which we should prioritize fixing but I do think the benefits still outweigh that con in terms of this PR

Testing:

updated tests
did a load test on internally in stripe to compare before/after incase there is performance diffs related to getPartitionIngestionTimeMs but turned out to be no diff
test failures look unrelated, I think random failures, likely pass on re-running

…mestamp of the ingestion delay metric

codecov-commenter · 2024-05-23T00:21:34Z

Codecov Report

Attention: Patch coverage is 44.44444% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 35.18%. Comparing base (59551e4) to head (8a9cdf0).
Report is 487 commits behind head on master.

Files	Patch %	Lines
...ata/manager/realtime/RealtimeTableDataManager.java	0.00%	3 Missing ⚠️
...core/query/executor/ServerQueryExecutorV1Impl.java	0.00%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #13207       +/-   ##
=============================================
- Coverage     61.75%   35.18%   -26.58%     
+ Complexity      207        6      -201     
=============================================
  Files          2436     2458       +22     
  Lines        133233   135136     +1903     
  Branches      20636    20943      +307     
=============================================
- Hits          82274    47541    -34733     
- Misses        44911    84091    +39180     
+ Partials       6048     3504     -2544

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`?`
integration2	`?`
java-11	`35.18% <44.44%> (-26.53%)`	⬇️
java-21	`?`
skip-bytebuffers-false	`46.58% <44.44%> (-15.17%)`	⬇️
skip-bytebuffers-true	`?`
temurin	`35.18% <44.44%> (-26.58%)`	⬇️
unittests	`46.58% <44.44%> (-15.17%)`	⬇️
unittests1	`46.58% <44.44%> (-0.32%)`	⬇️
unittests2	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jackie-Jiang

The solution looks good in general.
One behavior I want to discuss: when the consuming segment is not queried (or pruned), we won't return freshness time. Is this behavior desired?

Jackie-Jiang · 2024-05-23T23:01:54Z

pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java

+    // Not protected as this will only be invoked when metric is installed which happens after server ready
+    IngestionTimestamps currentMeasure = _partitionToIngestionTimestampsMap.get(partitionGroupId);
+    if (currentMeasure == null) { // Guard just in case we read the metric without initializing it
+      return 0;


Suggest returning Long.MIN_VALUE by default to be aligned with the segment metadata

change applied

priyen · 2024-05-23T23:48:00Z

The solution looks good in general. One behavior I want to discuss: when the consuming segment is not queried (or pruned), we won't return freshness time. Is this behavior desired?

I dont think its ideal, but Im not sure I have better way to introduce it at the moment since it requires consuming segments to be queried. But if you have suggestion regarding this let me know

I'm thinking a follow up related to a broker api that fills the gap (and possibly brokers periodically polling servers) and using that info to augment queries that didn't end up querying consuming segments

…rver reported ingestion delay timestamp (apache#13207)

Update consuming freshness field in query resp to be backed by the ti…

43a09f2

…mestamp of the ingestion delay metric

return 0 when it is not initialized

8a9cdf0

priyen changed the title ~~[wip] Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp~~ Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp May 23, 2024

Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes observability Related to observability (logging, tracing, metrics) labels May 23, 2024

Jackie-Jiang reviewed May 23, 2024

View reviewed changes

use long.min instead of 0 for non-init case

f9a4b24

Jackie-Jiang approved these changes May 23, 2024

View reviewed changes

Jackie-Jiang merged commit 6c803e2 into apache:master May 24, 2024

gortiz pushed a commit to gortiz/pinot that referenced this pull request Jun 14, 2024

Update consuming freshness field in query resp to be backed by the se…

9f5daeb

…rver reported ingestion delay timestamp (apache#13207)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp#13207

Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp#13207
Jackie-Jiang merged 3 commits intoapache:masterfrom
priyen:github-fork/consuming-freshness-server-ingestion-lag

priyen commented May 22, 2024 •

edited by Jackie-Jiang

Loading

Uh oh!

codecov-commenter commented May 23, 2024 •

edited

Loading

Uh oh!

Jackie-Jiang left a comment

Uh oh!

Jackie-Jiang May 23, 2024

Uh oh!

priyen May 23, 2024

Uh oh!

priyen commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

priyen commented May 22, 2024 • edited by Jackie-Jiang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang May 23, 2024

Choose a reason for hiding this comment

Uh oh!

priyen May 23, 2024

Choose a reason for hiding this comment

Uh oh!

priyen commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

priyen commented May 22, 2024 •

edited by Jackie-Jiang

Loading

codecov-commenter commented May 23, 2024 •

edited

Loading