Skip to content

Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp#13207

Merged
Jackie-Jiang merged 3 commits intoapache:masterfrom
priyen:github-fork/consuming-freshness-server-ingestion-lag
May 24, 2024
Merged

Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp#13207
Jackie-Jiang merged 3 commits intoapache:masterfrom
priyen:github-fork/consuming-freshness-server-ingestion-lag

Conversation

@priyen
Copy link
Contributor

@priyen priyen commented May 22, 2024

This updates the minConsumingFreshnessTimestampMs query response metadata field to be backed by the REALTIME_INGESTION_DELAY_MS metric's timestamp (backed by the IngestionDelayTracker class).
The current implementation can cause false positives because low volume partitions, even on a high-volume table could make it seem like there is lag where this is not in reality. This makes it hard to reliably trust/use this metric to make any decisions

Previous behaviour:

  • uses the last ingestion timestamp & low volume partitions will cause this metric to suggest there is lag when there may not be

New behaviour:

  • uses the last ingestion timestamp, though partitions with no messages to consume will be reported as near-realtime (preventing low volume partitions from inflating the lag). If there is real lag, the metric will ofcourse spike when an incoming messages' last ingestion timestamp is indexed. This is how the IngestionDelayTracker works

Unchanged:

  • will report as 0 if no consuming segments are queried

This is prone to the issue described in #11448, which we should prioritize fixing but I do think the benefits still outweigh that con in terms of this PR

Testing:

  • updated tests
  • did a load test on internally in stripe to compare before/after incase there is performance diffs related to getPartitionIngestionTimeMs but turned out to be no diff
  • test failures look unrelated, I think random failures, likely pass on re-running

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2024

Codecov Report

Attention: Patch coverage is 44.44444% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 35.18%. Comparing base (59551e4) to head (8a9cdf0).
Report is 487 commits behind head on master.

Files Patch % Lines
...ata/manager/realtime/RealtimeTableDataManager.java 0.00% 3 Missing ⚠️
...core/query/executor/ServerQueryExecutorV1Impl.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #13207       +/-   ##
=============================================
- Coverage     61.75%   35.18%   -26.58%     
+ Complexity      207        6      -201     
=============================================
  Files          2436     2458       +22     
  Lines        133233   135136     +1903     
  Branches      20636    20943      +307     
=============================================
- Hits          82274    47541    -34733     
- Misses        44911    84091    +39180     
+ Partials       6048     3504     -2544     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 ?
java-11 35.18% <44.44%> (-26.53%) ⬇️
java-21 ?
skip-bytebuffers-false 46.58% <44.44%> (-15.17%) ⬇️
skip-bytebuffers-true ?
temurin 35.18% <44.44%> (-26.58%) ⬇️
unittests 46.58% <44.44%> (-15.17%) ⬇️
unittests1 46.58% <44.44%> (-0.32%) ⬇️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@priyen priyen changed the title [wip] Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp Update consuming freshness field in query resp to be backed by the server reported ingestion delay timestamp May 23, 2024
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes observability Related to observability (logging, tracing, metrics) labels May 23, 2024
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution looks good in general.
One behavior I want to discuss: when the consuming segment is not queried (or pruned), we won't return freshness time. Is this behavior desired?

// Not protected as this will only be invoked when metric is installed which happens after server ready
IngestionTimestamps currentMeasure = _partitionToIngestionTimestampsMap.get(partitionGroupId);
if (currentMeasure == null) { // Guard just in case we read the metric without initializing it
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest returning Long.MIN_VALUE by default to be aligned with the segment metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change applied

@priyen
Copy link
Contributor Author

priyen commented May 23, 2024

The solution looks good in general. One behavior I want to discuss: when the consuming segment is not queried (or pruned), we won't return freshness time. Is this behavior desired?

I dont think its ideal, but Im not sure I have better way to introduce it at the moment since it requires consuming segments to be queried. But if you have suggestion regarding this let me know

I'm thinking a follow up related to a broker api that fills the gap (and possibly brokers periodically polling servers) and using that info to augment queries that didn't end up querying consuming segments

@Jackie-Jiang Jackie-Jiang merged commit 6c803e2 into apache:master May 24, 2024
gortiz pushed a commit to gortiz/pinot that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

observability Related to observability (logging, tracing, metrics) release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants