Skip to content

Conversation

@Abhishek01911
Copy link

@Abhishek01911 Abhishek01911 commented Oct 27, 2025

Summary

Resolves Issue 17045

This PR fixes an issue where an exception in fetchPartitionCount, often caused by a deleted topic, would halt ingestion for an entire multi-topic table. By catching this exception, the failure is now isolated. Partitions for the deleted topic will get stuck in a "consuming" state, but ingestion from all other valid topics will continue unaffected.

Testing

Tested on a pinot table. Validated single incorrect topic doesn't stop table ingestion completely. This table contains 2 topics - adaptive-authn-gateway & adaptive-authn-gatewy. adaptive-authn-gatewy is a invalid kafka topic.

Screenshot 2025-10-27 at 10 03 23 AM Screenshot 2025-10-27 at 10 03 37 AM

Validated logs are still getting ingested for adaptive-authn-gateway.
Screenshot 2025-10-28 at 4 23 06 PM

Warning message is logged for topic - adaptive-authn-gatewy
Screenshot 2025-10-28 at 4 28 00 PM

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a critical availability issue where a failure to fetch partition counts (commonly triggered by deleted Kafka topics) would halt ingestion for an entire multi-topic table. The fix isolates failures by catching exceptions in fetchPartitionCount, allowing ingestion to continue on all valid topics while partitions for the failed topic remain in a consuming state.

Key changes:

  • Added exception handling in StreamMetadataProvider.computePartitionGroupMetadata() to catch failures when fetching partition counts
  • When partition count cannot be fetched, the method now returns existing partition metadata instead of propagating the exception
  • Added comprehensive test coverage for the new exception handling behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java Added try-catch block around fetchPartitionCount() call with fallback to existing partitions and warning log
pinot-spi/src/test/java/org/apache/pinot/spi/stream/StreamMetadataProviderTest.java New test file covering exception scenarios including empty statuses and different exception types

Comment on lines 95 to 96
LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using"
+ "existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message is split across two lines with a string concatenation using +. This should be written as a single continuous string literal for better readability and to avoid the concatenation operator.

Suggested change
LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using"
+ "existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);
LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);

Copilot uses AI. Check for mistakes.
for (PartitionGroupConsumptionStatus currentPartitionGroupConsumptionStatus : partitionGroupConsumptionStatuses) {
existingPartitionGroupMetadataList.add(
new PartitionGroupMetadata(currentPartitionGroupConsumptionStatus.getStreamPartitionGroupId(),
currentPartitionGroupConsumptionStatus.getEndOffset()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this partitionGroupConsumptionStatuses updated?

Is it possible this endOffset is lower than the previous commit segment end offset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment for it. endOffset is only set as start offset if the last consuming segment commited.
It will be equal to previous commit segment end offset


int partitionCount;
try {
partitionCount = fetchPartitionCount(timeoutMillis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this part is processing a single stream. If we want to handle the case of multiple topics with one topic deleted, we should handle it on the caller side instead.

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.58%. Comparing base (51632a5) to head (9999d5e).
⚠️ Report is 98 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...pache/pinot/spi/stream/StreamMetadataProvider.java 23.07% 10 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (51632a5) and HEAD (9999d5e). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (51632a5) HEAD (9999d5e)
unittests1 1 0
unittests 3 1
java-11 4 3
temurin 8 7
unittests2 2 1
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17088       +/-   ##
=============================================
- Coverage     63.17%   33.58%   -29.60%     
+ Complexity     1421      726      -695     
=============================================
  Files          3104     3104               
  Lines        183247   183366      +119     
  Branches      28088    28098       +10     
=============================================
- Hits         115774    61590    -54184     
- Misses        58502   116692    +58190     
+ Partials       8971     5084     -3887     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 100.00% <ø> (+36.82%) ⬆️
java-21 33.58% <23.07%> (+0.09%) ⬆️
temurin 33.58% <23.07%> (-29.60%) ⬇️
unittests 33.58% <23.07%> (-29.60%) ⬇️
unittests1 ?
unittests2 33.58% <23.07%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants