Fix: Handle kafka topic not found exception in fetchPartitionCount fu… #17088

Abhishek01911 · 2025-10-27T17:04:32Z

Summary

Resolves Issue 17045

This PR fixes an issue where an exception in fetchPartitionCount, often caused by a deleted topic, would halt ingestion for an entire multi-topic table. By catching this exception, the failure is now isolated. Partitions for the deleted topic will get stuck in a "consuming" state, but ingestion from all other valid topics will continue unaffected.

Testing

Tested on a pinot table. Validated single incorrect topic doesn't stop table ingestion completely. This table contains 2 topics - adaptive-authn-gateway & adaptive-authn-gatewy. adaptive-authn-gatewy is a invalid kafka topic.

Validated logs are still getting ingested for adaptive-authn-gateway.

Warning message is logged for topic - adaptive-authn-gatewy

…nction gracefully

Copilot

Pull Request Overview

This PR addresses a critical availability issue where a failure to fetch partition counts (commonly triggered by deleted Kafka topics) would halt ingestion for an entire multi-topic table. The fix isolates failures by catching exceptions in fetchPartitionCount, allowing ingestion to continue on all valid topics while partitions for the failed topic remain in a consuming state.

Key changes:

Added exception handling in StreamMetadataProvider.computePartitionGroupMetadata() to catch failures when fetching partition counts
When partition count cannot be fetched, the method now returns existing partition metadata instead of propagating the exception
Added comprehensive test coverage for the new exception handling behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java`	Added try-catch block around `fetchPartitionCount()` call with fallback to existing partitions and warning log
`pinot-spi/src/test/java/org/apache/pinot/spi/stream/StreamMetadataProviderTest.java`	New test file covering exception scenarios including empty statuses and different exception types

Copilot · 2025-10-28T02:15:40Z

pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java

+      LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using"
+      + "existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);


The log message is split across two lines with a string concatenation using +. This should be written as a single continuous string literal for better readability and to avoid the concatenation operator.

Suggested change

LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using"

+ "existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);

LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);

xiangfu0 · 2025-10-28T19:31:59Z

pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java

+      for (PartitionGroupConsumptionStatus currentPartitionGroupConsumptionStatus : partitionGroupConsumptionStatuses) {
+        existingPartitionGroupMetadataList.add(
+            new PartitionGroupMetadata(currentPartitionGroupConsumptionStatus.getStreamPartitionGroupId(),
+                currentPartitionGroupConsumptionStatus.getEndOffset()));


How is this partitionGroupConsumptionStatuses updated?

Is it possible this endOffset is lower than the previous commit segment end offset?

added a comment for it. endOffset is only set as start offset if the last consuming segment commited.
It will be equal to previous commit segment end offset

Jackie-Jiang · 2025-11-17T22:53:37Z

pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java

+
+    int partitionCount;
+    try {
+      partitionCount = fetchPartitionCount(timeoutMillis);


IIUC, this part is processing a single stream. If we want to handle the case of multiple topics with one topic deleted, we should handle it on the caller side instead.

codecov-commenter · 2025-11-17T23:37:45Z

Codecov Report

❌ Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.58%. Comparing base (51632a5) to head (9999d5e).
⚠️ Report is 98 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...pache/pinot/spi/stream/StreamMetadataProvider.java	23.07%	10 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (51632a5) and HEAD (9999d5e). Click for more details.

HEAD has 6 uploads less than BASE

Flag BASE (51632a5) HEAD (9999d5e)

unittests1 1 0

unittests 3 1

java-11 4 3

temurin 8 7

unittests2 2 1

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #17088       +/-   ##
=============================================
- Coverage     63.17%   33.58%   -29.60%     
+ Complexity     1421      726      -695     
=============================================
  Files          3104     3104               
  Lines        183247   183366      +119     
  Branches      28088    28098       +10     
=============================================
- Hits         115774    61590    -54184     
- Misses        58502   116692    +58190     
+ Partials       8971     5084     -3887

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`100.00% <ø> (+36.82%)`	⬆️
java-21	`33.58% <23.07%> (+0.09%)`	⬆️
temurin	`33.58% <23.07%> (-29.60%)`	⬇️
unittests	`33.58% <23.07%> (-29.60%)`	⬇️
unittests1	`?`
unittests2	`33.58% <23.07%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fix: Handle kafka topic not found exception in fetchPartitionCount fu…

3b40d8f

…nction gracefully

xiangfu0 requested review from Jackie-Jiang, Copilot and xiangfu0 October 28, 2025 02:15

xiangfu0 added kafka ingestion plugins labels Oct 28, 2025

Copilot AI reviewed Oct 28, 2025

View reviewed changes

xiangfu0 reviewed Oct 28, 2025

View reviewed changes

log statement updated and added a comment

9999d5e

Jackie-Jiang added bugfix real-time labels Nov 17, 2025

Jackie-Jiang reviewed Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Handle kafka topic not found exception in fetchPartitionCount fu… #17088

Fix: Handle kafka topic not found exception in fetchPartitionCount fu… #17088

Uh oh!

Abhishek01911 commented Oct 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 28, 2025

Uh oh!

xiangfu0 Oct 28, 2025

Uh oh!

Abhishek01911 Oct 28, 2025

Uh oh!

Jackie-Jiang Nov 17, 2025

Uh oh!

codecov-commenter commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		LOGGER.warn("Failed to fetch partition count for stream config: {}. Skipping stream and using"
		+ "existing partitions only. Error: {}", streamConfig.getTopicName(), e.getMessage(), e);

Fix: Handle kafka topic not found exception in fetchPartitionCount fu… #17088

Are you sure you want to change the base?

Fix: Handle kafka topic not found exception in fetchPartitionCount fu… #17088

Uh oh!

Conversation

Abhishek01911 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

xiangfu0 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Abhishek01911 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Abhishek01911 commented Oct 27, 2025 •

edited

Loading

codecov-commenter commented Nov 17, 2025 •

edited

Loading