Skip to content

Fix EmbeddedKafkaCluster startup/teardown ordering in integration tests#17855

Merged
xiangfu0 merged 2 commits intoapache:masterfrom
xiangfu0:fix-kafka-startup-teardown-ordering
Mar 12, 2026
Merged

Fix EmbeddedKafkaCluster startup/teardown ordering in integration tests#17855
xiangfu0 merged 2 commits intoapache:masterfrom
xiangfu0:fix-kafka-startup-teardown-ordering

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Mar 11, 2026

Summary

  • Refactor EmbeddedKafkaCluster to allow extra props for Kafka test setup.
  • Ensure Kafka starts after Zookeeper but before Controller across all integration tests (startup: ZK → Kafka → Controller → Broker → Server)
  • Ensure Kafka stops after Controller but before Zookeeper in teardown (teardown: Server → Broker → Controller → Kafka → ZK)
  • Add missing @AfterClass tearDown methods with stopKafka() to 5 tests that were starting Kafka but never stopping it
  • Add getKafkaExtraProperties() hook in BaseClusterIntegrationTest for subclasses to pass custom Kafka broker config to EmbeddedKafkaCluster (used by ExactlyOnceKafkaRealtimeClusterIntegrationTest to set log.flush.interval.messages=1 for transactional test stability)
  • Wrap all @AfterClass tearDown methods in try/finally with FileUtils.deleteQuietly for reliable temp directory cleanup
  • Replace System.err.println with SLF4J logging in ExactlyOnceKafkaRealtimeClusterIntegrationTest

Files changed

Startup ordering fixes (10 files — moved startKafka() before startController()):

  • BaseRealtimeClusterIntegrationTest.java
  • CLPEncodingRealtimeIntegrationTest.java
  • QueryWorkloadIntegrationTest.java
  • RetentionManagerIntegrationTest.java
  • StaleSegmentCheckIntegrationTest.java
  • PinotSinkUpsertTableIntegrationTest.java
  • PartialUpsertTableRebalanceIntegrationTest.java
  • NullHandlingIntegrationTest.java
  • PauselessRealtimeIngestionSegmentCommitFailureTest.java
  • BasePauselessRealtimeIngestionTest.java

Teardown ordering fixes (3 files — moved stopKafka() after stopController()):

  • BrokerQueryLimitTest.java
  • CustomDataQueryClusterIntegrationTest.java
  • BaseLogicalTableIntegrationTest.java

Added missing tearDown with stopKafka() (5 files):

  • CLPEncodingRealtimeIntegrationTest.java — no tearDown at all
  • QueryWorkloadIntegrationTest.java — no tearDown at all
  • RetentionManagerIntegrationTest.java — no tearDown at all
  • PinotSinkUpsertTableIntegrationTest.java — no tearDown at all
  • StaleSegmentCheckIntegrationTest.java — tearDown existed but was missing stopKafka()

Kafka extra properties support (3 files):

  • EmbeddedKafkaCluster.java — forward extra config props to KafkaClusterTestKit builder, clear on re-init
  • BaseClusterIntegrationTest.java — add getKafkaExtraProperties() hook, merge into Kafka startup
  • ExactlyOnceKafkaRealtimeClusterIntegrationTest.java — override hook to set log.flush.interval.messages=1

Teardown robustness (6 files — wrap in try/finally with deleteQuietly):

  • CLPEncodingRealtimeIntegrationTest.java
  • QueryWorkloadIntegrationTest.java
  • RetentionManagerIntegrationTest.java
  • PinotSinkUpsertTableIntegrationTest.java
  • BrokerQueryLimitTest.java (also fixed duplicate deleteDirectory and ordering)
  • ExactlyOnceKafkaRealtimeClusterIntegrationTest.java (replaced System.err with SLF4J)

Test plan

  • Verify ExactlyOnceKafkaRealtimeClusterIntegrationTest passes (9 tests, 0 failures)
  • Verify existing integration tests still pass with the reordered startup/teardown
  • Confirm no Kafka-related resource leaks in CI

🤖 Generated with Claude Code

@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.25%. Comparing base (e05a8a6) to head (9ae3959).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17855      +/-   ##
============================================
- Coverage     63.26%   63.25%   -0.01%     
- Complexity     1460     1466       +6     
============================================
  Files          3190     3190              
  Lines        192011   192011              
  Branches      29412    29412              
============================================
- Hits         121469   121453      -16     
- Misses        61026    61044      +18     
+ Partials       9516     9514       -2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (+100.00%) ⬆️
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (ø)
java-11 63.21% <ø> (+<0.01%) ⬆️
java-21 63.21% <ø> (-0.02%) ⬇️
temurin 63.25% <ø> (-0.01%) ⬇️
unittests 63.25% <ø> (-0.02%) ⬇️
unittests1 55.57% <ø> (+<0.01%) ⬆️
unittests2 34.25% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the fix-kafka-startup-teardown-ordering branch 3 times, most recently from 38a7ac1 to 25a5c3e Compare March 11, 2026 10:09
- Add getKafkaExtraProperties() hook in BaseClusterIntegrationTest for
  subclasses to pass custom Kafka broker config
- Update EmbeddedKafkaCluster to forward extra config properties to
  KafkaClusterTestKit builder
- Set log.flush.interval.messages=1 in ExactlyOnceKafka test to ensure
  transactional data is flushed to disk immediately
- Fix timeout message mismatch (was "60s", actual deadline is 120s)
- Add retry logic for realtime table creation when Kafka topic metadata
  is not yet available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make integration tests more reliable by enforcing a consistent embedded ZooKeeper/Kafka/Pinot startup and teardown order, reducing Kafka-related leaks and cross-test interference.

Changes:

  • Reorders Kafka startup to occur after ZooKeeper but before Controller in multiple integration tests.
  • Reorders Kafka shutdown to occur after Controller but before ZooKeeper in suite/test teardowns.
  • Adds support for passing extra Kafka broker config into the embedded Kafka cluster and uses it in the ExactlyOnce Kafka integration test.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/src/test/java/org/apache/pinot/plugin/stream/kafka30/server/EmbeddedKafkaCluster.java Adds forwarding of extra broker config properties into the Kafka testkit builder.
pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTest.java Introduces getKafkaExtraProperties() and passes it into embedded Kafka startup.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/ExactlyOnceKafkaRealtimeClusterIntegrationTest.java Overrides extra Kafka broker properties; increases verification timeout messaging.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseRealtimeClusterIntegrationTest.java Moves startKafka() earlier in startup ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BasePauselessRealtimeIngestionTest.java Moves startKafka() earlier in startup ordering and removes later duplicate start.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/PauselessRealtimeIngestionSegmentCommitFailureTest.java Moves startKafka() earlier in startup ordering and removes later duplicate start.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/PartialUpsertTableRebalanceIntegrationTest.java Moves startKafka() earlier in startup ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java Moves startKafka() earlier in startup ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/TableRebalancePauselessIntegrationTest.java Moves startKafka() earlier in startup ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/StaleSegmentCheckIntegrationTest.java Moves Kafka start earlier and adds missing stopKafka() in teardown.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/RetentionManagerIntegrationTest.java Moves Kafka start earlier and adds a new @AfterClass teardown that stops Kafka.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/QueryWorkloadIntegrationTest.java Moves Kafka start earlier and adds a new @AfterClass teardown that stops Kafka.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/CLPEncodingRealtimeIntegrationTest.java Moves Kafka start earlier and adds a new @AfterClass teardown that stops Kafka.
pinot-connectors/pinot-flink-connector/src/test/java/org/apache/pinot/connector/flink/sink/PinotSinkUpsertTableIntegrationTest.java Moves Kafka start earlier and adds a new @AfterClass teardown that stops Kafka.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BrokerQueryLimitTest.java Moves Kafka stop later in teardown ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/CustomDataQueryClusterIntegrationTest.java Moves Kafka stop later in suite teardown ordering.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/logicaltable/BaseLogicalTableIntegrationTest.java Moves Kafka stop later in suite teardown ordering.

You can also share your feedback on Copilot code review. Take the survey.

@xiangfu0 xiangfu0 changed the title Fix Kafka startup/teardown ordering in integration tests Fix EmbeddedKafkaCluster startup/teardown ordering in integration tests Mar 11, 2026
…xtra props

- Wrap all @afterclass tearDown methods in try/finally with
  FileUtils.deleteQuietly for reliable temp directory cleanup
- Fix BrokerQueryLimitTest duplicate deleteDirectory and wrong ordering
- Replace System.err.println with SLF4J LOGGER in ExactlyOnce test
- Clear _extraConfigProps at start of init() in EmbeddedKafkaCluster

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.


You can also share your feedback on Copilot code review. Take the survey.

}
} catch (Exception e) {
System.err.println("[ExactlyOnce] Error counting records with " + isolationLevel + ": " + e.getMessage());
LOGGER.error("Error counting records with {}: {}", isolationLevel, e.getMessage());
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log drops the exception stack trace by only logging e.getMessage(). Including the Throwable as the last parameter will preserve the full stack trace, which is especially useful for diagnosing flaky integration test failures.

Suggested change
LOGGER.error("Error counting records with {}: {}", isolationLevel, e.getMessage());
LOGGER.error("Error counting records with {}", isolationLevel, e);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

@9aman 9aman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiangfu0 xiangfu0 merged commit f641a9c into apache:master Mar 12, 2026
24 checks passed
@xiangfu0 xiangfu0 deleted the fix-kafka-startup-teardown-ordering branch March 12, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants