Skip to content

[fix][test] Fix flaky ModularLoadManagerImplTest and IsolatedBookieEnsemblePlacementPolicyTest#25496

Merged
merlimat merged 3 commits intoapache:masterfrom
lhotari:fix-flaky-loadbalance-and-bookie-tests
Apr 8, 2026
Merged

[fix][test] Fix flaky ModularLoadManagerImplTest and IsolatedBookieEnsemblePlacementPolicyTest#25496
merlimat merged 3 commits intoapache:masterfrom
lhotari:fix-flaky-loadbalance-and-bookie-tests

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented Apr 8, 2026

Motivation

Fix two flaky tests that fail intermittently on master.

  • ModularLoadManagerImplTest.testRemoveNonExistBundleData: The test creates a namespace with 8 bundles and asserts that all 8 bundles have data in the metadata store after writeResourceQuotasToZooKeeper(). The flakiness occurs because writeResourceQuotasToZooKeeper() internally calls updateBundleData() which reads bundle stats from loadData.getBrokerData(). If the ZooKeeper notification from the second broker's writeBrokerDataOnZooKeeper() hasn't been processed yet, loadData only contains bundles from one broker, so fewer than 8 bundles get written. The fix ensures lm1.updateAll() is called before writing bundle data so that lm1 has fetched both brokers' data from the metadata store.

  • IsolatedBookieEnsemblePlacementPolicyTest.testSecondaryIsolationGroupsBookies: Bookies were configured with conflicting rack assignments across groups (e.g., BOOKIE1 was rack0 in default but rack1 in primaryGroup). Since BookieRackAffinityMapping processes groups via HashMap iteration (non-deterministic order), the final rack for each bookie varied between runs, causing rack diversity constraint failures.

Modifications

  • testRemoveNonExistBundleData: Added lm1.updateAll() before writeResourceQuotasToZooKeeper() to ensure lm1 has loaded both brokers' data from the metadata store. This fixes the root cause (missing broker data in loadData) and preserves the original strict assertEquals(bundleNumbers, bundles.size()) assertion that verifies all bundles are present.
  • testSecondaryIsolationGroupsBookies: Made rack assignments consistent across all groups for each bookie (BOOKIE1 always rack0, BOOKIE2/BOOKIE4 always rack1).

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as ModularLoadManagerImplTest.testRemoveNonExistBundleData and IsolatedBookieEnsemblePlacementPolicyTest.testSecondaryIsolationGroupsBookies.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

…semblePlacementPolicyTest

- ModularLoadManagerImplTest.testRemoveNonExistBundleData: Replace exact
  bundle count assertion with upper-bound check since not all bundles may
  have data written in the metadata store at the time of the check.

- IsolatedBookieEnsemblePlacementPolicyTest.testSecondaryIsolationGroupsBookies:
  Use consistent rack assignments across bookie groups to avoid
  non-deterministic rack resolution when the same bookie appears in
  multiple groups with different rack values. HashMap iteration order
  determined which rack assignment won, causing flaky failures in the
  rack-aware ensemble placement.
lhotari added 2 commits April 8, 2026 23:02
…ause

The previous fix weakened assertEquals(bundleNumbers, bundles.size()) to
assertTrue(bundles.size() <= bundleNumbers), which hides the real problem
and changes the test's original intention.

Root cause: writeResourceQuotasToZooKeeper() calls updateBundleData() which
reads from loadData.getBrokerData(). If the ZK notification from lm2's
writeBrokerDataOnZooKeeper hasn't been processed yet by lm1's executor,
lm1's loadData only contains bundles from one broker, so fewer than 8
bundles get written to the metadata store.

Fix: call lm1.updateAll() before writeResourceQuotasToZooKeeper() to
ensure lm1 has fetched both brokers' data from the metadata store. This
preserves the original strict assertion that all 8 bundles are present.
@merlimat merlimat merged commit d96da97 into apache:master Apr 8, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants