[Enhancement] Improve RocketMQ test suite stability and quality

## Before Creating the Enhancement Request

- [x] I have confirmed that this should be classified as an enhancement rather than a bug/feature.
- [x] 我已确认此请求应归类为增强（Enhancement），而非 Bug 或新功能。

## Summary

A three-phase initiative to systematically improve the RocketMQ test suite: (1) detect and quarantine flaky tests via large-scale repeated execution, (2) root-cause and fix the quarantined tests, (3) deeply improve tests with performance and design issues such as excessive sleep, over-long output, and slow execution.

分三个阶段系统性提升 RocketMQ 测试套件质量：（1）通过大规模重复执行检测并隔离不稳定测试；（2）对被隔离的测试进行根因分析和修复；（3）对包含 sleep 过多、输出过长、执行过慢等问题的用例进行深度改进。

## Motivation

Flaky tests erode developer confidence in CI signals. When tests fail non-deterministically, developers begin ignoring red builds, which masks real regressions. Beyond flakiness, some tests suffer from poor design: hard-coded `Thread.sleep()` makes them fragile and slow, excessive log output makes failures hard to diagnose, and unnecessary resource initialization inflates total CI time. Addressing all three layers is required to achieve a fast, reliable, and maintainable test suite.

不稳定测试会削弱开发者对 CI 信号的信任。当测试以非确定性方式失败时，开发者会开始忽略红色构建，从而掩盖真正的回归问题。除了 flakiness 之外，部分测试还存在设计缺陷：硬编码 `Thread.sleep()` 导致脆弱且缓慢，过多日志输出导致失败难以诊断，不必要的资源初始化拖慢整体 CI 耗时。三个层面都需要治理，才能达到快速、可靠、可维护的测试套件。

## Describe the Solution You'd Like

---

### Phase 1: Detection & Quarantine

**Approach:** Run the full RocketMQ test suite 100× across 10 ECS nodes using a three-layer funnel (module → class → method) to statistically identify non-deterministic failures. Quarantine methods with ≥1% failure rate using `@Ignore`.

在 10 台 ECS 节点上将 RocketMQ 全量测试执行 100 次，采用三层漏斗（模块 → 类 → 方法）逐步缩小范围，通过统计识别非确定性失败。对失败率 ≥1% 的方法添加 `@Ignore` 隔离。

**Reference:** This follows Google's "deflake + quarantine" methodology from [Flaky Tests at Google and How We Mitigate Them](https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html) (2016).

参考 Google 在 2016 年提出的 "deflake + quarantine" 方法论。

---

### Phase 2: Root-Cause & Fix

**Approach:** For each quarantined test, analyze the root cause (race conditions, resource conflicts, time-dependent assertions, shared mutable state, etc.) and apply a targeted fix. Priority: high failure rate first (≥10%), then moderate (1%-10%). Exit criteria: remove `@Ignore`, re-run 100× with zero failures.

对每个被隔离的测试进行根因分析（竞态条件、资源冲突、时间相关断言、共享可变状态等）并针对性修复。优先级：先修高失败率（≥10%），再处理中等（1%-10%）。退出标准：移除 `@Ignore`，重新执行 100 次零失败。

---

### Phase 3: Deep Quality Improvement

**Approach:** Beyond flakiness, identify and improve tests with structural quality issues — excessive `Thread.sleep` (replace with event-driven waiting like `Awaitility`), overly verbose output (tune test log levels), slow execution due to unnecessary full-component startup (use targeted mocks), and resource leaks (enforce proper cleanup). The goal is a test suite that runs fast, fails clearly, and doesn't create new flakiness over time.

在 flakiness 之外，识别并改进存在结构性质量问题的测试 —— 过度使用 `Thread.sleep`（替换为 `Awaitility` 等事件驱动等待）、输出过于冗长（调整测试日志级别）、因不必要的完整组件启动导致执行过慢（使用精确 mock）、以及资源泄漏（强制正确清理）。目标是让测试套件跑得快、失败信息清晰、且不会随时间产生新的 flakiness。

---

## Additional Context

- **Methodology:** 100 iterations × 10 nodes = ~1000 effective runs per test method.
- **Industry reference:** Google (deflake + quarantine, 2016), Meta (aggressive retry, 2018), Spotify (three-stage flaky test governance, 2019).
- **Phased delivery:** Phase 1 is complete. Phase 2 and 3 are tracked as follow-up work items with individual sub-issues per test.
- **Success metric:** CI green rate improves from ~85% to >99% on the main branch.

---

- 方法论：100 次迭代 × 10 个节点 ≈ 每个测试方法 1000 次有效执行。
- 业界参考：Google（deflake + quarantine, 2016）、Meta（aggressive retry, 2018）、Spotify（三阶段治理, 2019）。
- 分阶段交付：第一阶段已完成。第二、三阶段作为后续工作项跟踪，每个测试单独建立子 issue。
- 成功指标：主分支 CI 绿色率从 ~85% 提升到 >99%。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Improve RocketMQ test suite stability and quality #10373

Before Creating the Enhancement Request

Summary

Motivation

Describe the Solution You'd Like

Phase 1: Detection & Quarantine

Phase 2: Root-Cause & Fix

Phase 3: Deep Quality Improvement

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Enhancement] Improve RocketMQ test suite stability and quality #10373

Description

Before Creating the Enhancement Request

Summary

Motivation

Describe the Solution You'd Like

Phase 1: Detection & Quarantine

Phase 2: Root-Cause & Fix

Phase 3: Deep Quality Improvement

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions