Shard subscription-tree-regression-consumer to speed up Multi-Cluster IT by JackieTien97 · Pull Request #17694 · apache/iotdb

JackieTien97 · 2026-05-17T01:51:59Z

Summary

Splits the subscription-tree-regression-consumer job in pipe-it.yml into 3 parallel matrix shards. Expected: ~30–45 min → ~12–18 min per shard, removing it as the Multi-Cluster IT pipeline's long pole.
Reuses the \$RUNNER_TEMP/it-shard.txt + -Dfailsafe.includesFile pattern from PR Shard Windows IT jobs to speed up 1C1D and Table 1C1D CI #17692 (the 1C1D Windows sharding work).
Emits paths relative to src/test/java/ (e.g. org/apache/iotdb/.../IoTDBFooIT.java) rather than bare class names — needed because this suite has 6 pairs of duplicate simple names across pushconsumer/multi/ and pullconsumer/multi/ that would otherwise run twice across shards.

Why this job specifically

The Multi-Cluster IT pipeline currently runs 11 parallel jobs. All others finish in 10–20 min; subscription-tree-regression-consumer runs 72 IT classes serially in a single forkCount=1 JVM, each restarting a 2-cluster environment. That single job dictates the whole pipeline's wall clock on every PR.

Other subscription/dual-cluster jobs in pipe-it.yml are not sharded:

subscription-tree-regression-misc (13 classes) — borderline; defer to follow-up
arch-verification jobs (1–4 classes each) — sharding overhead would exceed savings
dual-tree/dual-table jobs (9–13 classes) — already under the new shard wall clock

Local verification

$ grep -rlE --include='*IT.java' '\bMultiClusterIT2SubscriptionTreeRegressionConsumer\b' integration-test/src/test/java | wc -l
72
$ for s in 0 1 2; do
    grep -rlE --include='*IT.java' '\bMultiClusterIT2SubscriptionTreeRegressionConsumer\b' integration-test/src/test/java \
      | sed 's|.*/src/test/java/||' | sort | awk -v s=\$s -v t=3 'NR%t==s' | wc -l
  done
24
24
24
$ # And unique-paths == total (i.e. no collisions after disambiguation):
$ grep -rlE --include='*IT.java' '\bMultiClusterIT2SubscriptionTreeRegressionConsumer\b' integration-test/src/test/java | sed 's|.*/src/test/java/||' | sort -u | wc -l
72

Test plan

CI: 3 parallel subscription-tree-regression-consumer (…, 0/1/2) jobs appear under Multi-Cluster IT and all go green.
Each shard finishes in ~12–18 min (down from ~30–45 min for the single un-sharded job).
No `Files with unapproved licenses` warning referencing `it-shard.txt` in any shard's log.
Union of executed test classes across the 3 shards == 72 (no class missing, no class run twice — verify via surefire-reports artifacts).

Tracker

This is item #2 from Remaining bottlenecks in the CI optimization status doc. The other two (AINode cold build and broader Subscription/daily-it sharding) will be addressed separately.

@category

The Multi-Cluster IT pipeline (pipe-it.yml) runs 11 parallel jobs on every PR. Of those, subscription-tree-regression-consumer is the longest pole: 72 IT classes annotated with @category(MultiClusterIT2SubscriptionTreeRegressionConsumer.class), each restarting two ScalableSingleNodeMode clusters in setUp(), executed serially in a single forkCount=1 JVM. Estimated wall clock ~30-45 min, while every other job in the workflow finishes in ~10-20 min. Split this job into 3 parallel matrix shards using the same hash-mod pattern that cluster-it-1c1d.yml introduced (commits 89748f1, a343cf5, 02ef20a). Each shard runs ~24 of the 72 classes and is expected to finish in ~12-18 min, removing this job as the workflow's bottleneck. The shard list is written to \$RUNNER_TEMP/it-shard.txt for the same RAT-avoidance reason as 1C1D. Two deviations from the 1C1D pattern: 1. The shard list emits paths relative to src/test/java/ (e.g., org/apache/iotdb/.../IoTDBFooIT.java) instead of bare class names. This suite has 6 pairs of duplicate simple names across pushconsumer/multi/ and pullconsumer/multi/ (e.g., IoTDBOneConsumerMultiTopicsTsfileIT exists in both). Bare names would cause failsafe to match both files for each entry, running those 6 classes twice across shards. 2. The other subscription / dual-cluster jobs in this workflow are not sharded. subscription-tree-regression-misc (13 classes) is borderline; arch-verification jobs (1-4 classes each) and dual-tree/dual-table jobs (9-13 classes) are well under the new shard wall clock and would not benefit. Revisit if any of them becomes the new long pole. Local counts on macOS: - Total classes matching the annotation: 72 - Per-shard distribution after hash-mod: 24/24/24 - Unique paths after sed normalization: 72 (no collisions)

sonarqubecloud · 2026-05-17T01:58:56Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

JackieTien97 · 2026-05-17T02:11:33Z

Closing — this PR optimized the wrong target.

After looking at actual durations from 3 recent successful Multi-Cluster IT runs, subscription-tree-regression-consumer was already finishing in ~5 minutes unsharded; my pre-PR estimate of 30-45 min was wrong. The real long poles are the dual-* jobs:

Job	Duration	Classes
dual-table-manual-basic	~63 min	13
dual-table-manual-enhanced	~62 min	11
dual-tree-auto-enhanced	~51 min	9
dual-tree-auto-basic	~42 min	12
dual-tree-manual	~27 min	11
subscription-tree-regression-consumer	~5 min	72

Sharding subscription added ~10 runner-min per CI run for zero wall-clock benefit. A follow-up PR will shard the dual-* jobs above instead — that should cut Multi-Cluster IT wall clock from ~63 min to ~22 min.

codecov · 2026-05-17T02:59:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.39%. Comparing base (2f57fd6) to head (b996d09).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #17694      +/-   ##
============================================
- Coverage     40.39%   40.39%   -0.01%     
  Complexity     2574     2574              
============================================
  Files          5179     5179              
  Lines        349628   349628              
  Branches      44683    44683              
============================================
- Hits         141243   141215      -28     
- Misses       208385   208413      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JackieTien97 closed this May 17, 2026

JackieTien97 deleted the shard-subscription-consumer-it branch May 17, 2026 02:11

JackieTien97 mentioned this pull request May 17, 2026

Shard 5 dual-cluster jobs to speed up Multi-Cluster IT #17695

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard subscription-tree-regression-consumer to speed up Multi-Cluster IT#17694

Shard subscription-tree-regression-consumer to speed up Multi-Cluster IT#17694
JackieTien97 wants to merge 1 commit into
masterfrom
shard-subscription-consumer-it

JackieTien97 commented May 17, 2026

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Uh oh!

JackieTien97 commented May 17, 2026

Uh oh!

codecov Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JackieTien97 commented May 17, 2026

Summary

Why this job specifically

Local verification

Test plan

Tracker

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Quality Gate passed

Uh oh!

JackieTien97 commented May 17, 2026

Uh oh!

codecov Bot commented May 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant