Skip to content

[MINOR][CI] Make federated client and monitoring threads daemon#2508

Merged
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix/fed-netty-daemon-threads
Jun 24, 2026
Merged

[MINOR][CI] Make federated client and monitoring threads daemon#2508
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix/fed-netty-daemon-threads

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Fixes intermittent data.misc/lineage Java test forks hanging until the 30-minute GitHub Actions job timeout. All test classes complete, but the fork JVM cannot exit because of leaked non-daemon threads in the federated stack, the same class of issue as #2488/#2502, applied to the remaining sources.

Several federated Netty event-loop groups and a monitoring stats pool used
non-daemon threads, which can keep a surefire test fork JVM alive after its
tests complete and stall the job until the GitHub Actions timeout. This is
the same class of leak addressed for the server-side worker and common
thread pools, applied to the remaining sources:

- FederatedData (client/coordinator): the lazily-created NioEventLoopGroup
  used non-daemon threads and is only torn down via clearWorkGroup(), whose
  shutdownGracefully() is asynchronous and shared across concurrently running
  coordinator tests. A missed cleanup or in-flight shutdown left a non-daemon
  event loop running indefinitely (no idle expiry), hanging the data.misc/
  lineage fork. Create it with a daemon thread factory like the worker side.
- FederatedMonitoringServer: boss/worker event-loop groups now use a daemon
  thread factory, preserving the default thread count.
- WorkerService: the scheduled stats-collection pool is never shut down and
  runs a fixed-rate task forever; make its threads daemon so it cannot block
  JVM exit.

Graceful shutdown remains the normal path; daemon threads ensure a leaked or
in-flight group can never block fork/JVM exit.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.54%. Comparing base (e295b40) to head (167f977).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2508      +/-   ##
============================================
- Coverage     71.56%   71.54%   -0.03%     
+ Complexity    49052    49034      -18     
============================================
  Files          1574     1574              
  Lines        189565   189570       +5     
  Branches      37188    37188              
============================================
- Hits         135658   135622      -36     
- Misses        43422    43466      +44     
+ Partials      10485    10482       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard Baunsgaard merged commit 77da773 into apache:main Jun 24, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant