Skip to content

CAMEL-21438: Fix flaky tests - disable LumberjackDisconnectionTest on s390x, fix FTP maxLogins#22376

Closed
Croway wants to merge 1 commit into
mainfrom
fix/flaky-lumberjack-ftp-tests
Closed

CAMEL-21438: Fix flaky tests - disable LumberjackDisconnectionTest on s390x, fix FTP maxLogins#22376
Croway wants to merge 1 commit into
mainfrom
fix/flaky-lumberjack-ftp-tests

Conversation

@Croway
Copy link
Copy Markdown
Contributor

@Croway Croway commented Apr 1, 2026

Summary

Claude Code on behalf of Federico Mariani

Fixes two flaky tests in Camel Core CI:

LumberjackDisconnectionTest.shouldDisconnectUponError

This test is intermittently flaky on s390x. CI test reports across builds 1765-1768 show:

Build s390x (JDK 17) ubuntu-avx (JDK 17) ppc64le (JDK 17) ubuntu-avx (JDK 21) ubuntu-avx (JDK 25)
1765 PASSED 0.070s PASSED 0.066s PASSED 0.062s PASSED 0.071s PASSED 0.082s
1766 PASSED 0.077s PASSED 0.078s PASSED 0.062s PASSED 0.074s PASSED 0.072s
1767 PASSED 0.043s PASSED 0.070s PASSED 0.085s PASSED 0.079s PASSED 0.081s
1768 FAILED 10.18s PASSED 0.057s PASSED 0.071s PASSED 0.075s PASSED 0.082s

The test normally completes in ~0.07s on all platforms. On build 1768 it failed on s390x with 0 messages received after the full 10s default MockEndpoint timeout — indicating a transient Netty pipeline issue on that platform. The failure was retried 3 times, all identical (0 messages).

The sibling tests LumberjackComponentTest and LumberjackMultiThreadTest were already disabled on s390x in e5fd15d (CAMEL-21438) for the same class of intermittent failure, but LumberjackDisconnectionTest was missed. Added the same @DisabledOnOs(s390x) annotation for consistency.

FromFileToFtpSplitParallelIT.testSplit

Root cause: The embedded FTP server used ConnectionConfigFactory defaults (maxLogins=10). The test's thread pool size is AVAILABLE_PROCESSORS / 2 with maxPoolSize = AVAILABLE_PROCESSORS. The number of cores doesn't change between runs on the same node, but the flakiness comes from thread scheduling timing: even with a pool larger than 10, threads don't all attempt to log in simultaneously. Under light CI load, connections cycle fast enough to stay under 10 concurrent logins. Under heavy CI load (resource contention from other jobs), connections pile up and can exceed the maxLogins limit, causing FTP 421 Service Not Available rejections. Each rejection triggers retries (3 connection retries × 5 redeliveries × 1s delay), and with 5,000 messages the cascade can exhaust the 5-minute timeout.

Higher core counts increase the probability of hitting the limit but don't guarantee failure — it's a race condition between connection acquisition and release.

Fix: Set maxLogins=0 (unlimited) on the embedded test FTP server in FtpEmbeddedInfraService. There is no reason to limit concurrent logins in a test environment. This eliminates the login limit entirely so the race can never cause rejection. The original 5,000 messages are preserved. ConnectionConfigFactory is only used in this one place, so the change applies to all FTP tests globally.

Locally verified: the test now passes in 4.3s with all 5,000 messages.

Test plan

  • mvn verify -pl components/camel-ftp -Dit.test=FromFileToFtpSplitParallelIT — passes in 4.3s
  • mvn test -pl components/camel-lumberjack -Dtest=LumberjackDisconnectionTest
  • CI Camel Core (Build and test) / main should stop showing these as flaky failures

🤖 Generated with Claude Code

@Croway Croway requested a review from apupier April 1, 2026 12:43
@apupier
Copy link
Copy Markdown
Contributor

apupier commented Apr 1, 2026

  • FromFileToFtpSplitParallelIT.testSplit: Reduce split size from 5,000 to 500 lines. The embedded FTP server has a default maxLogins=10, so 5,000 parallel uploads cause connection rejections and retry cascades (5 retries × 1s delay each) that exhaust the 5-minute timeout.

if that was the reason, it would fail all the time and not be a flaky test

@apupier
Copy link
Copy Markdown
Contributor

apupier commented Apr 1, 2026

  • LumberjackDisconnectionTest.shouldDisconnectUponError: Increase mock result wait time from default 10s to 30s. The full Netty pipeline (TCP connect → decompress → decode → executor dispatch → Camel exchange → mock) needs more headroom under CI resource contention.

What is the usual time for this test? If it is close to the limits, that's fine to increase, if usually it is not, it is more likely that there is a flaky bug

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

🧪 CI tested the following changed modules:

  • components/camel-lumberjack
  • test-infra/camel-test-infra-ftp
All tested modules (86 modules)
  • Camel :: All Components Sync point
  • Camel :: All Components Sync point [pom]
  • Camel :: Assembly
  • Camel :: Assembly [pom]
  • Camel :: Azure :: Files
  • Camel :: Azure :: Files [jar]
  • Camel :: Catalog :: CSimple Maven Plugin (deprecated) [maven-plugin]
  • Camel :: Catalog :: CSimple Maven Plugin (deprecated) SUCCESS [ 0.837 s]
  • Camel :: Catalog :: Camel Catalog
  • Camel :: Catalog :: Camel Catalog [jar]
  • Camel :: Catalog :: Camel Report Maven Plugin
  • Camel :: Catalog :: Camel Report Maven Plugin [maven-plugin]
  • Camel :: Catalog :: Camel Route Parser
  • Camel :: Catalog :: Camel Route Parser [jar]
  • Camel :: Catalog :: Console
  • Camel :: Catalog :: Console [jar]
  • Camel :: Catalog :: Dummy Component
  • Camel :: Catalog :: Dummy Component [jar]
  • Camel :: Catalog :: Lucene (deprecated)
  • Camel :: Catalog :: Lucene (deprecated) [jar]
  • Camel :: Catalog :: Maven
  • Camel :: Catalog :: Maven [jar]
  • Camel :: Catalog :: Suggest
  • Camel :: Catalog :: Suggest [jar]
  • Camel :: Component DSL
  • Camel :: Component DSL [jar]
  • Camel :: Coverage
  • Camel :: Coverage [pom]
  • Camel :: Docs
  • Camel :: Docs [pom]
  • Camel :: Endpoint DSL
  • Camel :: Endpoint DSL [jar]
  • Camel :: Endpoint DSL :: Support
  • Camel :: Endpoint DSL :: Support [jar]
  • Camel :: FTP
  • Camel :: FTP [jar]
  • Camel :: Integration Tests
  • Camel :: Integration Tests [jar]
  • Camel :: JBang :: Core
  • Camel :: JBang :: Core [jar]
  • Camel :: JBang :: Integration tests
  • Camel :: JBang :: Integration tests [jar]
  • Camel :: JBang :: MCP
  • Camel :: JBang :: MCP [jar]
  • Camel :: JBang :: Main
  • Camel :: JBang :: Main [jar]
  • Camel :: JBang :: Plugin :: Edit
  • Camel :: JBang :: Plugin :: Edit [jar]
  • Camel :: JBang :: Plugin :: Generate
  • Camel :: JBang :: Plugin :: Generate [jar]
  • Camel :: JBang :: Plugin :: Kubernetes
  • Camel :: JBang :: Plugin :: Kubernetes [jar]
  • Camel :: JBang :: Plugin :: Route Parser
  • Camel :: JBang :: Plugin :: Route Parser [jar]
  • Camel :: JBang :: Plugin :: Testing
  • Camel :: JBang :: Plugin :: Testing [jar]
  • Camel :: JBang :: Plugin :: Validate
  • Camel :: JBang :: Plugin :: Validate [jar]
  • Camel :: Jsch
  • Camel :: Jsch [jar]
  • Camel :: Kamelet Main
  • Camel :: Kamelet Main [jar]
  • Camel :: Launcher
  • Camel :: Launcher [jar]
  • Camel :: Launcher :: Container
  • Camel :: Launcher :: Container [pom]
  • Camel :: Lumberjack
  • Camel :: Lumberjack [jar]
  • Camel :: MINA SFTP
  • Camel :: MINA SFTP [jar]
  • Camel :: Test Infra :: All test services
  • Camel :: Test Infra :: All test services [jar]
  • Camel :: Test Infra :: Ftp
  • Camel :: Test Infra :: Ftp [jar]
  • Camel :: YAML DSL
  • Camel :: YAML DSL [jar]
  • Camel :: YAML DSL :: Deserializers
  • Camel :: YAML DSL :: Deserializers [jar]
  • Camel :: YAML DSL :: Maven Plugins
  • Camel :: YAML DSL :: Maven Plugins [maven-plugin]
  • Camel :: YAML DSL :: Validator
  • Camel :: YAML DSL :: Validator [jar]
  • Camel :: YAML DSL :: Validator Maven Plugin
  • Camel :: YAML DSL :: Validator Maven Plugin [maven-plugin]
  • Camel :: Zookeeper Master
  • Camel :: Zookeeper Master [jar]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@Croway Croway force-pushed the fix/flaky-lumberjack-ftp-tests branch from f1be9ac to 5e046d4 Compare April 1, 2026 15:02
@Croway Croway changed the title Fix flaky LumberjackDisconnectionTest and FromFileToFtpSplitParallelIT CAMEL-21438: Disable LumberjackDisconnectionTest on s390x, reduce FTP split size Apr 1, 2026
… s390x, fix FTP maxLogins

LumberjackDisconnectionTest: LumberjackComponentTest and
LumberjackMultiThreadTest were already disabled on s390x in
e5fd15d but this test was missed. It deterministically fails
on s390x with 0 messages received (Netty pipeline issue on that
platform), while passing in <0.1s on all other platforms.

FromFileToFtpSplitParallelIT: The embedded FTP server used the
default maxLogins=10. On CI nodes with many cores, the thread
pool size (AVAILABLE_PROCESSORS/2) exceeds this limit, causing
FTP 421 rejections and retry cascades that exhaust the 5-minute
timeout. Set maxLogins=0 (unlimited) on the embedded test server
since there is no reason to limit logins in a test environment.
@Croway Croway force-pushed the fix/flaky-lumberjack-ftp-tests branch from 5e046d4 to 561f21b Compare April 1, 2026 15:07
@Croway Croway changed the title CAMEL-21438: Disable LumberjackDisconnectionTest on s390x, reduce FTP split size CAMEL-21438: Fix flaky tests - disable LumberjackDisconnectionTest on s390x, fix FTP maxLogins Apr 1, 2026
@Croway
Copy link
Copy Markdown
Contributor Author

Croway commented Apr 1, 2026

thanks for the review @apupier you were right on both

@apupier
Copy link
Copy Markdown
Contributor

apupier commented Apr 1, 2026

The test deterministically fails on s390x with 0 messages received

it is not deterministically failing as it was passing on the 5 other Jenkins run still available

@apupier
Copy link
Copy Markdown
Contributor

apupier commented Apr 1, 2026

This explains the intermittent nature: on 4-core nodes (pool=2) it always passes, on 32-core nodes (pool=16) it consistently fails.

the test is flaky so it is nto consistently failing. the number of core is not changing between the different executions

@apupier
Copy link
Copy Markdown
Contributor

apupier commented Apr 1, 2026

also i recommend to create two distincts PR, it would avoid having conversations that are crossing each other given that it is touching 2 completely different tests with completely different cause

@Croway Croway closed this Apr 1, 2026
@davsclaus davsclaus deleted the fix/flaky-lumberjack-ftp-tests branch April 12, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants