Skip to content

Add generative contract tests for connection pools and async IT coverage for cancel/timeout scenarios#644

Open
arturobernalg wants to merge 1 commit intoapache:masterfrom
arturobernalg:testing/pool-contract-fuzz
Open

Add generative contract tests for connection pools and async IT coverage for cancel/timeout scenarios#644
arturobernalg wants to merge 1 commit intoapache:masterfrom
arturobernalg:testing/pool-contract-fuzz

Conversation

@arturobernalg
Copy link
Member

This change adds seeded generative contract tests for the connection pool implementations (STRICT, LAX, OFFLOCK) that run a shared operation trace and assert core invariants (no negative stats, no double-lease, no stuck pending, no leaked leased entries, and hard bounds where applicable) with seed+trace replay for reproducibility. It also adds a small httpcore5-testing integration test based on HttpAsyncRequester to validate real socket behavior for pending connect cancellation, pending connect timeout, and close-mid-flight without hangs across all three pool policies.

@arturobernalg arturobernalg force-pushed the testing/pool-contract-fuzz branch from 554f85d to 6e51548 Compare February 28, 2026 12:11
@rschmitt
Copy link
Contributor

Some early thoughts:

  1. These tests add about 35 seconds of execution time and 10 seconds of wall-clock time on my machine.
  2. Several of these test classes do not run by default in any Maven profile.
  3. They did not uncover any new bugs.
  4. They did not fail when I tried running them with my various connection pool fixes reverted. assertCoreInvariants doesn't assert that getMaxTotal() returns a non-negative value; the result of lease and the pool stats are apparently not compared to expected values; etc.
  5. Running them in IntelliJ Profiler, it's clear from the timeline view that the tests spend most of their time sleeping.
  6. The fuzzer tests are single-threaded, so they won't catch concurrency bugs.
  7. There is no test coverage for H2SharingConnPool.

A key issue here appears to be the 200ms connection request timeout in ConnPoolContractFuzzer::nudge. About 99% of the total test runtime is spent waiting for this timeout to fire. This has to be fixed in order to make progress on these tests. I'm pretty sure the lease method always returns a completed Future any time it's not at the max conns limit, so for single-threaded testing you can literally use a timeout of 0. (I just tried it locally and it dropped the single-threaded test runtime from 30.4 seconds to 92 ms.) For multi-threaded testing things are more complicated, but we'll cross that bridge when we come to it.

Another issue is the dependency on the system clock via the static JDK methods Thread.sleep() and System.currentTimeMillis(). Probably the very first thing I would do here is refactor the ConnPool classes for testability by replacing the calls to System::currentTimeMillis with calls to java.time.Clock::millis. We can then inject a test double and take direct and deterministic control over the clock for testing purposes. This will let us fix the sleeps in the existing unit tests, before we add any new test coverage.

If necessary, we can also add internal ConnPool test methods to support things like synchronization barriers, counters, callbacks, and whatever else we need in order to write fast, deterministic, exhaustive tests. But let's take this one step at a time; this should be done as a whole series of smaller PRs, rather than 1,200 lines all at once.

@arturobernalg arturobernalg force-pushed the testing/pool-contract-fuzz branch from 6e402ee to 807c891 Compare March 1, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants