Detailed Description of the Problem
In master-worker mode (-W), the master CLI proxy (mworker_proxy) has a hardcoded maxconn of 10. When a client connects to the master CLI socket and issues a command that gets forwarded to a worker (e.g. @1 show sess all), and the target worker is unresponsive (stuck, slow, or overloaded), the connection hangs waiting for the worker's response. If the client then disconnects (due to a timeout, Ctrl-C, or a closed socket), the connection slot is never freed.
After 10 such leaked connections, the master CLI socket becomes completely unreachable. New connection attempts fail with Resource temporarily unavailable. In Kubernetes deployments this causes readiness/liveness probes that use the master socket to fail, leading to pod restarts.
Expected Behavior
When a client disconnects from the master CLI while a forwarded command is pending on an unresponsive worker, HAProxy should detect the client-side disconnect and clean up the connection, releasing the master CLI slot. The master CLI should remain reachable regardless of how many prior clients have disconnected.
Steps to Reproduce the Behavior
- Start HAProxy in master-worker mode with a master CLI socket:
haproxy -W -S /tmp/master.sock
- Freeze the worker process:
kill -STOP <worker_pid>
- Open 10 connections to the master socket, each sending a command forwarded to the frozen worker:
for i in $(seq 1 10); do
(printf "@1 show sess all\n" | socat -t2 UNIX:/tmp/master.sock - 2>/dev/null) &
done
wait
- After the socat processes time out and disconnect, try a new connection:
echo "show version" | socat UNIX:/tmp/master.sock -
Do you have any idea what may have caused this?
When the response analyzer (AN_RES_WAIT_CLI) is active and the client disconnects, pcli_wait_for_request() simply returns 0 without propagating the disconnect to the backend side. The response analyzer in pcli_wait_for_response() never sees an error condition on the backend stream connector, so it keeps waiting indefinitely. The stream is never torn down and the connection slot is never released.
Do you have an idea how to solve the issue?
The fix has two coordinated parts:
In pcli_wait_for_request(), when AN_RES_WAIT_CLI is set and the frontend stream connector shows a client disconnect (SC_FL_EOS or SC_FL_ABRT_DONE on s->scf), explicitly call sc_abort(s->scb) to propagate the disconnect to the backend.
In pcli_wait_for_response(), extend the existing error check to also detect SC_FL_ABRT_DONE on s->scb. This flag is only set by the explicit sc_abort() above, so it does not interfere with normal one-shot CLI tools that close their TCP connection after receiving a response.
What is your configuration?
global
stats socket /tmp/master.sock mode 660 level admin
defaults
mode http
timeout connect 5s
timeout client 30s
timeout server 30s
frontend fe
bind :8080
default_backend be
backend be
server s1 127.0.0.1:8081
Output of haproxy -vv
HAProxy version 3.4-dev9 2026/04/15 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.14.21-150500.55.144-default #1 SMP PREEMPT_DYNAMIC Tue Apr 7 16:14:47 UTC 2026 (fd769ac) x86_64
Build options :
TARGET = linux-glibc
CC = cc
CFLAGS = -O2 -g -fwrapv
OPTIONS = USE_OPENSSL=1 USE_PCRE2=1
DEBUG =
Feature list : -51DEGREES +ACCEPT4 +ACME +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ECH -ENGINE +EPOLL -EVPORTS +GETADDRINFO +HAVE_TCP_MD5SIG -KQUEUE +KTLS -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -OTEL -PCRE +PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB
Detected feature list : +HAVE_WORKING_TCP_MD5SIG
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, default=40).
Built with SSL library version : OpenSSL 3.0.8 7 Feb 2023
Running on SSL library version : OpenSSL 3.0.8 7 Feb 2023
SSL library supports TLS extensions : yes
SSL library supports SNI : yes
SSL library default verify directory : /var/lib/ca-certificates/openssl
SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.39 2021-10-29
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 7.5.0
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
spop : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] comp-req
[COMP] comp-res
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
No crash. The symptom is silent: the master socket simply stops accepting connections. No log message is emitted when slots leak.
Additional Information
- Observed in containerized (Kubernetes) deployments where readiness probes query the master CLI socket. When workers become temporarily unresponsive under load, probe connections time out and leak slots, eventually making the master socket unreachable and triggering pod restarts.
- The mworker_proxy->maxconn is hardcoded to 10, making this easy to hit.
- I guess it, it affects all HAProxy versions with master-worker mode and the pcli proxy (master CLI socket). At least on 3.0.18 I could also reproduce the issue.
Detailed Description of the Problem
In master-worker mode (
-W), the master CLI proxy (mworker_proxy) has a hardcoded maxconn of 10. When a client connects to the master CLI socket and issues a command that gets forwarded to a worker (e.g. @1 show sess all), and the target worker is unresponsive (stuck, slow, or overloaded), the connection hangs waiting for the worker's response. If the client then disconnects (due to a timeout,Ctrl-C, or a closed socket), the connection slot is never freed.After 10 such leaked connections, the master CLI socket becomes completely unreachable. New connection attempts fail with Resource temporarily unavailable. In Kubernetes deployments this causes readiness/liveness probes that use the master socket to fail, leading to pod restarts.
Expected Behavior
When a client disconnects from the master CLI while a forwarded command is pending on an unresponsive worker, HAProxy should detect the client-side disconnect and clean up the connection, releasing the master CLI slot. The master CLI should remain reachable regardless of how many prior clients have disconnected.
Steps to Reproduce the Behavior
haproxy -W -S /tmp/master.sockkill -STOP <worker_pid>echo "show version" | socat UNIX:/tmp/master.sock -Do you have any idea what may have caused this?
When the response analyzer (
AN_RES_WAIT_CLI) is active and the client disconnects,pcli_wait_for_request()simply returns 0 without propagating the disconnect to the backend side. The response analyzer in pcli_wait_for_response() never sees an error condition on the backend stream connector, so it keeps waiting indefinitely. The stream is never torn down and the connection slot is never released.Do you have an idea how to solve the issue?
The fix has two coordinated parts:
In
pcli_wait_for_request(), whenAN_RES_WAIT_CLIis set and the frontend stream connector shows a client disconnect (SC_FL_EOSorSC_FL_ABRT_DONEons->scf), explicitly callsc_abort(s->scb)to propagate the disconnect to the backend.In
pcli_wait_for_response(), extend the existing error check to also detectSC_FL_ABRT_DONEons->scb. This flag is only set by the explicitsc_abort()above, so it does not interfere with normal one-shot CLI tools that close their TCP connection after receiving a response.What is your configuration?
Output of
haproxy -vvLast Outputs and Backtraces
Additional Information