Skip to content

Update vtest.yml#12

Merged
chipitsine merged 1 commit intochipitsine:masterfrom
shipitsin-ilia:patch-1
Jul 19, 2022
Merged

Update vtest.yml#12
chipitsine merged 1 commit intochipitsine:masterfrom
shipitsin-ilia:patch-1

Conversation

@shipitsin-ilia
Copy link

No description provided.

@chipitsine chipitsine merged commit 49b74dc into chipitsine:master Jul 19, 2022
chipitsine pushed a commit that referenced this pull request Apr 1, 2023
The MUX instance is released before its quic-conn counterpart. On
termination, a H3 GOAWAY is emitted to prevent the client to open new
streams for this connection.

The quic-conn instance will stay alive until all opened streams data are
acknowledged. If the client tries to open a new stream during this
interval despite the GOAWAY, quic-conn is responsible to request its
immediate closure with a STOP_SENDING + RESET_STREAM.

This behavior was already implemented but the received packet with the
new STREAM was never acknowledged. This was fixed with the following
commit :
  commit 156a89a
  BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released

However, this patch introduces a regression as it did not skip the call
to qc_handle_strm_frm() despite the MUX instance being released. This
can cause a segfault when using qcc_get_qcs() on a released MUX
instance. To fix this, add a missing break statement which will skip
qc_handle_strm_frm() when the MUX instance is not initialized.

This commit was reproduced using a short timeout client and sending
several requests with delay between them by using a modified aioquic. It
produces a crash with the following backtrace :
 #0  0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132
 #1  eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37
 #2  0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668
 #3  0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974
 #4  0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515
 #5  0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050
 #6  0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214
 #7  0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640
 #8  0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596
 #9  0x00005555558a725b in process_runnable_tasks () at src/task.c:876
 #10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945
 #11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141
 #12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6
 #13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6

This should fix github issue haproxy#2067.

This must be backported up to 2.6.
chipitsine pushed a commit that referenced this pull request Sep 16, 2023
When using the listener socket as file descriptor, qc->fd value is -1.
In this case one must not access fdtab[qc->fd] element to change its value.

This bug could have been detected by asan with such a backtrace:

=================================================================
==402222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa8ecf417ex7fa8e915cf90 sp 0x7fa8e915cf88
WRITE of size 8 at 0x7fa8ecf417e8 thread T6
    #0 0x55707a0bf18a in qc_new_cc_conn src/quic_conn.c:838
    #1 0x55707a0c6dc0 in quic_conn_release src/quic_conn.c:1408
    #2 0x55707a10916f in quic_close src/xprt_quic.c:35
    #3 0x55707a0cec77 in conn_xprt_close include/haproxy/connection.h:153
    #4 0x55707a0ceed0 in conn_full_close include/haproxy/connection.h:197
    #5 0x55707a0ec253 in qcc_release src/mux_quic.c:2412
    #6 0x55707a0ec7d0 in qcc_io_cb src/mux_quic.c:2443
    #7 0x55707a63ff2a in run_tasks_from_lists src/task.c:596
    #8 0x55707a641cc9 in process_runnable_tasks src/task.c:876
    #9 0x55707a56f7b2 in run_poll_loop src/haproxy.c:2954
    #10 0x55707a5705fd in run_thread_poll_loop src/haproxy.c:3153
    #11 0x7fa8f9450ea6 in start_thread nptl/pthread_create.c:477
    #12 0x7fa8f936ea2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x7fa8ecf417e8 is located 24 bytes to the left of 134217728-byte region [0x7fa8e
allocated by thread T0 here:
    #0 0x7fa8f9a37037 in __interceptor_calloc ../../../../src/libsanitizer/asan/
    #1 0x55707a71a61d in init_pollers src/fd.c:1161
    #2 0x55707a56cdf1 in init src/haproxy.c:2672
    #3 0x55707a5714c2 in main src/haproxy.c:3298
    #4 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

Thread T6 created by T0 here:
    #0 0x7fa8f99e22a2 in __interceptor_pthread_create ../../../../src/libsanitizpp:214
    #1 0x55707a748a21 in setup_extra_threads src/thread.c:252
    #2 0x55707a5735c9 in main src/haproxy.c:3844
    #3 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow src/quic_conn.c:838 in qc_new_cc
Shadow bytes around the buggy address:
  0x0ff59d9e02a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0ff59d9e02f0: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
  0x0ff59d9e0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==402222==ABORTING
Aborted

Thank you to @Tristan971 for having reported this bug in GH haproxy#2247.

No need to backport.
chipitsine pushed a commit that referenced this pull request Oct 17, 2023
Since the following patch :
  commit 33c49ce
  MINOR: quic: Make qc_dgrams_retransmit() return a status.
retransmission process is interrupted as soon as a fatal send error has
been encounted. However, this may leave frames in local list. This cause
several issues : a memory leak and a potential crash.

The crash happens because leaked frames are duplicated of an origin
frame via qc_dup_pkt_frms(). If an ACK arrives later for the origin
frame, all duplicated frames are also freed. During qc_frm_free(),
LIST_DEL_INIT() operation is invalid as it still references the local
list used inside qc_dgrams_retransmit().

This bug was reproduced using the following injection from another
machine :
  $ h2load --npn-list h3 -t 8 -c 10000 -m 1 -n 2000000000 \
      https://<host>:<port>/?s=4m

Haproxy was compiled using ASAN. The crash resulted in the following
trace :
==332748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff82bf9d78 at pc 0x556facd3b95a bp 0x7fff82bf8b20 sp 0x7fff82bf8b10
WRITE of size 8 at 0x7fff82bf9d78 thread T0
    #0 0x556facd3b959 in qc_frm_free include/haproxy/quic_frame.h:273
    #1 0x556facd59501 in qc_release_frm src/quic_conn.c:1724
    #2 0x556facd5a07f in quic_stream_try_to_consume src/quic_conn.c:1803
    #3 0x556facd5abe9 in qc_treat_acked_tx_frm src/quic_conn.c:1866
    #4 0x556facd5b3d8 in qc_ackrng_pkts src/quic_conn.c:1928
    #5 0x556facd60187 in qc_parse_ack_frm src/quic_conn.c:2354
    #6 0x556facd693a1 in qc_parse_pkt_frms src/quic_conn.c:3203
    #7 0x556facd7531a in qc_treat_rx_pkts src/quic_conn.c:4606
    #8 0x556facd7a528 in quic_conn_app_io_cb src/quic_conn.c:5059
    #9 0x556fad3284be in run_tasks_from_lists src/task.c:596
    #10 0x556fad32a3fa in process_runnable_tasks src/task.c:876
    #11 0x556fad24a676 in run_poll_loop src/haproxy.c:2968
    #12 0x556fad24b510 in run_thread_poll_loop src/haproxy.c:3167
    #13 0x556fad24e7ff in main src/haproxy.c:3857
    #14 0x7fae30ddd0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #15 0x556facc9375d in _start (/opt/haproxy-quic-2.8/haproxy+0x1ea75d)

Address 0x7fff82bf9d78 is located in stack of thread T0 at offset 40 in frame
    #0 0x556facd74ede in qc_treat_rx_pkts src/quic_conn.c:4580

This must be backported up to 2.7.
chipitsine pushed a commit that referenced this pull request Nov 21, 2023
The connections are flagged as "to be killed" asap when the peer has left
(detected by sendto() "Connection refused" errno) by qc_kill_conn(). This
function has to wakeup the idle timer task to release the connection (and the idle
timer  and the idle timer task itself). Then if in the meantime the connection
was flagged as having to process some retransmissions, some packet could lead
to sendto() errors again with a call to qc_kill_conn(), this time with a released
idle timer task.

This bug could be detected by libasan as follows:

.AddressSanitizer:DEADLYSIGNAL
=================================================================
==21018==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x    560b5d898717 bp 0x7f9aaac30000 sp 0x7f9aaac2ff80 T3)
==21018==The signal is caused by a READ memory access.
==21018==Hint: address points to the zero page.
.    #0 0x560b5d898717 in _task_wakeup include/haproxy/task.h:209
    #1 0x560b5d8a563c in qc_kill_conn src/quic_conn.c:171
    #2 0x560b5d97f832 in qc_send_ppkts src/quic_tx.c:636
    #3 0x560b5d981b53 in qc_send_app_pkts src/quic_tx.c:876
    #4 0x560b5d987122 in qc_send_app_probing src/quic_tx.c:910
    #5 0x560b5d987122 in qc_dgrams_retransmit src/quic_tx.c:1397
    #6 0x560b5d8ab250 in quic_conn_app_io_cb src/quic_conn.c:712
    #7 0x560b5de41593 in run_tasks_from_lists src/task.c:596
    #8 0x560b5de4333c in process_runnable_tasks src/task.c:876
    #9 0x560b5dd6f2b0 in run_poll_loop src/haproxy.c:2966
    #10 0x560b5dd700fb in run_thread_poll_loop src/haproxy.c:3165
    #11 0x7f9ab9188ea6 in start_thread nptl/pthread_create.c:477
    #12 0x7f9ab90a8a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV include/haproxy/task.h:209 in _task_wakeup
Thread T3 created by T0 here:
    #0 0x7f9ab97ac2a2 in __interceptor_pthread_create ../../../../src/libsaniti    zer/asan/asan_interceptors.cpp:214
    #1 0x560b5df4f3ef in setup_extra_threads src/thread.c:252                  o
    #2 0x560b5dd730c7 in main src/haproxy.c:3856
    #3 0x7f9ab8fd0d09 in __libc_start_main ../csu/libc-start.c:308             i

==21018==ABORTING
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

To fix, simply reset the connection flag QUIC_FL_CONN_RETRANS_NEEDED to cancel
the retransmission when qc_kill_conn is called.

Note that this new bug arrived with this fix which is correct and flagged as to be
backported as far as 2.6.
      BUG/MINOR: quic: idle timer task requeued in the past

Must be backported as far as 2.6.
chipitsine pushed a commit that referenced this pull request Nov 23, 2023
This bug could be reproduced with -dMfail and dectected by libasan as follows:

$ ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=f quic-freeze.cfg -dMfail -dMno-cache -dM0x55
=================================================================
==82989==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc 0x560790cc4749 bp 0x7fff8e0e8e30 sp 0x7fff8e0e8e28
WRITE of size 8 at 0x7fff8e0ea338 thread T0
    #0 0x560790cc4748 in qc_frm_free src/quic_frame.c:1222
    #1 0x560790cc5260 in qc_release_frm src/quic_frame.c:1261
    #2 0x560790d1de99 in qc_treat_acked_tx_frm src/quic_rx.c:312
    #3 0x560790d1e708 in qc_ackrng_pkts src/quic_rx.c:370
    #4 0x560790d22a1d in qc_parse_ack_frm src/quic_rx.c:694
    #5 0x560790d25daa in qc_parse_pkt_frms src/quic_rx.c:988
    #6 0x560790d2a509 in qc_treat_rx_pkts src/quic_rx.c:1373
    #7 0x560790c72d45 in quic_conn_io_cb src/quic_conn.c:906
    #8 0x560791207847 in run_tasks_from_lists src/task.c:596
    #9 0x5607912095f0 in process_runnable_tasks src/task.c:876
    #10 0x560791135564 in run_poll_loop src/haproxy.c:2966
    #11 0x5607911363af in run_thread_poll_loop src/haproxy.c:3165
    #12 0x56079113938c in main src/haproxy.c:3862
    #13 0x7f92606edd09 in __libc_start_main ../csu/libc-start.c:308
    #14 0x560790bcd529 in _start (/home/flecaille/src/haproxy/haproxy+0x

Address 0x7fff8e0ea338 is located in stack of thread T0 at offset 1032 i
    #0 0x560790d29b52 in qc_treat_rx_pkts src/quic_rx.c:1341

  This frame has 2 object(s):
    [32, 48) 'ar' (line 1380)
    [64, 1088) '_msg' (line 1368) <== Memory access at offset 1032 is inable
HINT: this may be a false positive if your program uses some custom stacnism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope src/quic_frame.c:1222 i
Shadow bytes around the buggy address:
  0x100071c15410: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15420: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15430: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15440: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15450: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
=>0x100071c15460: f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8 f3 f3
  0x100071c15470: f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
  0x100071c15480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c15490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154b0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==82989==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

Note that a coredump could not always be produced with all compilers. This was
always the case with clang 11.

When allocating frames to be retransmitted from qc_dgrams_retransmit(), if they
could not be sent for any reason, they could remain attached to a local list to
qc_dgrams_retransmit() and trigger a crash with libasan when releasing the
original frames they were duplicated from.

To fix this, always release the frames which could not be sent during
retransmissions calling qc_free_frm_list() where needed.

Must be backported as far as 2.6.
chipitsine pushed a commit that referenced this pull request Jan 7, 2024
This bug could be reproduced loading several certificated from "bind" line:
with "server_ocsp.pem" as argument to "crt" setting and updating
the CDSA certificate with the RSA as follows:

echo -e "set ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa \
	     <<\n$(cat reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.rsa)\n" | socat - /tmp/stats
followed by an "commit ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa"
command. This could be detected by libasan as follows:

=================================================================
==507223==ERROR: AddressSanitizer: attempting double-free on 0x60200007afb0 in thread T3:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)
    #2 0x7fabc6af54e9 in ossl_asn1_primitive_free (/opt/quictls/lib/libcrypto.so.81.3+0xe14e9)
    #3 0x7fabc6af5960 in ossl_asn1_template_free (/opt/quictls/lib/libcrypto.so.81.3+0xe1960)
    #4 0x7fabc6af569f in ossl_asn1_item_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xe169f)
    #5 0x7fabc6af58a4 in ASN1_item_free (/opt/quictls/lib/libcrypto.so.81.3+0xe18a4)
    #6 0x46a159 in ssl_sock_free_cert_key_and_chain_contents src/ssl_ckch.c:723
    #7 0x46aa92 in ckch_store_free src/ssl_ckch.c:869
    #8 0x4704ad in cli_release_commit_cert src/ssl_ckch.c:1981
    #9 0x962e83 in cli_io_handler src/cli.c:1140
    #10 0xc1edff in task_run_applet src/applet.c:454
    #11 0xaf8be9 in run_tasks_from_lists src/task.c:634
    #12 0xafa2ed in process_runnable_tasks src/task.c:876
    #13 0xa23c72 in run_poll_loop src/haproxy.c:3024
    #14 0xa24aa3 in run_thread_poll_loop src/haproxy.c:3226
    #15 0x7fabc69e7ea6 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7ea6)
    #16 0x7fabc6907a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x60200007afb0 is located 0 bytes inside of 3-byte region [0x60200007afb0,0x60200007afb3)
freed by thread T3 here:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)

previously allocated by thread T2 here:
    #0 0x7fabc6fb573f in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5473f)
    #1 0x7fabc6ae8d77 in ASN1_STRING_set (/opt/quictls/lib/libcrypto.so.81.3+0xd4d77)

Thread T3 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0xc04f36 in setup_extra_threads src/thread.c:252
    #2 0xa2761f in main src/haproxy.c:3917
    #3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

Thread T2 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0xc04f36 in setup_extra_threads src/thread.c:252
    #2 0xa2761f in main src/haproxy.c:3917
    #3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

SUMMARY: AddressSanitizer: double-free ??:0 __interceptor_free
==507223==ABORTING
Aborted

The OCSP CID stored in the impacted ckch data were freed but not reset to NULL,
leading to a subsequent double free.

Must be backported to 2.8.
chipitsine pushed a commit that referenced this pull request Mar 27, 2025
A BUG_ON() is present in qcc_send_stream() to ensure that emission is
never performed with a stream already closed locally. However, this
function is also used for RESET_STREAM/STOP_SENDING emission. No
protection exists to ensure that RS/SS is not scheduled after stream
local closure, which would result in this BUG_ON() crash.

This crash can be triggered with the following QUIC client sequence :
1. SS is emitted to open a new stream. QUIC-MUX schedules a RS emission
   by and the stream is locally closed.
2. An invalid HTTP/3 request is sent on the same stream, for example
   with duplicated pseudo-headers. The objective is to ensure
   qcc_abort_stream_read() is called after stream closure, which results
   in the following backtrace.

 0x000055555566a620 in qcc_send_stream (qcs=0x7ffff0061420, urg=1, count=0) at src/mux_quic.c:1633
 1633            BUG_ON(qcs_is_close_local(qcs));
 [ ## gdb ## ] bt
 #0  0x000055555566a620 in qcc_send_stream (qcs=0x7ffff0061420, urg=1, count=0) at src/mux_quic.c:1633
 #1  0x000055555566a921 in qcc_abort_stream_read (qcs=0x7ffff0061420) at src/mux_quic.c:1658
 #2  0x0000555555685426 in h3_rcv_buf (qcs=0x7ffff0061420, b=0x7ffff748d3f0, fin=0) at src/h3.c:1454
 #3  0x0000555555668a67 in qcc_decode_qcs (qcc=0x7ffff0049eb0, qcs=0x7ffff0061420) at src/mux_quic.c:1315
 #4  0x000055555566c76e in qcc_recv (qcc=0x7ffff0049eb0, id=12, len=0, offset=23, fin=0 '\000',
     data=0x7fffe0049c1c "\366\r,\230\205\354\234\301;\2563\335\037k\306\334\037\260", <incomplete sequence \323>) at src/mux_quic.c:1901
 #5  0x0000555555692551 in qc_handle_strm_frm (pkt=0x7fffe00484b0, strm_frm=0x7ffff00539e0, qc=0x7fffe0049220, fin=0 '\000') at src/quic_rx.c:635
 #6  0x0000555555694530 in qc_parse_pkt_frms (qc=0x7fffe0049220, pkt=0x7fffe00484b0, qel=0x7fffe0075fc0) at src/quic_rx.c:980
 #7  0x0000555555696c7a in qc_treat_rx_pkts (qc=0x7fffe0049220) at src/quic_rx.c:1324
 #8  0x00005555556b781b in quic_conn_app_io_cb (t=0x7fffe0037f20, context=0x7fffe0049220, state=49232) at src/quic_conn.c:601
 #9  0x0000555555d53788 in run_tasks_from_lists (budgets=0x7ffff748e2b0) at src/task.c:603
 #10 0x0000555555d541ae in process_runnable_tasks () at src/task.c:886
 #11 0x00005555559c39e9 in run_poll_loop () at src/haproxy.c:2858
 #12 0x00005555559c41ea in run_thread_poll_loop (data=0x55555629fb40 <ha_thread_info+64>) at src/haproxy.c:3075

The proper solution is to not execute this BUG_ON() for RS/SS emission.
Indeed, it is valid and can be useful to emit these frames, even after
stream local closure.

To implement this, qcc_send_stream() has been rewritten as a mere
wrapper function around the new internal _qcc_send_stream(). The latter
is used only by QMUX for STREAM, RS and SS emission. Application layer
continue to use the original function for STREAM emission, with the
BUG_ON() still in place there.

This must be backported up to 2.8.
chipitsine pushed a commit that referenced this pull request Apr 17, 2025
…ll()

In the following trace trying to abuse the watchdog from the CLI's
"debug dev loop" command running in parallel to "show threads" loops,
it's clear that some re-entrance may happen in ha_thread_dump_fill().

A first minimal fix consists in using a test-and-set on the flag
indicating that the function is currently dumping threads, so that
the one from the signal just returns. However the caller should be
made more reliable to serialize all of this, that's for future
work.

Here's an example capture of 7 threads stuck waiting for each other:
  (gdb) bt
  #0  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #1  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #2  0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402
  #3  ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384
  #4  0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840
  #5  0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156
  #6  <signal handler called>
  #7  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #8  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #9  0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426
  #10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384
  #11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548
  #12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176
  #13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920
  #14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644
  #15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886
  #16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858
  #17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075
  #18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665
chipitsine pushed a commit that referenced this pull request Sep 7, 2025
When an SNI is set on a QUIC server line, ssl_sock_set_servername() is called
from connect_server() (backend.c). This leads some BUG_ON() to be triggered
because the CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS were not set. This must
be done into the ->init() xprt callback. This patch move the flags settings
from ->start() to ->init() callback.

Indeed, connect_server() calls these functions in this order:

   ->init(),
   ssl_sock_set_servername() # => crash if CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS not set
   ->start()

Furthermore ssl_sock_set_servername() has a side effect to reset the SSL_SESSION
object (attached to SSL object) calling SSL_set_session(), leading to crashes as follows:

 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
 Core was generated by `./haproxy -f quic_srv.cfg'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20)
     at ssl/statem/statem_clnt.c:1624
 1624            if (s->session->session_id_length > 0) {
 [Current thread is 1 (Thread 0x7fc364e53dc0 (LWP 35514))]
 (gdb) bt
 #0  tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20)
     at ssl/statem/statem_clnt.c:1624
 #1  0x00007fc36540fba4 in ossl_statem_client_process_message (s=0x560c259733b0,
     pkt=0x7fffac239f20) at ssl/statem/statem_clnt.c:1042
 #2  0x00007fc36540d028 in read_state_machine (s=0x560c259733b0) at ssl/statem/statem.c:646
 #3  0x00007fc36540ca70 in state_machine (s=0x560c259733b0, server=0)
     at ssl/statem/statem.c:439
 #4  0x00007fc36540c576 in ossl_statem_connect (s=0x560c259733b0) at ssl/statem/statem.c:250
 #5  0x00007fc3653f1698 in SSL_do_handshake (s=0x560c259733b0) at ssl/ssl_lib.c:3835
 #6  0x0000560c22620327 in qc_ssl_do_hanshake (qc=qc@entry=0x560c25961f60,
     ctx=ctx@entry=0x560c25963020) at src/quic_ssl.c:863
 #7  0x0000560c226210be in qc_ssl_provide_quic_data (len=90, data=<optimized out>,
     ctx=0x560c25963020, level=ssl_encryption_initial, ncbuf=0x560c2588bb18)
     at src/quic_ssl.c:1071
 #8  qc_ssl_provide_all_quic_data (qc=qc@entry=0x560c25961f60, ctx=0x560c25963020)
     at src/quic_ssl.c:1123
 #9  0x0000560c2260ca5f in quic_conn_io_cb (t=0x560c25962f80, context=0x560c25961f60,
     state=<optimized out>) at src/quic_conn.c:791
 #10 0x0000560c228255ed in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:648
 #11 0x0000560c22825f7a in process_runnable_tasks () at src/task.c:889
 #12 0x0000560c22793dc7 in run_poll_loop () at src/haproxy.c:2836
 #13 0x0000560c22794481 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3056
 #14 0x0000560c2259082d in main (argc=<optimized out>, argv=<optimized out>)
     at src/haproxy.c:3667

<s> is the SSL object, and <s->session> is the SSL_SESSION object.

For the client, this is the first call do SSL_do_handshake() which initializes this
SSL_SESSION object from ->init() xpt callback. Then it is reset by
ssl_sock_set_servername(), then tls_process_server_hello() TLS stack is called with
NULL value for s->session when receiving the ServerHello TLS message.

To fix this, simply move the first call to SSL_do_handshake to ->start xprt call
back (qc_xprt_start()).

No need to backport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants