Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: res_pjsip: Crash when looking up transport state in use #509

Closed
1 task done
learbia opened this issue Dec 21, 2023 · 23 comments · Fixed by #523
Closed
1 task done

[bug]: res_pjsip: Crash when looking up transport state in use #509

learbia opened this issue Dec 21, 2023 · 23 comments · Fixed by #523
Assignees
Labels
bug support-level-core Functionality with core support level

Comments

@learbia
Copy link

learbia commented Dec 21, 2023

Severity

Major

Versions

18.20.2

Components/Modules

pjsip

Operating Environment

The asterisk 18.20.2 have a crash

Thread 1 (Thread 0x7f8a9919c700 (LWP 11551)):
#0 0x00007f8c64be46fe in __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:248
#1 0x00007f8c65808082 in pj_memcmp (size=15, buf2=, buf1=) at ../include/pj/string.h:825

Frequency of Occurrence

Constant

Issue Description

After upgrade asterisk to last version , we have many crashes a day.

Relevant log output

No response

Asterisk Issue Guidelines

  • Yes, I have read the Asterisk Issue Guidelines
@jcolp
Copy link
Member

jcolp commented Dec 21, 2023

This is not a full backtrace. The instructions for getting a backtrace[1] need to be followed to ensure all information is provided.

[1] https://docs.asterisk.org/Development/Debugging/Getting-a-Backtrace/?h=backtrace

@jcolp jcolp added the support-level-core Functionality with core support level label Dec 21, 2023
@learbia
Copy link
Author

learbia commented Dec 21, 2023

@learbia
Copy link
Author

learbia commented Dec 21, 2023

This is not a full backtrace. The instructions for getting a backtrace[1] need to be followed to ensure all information is provided.

[1] https://docs.asterisk.org/Development/Debugging/Getting-a-Backtrace/?h=backtrace

Done

@jcolp
Copy link
Member

jcolp commented Dec 21, 2023

What version was previously in use? The crash is nowhere near any code that was touched for the security releases. It also doesn't appear as though Asterisk was built with DONT_OPTIMIZE so parts of it are optimized out. If it is crashing multiple times then a second backtrace with DONT_OPTIMIZE would confirm whether it is crashing in the same spot.

@learbia
Copy link
Author

learbia commented Dec 21, 2023

The latest version is 18.19.0.
Yesterday, I upgraded to 18.20.1, but experienced many crashes. Therefore, today we installed version 18.20.2.

The crash is the same because I obtain 2 twice, and the thread1 has the same reference.

@jcolp
Copy link
Member

jcolp commented Dec 21, 2023

Okay, so you skipped the normal 18.20.0 release where this code was moved around and other changes occurred, so unrelated to the security release.

@jcolp jcolp changed the title [bug]: Crash Asterisk 18.20.2 - pj_strcmp [bug]: res_pjsip: Crash when looking up transport state in use Dec 21, 2023
@jcolp
Copy link
Member

jcolp commented Dec 21, 2023

Making note that potentially related to change from #71

@jcolp
Copy link
Member

jcolp commented Dec 27, 2023

Can you provide an Asterisk log as well leading up to the crash to see the state and progression of things?

@learbia
Copy link
Author

learbia commented Jan 3, 2024

Can you provide an Asterisk log as well leading up to the crash to see the state and progression of things?

How can get it ?

@jcolp
Copy link
Member

jcolp commented Jan 3, 2024

@learbia
Copy link
Author

learbia commented Jan 3, 2024

I will try to do it, my problem is a I have 6000 endpoints the traffic is very high

@jcolp
Copy link
Member

jcolp commented Jan 3, 2024

Okay, that itself is an important data point as well. What kind of transport are they all using?

@learbia
Copy link
Author

learbia commented Jan 3, 2024

Okay, that itself is an important data point as well. What kind of transport are they all using?

Transport: <TransportId........> <BindAddress....................>

Transport: transport-tcp tcp 0 0 0.0.0.0:7048
Transport: transport-udp udp 0 0 0.0.0.0:7048
Transport: transport-ws ws 0 0 0.0.0.0:7048
Transport: transport-wss wss 0 0 0.0.0.0:7048

@learbia
Copy link
Author

learbia commented Jan 3, 2024

we have Many endpoints with wss

@GotoThor
Copy link

GotoThor commented Jan 5, 2024

So I expect that Asterisk 20 is also affected, right?

@jcolp
Copy link
Member

jcolp commented Jan 5, 2024

The same change I linked went into 20 as well, yes, and the backtrace you provided on your issue matches this once it reaches the sending part.

@GotoThor
Copy link

GotoThor commented Jan 5, 2024

For now let me add the following: we have no wss but only udp/tcp on port 5060. Logging is enabled now (as commented above) and I will provide it as soon as possible (when system crashes again).

@jcolp
Copy link
Member

jcolp commented Jan 5, 2024

Is the TCP in use? How many concurrent connections? Do they go up/down?

@GotoThor
Copy link

GotoThor commented Jan 5, 2024

No, usually we use UDP. It's a very low traffic machine with about 20 calls per day. It happens all 1-5 days. I setup sipp from an external host and fired over 1.000 Calls within 15 minutes but everything is fine. Hard to reproduce.

@mikepultz
Copy link

I'm having a similar issue in 21.0.2; backtrace from the core dump file:

#0  0x000014d2f2babe1e in __memcmp_avx2_movbe () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x14d2a8bbb700 (LWP 7350))]
Missing separate debuginfos, use: debuginfo-install fonolo_asterisk-21.0.2-1.amzn2.x86_64
(gdb) bt
#0  0x000014d2f2babe1e in __memcmp_avx2_movbe () from /lib64/libc.so.6
#1  0x000014d2f53b3fd2 in pj_memcmp (size=9, buf2=<optimized out>, buf1=<optimized out>) at ../include/pj/string.h:825
#2  pj_strcmp (str1=<optimized out>, str2=str2@entry=0x14d2a8bba7d8) at ../include/pj/string_i.h:172
#3  0x000014d2addfc981 in find_transport_state_in_use (obj=0x37597d0, arg=0x14d2a8bba7c0, flags=<optimized out>) at res_pjsip.c:589
#4  0x000000000045fe4d in internal_ao2_traverse.constprop ()
#5  0x000014d2addfe721 in ast_sip_find_transport_state_in_use (details=details@entry=0x14d2a8bba7c0) at res_pjsip.c:604
#6  0x000014d2aab058d4 in process_nat (tdata=0x14d2b635c3f8) at res_pjsip_nat.c:335
#7  nat_on_tx_message (tdata=0x14d2b635c3f8) at res_pjsip_nat.c:400
#8  0x000014d2f5310f90 in endpt_on_tx_msg (endpt=<optimized out>, tdata=0x14d2b635c3f8) at ../src/pjsip/sip_endpoint.c:1115
#9  0x000014d2f53186d4 in pjsip_transport_send (tr=0x14d2e82020d8, tdata=tdata@entry=0x14d2b635c3f8, addr=addr@entry=0x14d2b635c5e8, addr_len=addr_len@entry=16, token=token@entry=0x14d2b62846b8, 
    cb=cb@entry=0x14d2f5312600 <stateless_send_transport_cb>) at ../src/pjsip/sip_transport.c:935
#10 0x000014d2f53127a9 in stateless_send_transport_cb (token=0x14d2b62846b8, tdata=0x14d2b635c3f8, sent=<optimized out>) at ../src/pjsip/sip_util.c:1276
#11 0x000014d2f5314995 in stateless_send_resolver_callback (addr=0x14d2f532f2d0 <dlg_update_routeset+176>, token=<optimized out>, status=<optimized out>) at ../src/pjsip/sip_util.c:1377
#12 pjsip_endpt_send_request_stateless (endpt=<optimized out>, tdata=tdata@entry=0x14d2b635c3f8, token=token@entry=0x0, cb=cb@entry=0x14d2f532f0f0 <send_ack_callback>) at ../src/pjsip/sip_util.c:1430
#13 0x000014d2f533076b in pjsip_dlg_send_request (dlg=0x14d2b4ee70a8, tdata=0x14d2b635c3f8, mod_data_id=mod_data_id@entry=-1, mod_data=mod_data@entry=0x0) at ../src/pjsip/sip_dialog.c:1369
#14 0x000014d2f52f311d in inv_send_ack (inv=0x14d2b5c5f448, e=<optimized out>, e=<optimized out>) at ../src/pjsip-ua/sip_inv.c:502
#15 0x000014d2f52f3236 in mod_inv_on_rx_response (rdata=0x14d2ed37d888) at ../src/pjsip-ua/sip_inv.c:715
#16 0x000014d2f5332053 in pjsip_dlg_on_rx_response (dlg=dlg@entry=0x14d2b4ee70a8, rdata=rdata@entry=0x14d2ed37d888) at ../src/pjsip/sip_dialog.c:2066
#17 0x000014d2f5332cd1 in mod_ua_on_rx_response (rdata=0x14d2ed37d888) at ../src/pjsip/sip_ua_layer.c:954
#18 0x000014d2f5311cf0 in pjsip_endpt_process_rx_data (endpt=0x34379b8, rdata=rdata@entry=0x14d2ed37d888, p=p@entry=0x14d2ae04f400 <param>, p_handled=p_handled@entry=0x14d2a8bbac1c)
    at ../src/pjsip/sip_endpoint.c:937
#19 0x000014d2ade1c912 in distribute (data=0x14d2ed37d888) at res_pjsip/pjsip_distributor.c:955
#20 0x00000000005a591e in ast_taskprocessor_execute ()
#21 0x00000000005ac34b in execute_tasks ()
#22 0x00000000005a591e in ast_taskprocessor_execute ()
#23 0x00000000005acca1 in worker_start ()
#24 0x00000000005b48b9 in dummy_start ()
#25 0x000014d2f4a6444b in start_thread () from /lib64/libpthread.so.0
#26 0x000014d2f2b5152f in clone () from /lib64/libc.so.6

I have a mix of UDP, TCP, & TLS endpoints- they're fairly busy servers (200+ concurrent), but it's only crashed twice in the last two days.

I'll see if i can get more logs (it's production, so I have to avoid customer details).

Mike

@zhangyoufu
Copy link

zhangyoufu commented Jan 6, 2024

I am using asterisk release 21.0.2. I ran into this issue when I try to call another softphone on the same server. The server is not under load and I can trigger SIGSEGV reliably.

In find_transport_state_in_use, the 2nd argument (&details->local_address) passed to pj_strcmp contains invalid ptr and slen. This bad struct ast_sip_request_transport_details was initialized by ast_sip_set_request_transport_details in process_nat. The logic inside ast_sip_set_request_transport_details does not guarantee initialization of local_address and local_port member (which is bad from my point of view). In my crash, I have tdata->tp_sel.type == PJSIP_TPSELECTOR_LISTENER, which triggered the bug.

Before #72, we have struct request_transport_details details = { 0, };.
After #72, we have struct ast_sip_request_transport_details details;, which is not zero-initialized unfortunately.

@jcolp
Copy link
Member

jcolp commented Jan 6, 2024

Nice catch! I had reviewed the difference but didn't catch that.

maximilianfridrich added a commit to maximilianfridrich/asterisk that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: asterisk#509
@maximilianfridrich
Copy link
Contributor

maximilianfridrich commented Jan 8, 2024

Thank you for the detailed analysis! Sorry for the oversight and the late reaction, I was out of office.

Should be fixed in #523.

maximilianfridrich added a commit to maximilianfridrich/asterisk that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: asterisk#509
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 8, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
(cherry picked from commit 85dd7ce)
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
(cherry picked from commit 3e069f3)
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
The ast_sip_request_transport_details must be zero initialized,
otherwise this could lead to a SEGV.

Resolves: #509
(cherry picked from commit 81188ad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug support-level-core Functionality with core support level
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants