Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE and dtls_srtp fails with dtls already accepted #925

Closed
sreimers opened this issue Feb 23, 2020 · 21 comments
Closed

ICE and dtls_srtp fails with dtls already accepted #925

sreimers opened this issue Feb 23, 2020 · 21 comments

Comments

@sreimers
Copy link
Member

sreimers commented Feb 23, 2020

Hi Alfred,

i just tested ICE with the current master (9180775) branch.

Do you have any idea why connection fails with:

connchk: send_req: CONNCHECK already Pending!
stunsrv: triggered check failed
icecomp: {audio.1} set_selected: invalid state Failed 
dtls_srtp: incoming DTLS connect from 95.179.254.217:53038
dtls_srtp: dtls already accepted (peer = 144.76.152.248:44068)

Full logs (both sides): https://gist.github.com/sreimers/10c367e146b6f026359713cc9d0db501

Account config looks like this:

<sip:XXX@studio.link;transport=tcp>;auth_pass=xxx;stunserver="turn:stun.studio.link";medianat=ice;mediaenc=dtls_srtp;stunuser=xxx;stunpass=xxx
@sreimers
Copy link
Member Author

sreimers commented Feb 24, 2020

Maybe related: creytiv/re#238
Must be retested from my side.

Update: Sadly this was not enough to fix the error. Maybe there is still more todo.

@alfredh
Copy link
Collaborator

alfredh commented Feb 26, 2020

thanks for the report, I will have a look ..

@alfredh
Copy link
Collaborator

alfredh commented Feb 27, 2020

do you have a test account on your TURN server, that I can use for testing?
you can send me the credentials on email.

@sreimers
Copy link
Member Author

Sure I sent you a description to get login data. For testing you can use also:
echo@studio.link -> medianat ICE
echo1@studio.link -> medianat TURN

@alfredh
Copy link
Collaborator

alfredh commented Feb 28, 2020

I noticed this:

 Valid list:  (3)
     {comp=1}     Failed { VN} eth0:srflx:144.76.152.248:44068 <---> relay:95.179.254.217:61984
     {comp=1}     Failed { V } eth0:host:10.10.10.112:44068 <---> relay:95.179.254.217:61984
  *  {comp=1}  Succeeded {DV } eth0:relay:95.179.254.217:53038 <---> relay:95.179.254.217:61984

the first 2 pairs have inconsistent state; "Failed" and on the Valid list.

@sreimers
Copy link
Member Author

Hm. Looks like a race condition? Any idea how I can debug? One client is behind a normal NAT and the other behind an LTE carrier-grade NAT. Would it be possible to define, like in WEBRTC, that only relay candidates are used? At first look I did not find an easy way to disable other candidates.

@alfredh
Copy link
Collaborator

alfredh commented Feb 29, 2020

I made a test-call from local baresip (behind NAT) to echo@studio.link

>                             echo
ua: using best effort AF: af=AF_INET
call: alloc with params laddr=10.0.1.10, af=AF_INET, use_rtp=1
ice: new session with TURN-server at stun.studio.link (username=turn200300)
stream: audio: starting mediaenc 'dtls_srtp' (wait_secure=1)
call: use_video=0
call: connecting to 'sip:echo@studio.link;transport=tcp'..
ice: resolved TURN-server to address 95.179.254.217:3478
audio: added interface: en0:10.0.1.10 (local prio 10)
ice: relay gathered for comp 1 (0 OK)
ice: all components gathered.
audio: comp1 setting local: 95.179.254.217:53942
ice: audio: Default local candidates: 95.179.254.217:53942 / ?
call: media-nat 'ice' established/gathered
call: SIP Progress: 100 trying -- your call is important to us (/)
call: SIP Progress: 180 Ringing (/)
call: got SDP answer (937 bytes)
call: update media
stream: update 'audio'
audio: RTP/RTCP multiplexing enabled
{      audio.1} Set default remote candidate: relay:95.179.254.217:63361
{      audio.1} Add TURN Channel to peer 95.179.254.217:63361
chklist: audio: pruned candidate pairs: 3
{      audio. } starting connectivity checks with 6 candidate pairs
{      audio.1}  host <--->  host  FSM:      Frozen ===> Waiting   
{      audio.1}  host <---> srflx  FSM:      Frozen ===> Waiting   
{      audio.1}  host <---> relay  FSM:      Frozen ===> Waiting   
{      audio.1} relay <--->  host  FSM:      Frozen ===> Waiting   
{      audio.1} relay <---> srflx  FSM:      Frozen ===> Waiting   
{      audio.1} relay <---> relay  FSM:      Frozen ===> Waiting   
call: stream start (active=1)
6fc65bce92@studio.link: Call established: sip:echo@studio.link;transport=tcp
{      audio.1}  host <--->  host  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:host:10.0.1.10:42268 ---> host:10.10.10.112:9884 (InProgress)  
{      audio.1} Rx en0:host:10.0.1.10:42268 <--- host:10.10.10.112:9884 '0 ' (Operation timed out)
{      audio.1}  host <--->  host  FSM:  InProgress ===> Failed    
{      audio.1}  host <---> srflx  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:host:10.0.1.10:42268 ---> srflx:144.76.152.248:9884 (InProgress)  
{      audio.1} Rx Binding Request from 144.76.152.248:9884 via Socket (candpair=InProgress) 
{      audio.1} Rx en0:host:10.0.1.10:42268 <--- srflx:144.76.152.248:9884 '0 OK'
{      audio.1} New local candidate for mapped 84.208.61.36:39825
{      audio.1}  host <---> srflx  FSM:  InProgress ===> Succeeded 
{      audio.1}  host <---> relay  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:host:10.0.1.10:42268 ---> relay:95.179.254.217:63361 (InProgress)  
{      audio.1} Rx en0:host:10.0.1.10:42268 <--- relay:95.179.254.217:63361 '0 ' (Operation timed out)
{      audio.1}  host <---> relay  FSM:  InProgress ===> Failed    
{      audio.1} relay <--->  host  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:relay:95.179.254.217:53942 ---> host:10.10.10.112:9884 (InProgress)  
{      audio.1} Rx en0:relay:95.179.254.217:53942 <--- host:10.10.10.112:9884 '0 ' (Operation timed out)
{      audio.1} relay <--->  host  FSM:  InProgress ===> Failed    
{      audio.1} relay <---> srflx  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:relay:95.179.254.217:53942 ---> srflx:144.76.152.248:9884 (InProgress)  
{      audio.1} Tx en0:host:10.0.1.10:42268 ---> relay:95.179.254.217:63361 (Failed)  [Trigged]
{      audio.1} Rx Binding Request from 95.179.254.217:63361 via Socket (candpair=Failed) 
{      audio.1} Rx Binding Request from 144.76.152.248:9884 via Tunnel (candpair=InProgress) 
{      audio.1} Rx en0:host:10.0.1.10:42268 <--- relay:95.179.254.217:63361 '0 OK'
{      audio.1} New local candidate for mapped 84.208.61.36:39825
icecomp: {audio.1} set_selected: invalid state Failed
{      audio.1} relay <---> relay  FSM:     Waiting ===> InProgress
{      audio.1} Tx en0:relay:95.179.254.217:53942 ---> relay:95.179.254.217:63361 (InProgress)  
{      audio.1} Rx en0:relay:95.179.254.217:53942 <--- srflx:144.76.152.248:9884 '0 OK'
{      audio.1} relay <---> srflx  FSM:  InProgress ===> Succeeded 
{      audio.1} Rx Binding Request from 95.179.254.217:63361 via Tunnel (candpair=InProgress) 
{      audio.1} Rx en0:relay:95.179.254.217:53942 <--- relay:95.179.254.217:63361 '0 OK'
{      audio.1} relay <---> relay  FSM:  InProgress ===> Succeeded 
{      audio.1} Tx en0:srflx:84.208.61.36:39825 ---> srflx:144.76.152.248:9884 (Succeeded) [USE] [Trigged]
ice: audio: connectivity check is complete (update=1)
audio: Dumping media state: ----- ICE Media <audio> -----
 local_mode=Full, remote_mode=Full, local_role=Controlling
 local_ufrag="fWYleGq" local_pwd="9fKCFGoxQSrw3hfuH1rdc8PXWm2EedL"
 Components: (1)
  id=1 ldef=95.179.254.217:53942 rdef=95.179.254.217:63361 concluded=1
 Local Candidates:  (3)
  {1} fnd=0a00010a prio=7e000aff en0:host:10.0.1.10:42268
  {1} fnd=5fb3feda prio=000000ff en0:relay:95.179.254.217:53942 (rel-addr=10.0.1.10:42268)
  {1} fnd=54d03d25 prio=640000ff en0:srflx:84.208.61.36:39825 (rel-addr=10.0.1.10:42268)
 Remote Candidates:  (3)
  {1} fnd=0a0a0a70 prio=7e000aff host:10.10.10.112:9884
  {1} fnd=5fb3feda prio=000000ff relay:95.179.254.217:63361 (rel-addr=10.10.10.112:9884)
  {1} fnd=904c98f9 prio=640000ff srflx:144.76.152.248:9884 (rel-addr=10.10.10.112:9884)
 Check list: [state=Completed] (2)
     {comp=1}     Failed {   } en0:host:10.0.1.10:42268 <---> host:10.10.10.112:9884 (Operation timed out)
     {comp=1}     Failed {   } en0:relay:95.179.254.217:53942 <---> host:10.10.10.112:9884 (Operation timed out)
 Valid list:  (6)
  *  {comp=1}  Succeeded { VN} en0:srflx:84.208.61.36:39825 <---> srflx:144.76.152.248:9884
     {comp=1}  Succeeded { V } en0:host:10.0.1.10:42268 <---> srflx:144.76.152.248:9884
     {comp=1}     Failed { VN} en0:srflx:84.208.61.36:39825 <---> relay:95.179.254.217:63361
     {comp=1}     Failed { V } en0:host:10.0.1.10:42268 <---> relay:95.179.254.217:63361
     {comp=1}  Succeeded { VN} en0:relay:95.179.254.217:53942 <---> srflx:144.76.152.248:9884
     {comp=1}  Succeeded {DVN} en0:relay:95.179.254.217:53942 <---> relay:95.179.254.217:63361
STUN debug:
STUN client transactions: (1)
 method=Binding tid=9cb10b95c6c5fe62f57db2f4 rto=100ms tmr=100 n=1 interval=100

audio: comp1 setting local: 84.208.61.36:39825
stream: mnat 'ice' connected: raddr 144.76.152.248:9884 
stream: audio: starting mediaenc 'dtls_srtp' (wait_secure=1)
dtls_srtp: media=audio -- start DTLS server
dtls_srtp: component start: RTP [raddr=144.76.152.248:9884]
ice: audio: sending Re-INVITE with updated default candidates
call: media-nat 'ice' established/gathered
call: medianat established -- sending Re-INVITE
call: modify
{      audio.1} Rx Binding Request from 144.76.152.248:9884 via Socket (candpair=Succeeded) [USE]
dtls_srtp: incoming DTLS connect from 95.179.254.217:63361
call: got SDP answer (1055 bytes)
call: update media
stream: update 'audio'
stream: audio: starting mediaenc 'dtls_srtp' (wait_secure=1)
{      audio.1} Set default remote candidate: srflx:144.76.152.248:9884
{      audio.1} Add TURN Channel to peer 144.76.152.248:9884

we should first find this error:

icecomp: {audio.1} set_selected: invalid state Failed

it is a matter of checking all pairs, and decide if a pair should be valid or not.

in order to reduce the complexity, you can try to use STUN instead of TURN.
this would give 2 candidates instead of 3.

@sreimers
Copy link
Member Author

sreimers commented Mar 1, 2020

Ok i had to switch the echo account to production "turn" only. But we can use now:

echo1@studio.link - ICE with turn
echo2@studio.link - ICE with stun only

@alfredh
Copy link
Collaborator

alfredh commented Mar 6, 2020

I just want to add that the ICE stack in libre has many problems.
Some of them are fixed in librew trice, but not merged back.

One of the problems is that the embedded STUN servers
changes the state of the Candidate Pairs:

https://github.com/creytiv/re/blob/master/src/ice/stunsrv.c#L161

I think this is wrong.

You can try to disable this line and see if it makes a difference.

Note: we should fix all these ICE errors, but it requires quite a lot of testing ... :)

@sreimers
Copy link
Member Author

sreimers commented Mar 7, 2020

Is it useful to optimize/repair the ICE in re, or should I take a look at Librew and help/test to merge this work?

Is this roadmap https://github.com/creytiv/re/wiki/ICE-Stacks-merge still up-to-date?

@alfredh
Copy link
Collaborator

alfredh commented Mar 7, 2020

it is difficult to say .. I think we have to do it step by step, and make alot
of testing for each step ..

@alfredh
Copy link
Collaborator

alfredh commented Apr 30, 2020

@sreimers I sent you a patch for testing ..

@alfredh
Copy link
Collaborator

alfredh commented May 5, 2020

I did some more testing of ICE between Chrome and Baresip-Webrtc.
here are the results:

ICE Testing
-----------

client:   Chrome 81.0.4044.138
server:   baresip-webrtc
baresip:  bd01d73d2bab3c021845b5a0ef69cd1f47d1780b
libre:    ea0509d6fc9a8beb366dde731885ff7d9b4d2d8b+patch
network:  LAN



Client:    Server:    Selected:     Status:
--------------------------------------------
host       host       host->host    OK

--------------------------------------------
host       host       host->host    OK
srflx      srflx

--------------------------------------------
relay      host       relay->host   OK
           srflx

--------------------------------------------

@GGGO
Copy link
Contributor

GGGO commented May 6, 2020

@alfredh, could you share the patch ? I would like to help on this.

@alfredh
Copy link
Collaborator

alfredh commented May 6, 2020

here is the patch:

https://gist.github.com/alfredh/78b4adb09ed796e0a2b972dc51b2e0e1

thanks for helping :)

@GGGO
Copy link
Contributor

GGGO commented May 8, 2020

Love it, the patch is simple and make ICE connection working for me.

Before the patch I couldn't use baresip-webrtc between local machines without hacking icem_conncheck_start in src/ice/connchk.c like this:

/* add some delay, to wait for call to be 'established' */
- tmr_start(&icem->tmr_pace, 10, pace_timeout, icem);
+ tmr_start(&icem->tmr_pace, 500, pace_timeout, icem);

This patch fixes the timing issue and I don't need the hack anymore.

@alfredh
Copy link
Collaborator

alfredh commented May 8, 2020

great, thanks for testing.

@sreimers please let me know if you are planning to test this patch.

@sreimers
Copy link
Member Author

sreimers commented May 8, 2020

Sure (maybe this weekend).

@sreimers
Copy link
Member Author

First tests looks great! Many thanks, i will test a bit more. Only if I call a turn only peer:

ice: ICE not supported by peer, fallback to TURN

So the recognition works, but no audio connection is established. But that could be a problem on my end.

@alfredh
Copy link
Collaborator

alfredh commented May 10, 2020

great, thanks for reporting and testing.

The ICE saga is not over yet, the code still needs improvements.
It is just that the protocol and code is very sensitive, so any change
has to be tested quite well.

I will close this for now. If you find more ICE bugs, please create a new issue.

@alfredh alfredh closed this as completed May 10, 2020
@GGGO
Copy link
Contributor

GGGO commented May 10, 2020

@alfredh, how is librew compared to libre ? I mean what is wrong with librew ?

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this issue Nov 28, 2020
= libre Changelog

== [v1.1.0] - 2020-10-04

=== Added

- tls: functions to get the certificate issuer and subject [#18]
- uri: Added path field to struct uri and its decode to uri_decode [#22]
- tcp: add tcp_connect_bind [#24]
- http: support bind to laddr in http_request [#24]
- sipreg: support Cisco REGISTER keep-alives [#19]
- sip: websocket support [#26]

=== Fixed

- tls/openssl: fix X509_NAME win32/wincrypt.h conflict
- dns: listen on IPv4 and IPv6 socket [#27]
- main: fix/optimize windows file descriptors [#25]

=== Contributors (many thanks)

- Alfred E. Heggestad
- Christian Spielberger
- Christoph Huber
- Franz Auernigg
- Juha Heinanen
- Sebastian Reimers

== [v1.0.0] - 2020-09-08

=== Added

- sip: add trace
- sdp: sdp_media_disabled API function [#2]
- tls: add tls_set_selfsigned_rsa [#6]
- tls: add functions to verify server cert, purpose and hostname [#10]
- http: client should set SNI [#10]
- http: client should use tls functions to verify server certs, purpose
  and hostname [#10]
- sipreg: add proxy expires field and get function [#13]
- sipreg: make re-register interval configurable [#13]

=== Changed

- debian: Automatic cleanup after building debian package

=== Fixed

- Set SDK path (SYSROOT) using xcrun (fix building on macOS 10.14)
- tcp: close socket on windows if connection is aborted or reset [#1]
- rtmp: Fix URL path parsing (creytiv#245)
- ice: various fixes [baresip/baresip#925]
- openssl/tls: replace deprecated openssl 1.1.0 functions [#5]

=== Contributors (many thanks)

- Alfred E. Heggestad
- Christian Spielberger
- Christoph Huber
- Franz Auernigg
- juha-h
- Juha Heinanen
- Richard Aas
- Sebastian Reimers

[#25]: baresip/re#25
[#27]: baresip/re#27
[#26]: baresip/re#26
[#19]: baresip/re#19
[#24]: baresip/re#24
[#22]: baresip/re#22
[#18]: baresip/re#18
[#13]: baresip/re#13
[#10]: baresip/re#10
[#6]: baresip/re#6
[#5]: baresip/re#5
[#2]: baresip/re#2
[#1]: baresip/re#1

[v1.0.0]: baresip/re@v0.6.1...v1.0.0
[v1.1.0]: baresip/re@v1.0.0...v1.1.0
[Unreleased]: baresip/re@v1.1.0...HEAD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants