SC FMS is stuck on infinity connection retries #4234

dincho · 2023-12-15T10:52:00Z

2023-12-15 10:49:34.131 [warning] <0.9386.2130>@aesc_fsm:noise_accept:4221 Noise accept failed with {exception,{eacces,443}}
2023-12-15 10:49:34.131 [error] <0.1767.0>@aesc_listeners:new_listener_:132 Cannot open State Channel listener on 443: eacces

When the remote FSM is n/a the local one keeps retrying forever, no backoff no stop.

The text was updated successfully, but these errors were encountered:

uwiger · 2023-12-15T14:05:52Z

What is the application?
It's the responder doing the accept, and if it fails (there is a default of 3 retries), it may propagate out to something that keeps trying to spawn a server-side responder.

Not saying that this is what happens, only that it would help with more context.

dincho · 2023-12-18T09:41:17Z

I don't have much details unfortunately other than the FSM was initialized with the wrong port (443) and I've seen this message looping forever. After restart it is gone

davidyuk · 2024-02-14T17:56:58Z

seems I reported the same issue in #4235
I can't use FSM on testnet and mainnet because of {"event":"died"}

mitchelli · 2024-03-28T17:06:01Z

I reproduced this issue with @davidyuk snippet by changing the port to 443 (a privileged port on Linux, doesn't have this issue on the Mac) and changing the role to responder.

17:52:34.621 [debug] No listener on port 443 yet - create one
17:52:34.621 [error] Cannot open State Channel listener on 443: eacces
17:52:34.621 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.621 [debug] No listener on port 443 yet - create one
17:52:34.621 [error] Cannot open State Channel listener on 443: eacces
17:52:34.621 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.621 [debug] No listener on port 443 yet - create one
17:52:34.622 [error] Cannot open State Channel listener on 443: eacces
17:52:34.622 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.630 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.631 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.631 [debug] No listener on port 443 yet - create one
17:52:34.631 [error] Cannot open State Channel listener on 443: eacces
17:52:34.632 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.632 [debug] No listener on port 443 yet - create one
17:52:34.632 [error] Cannot open State Channel listener on 443: eacces
17:52:34.632 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.632 [debug] No listener on port 443 yet - create one
17:52:34.632 [error] Cannot open State Channel listener on 443: eacces
17:52:34.632 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.657 [debug] No listener on port 443 yet - create one
17:52:34.657 [error] Cannot open State Channel listener on 443: eacces
17:52:34.657 [warning] Noise accept failed with {exception,{eacces,443}}
17:52:34.657 [debug] No listener on port 443 yet - create one
17:52:34.657 [error] Cannot open State Channel listener on 443: eacces
17:52:34.657 [warning] Noise accept failed with {exception,{eacces,443}}

I'll keep on digging.

Can reproduce on the Mac by using nc to listen on the requested port:

18:23:17.911 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.937 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.937 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.937 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.937 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.937 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.937 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.937 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.937 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.937 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.938 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.938 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.938 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.938 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.938 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.938 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.938 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.938 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
18:23:17.938 [error] Cannot open State Channel listener on 443: eaddrinuse
18:23:17.938 [warning] Noise accept failed with {exception,{eaddrinuse,443}}

davidyuk · 2024-03-29T04:05:42Z

My snippet expected to work fine on localhost, I have issues on public endpoints (wss://testnet.aeternity.io/channel and wss://mainnet.aeternity.io/channel)

mitchelli · 2024-03-31T09:19:13Z

I was reproducing the issue @dincho observed above. I am not sure if they are related, I will have a look at that next,

dincho · 2024-04-01T07:22:34Z

@mitchelli could you please double check if the socket configuration includes SO_REUSEADDR ?

dincho · 2024-04-01T07:24:30Z

In general the clients/server are expected to have long running open connections which should work fine, but in case they need to reconnect (fast) for some reason, it's probably too fast and the socket state is still BUSY (i.e. not properly closed, or the close timeout not reached).

mitchelli · 2024-04-01T08:48:57Z

I think the issue is that the server is trying to open a port for listening that it is not allowed to. On Linux, the user running the node doesn't have permission to listen on privileged ports. On Mac I reproduced it by running another process that listened on port that the node wanted to listen on before the node tried to listen. It could be that API is being used incorrectly. I don't know enough about state channels. I'm not sure why the client can tell the node to listen on a certain port.

dincho · 2024-04-02T09:35:37Z

I'm not sure why the client can tell the node to listen on a certain port.

Ah, now I remember!

This is how it works currently as the SC FSMs needs a communication channel. So, the initiator FSM tries to bind on the corresponding port send via the API (WS) call.

First of, the OP error (trying to bind/listen) on a given port should have a backoff time and max tries then die.
So the fix of this should be pretty clear.

However, the general issue (new ticket?) with that approach is that while this concept might be acceptable for local/testing/playgrounds it does not work for production systems. No sane administrator would allow an app to bind on any random ports, moreover controlled by and user/external API, furthermore this means a port range open in the network firewall, which is also no-go.

The only way this could work in production, is to actually remove host/port parameters (in the WS API) and make them configurable (server side). It must also work on single port (multiplexing), that is a single FSM/node should be able to accept N number of remote FSM Noise connections, regardless of the responders etc.

Currently there are controlling channel WS API (/channel) to the FSMs and port 3114 (FSM noise), knowing the actual FSM host is another issue I should try to solve somehow.

uwiger · 2024-04-03T04:50:18Z

I'm not sure why the client can tell the node to listen on a certain port.

It can't. But I think there is a bug in there.

The casino SC demo revealed a few weaknesses in the SC connection handling, and there was a PR (#4011) to address this. The PR wasn't well tested (the demo project was put on ice, I think), but eventually merged anyway. I later noticed that the SC Market demo also broke, but haven't had the time to debug it.

In the SC Market case, it may have something to do with using a timeout on the listener side and constantly restarting them. I don't really think there is any reason to use a timeout on the listeners, but if one does, it needs to play nice with supervision/restarts and of course log reporting.

Any yes, a SC responder needs to be able to multiplex acceptors on a single listen socket, which is harder than it sounds since it has to match a session with the appropriate responder, as well as ensure that reconnecting clients find the same responder as before, including potentially matching noise crypto keys.

mitchelli · 2024-04-08T14:22:56Z

First of, the OP error (trying to bind/listen) on a given port should have a backoff time and max tries then die. So the fix of this should be pretty clear.

There was a bug in the function clause which I fixed, now it tries three times and fails but the attempts are very close together. I am not sure if the behaviour is correct:


16:08:27.779 [error] Cannot open State Channel listener on 443: eaddrinuse
16:08:27.779 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
16:08:27.779 [error] Cannot open State Channel listener on 443: eaddrinuse
16:08:27.779 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
16:08:27.779 [error] Cannot open State Channel listener on 443: eaddrinuse
16:08:27.779 [warning] Noise accept failed with {exception,{eaddrinuse,443}}
16:08:27.779 [error] Failed with failed_spawning_noise, Initiator: <<154,122,127,162,221,11,106,64,140,166,149,255,64,126,209,240,176,59,34,163,60,214,74,1,14,146,205,98,247,71,45,16>>, Responder: <<206,167,173,228,112,201,249,157,157,78,64,8,128,168,111,29,73,187,68,75,98,241,26,158,187,100,187,207,235,115,254,243>>
16:08:27.780 [error] Failed to start noise session: Error = failed_spawning_noise
16:08:27.780 [error] CRASH REPORT Process <0.1953.0> with 0 neighbours exited with reason: failed_noise_session_start in gen_statem:init_result/8 line 1023
16:08:27.780 [info] Handler critical error: failed_noise_session_start

dincho added kind/bug Issues or PRs related to a bug area/statechannels Issues or PRs related to state channels labels Dec 15, 2023

dincho changed the title ~~SC FMS is stuff on infinity connection retries~~ SC FMS is stuck on infinity connection retries Dec 15, 2023

davidyuk mentioned this issue Feb 15, 2024

test: run state channel tests on testnet aeternity/aepp-sdk-js#1924

Draft

mitchelli self-assigned this Mar 28, 2024

mitchelli linked a pull request Apr 8, 2024 that will close this issue

Fix function clause so attempts are checked #4323

Merged

mitchelli closed this as completed in #4323 Apr 11, 2024

mitchelli mentioned this issue Apr 11, 2024

Allow of range of ports to be specified for use by SC FSMs #4327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SC FMS is stuck on infinity connection retries #4234

SC FMS is stuck on infinity connection retries #4234

dincho commented Dec 15, 2023

uwiger commented Dec 15, 2023

dincho commented Dec 18, 2023

davidyuk commented Feb 14, 2024 •

edited

mitchelli commented Mar 28, 2024 •

edited

davidyuk commented Mar 29, 2024

mitchelli commented Mar 31, 2024

dincho commented Apr 1, 2024

dincho commented Apr 1, 2024 •

edited

mitchelli commented Apr 1, 2024

dincho commented Apr 2, 2024

uwiger commented Apr 3, 2024

mitchelli commented Apr 8, 2024

SC FMS is stuck on infinity connection retries #4234

SC FMS is stuck on infinity connection retries #4234

Comments

dincho commented Dec 15, 2023

uwiger commented Dec 15, 2023

dincho commented Dec 18, 2023

davidyuk commented Feb 14, 2024 • edited

mitchelli commented Mar 28, 2024 • edited

davidyuk commented Mar 29, 2024

mitchelli commented Mar 31, 2024

dincho commented Apr 1, 2024

dincho commented Apr 1, 2024 • edited

mitchelli commented Apr 1, 2024

dincho commented Apr 2, 2024

uwiger commented Apr 3, 2024

mitchelli commented Apr 8, 2024

davidyuk commented Feb 14, 2024 •

edited

mitchelli commented Mar 28, 2024 •

edited

dincho commented Apr 1, 2024 •

edited