Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subflows are not recreated in the WAN after an internet outage. #3122

Closed
vempire-ghost opened this issue Jan 12, 2024 · 16 comments
Closed

Subflows are not recreated in the WAN after an internet outage. #3122

vempire-ghost opened this issue Jan 12, 2024 · 16 comments

Comments

@vempire-ghost
Copy link

Expected Behavior

After the internet is restored, the subflows of that WAN will all be recreated, and traffic resumes flowing.

Current Behavior

After the internet is restored, the subflows of currently active connections are never restored again.

Specifications

  • OpenMPTCProuter version: openmptcprouter v0.60beta1-5.4 r0+16862-170d9 snap from 05/01
  • OpenMPTCProuter VPS version: 5.4.207-mptcp 0.1029-test
  • OpenMPTCProuter VPS provider: GCP
  • OpenMPTCProuter platform: x86_64

I have been monitoring this issue for some time; whenever a WAN loses internet connectivity, upon its restoration, the active connections do not recreate the subflow as they should.
However, upon conducting some tests, I observed that when stopping the interface and then restarting it after the internet has returned, the subflows of all the WANs that were previously inactive are recreated, including those from other WANs.

Here are some screenshots.
image
image
image
image

As you can see, in this test, I disconnected the internet from WAN1 and WAN3, waited for some time, and then reconnected the internet. However, no data started flowing through them again.
However, at the moment when I went to the Interfaces, stopped WAN1, and then restarted it, the flow immediately resumed for both WAN1 and WAN3.
Apparently, when stop an interface and restart it, it activates some mechanism that recreates the subflows for it or creates new ones if the connection was established during the internet outage.
In OMR, there is an option to restart the interface if it is down. Is there a way to stop and restart an interface after a certain period of time once it has become active again?
Even if it involves using a script triggered by an event like 'Jan 12 22:53:42 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan3 (eth3) switched up,' where it waits for a few seconds and then executes commands to stop and restart that interface.

I would like to test if it's possible to automate this solution, but I lack the knowledge to create a script tied to an event like this. That's why I seek your assistance.

@vempire-ghost
Copy link
Author

I think I found the issue; this event is occurring too early.
Jan 13 01:42:17 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan3 (eth3) switched up
Jan 13 01:42:17 OpenMPTCProuter user.notice post-tracking-020-status: Reload MPTCP for eth3
Jan 13 01:42:17 OpenMPTCProuter user.notice MPTCP: Set eth3 to on

In the same second that the interface is detected as UP, it already reloads the mptcp, and perhaps there hasn't been enough time for the internet to be fully active, causing the subflows not to be recreated.
However, if about 15 seconds after the interface is detected as UP, I run the command 'multipath eth3 off,' the subflows are recreated. I repeated the test 5 times, and in all instances, the subflows were only recreated after the manual command.

The system reports these events when I enter the command above.
Jan 13 01:44:21 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Reload MPTCP config for eth3
Jan 13 01:44:22 OpenMPTCProuter user.notice MPTCP: Set eth3 to on
Jan 13 01:44:22 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Multipath eth3 (wan3) switched to on (from off)

How can I increase the time it takes for the tracker to trigger the 'multipath on' command in this case?

@vempire-ghost
Copy link
Author

I solved the problem by adding this configuration.
uci set omr-tracker.wan3=interface && uci set omr-tracker.wan3.script_alert_up='sleep 15 ; multipath eth3 on ; multipath eth3 off'

Now, whenever WAN3 loses and regains internet connectivity, the log displays this.
Jan 13 03:03:43 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan3 (eth3) switched up
Jan 13 03:04:49 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Reload MPTCP config for eth3
Jan 13 03:04:49 OpenMPTCProuter user.notice MPTCP: Set eth3 to on
Jan 13 03:04:49 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Multipath eth3 (wan3) switched to on (from off)

Even though I set a sleep of 15s, the log always shows that it takes about 60s to execute the command. However, now all subflows are consistently recreated correctly.

@vempire-ghost
Copy link
Author

vempire-ghost commented Jan 13, 2024

If this event of IP change occurs right after the wan is up, the script does not execute the 'multipath eth3 on; multipath eth3 off' command. Could you tell me why?
Jan 13 03:58:39 OpenMPTCProuter user.notice post-tracking-020-status: New public ip detected for wan3 (eth3): xx.239.xxx.xx(previous: xx.239.xx.xx)
Jan 13 03:58:39 OpenMPTCProuter user.notice post-tracking-020-status: Reload MPTCP for eth3
Jan 13 03:58:39 OpenMPTCProuter user.notice MPTCP: Set eth3 to on

There is anything I need to change in the scrip?

@vempire-ghost
Copy link
Author

vempire-ghost commented Jan 15, 2024

After more tests over the weekend, I came up with a script that, if executed by the OMR, always recreates broken connections when an interface comes back online. In case someone is facing the same issue or wants to ensure that established connections with broken subflows are recreated, this command for each WAN solved my problem.

uci set omr-tracker.wan1=interface
uci set omr-tracker.wan1.script_alert_up='sleep 15 ; multipath eth1 off ; sleep 1 ; multipath eth1 on

For each WAN, a script needs to be created according to the name of your WAN in the OMR network interface.

What I noticed is that multipath only recreates broken subflows when changed from off to on. So, if it's already on, it won't recreate them, or if it's changed from off to on too early, before there is connectivity on the WAN, it fails to recreate and doesn't attempt again.
Another issue is that if it is turned off and on too quickly, it also does not recreate; hence, the 1s delay between off and on ensures that it will recreate the subflows.

And if you are using OMR with redundant scheduler for resilience instead of aggregation, and want to ensure that your subflows are always recreated and healthy, adding this script to the cron ensures that.

*/1 * * * * multipath eth1 off ; sleep 1 ; multipath eth1 on ; multipath eth2 off ; sleep 1 ; multipath eth2 on ;multipath eth3 off ; sleep 1 ; multipath eth3 on ; multipath eth4 off ; sleep 1 ; multipath eth4 on

The loop time is at your discretion for how often you want the subflows to be recreated if necessary. I set it to 1 minute, but you can use any value you find suitable. I tested it all weekend gaming and monitoring my latency, and it didn't cause any issues. On the contrary, it made my connection even more stable.

Unfortunately, I couldn't test the new snapshots with the new OMR tracker because all the ones I tried couldn't even connect to the VPS. I will test again as new ones are compiled.

@Ysurac
Copy link
Owner

Ysurac commented Jan 15, 2024

Latest release should fix Shadowsocks-libev.
I have modified some part of OMR-Tracker and mptcp script, maybe this is enough to fix the issue.

@vempire-ghost
Copy link
Author

Latest release should fix Shadowsocks-libev. I have modified some part of OMR-Tracker and mptcp script, maybe this is enough to fix the issue.

I tried today's latest snapshot, but it's not connecting. It gives this error in the log.
Jan 15 22:59:00 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: no URL specified!
Jan 15 22:59:00 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: try 'curl --help' for more information
Jan 15 22:59:00 OpenMPTCProuter daemon.err omr-tracker-server[7147]: /bin/omr-tracker-server: eval: line 51: https://X.X.X.X:65500/: not found
Jan 15 22:59:01 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: no URL specified!
Jan 15 22:59:01 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: try 'curl --help' for more information
Jan 15 22:59:01 OpenMPTCProuter daemon.err omr-tracker-server[7147]: /bin/omr-tracker-server: eval: line 51: https://X.X.X.X:65500/: not found
Jan 15 22:59:02 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: no URL specified!
Jan 15 22:59:02 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: try 'curl --help' for more information
Jan 15 22:59:02 OpenMPTCProuter daemon.err omr-tracker-server[7147]: /bin/omr-tracker-server: eval: line 51: https://X.X.X.X:65500/: not found
Jan 15 22:59:03 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: no URL specified!
Jan 15 22:59:03 OpenMPTCProuter daemon.err omr-tracker-server[7147]: curl: try 'curl --help' for more information
Jan 15 22:59:03 OpenMPTCProuter daemon.err omr-tracker-server[7147]: /bin/omr-tracker-server: eval: line 51: https://X.X.X.X:65500/: not found

In the status page only show this.
image

@Ysurac
Copy link
Owner

Ysurac commented Jan 16, 2024

I didn't tested on 5.4, only on 6.1 where there is major changes. I will try on 5.4 maybe I have a wrong dependency...

@vempire-ghost
Copy link
Author

Latest release should fix Shadowsocks-libev. I have modified some part of OMR-Tracker and mptcp script, maybe this is enough to fix the issue.

Tried the snapshot from today 17/01.
image
Tried dns and ping, with these settings from the screenshot, it took 33s in first test and 1:10min in second test for OMR to detect and turn off the interface, and it only managed to detect the interface with a static IP; those with DHCP it never detects and turns off.
I tried to configure it to test the link quality, but it doesn't seem to be working because it doesn't save when I select this option.
Is there a way to minimize the time it takes for OMR to detect and turn off the interface?

Also, even with the interface that was detected and deactivated, upon reactivation, the subflows were not restored as they should have been.
And these scripts I was using to work around this problem, I can no longer add them to OMR.
uci set omr-tracker.wan1=interface'
uci set omr-tracker.wan1.script_alert_up='sleep 15 ; multipath eth1 off ; sleep 1 ; multipath eth1 on'
uci set omr-tracker.wan2=interface'
uci set omr-tracker.wan2.script_alert_up='sleep 15 ; multipath eth2 off ; sleep 1 ; multipath eth2 on'
uci set omr-tracker.wan3=interface'
uci set omr-tracker.wan3.script_alert_up='sleep 15 ; multipath eth3 off ; sleep 1 ; multipath eth3 on'
uci set omr-tracker.wan4=interface'
uci set omr-tracker.wan4.script_alert_up='sleep 15 ; multipath eth4 off ; sleep 1 ; multipath eth4 on'
uci set omr-tracker.wan5=interface'
uci set omr-tracker.wan4.script_alert_up='sleep 15 ; multipath eth5 off ; sleep 1 ; multipath eth5 on'

Xray is not working either.
Jan 18 01:21:04 OpenMPTCProuter daemon.err xray: Invalid V2Ray file.

@Ysurac
Copy link
Owner

Ysurac commented Jan 18, 2024

The WAN detection is now fixed, new image is compiling.

@vempire-ghost
Copy link
Author

The WAN detection is now fixed, new image is compiling.

I tested the latest build with these configurations.
image
Using ping, it took approximately 30 seconds to detect and remove the route, but it only detected 2 out of the 3 WANs. Using DNS, it took the same 30 seconds, but it detected all 3 WANs.
Using the link quality test with default values, it couldn't detect any offline interface at any time.

Would it be possible to detect and turn off the WAN route within a 1-second interval when it's offline? Or would this not be feasible and require more time?

And Xray still presents the error: OpenMPTCProuter daemon.err xray: Invalid V2Ray file.

@Ysurac
Copy link
Owner

Ysurac commented Jan 19, 2024

I fixed OMR-Tracker delay and XRay should be back (I wlll test).
New snapshot is compiling.

@vempire-ghost
Copy link
Author

I fixed OMR-Tracker delay and XRay should be back (I wlll test). New snapshot is compiling.

Xray still not working
Jan 23 00:59:00 OpenMPTCProuter user.notice omr-schedule-010-services: Can't find XRay, restart it...
Jan 23 00:59:00 OpenMPTCProuter daemon.err xray: Invalid XRay file.

@vempire-ghost
Copy link
Author

I fixed OMR-Tracker delay and XRay should be back (I wlll test). New snapshot is compiling.

Using today's snapshot from 27/01/2024 with the OMR tracker in ping mode, I don't know why, but it didn't detect any internet outage in any WAN at any time. However, in DNS mode, it perfectly detected the lack of internet and removed the route; it worked very well.

As for Xray, it continues to give an invalid file error and cannot be used.

@Ysurac
Copy link
Owner

Ysurac commented Jan 27, 2024

I'm not able to reproduce the issue with OMR-Tracker, can you give me the result of uci show omr-tracker via SSH on the router ? Maybe we don't have same configuration.
For XRay, a new image is compiling.

@vempire-ghost
Copy link
Author

I'm not able to reproduce the issue with OMR-Tracker, can you give me the result of uci show omr-tracker via SSH on the router ? Maybe we don't have same configuration. For XRay, a new image is compiling.

root@OpenMPTCProuter:~# uci show omr-tracker
omr-tracker.defaults=defaults
omr-tracker.defaults.enabled='1'
omr-tracker.defaults.hosts='4.2.2.1' '8.8.8.8' '80.67.169.12' '8.8.4.4' '9.9.9.9' '1.0.0.1' '114.114.115.115' '1.2.4.8' '80.67.169.40' '114.114.114.114' '1.1.1.1'
omr-tracker.defaults.hosts6='2606:4700:4700::1111' '2606:4700:4700::1001' '2620:fe::fe' '2620:fe::9' '2001:4860:4860::8888' '2001:4860:4860::8844'
omr-tracker.defaults.timeout='2'
omr-tracker.defaults.count='2'
omr-tracker.defaults.tries='3'
omr-tracker.defaults.interval='2'
omr-tracker.defaults.interval_tries='1'
omr-tracker.defaults.type='ping'
omr-tracker.defaults.wait_test='0'
omr-tracker.defaults.server_http_test='0'
omr-tracker.defaults.restart_down='0'
omr-tracker.defaults.mail_alert='0'
omr-tracker.defaults.initial_state='online'
omr-tracker.defaults.family='ipv4'
omr-tracker.defaults.reliability='1'
omr-tracker.defaults.failure_interval='5'
omr-tracker.defaults.tries_up='5'
omr-tracker.proxy=proxy
omr-tracker.proxy.enabled='1'
omr-tracker.proxy.hosts='212.27.48.10' '198.27.92.1' '151.101.129.164' '77.88.55.77' '1.1.1.1' '74.82.42.42' '198.41.212.162'
omr-tracker.proxy.timeout='10'
omr-tracker.proxy.tries='3'
omr-tracker.proxy.wait_test='0'
omr-tracker.proxy.interval_tries='1'
omr-tracker.proxy.interval='10'
omr-tracker.proxy.mail_alert='0'
omr-tracker.proxy.initial_state='online'
omr-tracker.proxy.family='ipv4ipv6'
omr-tracker.server=server
omr-tracker.server.enabled='1'
omr-tracker.server.tries='3'
omr-tracker.server.timeout='10'
omr-tracker.server.wait_test='0'
omr-tracker.server.interval='5'
omr-tracker.server.mail_alert='0'
omr-tracker.server.initial_state='online'
omr-tracker.omrvpn=interface
omr-tracker.omrvpn.type='none'
omr-tracker.omrvpn.timeout='10'
omr-tracker.omrvpn.tries='3'
omr-tracker.omrvpn.interval='5'
omr-tracker.omrvpn.mail_alert='0'
omr-tracker.omrvpn.enabled='1'
omr-tracker.omrvpn.server_http_test='1'
omr-tracker.omrvpn.restart_down='0'
omr-tracker.omrvpn.hosts='4.2.2.1' '8.8.8.8'
omr-tracker.omrvpn.initial_state='online'
omr-tracker.omrvpn.family='ipv4'
omr-tracker.omrvpn.reliability='1'
omr-tracker.omrvpn.count='1'
omr-tracker.omrvpn.failure_interval='5'

I downloaded the new snapshot from 01/27/24, left the default configuration it comes with. After 1 minute, the omr-vps reported the server as offline, but at no time did the tracker detect the lack of connection and deactivate any route.
Jan 27 15:06:22 OpenMPTCProuter daemon.info omr-tracker-xray: xray is down (can't contact via http 151.101.129.164, 77.88.55.77, 1.1.1.1)
Jan 27 15:06:30 OpenMPTCProuter user.notice OMR-VPS: Can't get vps token, try later
Jan 27 15:06:38 OpenMPTCProuter user.notice OMR-VPS: Can't get vps_us token, try later
Jan 27 15:06:38 OpenMPTCProuter user.notice xray: Rules DOWN

Switching to DNS this time, the omr-tracker also did not detect the internet outage, and after 1 minute, the omr-vps warned that the server was offline. I don't understand why it's not detecting correctly.
Jan 27 15:12:42 OpenMPTCProuter daemon.info omr-tracker-xray: xray is down (can't contact via http 74.82.42.42, 198.41.212.162, 212.27.48.10)
Jan 27 15:12:52 OpenMPTCProuter user.notice OMR-VPS: Can't get vps token, try later
Jan 27 15:13:02 OpenMPTCProuter user.notice OMR-VPS: Can't get vps_us token, try later
Jan 27 15:13:02 OpenMPTCProuter user.notice xray: Rules DOWN

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants