-
-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subflows are not recreated in the WAN after an internet outage. #3122
Comments
I think I found the issue; this event is occurring too early. In the same second that the interface is detected as UP, it already reloads the mptcp, and perhaps there hasn't been enough time for the internet to be fully active, causing the subflows not to be recreated. The system reports these events when I enter the command above. How can I increase the time it takes for the tracker to trigger the 'multipath on' command in this case? |
I solved the problem by adding this configuration. Now, whenever WAN3 loses and regains internet connectivity, the log displays this. Even though I set a sleep of 15s, the log always shows that it takes about 60s to execute the command. However, now all subflows are consistently recreated correctly. |
If this event of IP change occurs right after the wan is up, the script does not execute the 'multipath eth3 on; multipath eth3 off' command. Could you tell me why? There is anything I need to change in the scrip? |
After more tests over the weekend, I came up with a script that, if executed by the OMR, always recreates broken connections when an interface comes back online. In case someone is facing the same issue or wants to ensure that established connections with broken subflows are recreated, this command for each WAN solved my problem. uci set omr-tracker.wan1=interface For each WAN, a script needs to be created according to the name of your WAN in the OMR network interface. What I noticed is that multipath only recreates broken subflows when changed from off to on. So, if it's already on, it won't recreate them, or if it's changed from off to on too early, before there is connectivity on the WAN, it fails to recreate and doesn't attempt again. And if you are using OMR with redundant scheduler for resilience instead of aggregation, and want to ensure that your subflows are always recreated and healthy, adding this script to the cron ensures that. */1 * * * * multipath eth1 off ; sleep 1 ; multipath eth1 on ; multipath eth2 off ; sleep 1 ; multipath eth2 on ;multipath eth3 off ; sleep 1 ; multipath eth3 on ; multipath eth4 off ; sleep 1 ; multipath eth4 on The loop time is at your discretion for how often you want the subflows to be recreated if necessary. I set it to 1 minute, but you can use any value you find suitable. I tested it all weekend gaming and monitoring my latency, and it didn't cause any issues. On the contrary, it made my connection even more stable. Unfortunately, I couldn't test the new snapshots with the new OMR tracker because all the ones I tried couldn't even connect to the VPS. I will test again as new ones are compiled. |
Latest release should fix Shadowsocks-libev. |
I tried today's latest snapshot, but it's not connecting. It gives this error in the log. |
I didn't tested on 5.4, only on 6.1 where there is major changes. I will try on 5.4 maybe I have a wrong dependency... |
The WAN detection is now fixed, new image is compiling. |
I fixed OMR-Tracker delay and XRay should be back (I wlll test). |
Xray still not working |
Using today's snapshot from 27/01/2024 with the OMR tracker in ping mode, I don't know why, but it didn't detect any internet outage in any WAN at any time. However, in DNS mode, it perfectly detected the lack of internet and removed the route; it worked very well. As for Xray, it continues to give an invalid file error and cannot be used. |
I'm not able to reproduce the issue with OMR-Tracker, can you give me the result of |
root@OpenMPTCProuter:~# uci show omr-tracker I downloaded the new snapshot from 01/27/24, left the default configuration it comes with. After 1 minute, the omr-vps reported the server as offline, but at no time did the tracker detect the lack of connection and deactivate any route. Switching to DNS this time, the omr-tracker also did not detect the internet outage, and after 1 minute, the omr-vps warned that the server was offline. I don't understand why it's not detecting correctly. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
Expected Behavior
After the internet is restored, the subflows of that WAN will all be recreated, and traffic resumes flowing.
Current Behavior
After the internet is restored, the subflows of currently active connections are never restored again.
Specifications
I have been monitoring this issue for some time; whenever a WAN loses internet connectivity, upon its restoration, the active connections do not recreate the subflow as they should.
However, upon conducting some tests, I observed that when stopping the interface and then restarting it after the internet has returned, the subflows of all the WANs that were previously inactive are recreated, including those from other WANs.
Here are some screenshots.
![image](https://private-user-images.githubusercontent.com/132581535/296388876-1980a369-2548-46c2-bbc2-07c1fe437979.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzNTQxMzksIm5iZiI6MTcyMTM1MzgzOSwicGF0aCI6Ii8xMzI1ODE1MzUvMjk2Mzg4ODc2LTE5ODBhMzY5LTI1NDgtNDZjMi1iYmMyLTA3YzFmZTQzNzk3OS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQwMTUwMzlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mNzVmOWQ2Y2U3NTYzYTdmOGM3NjE5MzcyN2JhNDY5ZTA3OTMwZmY3YzI2ZjU1YThkNWY1OTM2N2I2ZTA3NGJkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.ka3iyhiCUl_GPoaauyIEJPTE8bfPhi-p6wOf8LitrsI)
![image](https://private-user-images.githubusercontent.com/132581535/296388916-2fe2e419-c9b8-4725-9f18-6c1b6b527527.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzNTQxMzksIm5iZiI6MTcyMTM1MzgzOSwicGF0aCI6Ii8xMzI1ODE1MzUvMjk2Mzg4OTE2LTJmZTJlNDE5LWM5YjgtNDcyNS05ZjE4LTZjMWI2YjUyNzUyNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQwMTUwMzlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NmJhYWJjMmFiYWI0YWQzNTQ2ZWJjN2E5ZDE2YjdiMWJiZGQ3YTQyY2Q4NTBiYzc2MGNlNzhlZTgzNjVlMGU2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.RLNMiqxQHYMo9PVMvrcj7V4-6F42GQmieI2-b0-0LZ8)
![image](https://private-user-images.githubusercontent.com/132581535/296388941-8ecb04f0-09d8-4ff5-b56e-47580bd78be1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzNTQxMzksIm5iZiI6MTcyMTM1MzgzOSwicGF0aCI6Ii8xMzI1ODE1MzUvMjk2Mzg4OTQxLThlY2IwNGYwLTA5ZDgtNGZmNS1iNTZlLTQ3NTgwYmQ3OGJlMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQwMTUwMzlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xOTc2NGRmMzk0NjUwODAwNWE2ZDZlMmZiMmY3YjM0ZjE3M2RiNWQ5ZjNmNGE5OGE2YTc3Zjk1MmZhOWZkMjFhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.VucuV71-98M4GAdeac22bF0lZZYbShKW7a0Az8fZIHI)
![image](https://private-user-images.githubusercontent.com/132581535/296388981-7fae9306-6b16-4872-8511-62c80d168ada.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzNTQxMzksIm5iZiI6MTcyMTM1MzgzOSwicGF0aCI6Ii8xMzI1ODE1MzUvMjk2Mzg4OTgxLTdmYWU5MzA2LTZiMTYtNDg3Mi04NTExLTYyYzgwZDE2OGFkYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQwMTUwMzlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZjcxNjU2ODczNDUxYmI3MjU2Y2QyNDYwZjBiNTRkODA1NDRmY2FiZmY1YTNmNzAzNjBlZTMyZjcyNzIwNGYxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.vHUHNNGsToN5Dr8M4FClRaqHcR7fWH-0Wy5OPm35L1A)
As you can see, in this test, I disconnected the internet from WAN1 and WAN3, waited for some time, and then reconnected the internet. However, no data started flowing through them again.
However, at the moment when I went to the Interfaces, stopped WAN1, and then restarted it, the flow immediately resumed for both WAN1 and WAN3.
Apparently, when stop an interface and restart it, it activates some mechanism that recreates the subflows for it or creates new ones if the connection was established during the internet outage.
In OMR, there is an option to restart the interface if it is down. Is there a way to stop and restart an interface after a certain period of time once it has become active again?
Even if it involves using a script triggered by an event like 'Jan 12 22:53:42 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan3 (eth3) switched up,' where it waits for a few seconds and then executes commands to stop and restart that interface.
I would like to test if it's possible to automate this solution, but I lack the knowledge to create a script tied to an event like this. That's why I seek your assistance.
The text was updated successfully, but these errors were encountered: