-
-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible vunerability in OMR: bad packet attack from Internet IP crashes glorytun, brings down wan interface, omr-tracker fails to restart interface. #2956
Comments
This is a bigger problem that I thought. Right before wan2 goes down, I see this in the VPS log:
The timing matches up omr event: Analysis:
|
This is the second time I have seen this event since I have been watching OMR logs very closely this week. I posed about it in this thread: Same sequence of events:
|
VPS Glorytun TCP alerts with external IPs are not related with router wan1 problem. I don't see a Glorytun TCP crash in your VPS log. For OMR-Tracker, after a restart of the interface it should work and doesn't try to restart it again. |
I think omr-tracker tries to restart the interface, something happened and it restart completely. Maybe it didn't get an IP address or gateway, restart didn't work. Here is where I think the problem is:
but I think the wan did not have an IP at this time. It stayed this way for 7 hours, no IP and no gateway. When logged into router 7 hours later I see two things:
Does omr-tracker check there is an ip and gateway after a retart or does it just restart and go away? This is what I think happened, the restart didn't finish and left the interface in a bad state but omr-tracker thinks it is up and doesn't check again for 7 hours. I log in and it triggers omr-tracker and then omr-tracker sees- this wan does not have an ip and it restarts it. This time it worked. Is there anyway to check after a restart if it was successful? |
After interface restart, omr-tracker still work but do nothing. After up/down, interface should get an IP via DHCP, this is not the case for unknown reason. I can add a timeout after which interface is restarted again but I don't want a restart loop... |
Yes a loop does not work well with openwrt I tried it already with my aggregate script, it made the router very laggy. I found adding a cronjob is a simple way to run a script on a timer in this environment since cron it is already working. Don't make any changes yet, I found the root cause of the issue because it just happened again tonight. Here is what happens:
Tonight it happened again. omr-tracker restarted starlink at 19:39 and I didn't recover until 20:44 so eth1 was down for 1 hour. Here is the problem:
Now every 2 minutes it will If omr-tracker is triggered it sees the local IP and no gateway and restarts the interface but udhcpc always grabs the 192.168 ip again:
Finally it is fixed here 1 hour later:
Solutions
|
Seems that it's fixed in busybox git, I will check if it's fixed in the versions used or if I need to update them. |
I looked for this yesterday but I couldn't find it. Do you have the git link?
It could be startlink updating I thought about that too but it happened two days in a row but different behavior from OMR. First day it only ran OMR-tracker once time and not again for 7 hours so it thought everything was good. Second day it detected there was no gateway and tried many times to restart the interface.
Do you mean do this to test did different settings? It might be hard because I don't know how to simulate this error.
I am using the router but I have it in bypass mode so I can use it with OMR. When you say you use provided router do you mean you are using the router wifi capabilities? If you connect OMR to Starlink then you will have double NAT? Seems bypass mode is the best configuration for using startlink with OMR. Do you mean you will test with the router in bypass mode? @Network-Traditions I thought I saw you mentioned this problem in another issue around 1 year ago but I can't find it now. Did you find a work around? |
@ioogithub we had and still have an issue #2793 where our T-Mobile Business Cellular Internet WAN controlled by MODEMMANAGER regularly (once per day on average) stops receiving packets. Executing a restart from Luci generally fails to successfully bring the interface back online, which is also the case with OMR-Tracker. We developed a bash script to reset the USB interface, which succeeds in bringing the WAN connection back online 9 out of 10 times. When it doesn't a full power cycle of the MODEM or reboot of OMR generally resolves the issue. We are hoping this behavior will resolve with a Debian 12 VPS running an an Oracle Ampere ARM64 instance using OMR v6.1 with upstream MPTCP. Finally, our current modem has the flawed chipset, which prohibits operating in "SA" mode on T-Mobile's network. I expect significant performance improvements to include potentially eliminating this issue when we upgrade (soon) to a T-Mobile certified modem, which will successfully support "SA" mode. |
Sounds like a different issue. This one involves starlink giving a temporary 192.168.x.x address after omr-tracker restarts it and holding on to the temporary lease instead of renewing the proper lease. I thought for sure I saw you linked to this issue before but it was over a year ago, sorry. |
For busybox, it's here: https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c#n1740 |
Thanks I will read and try to understand. Busybox is a single binary isn't it. If we wanted to update OMR is it as easy as installing the new binary on the router and testing it? If you make a decision I am available to test it.
I am looking at the OME web interface, what type of setting could be used to mitigate this issue for example? So same issue happened again 1 hour ago: Sep 14 10:16:31 vps glorytun-tcp-run[683]: read: Connection timed out and on vps:
I know you said this is just a scanner but every time after the scanner sends the bad packet, eth2 goes down. |
For OMR-Tracker, as I said you can increase timeout and tries. For Glorytun-TCP, you see the problem on VPS because wan2 already doesn't answer, it's the "connection timed out", as there is no active connection glorytun-tcp answer to all connection, it's the bot with 35.203.* IP, then wan2 reconnect but seems to still loose many packets so omr-tracker detect it as down and restart interface. |
Okay I understand this, glorytun is timing out before the bad packet this is the start of the event. Glorytun is just reacting to the bot scan afterwards. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
OMR will be able to automatically recover and restart after an interface after it brings it down.
Current Behavior
omr-tracker brings down an interface, tries to restart it, it doesn't restart fully, omr tracker is fooled into thinking it is online and doesn't try any more to bring it online until the user manually intervenes.
Possible Solution
I have identified an edge case where omr-tracker is not triggered to bring the interface back online. The script will need to be modified to address this edge case.
Steps to Reproduce the Problem
Context (Environment)
Its a problem because it makes omr less reliable. In this case the router was not able to recover automatically.
Specifications
Here is the log:
Analysis:
omr-tracker beings the interface down:
Mon Sep 11 03:31:43 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (9.9.9.9,1.0.0.1,114.114.115.115)
this command is failing, how to find out what command it is and how to fix it?
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): Command failed: Permission denied
interface starts to come up but never finished, it is actually still down:
it somehow fools omr-tracker or omr-tracker doesn't get triggered again for 8 hours and the interface stays down:
Questions:
is the `Command failed: Permission denied' netif error contributing to this, is it causing the problem?
wan1 never finished coming back up but omr thinks its back online and doesn't try to restart it. why doesn't omr-tracker run again anytime between 4am and 1pm to try again to bring wan1 back online? What can be done to fix this?
The text was updated successfully, but these errors were encountered: