Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible vunerability in OMR: bad packet attack from Internet IP crashes glorytun, brings down wan interface, omr-tracker fails to restart interface. #2956

Closed
ioogithub opened this issue Sep 11, 2023 · 15 comments

Comments

@ioogithub
Copy link

OMR will be able to automatically recover and restart after an interface after it brings it down.

Current Behavior

omr-tracker brings down an interface, tries to restart it, it doesn't restart fully, omr tracker is fooled into thinking it is online and doesn't try any more to bring it online until the user manually intervenes.

Possible Solution

I have identified an edge case where omr-tracker is not triggered to bring the interface back online. The script will need to be modified to address this edge case.

Steps to Reproduce the Problem

  1. let omr-tracker bring an interface down
  2. the interface doesn't completely come back up but starts to come back up enough to fool omr-tracker into thinking it is up.

Context (Environment)

Its a problem because it makes omr less reliable. In this case the router was not able to recover automatically.

Specifications

- OpenMPTCProuter version: v0.59.1-5.4 r0+16594-ce92de8c8c
- OpenMPTCProuter VPS version: VPS 0.1028
- OpenMPTCProuter VPS provider:
- OpenMPTCProuter platform: x86_64)

Here is the log:

Mon Sep 11 03:31:43 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (9.9.9.9,1.0.0.1,114.114.115.115)
Mon Sep 11 03:31:43 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan1gateway dev eth1
Mon Sep 11 03:31:45 2023 user.notice post-tracking-post-tracking: Restart wan1
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): udhcpc: received SIGTERM
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): udhcpc: unicasting a release of wan1ip to wan1gateway
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): udhcpc: sending release
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): udhcpc: entering released state
Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): Command failed: Permission denied
Mon Sep 11 03:31:45 2023 daemon.notice netifd: Interface 'wan1' is now down
Mon Sep 11 03:31:45 2023 daemon.err glorytun[25454]: vpsip.65001: connected
Mon Sep 11 03:31:45 2023 daemon.notice netifd: Interface 'wan1' is disabled
Mon Sep 11 03:31:45 2023 daemon.notice netifd: Network device 'eth1' link is down
Mon Sep 11 03:31:45 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Mon Sep 11 03:31:45 2023 daemon.info glorytun[25454]: STARTED tun0
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.003584] __mptcp_init4_subsockets: token 0xc793d456 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.004476] __mptcp_init4_subsockets: token 0xc793d456 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129391] __mptcp_init4_subsockets: token 0x985a511 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129427] __mptcp_init4_subsockets: token 0xa5cf704 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129507] __mptcp_init4_subsockets: token 0xde9d1ca1 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129511] __mptcp_init4_subsockets: token 0xde9d1ca1 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129671] __mptcp_init4_subsockets: token 0x2d072377 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.129674] __mptcp_init4_subsockets: token 0x2d072377 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.130408] __mptcp_init4_subsockets: token 0x985a511 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 kern.err kernel: [116385.131366] __mptcp_init4_subsockets: token 0xa5cf704 bind() to wan1ip index 11 failed, error -99
Mon Sep 11 03:31:45 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 03:31:45 2023 user.notice MPTCP: Flush route cache
Mon Sep 11 03:31:46 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 03:31:46 2023 user.notice MPTCP: Flush route cache
Mon Sep 11 03:31:48 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 198.41.212.162)
Mon Sep 11 03:31:49 2023 user.notice post-tracking-post-tracking: Tunnel up : Replace default route by 10.255.255.1 dev tun0
Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is enabled
Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: started, v1.33.2
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: sending discover
Mon Sep 11 03:31:51 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 03:31:51 2023 user.notice OMR-VPS: Can't get vps token, try later
Mon Sep 11 03:31:51 2023 user.notice v2ray: Rules UP
Mon Sep 11 03:31:51 2023 user.notice v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward
Mon Sep 11 03:31:51 2023 user.notice v2ray: Reload omr-bypass rules
Mon Sep 11 03:31:51 2023 user.notice omr-bypass: Starting OMR-ByPass...
Mon Sep 11 03:31:52 2023 user.notice omr-bypass: OMR-ByPass is running
Mon Sep 11 03:31:53 2023 daemon.notice netifd: wan1 (28456): udhcpc: sending discover
Mon Sep 11 03:31:56 2023 daemon.notice netifd: wan1 (28456): udhcpc: sending discover
Mon Sep 11 03:45:08 2023 daemon.info v2ray: DNS disabled: main_dns
Mon Sep 11 03:45:08 2023 daemon.info v2ray: Setting transparent proxy on port: 1897
Mon Sep 11 03:45:08 2023 daemon.info v2ray: Transparent proxy mode: default
Mon Sep 11 03:45:08 2023 user.notice v2ray: add rules
Mon Sep 11 03:45:08 2023 user.notice v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward
Mon Sep 11 03:45:08 2023 user.notice v2ray: Reload omr-bypass rules
Mon Sep 11 03:45:08 2023 user.notice omr-bypass: Starting OMR-ByPass...
Mon Sep 11 03:45:08 2023 user.notice omr-bypass: OMR-ByPass is running
Mon Sep 11 03:45:08 2023 daemon.info v2ray[23775]: V2Ray 4.43.0 (V2Fly, a community-driven edition of V2Ray.) OpenWrt R1 (go1.17.3 linux/amd64)
Mon Sep 11 03:45:08 2023 daemon.info v2ray[23775]: A unified platform for anti-censorship.
Mon Sep 11 03:45:08 2023 daemon.info v2ray[23775]: 2023/09/11 07:45:08 [Info] main/jsonem: Reading config: /var/etc/v2ray/v2ray.main.json
Mon Sep 11 04:00:08 2023 daemon.info v2ray: DNS disabled: main_dns
Mon Sep 11 04:00:08 2023 daemon.info v2ray: Setting transparent proxy on port: 1897
Mon Sep 11 04:00:08 2023 daemon.info v2ray: Transparent proxy mode: default
Mon Sep 11 04:00:08 2023 user.notice v2ray: add rules
Mon Sep 11 04:00:08 2023 user.notice v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward
Mon Sep 11 04:00:08 2023 user.notice v2ray: Reload omr-bypass rules
Mon Sep 11 04:00:08 2023 user.notice omr-bypass: Starting OMR-ByPass...
Mon Sep 11 04:00:08 2023 user.notice omr-bypass: OMR-ByPass is running

.... interface is down for 7 hours: eth1: 0 kB/s


Mon Sep 11 13:01:11 2023 authpriv.notice dropbear[25125]: Pubkey auth succeeded for 'root' with key sha1!! aaasdasdfasfasdfasdfas from lanip:55914
Mon Sep 11 13:02:12 2023 user.info : luci: accepted login on / for root from lanip
Mon Sep 11 13:02:40 2023 user.notice omr-bypass: Starting OMR-ByPass...
Mon Sep 11 13:02:40 2023 user.notice omr-bypass: OMR-ByPass is running
Mon Sep 11 13:02:40 2023 user.notice MPTCP: Flush route cache
Mon Sep 11 13:02:50 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 13:02:50 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Mon Sep 11 13:02:50 2023 user.notice omr-tracker: Launching...
Mon Sep 11 13:02:57 2023 user.notice omr-tracker: Launched
Mon Sep 11 13:02:57 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4, interface have no IPv4 gateway
Mon Sep 11 13:02:57 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via  dev eth1
Mon Sep 11 13:02:57 2023 user.notice omr-bypass: Starting OMR-ByPass...
Mon Sep 11 13:02:58 2023 user.notice omr-bypass: OMR-ByPass is running
Mon Sep 11 13:02:58 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Mon Sep 11 13:02:59 2023 user.notice post-tracking-post-tracking: Restart wan1
Mon Sep 11 13:02:59 2023 daemon.notice netifd: wan1 (28456): udhcpc: received SIGTERM
Mon Sep 11 13:02:59 2023 daemon.notice netifd: wan1 (28456): udhcpc: entering released state
Mon Sep 11 13:02:59 2023 daemon.notice netifd: wan1 (28456): Command failed: Permission denied
Mon Sep 11 13:02:59 2023 daemon.notice netifd: Interface 'wan1' is now down
Mon Sep 11 13:02:59 2023 daemon.notice netifd: Interface 'wan1' is disabled
Mon Sep 11 13:03:04 2023 daemon.notice netifd: Interface 'wan1' is enabled
Mon Sep 11 13:03:04 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Mon Sep 11 13:03:04 2023 daemon.notice netifd: wan1 (11295): udhcpc: started, v1.33.2
Mon Sep 11 13:03:04 2023 daemon.notice netifd: wan1 (11295): udhcpc: sending discover
Mon Sep 11 13:03:05 2023 user.notice MPTCP: Flush route cache
Mon Sep 11 13:03:06 2023 kern.info kernel: [150665.207647] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Mon Sep 11 13:03:07 2023 daemon.notice netifd: Network device 'eth1' link is up
Mon Sep 11 13:03:07 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Mon Sep 11 13:03:07 2023 daemon.notice netifd: wan1 (11295): udhcpc: sending discover
Mon Sep 11 13:03:07 2023 daemon.notice netifd: wan1 (11295): udhcpc: sending select for wan1ip
Mon Sep 11 13:03:07 2023 daemon.notice netifd: wan1 (11295): udhcpc: lease of wan1ip obtained, lease time 300
Mon Sep 11 13:03:07 2023 daemon.notice netifd: Interface 'wan1' is now up
Mon Sep 11 13:03:07 2023 user.notice firewall: Reloading firewall due to ifup of wan1 (eth1)
Mon Sep 11 13:03:07 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Mon Sep 11 13:03:08 2023 daemon.notice netifd: Network device 'eth1' link is down
Mon Sep 11 13:03:08 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Mon Sep 11 13:03:08 2023 user.notice post-tracking-post-tracking: Set firewall on server vpsLuna
Mon Sep 11 13:03:14 2023 daemon.notice netifd: Network device 'eth1' link is up
Mon Sep 11 13:03:14 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Mon Sep 11 13:03:14 2023 kern.info kernel: [150672.707356] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Mon Sep 11 13:03:16 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 13:03:17 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan1 (eth1)
Mon Sep 11 13:03:17 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Mon Sep 11 13:03:17 2023 user.notice omr-tracker: Launching...
Mon Sep 11 13:03:17 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 13:03:24 2023 user.notice omr-tracker: Launched
Mon Sep 11 13:03:24 2023 user.notice MPTCP: Flush route cache
Mon Sep 11 13:03:24 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
Mon Sep 11 13:03:24 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Mon Sep 11 13:05:37 2023 daemon.notice netifd: wan1 (11295): udhcpc: sending renew to wan1gateway
Mon Sep 11 13:05:37 2023 daemon.notice netifd: wan1 (11295): udhcpc: lease of wan1ip obtained, lease time 300

Analysis:

  • omr-tracker beings the interface down:
    Mon Sep 11 03:31:43 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (9.9.9.9,1.0.0.1,114.114.115.115)

  • this command is failing, how to find out what command it is and how to fix it?
    Mon Sep 11 03:31:45 2023 daemon.notice netifd: wan1 (6224): Command failed: Permission denied

  • interface starts to come up but never finished, it is actually still down:

  • it somehow fools omr-tracker or omr-tracker doesn't get triggered again for 8 hours and the interface stays down:

Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is enabled
Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: started, v1.33.2
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: sending discover
Mon Sep 11 03:31:51 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2
  • Finally omr-tracker comes back online and recognizes there is a problem but only after i ssh into the router and only after 8 hours of wan1 being down:
Mon Sep 11 13:02:57 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4, interface have no IPv4 gateway
Mon Sep 11 13:02:57 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via  dev eth1

Questions:

  • is the `Command failed: Permission denied' netif error contributing to this, is it causing the problem?

  • wan1 never finished coming back up but omr thinks its back online and doesn't try to restart it. why doesn't omr-tracker run again anytime between 4am and 1pm to try again to bring wan1 back online? What can be done to fix this?

@ioogithub ioogithub added the bug label Sep 11, 2023
@ioogithub
Copy link
Author

ioogithub commented Sep 11, 2023

This is a bigger problem that I thought. Right before wan2 goes down, I see this in the VPS log:

Sep 11 03:31:14 vps glorytun-tcp-run[21201]: read: Connection timed out
Sep 11 03:31:15 vps glorytun-tcp-run[21201]: ::ffff:35.203.211.161.64038: connected
Sep 11 03:31:15 vps glorytun-tcp-run[21201]: bad packet [280B0E1C280B0E1D280B0E1E] !
Sep 11 03:31:15 vps glorytun-tcp-run[21201]: ::ffff:35.203.211.161.64038: key exchange failed
Sep 11 03:31:15 vps glorytun-tcp-run[21201]: getpeername: Transport endpoint is not connected
Sep 11 03:31:16 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.10.44157: connected
Sep 11 03:31:16 vps glorytun-tcp-run[21201]: bad packet [280B0E1C280B0E1D280B0E1E] !
Sep 11 03:31:16 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.10.44157: key exchange failed
Sep 11 03:31:16 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.9.59165: connected
Sep 11 03:31:16 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.9.59165: key exchange failed
Sep 11 03:31:17 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.36.58253: connected
Sep 11 03:31:17 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.36.58253: key exchange failed
Sep 11 03:31:17 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.24.58505: connected
Sep 11 03:31:17 vps glorytun-tcp-run[21201]: bad packet [280B0E1C280B0E1D280B0E1E] !
Sep 11 03:31:17 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.24.58505: key exchange failed
Sep 11 03:31:18 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.6.60863: connected
Sep 11 03:31:18 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.6.60863: key exchange failed
Sep 11 03:31:18 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.5.56273: connected
Sep 11 03:31:18 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.5.56273: key exchange failed
Sep 11 03:31:19 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.14.53449: connected
Sep 11 03:31:19 vps glorytun-tcp-run[21201]: ::ffff:87.236.176.14.53449: key exchange failed
Sep 11 03:31:45 vps glorytun-tcp-run[21201]: ::ffff:wan2ip.58185: connected

The timing matches up omr event:
Mon Sep 11 03:31:43 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (9.9.9.9,1.0.0.1,114.114.115.115)

Analysis:

  • 35 ip is: 161.211.203.35.bc.googleusercontent.com so could be a script.

  • 87 ip is: fascinating.monitoring.internet-measurement.com looks like a scanning bot

  • So either this scanning bot is crashing OMR or it is an attack.

  • This is the second time I have seen this problem in 1 week. Is there a vulnerability in glorytun which in turn is able to crash OMR?

  • The problem is that the wan does not come back online automatically and OMR tracker can't recover from this attack because it doesn't realize the wan is offline.

@ioogithub ioogithub changed the title omr-tracker brings interfaces down, they do not come back up and omr-tracker does not get triggered to fix the issue. Possible security vunerability in OMR: bad packet attack from Internet IP crashes glorytun, brings down wan interface, omr-tracker fails to restart interface. Sep 11, 2023
@ioogithub
Copy link
Author

This is the second time I have seen this event since I have been watching OMR logs very closely this week. I posed about it in this thread:
Originally posted by @ioogithub in #2936 (comment)

Same sequence of events:

  1. bad packet from Internet IP crashes glorytun
  2. one wan interface goes down
  3. OMR can't automatically restart interface
  4. Interface stays down until manually brought back up.

@ioogithub ioogithub changed the title Possible security vunerability in OMR: bad packet attack from Internet IP crashes glorytun, brings down wan interface, omr-tracker fails to restart interface. Possible vunerability in OMR: bad packet attack from Internet IP crashes glorytun, brings down wan interface, omr-tracker fails to restart interface. Sep 11, 2023
@Ysurac
Copy link
Owner

Ysurac commented Sep 12, 2023

VPS Glorytun TCP alerts with external IPs are not related with router wan1 problem. I don't see a Glorytun TCP crash in your VPS log.
It's only some scanning, nothing to worry about. If you don't have fixed IP, maybe possible to use GeoIP blocking with shorewall on the VPS, or to configure fail2ban to read glorytun tcp log and ban IP, or use Crowdsec.

For OMR-Tracker, after a restart of the interface it should work and doesn't try to restart it again.
So the interface restart, get an IP via DHCP, but doesn't work ? Another ifdown wan1 && ifup wan1 solve the issue or something else must be done ?

@ioogithub
Copy link
Author

For OMR-Tracker, after a restart of the interface it should work and doesn't try to restart it again.
So the interface restart, get an IP via DHCP, but doesn't work ? Another ifdown wan1 && ifup wan1 solve the issue or something else must be done ?

I think omr-tracker tries to restart the interface, something happened and it restart completely. Maybe it didn't get an IP address or gateway, restart didn't work.

Here is where I think the problem is:

Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is enabled
Mon Sep 11 03:31:50 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: started, v1.33.2
Mon Sep 11 03:31:50 2023 daemon.notice netifd: wan1 (28456): udhcpc: sending discover
Mon Sep 11 03:31:51 2023 user.notice MPTCP: Add route wan2subnet/24 via wan2gateway dev eth2

but I think the wan did not have an IP at this time.

It stayed this way for 7 hours, no IP and no gateway.

When logged into router 7 hours later I see two things:

  1. status page has "no IP for this wan"
  2. Logging in triggered omr-tracker and then it sees no IP and restarts it:
    Mon Sep 11 13:02:57 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4, interface have no IPv4 gateway

Does omr-tracker check there is an ip and gateway after a retart or does it just restart and go away?

This is what I think happened, the restart didn't finish and left the interface in a bad state but omr-tracker thinks it is up and doesn't check again for 7 hours. I log in and it triggers omr-tracker and then omr-tracker sees- this wan does not have an ip and it restarts it. This time it worked.

Is there anyway to check after a restart if it was successful?

@Ysurac
Copy link
Owner

Ysurac commented Sep 12, 2023

After interface restart, omr-tracker still work but do nothing. After up/down, interface should get an IP via DHCP, this is not the case for unknown reason. I can add a timeout after which interface is restarted again but I don't want a restart loop...
I will make some change in the script to add this.

@ioogithub
Copy link
Author

ioogithub commented Sep 13, 2023

After interface restart, omr-tracker still work but do nothing. After up/down, interface should get an IP via DHCP, this is not the case for unknown reason. I can add a timeout after which interface is restarted again but I don't want a restart loop... I will make some change in the script to add this.

Yes a loop does not work well with openwrt I tried it already with my aggregate script, it made the router very laggy. I found adding a cronjob is a simple way to run a script on a timer in this environment since cron it is already working.

Don't make any changes yet, I found the root cause of the issue because it just happened again tonight.
It is a known issue with starlink and udhcpc described here: https://forum.openwrt.org/t/starlink-dhcp-vs-openwrt-is-this-a-bug/93375/6. One use said that it was reported to buxybox years ago but still not fixed.

Here is what happens:

  1. omr-tracker can't ping wan1 (starlink) for various reasons, it restarts the interface.
  2. When starlink restarts it first gives a temporary 192.168.100.100 address for admin reasons before it fully boots up.
  3. udhcpc grabs this address and gives it to OMR, it has a very short lease time of 122s but it should not.
  4. Starlink eventually comes up fully but udhcpc hangs on to this 192.168 address.

Tonight it happened again. omr-tracker restarted starlink at 19:39 and I didn't recover until 20:44 so eth1 was down for 1 hour. Here is the problem:

Tue Sep 12 19:39:08 2023 daemon.info omr-tracker-v2ray: V2Ray is down (can't contact via http 198.27.92.1, 151.101.129.164, 77.88.55.77)
Tue Sep 12 19:39:08 2023 user.notice v2ray: Rules DOWN
Tue Sep 12 19:39:16 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from goodstarlinkip error (8.8.8.8,80.67.169.12,8.8.4.4)
Tue Sep 12 19:39:16 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via goodstarlinkgateway dev eth1
Tue Sep 12 19:39:17 2023 daemon.info omr-tracker-v2ray: Server (vpsip) seems down, no answer to ping
Tue Sep 12 19:39:18 2023 user.notice post-tracking-post-tracking: Restart wan1
Tue Sep 12 19:39:18 2023 daemon.notice netifd: wan1 (12879): udhcpc: received SIGTERM
Tue Sep 12 19:39:18 2023 daemon.notice netifd: wan1 (12879): udhcpc: unicasting a release of goodstarlinkip to goodstarlinkgateway
Tue Sep 12 19:39:18 2023 daemon.notice netifd: wan1 (12879): udhcpc: sending release
Tue Sep 12 19:39:18 2023 daemon.notice netifd: wan1 (12879): udhcpc: entering released state
Tue Sep 12 19:39:18 2023 daemon.notice netifd: wan1 (12879): Command failed: Permission denied
Tue Sep 12 19:39:18 2023 daemon.notice netifd: Interface 'wan1' is now down
Tue Sep 12 19:39:18 2023 daemon.notice netifd: Interface 'wan1' is disabled
Tue Sep 12 19:39:18 2023 daemon.notice netifd: Network device 'eth1' link is down
Tue Sep 12 19:39:18 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Tue Sep 12 19:39:19 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:39:22 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.1.1.1)
Tue Sep 12 19:39:23 2023 daemon.notice netifd: Interface 'wan1' is enabled
Tue Sep 12 19:39:23 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Tue Sep 12 19:39:23 2023 daemon.notice netifd: wan1 (12600): udhcpc: started, v1.33.2
Tue Sep 12 19:39:23 2023 daemon.notice netifd: wan1 (12600): udhcpc: sending discover
Tue Sep 12 19:39:26 2023 daemon.notice netifd: Network device 'eth1' link is up
Tue Sep 12 19:39:26 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Tue Sep 12 19:39:26 2023 kern.info kernel: [260840.647430] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Tue Sep 12 19:39:26 2023 daemon.notice netifd: wan1 (12600): udhcpc: sending discover
Tue Sep 12 19:39:29 2023 daemon.notice netifd: wan1 (12600): udhcpc: sending discover
Tue Sep 12 19:39:32 2023 daemon.notice netifd: wan1 (12600): udhcpc: sending select for 192.168.100.100
Tue Sep 12 19:39:32 2023 daemon.notice netifd: wan1 (12600): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:39:32 2023 daemon.notice netifd: Interface 'wan1' is now up
  • When udhcpc grabs a new lease it gets the temporary starlink admin page lease:
    Tue Sep 12 19:39:32 2023 daemon.notice netifd: wan1 (12600): udhcpc: lease of 192.168.100.100 obtained, lease time 122

Now every 2 minutes it will

If omr-tracker is triggered it sees the local IP and no gateway and restarts the interface but udhcpc always grabs the 192.168 ip again:
Tue Sep 12 19:41:51 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4 gateway

  • here are some logs sequences:
Tue Sep 12 19:41:52 2023 user.notice post-tracking-post-tracking: Restart wan1
Tue Sep 12 19:41:52 2023 daemon.notice netifd: wan1 (12600): udhcpc: received SIGTERM
Tue Sep 12 19:41:52 2023 daemon.notice netifd: wan1 (12600): udhcpc: unicasting a release of 192.168.100.100 to 192.168.100.1
Tue Sep 12 19:41:52 2023 daemon.notice netifd: wan1 (12600): udhcpc: sending release
Tue Sep 12 19:41:52 2023 daemon.notice netifd: wan1 (12600): udhcpc: entering released state
Tue Sep 12 19:41:52 2023 daemon.notice netifd: wan1 (12600): Command failed: Permission denied
Tue Sep 12 19:41:52 2023 daemon.notice netifd: Interface 'wan1' is now down
Tue Sep 12 19:41:52 2023 daemon.notice netifd: Interface 'wan1' is disabled
Tue Sep 12 19:41:52 2023 daemon.notice netifd: Network device 'eth1' link is down
Tue Sep 12 19:41:52 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Tue Sep 12 19:41:52 2023 user.notice omr-tracker: Launching...
Tue Sep 12 19:41:57 2023 daemon.notice netifd: Interface 'wan1' is enabled
Tue Sep 12 19:41:57 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Tue Sep 12 19:41:57 2023 daemon.notice netifd: wan1 (4421): udhcpc: started, v1.33.2
Tue Sep 12 19:41:57 2023 daemon.notice netifd: wan1 (4421): udhcpc: sending discover
Tue Sep 12 19:41:59 2023 user.notice omr-tracker: Launched
Tue Sep 12 19:41:59 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 19:41:59 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4, interface have no IPv4 gateway
Tue Sep 12 19:41:59 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via  dev eth1
Tue Sep 12 19:42:00 2023 user.notice omr-bypass: OMR-ByPass is running
Tue Sep 12 19:42:00 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:42:00 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Tue Sep 12 19:42:00 2023 daemon.notice netifd: wan1 (4421): udhcpc: sending discover
Tue Sep 12 19:42:01 2023 user.notice post-tracking-post-tracking: Restart wan1
Tue Sep 12 19:42:01 2023 daemon.notice netifd: wan1 (4421): udhcpc: received SIGTERM
Tue Sep 12 19:42:01 2023 daemon.notice netifd: wan1 (4421): udhcpc: entering released state
Tue Sep 12 19:42:01 2023 daemon.notice netifd: wan1 (4421): Command failed: Permission denied
Tue Sep 12 19:42:01 2023 daemon.notice netifd: Interface 'wan1' is now down
Tue Sep 12 19:42:01 2023 daemon.notice netifd: Interface 'wan1' is disabled
Tue Sep 12 19:42:06 2023 daemon.notice netifd: Interface 'wan1' is enabled
Tue Sep 12 19:42:06 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Tue Sep 12 19:42:06 2023 daemon.notice netifd: wan1 (7666): udhcpc: started, v1.33.2
Tue Sep 12 19:42:06 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending discover
Tue Sep 12 19:42:09 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending discover
Tue Sep 12 19:42:10 2023 daemon.notice netifd: Network device 'eth1' link is up
Tue Sep 12 19:42:10 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Tue Sep 12 19:42:10 2023 kern.info kernel: [261004.611066] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Tue Sep 12 19:42:12 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending discover
Tue Sep 12 19:42:12 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending select for 192.168.100.100
Tue Sep 12 19:42:12 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:42:12 2023 daemon.notice netifd: Interface 'wan1' is now up
Tue Sep 12 19:42:13 2023 daemon.notice netifd: Network device 'eth1' link is down
Tue Sep 12 19:42:13 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Tue Sep 12 19:42:13 2023 user.notice firewall: Reloading firewall due to ifup of wan1 (eth1)
Tue Sep 12 19:42:13 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Tue Sep 12 19:42:13 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan1 (eth1)
Tue Sep 12 19:42:13 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:42:15 2023 kern.info kernel: [261009.760780] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Tue Sep 12 19:42:15 2023 daemon.notice netifd: Network device 'eth1' link is up
Tue Sep 12 19:42:15 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Tue Sep 12 19:42:16 2023 user.notice post-tracking-post-tracking: Set firewall on server vpsLuna
Tue Sep 12 19:42:23 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:43:14 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:43:44 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:43:44 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:44:45 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:45:12 2023 daemon.info v2ray: DNS disabled: main_dns
Tue Sep 12 19:45:12 2023 daemon.info v2ray: Setting transparent proxy on port: 1897
Tue Sep 12 19:45:12 2023 daemon.info v2ray: Transparent proxy mode: default
Tue Sep 12 19:45:12 2023 user.notice v2ray: add rules
Tue Sep 12 19:45:12 2023 user.notice v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward
Tue Sep 12 19:45:12 2023 user.notice v2ray: Reload omr-bypass rules
Tue Sep 12 19:45:12 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 19:45:13 2023 user.notice omr-bypass: OMR-ByPass is running
Tue Sep 12 19:45:13 2023 daemon.info v2ray[31936]: V2Ray 4.43.0 (V2Fly, a community-driven edition of V2Ray.) OpenWrt R1 (go1.17.3 linux/amd64)
Tue Sep 12 19:45:13 2023 daemon.info v2ray[31936]: A unified platform for anti-censorship.
Tue Sep 12 19:45:13 2023 daemon.info v2ray[31936]: 2023/09/12 23:45:13 [Info] main/jsonem: Reading config: /var/etc/v2ray/v2ray.main.json
Tue Sep 12 19:45:13 2023 user.notice aggregate-tracker: [curl] PID 28854: 2023-09-12 19:45:00 5.66 Mbps [speed] PID 28854: eth1: 0 kB/s eth2: 308 kB/s,  Not operating in aggregate mode, restarting v2ray in [5s]
Tue Sep 12 19:45:15 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:45:15 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:46:16 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:46:46 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:46:46 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:47:48 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:48:18 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:48:18 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:49:19 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:49:49 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:49:49 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:50:50 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:51:20 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 0.0.0.0
Tue Sep 12 19:51:20 2023 daemon.notice netifd: wan1 (7666): udhcpc: lease of 192.168.100.100 obtained, lease time 122
Tue Sep 12 19:52:12 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 19:52:12 2023 user.notice omr-bypass: OMR-ByPass is running
Tue Sep 12 19:52:12 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:52:22 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending renew to 192.168.100.1
Tue Sep 12 19:52:22 2023 user.notice omr-tracker: Launching...
Tue Sep 12 19:52:29 2023 user.notice omr-tracker: Launched
Tue Sep 12 19:52:29 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4 gateway
Tue Sep 12 19:52:29 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via  dev eth1
Tue Sep 12 19:52:29 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Tue Sep 12 19:52:30 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Tue Sep 12 19:52:31 2023 user.notice post-tracking-post-tracking: Restart wan1
Tue Sep 12 19:52:31 2023 daemon.notice netifd: wan1 (7666): udhcpc: received SIGTERM
Tue Sep 12 19:52:31 2023 daemon.notice netifd: wan1 (7666): udhcpc: unicasting a release of 192.168.100.100 to 192.168.100.1
Tue Sep 12 19:52:31 2023 daemon.notice netifd: wan1 (7666): udhcpc: sending release
Tue Sep 12 19:52:31 2023 daemon.notice netifd: wan1 (7666): udhcpc: entering released state
Tue Sep 12 19:52:31 2023 daemon.notice netifd: wan1 (7666): Command failed: Permission denied
Tue Sep 12 19:52:31 2023 daemon.notice netifd: Interface 'wan1' is now down
Tue Sep 12 19:52:31 2023 daemon.notice netifd: Interface 'wan1' is disabled
Tue Sep 12 19:52:31 2023 daemon.notice netifd: Network device 'eth1' link is down
Tue Sep 12 19:52:31 2023 daemon.notice netifd: Interface 'wan1' has link connectivity loss
Tue Sep 12 19:52:32 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:52:36 2023 daemon.notice netifd: Interface 'wan1' is enabled
Tue Sep 12 19:52:36 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Tue Sep 12 19:52:36 2023 daemon.notice netifd: wan1 (25077): udhcpc: started, v1.33.2
Tue Sep 12 19:52:36 2023 daemon.notice netifd: wan1 (25077): udhcpc: sending discover
Tue Sep 12 19:52:39 2023 daemon.notice netifd: wan1 (25077): udhcpc: sending discover
Tue Sep 12 19:52:42 2023 daemon.notice netifd: wan1 (25077): udhcpc: sending discover
Tue Sep 12 19:53:26 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 19:53:36 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Tue Sep 12 19:53:36 2023 user.notice omr-tracker: Launching...
Tue Sep 12 19:53:43 2023 user.notice omr-tracker: Launched
Tue Sep 12 19:53:43 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 19:53:43 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because wan1 may have ip issues, interface have no IPv4, interface have no IPv4 gateway
Tue Sep 12 19:53:43 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via  dev eth1
Tue Sep 12 19:53:44 2023 user.notice omr-bypass: OMR-ByPass is running
Tue Sep 12 19:53:44 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Tue Sep 12 19:53:45 2023 user.notice post-tracking-post-tracking: Restart wan1
Tue Sep 12 19:53:45 2023 daemon.notice netifd: wan1 (25077): udhcpc: received SIGTERM
Tue Sep 12 19:53:45 2023 daemon.notice netifd: wan1 (25077): udhcpc: entering released state
Tue Sep 12 19:53:45 2023 daemon.notice netifd: wan1 (25077): Command failed: Permission denied
Tue Sep 12 19:53:45 2023 daemon.notice netifd: Interface 'wan1' is now down
Tue Sep 12 19:53:45 2023 daemon.notice netifd: Interface 'wan1' is disabled
Tue Sep 12 19:53:50 2023 daemon.notice netifd: Interface 'wan1' is enabled
Tue Sep 12 19:53:50 2023 daemon.notice netifd: Interface 'wan1' is setting up now
Tue Sep 12 19:53:50 2023 daemon.notice netifd: wan1 (5102): udhcpc: started, v1.33.2
Tue Sep 12 19:53:50 2023 daemon.notice netifd: wan1 (5102): udhcpc: sending discover
Tue Sep 12 19:53:52 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 19:53:52 2023 user.notice omr-bypass: OMR-ByPass is running

Finally it is fixed here 1 hour later:

Tue Sep 12 20:38:25 2023 kern.info kernel: [264379.661140] igc 0000:03:00.0 eth1: igc: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Tue Sep 12 20:38:25 2023 daemon.notice netifd: Network device 'eth1' link is up
Tue Sep 12 20:38:25 2023 daemon.notice netifd: Interface 'wan1' has link connectivity
Tue Sep 12 20:38:26 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 20:38:28 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched up
Tue Sep 12 20:38:38 2023 user.notice post-tracking-post-tracking: New public ip detected for wan1 (eth1): goodstarlinkip
Tue Sep 12 20:38:59 2023 user.notice MPTCP: Flush route cache
Tue Sep 12 20:38:59 2023 user.notice omr-tracker: Launching...
Tue Sep 12 20:39:07 2023 user.notice omr-tracker: Launched
Tue Sep 12 20:39:07 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 12 20:39:07 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
- tonight omr-tracker is active, it can see the problem and tries to restart the interface over and over but it still fails.
- last night, omr-tracker was not triggered, after the restart, it ran once then did not run again for 7 hours. What is the difference.

Solutions

  • This is a upstream bug but it affects OMR. Do you know the busybox project page? One user claimed that the bug was reported a long time ago but I could not find the bug report. Maybe there is already fix in a new version?
  • Tonight omr-tracker was active and running but it still did not fix the issue for over 1 hour.
  • Last night it only ran 1 time after restarting the interface then did not run again for 7 hours. When it did run it restarted the interface and grabbed the correct IP. What is the difference?
  • since omr-tracker is actively restarting the starlink interface, this issue might happen more often with OMR than a router that is not doing this.
  • We should look at adjusting the omr-tracker settings for starlink, maybe it is too aggressive. What do you suggest changing first? I can try to test.
  • Is it possible to set a static address for starlink with omr, will this work or cause more problem since dishy is supplying the dhcp addresses.
  • I think something like this ifdown wan1; sleep 10s; ifup wan1 is better than this ifdown wan1 && ifup wan1 for starlink. I have added this to my aggregate script
  • I am not sure how to test it because I had read about this issue 1 year ago in a few different places, I think it is well understood that starlink boots with a temporary admin dhcp address and that can cause issues with 3rd party routers like omr but I did not see it until today. I think it happened a few times times but I only saw it because I am actively looking at logs.
  • what actually triggers omr-tracker in the last log that fixed the problem is it listening for the kernel message of the if coming back up?
  • starlink will be a popular option for OMR users because its a global ISP and OMR is a solution for users with a lack of good ISP options so OMR should have some solution for this starlink edge case.

@Ysurac
Copy link
Owner

Ysurac commented Sep 13, 2023

Seems that it's fixed in busybox git, I will check if it's fixed in the versions used or if I need to update them.
OMR-Tracker do something new if the error is new, so maybe a starlink reboot after update.
You can add a new interface (wan1) and set specific timeout and tries.
For static IP with starlink dish I have no idea, but I suppose you can and this would fix DHCP problem.
OMR-Tracker to a ifdown wan1 && sleep 5s && ifup wan1 already.
I have Starlink but I use the provided router, so no issues. I will test without the router asap.

@ioogithub
Copy link
Author

ioogithub commented Sep 13, 2023

Seems that it's fixed in busybox git, I will check if it's fixed in the versions used or if I need to update them.

I looked for this yesterday but I couldn't find it. Do you have the git link?

OMR-Tracker do something new if the error is new, so maybe a starlink reboot after update.

It could be startlink updating I thought about that too but it happened two days in a row but different behavior from OMR. First day it only ran OMR-tracker once time and not again for 7 hours so it thought everything was good. Second day it detected there was no gateway and tried many times to restart the interface.

You can add a new interface (wan1) and set specific timeout and tries.

Do you mean do this to test did different settings? It might be hard because I don't know how to simulate this error.

I have Starlink but I use the provided router, so no issues. I will test without the router asap.

I am using the router but I have it in bypass mode so I can use it with OMR. When you say you use provided router do you mean you are using the router wifi capabilities? If you connect OMR to Starlink then you will have double NAT? Seems bypass mode is the best configuration for using startlink with OMR. Do you mean you will test with the router in bypass mode?

@Network-Traditions I thought I saw you mentioned this problem in another issue around 1 year ago but I can't find it now. Did you find a work around?

@Network-Traditions
Copy link

@ioogithub we had and still have an issue #2793 where our T-Mobile Business Cellular Internet WAN controlled by MODEMMANAGER regularly (once per day on average) stops receiving packets. Executing a restart from Luci generally fails to successfully bring the interface back online, which is also the case with OMR-Tracker. We developed a bash script to reset the USB interface, which succeeds in bringing the WAN connection back online 9 out of 10 times. When it doesn't a full power cycle of the MODEM or reboot of OMR generally resolves the issue. We are hoping this behavior will resolve with a Debian 12 VPS running an an Oracle Ampere ARM64 instance using OMR v6.1 with upstream MPTCP. Finally, our current modem has the flawed chipset, which prohibits operating in "SA" mode on T-Mobile's network. I expect significant performance improvements to include potentially eliminating this issue when we upgrade (soon) to a T-Mobile certified modem, which will successfully support "SA" mode.

@ioogithub
Copy link
Author

@ioogithub we had and still have an issue #2793 where our T-Mobile Business Cellular Internet WAN controlled by MODEMMANAGER regularly (once per day on average) stops receiving packets.

Sounds like a different issue. This one involves starlink giving a temporary 192.168.x.x address after omr-tracker restarts it and holding on to the temporary lease instead of renewing the proper lease. I thought for sure I saw you linked to this issue before but it was over a year ago, sorry.

@Ysurac
Copy link
Owner

Ysurac commented Sep 14, 2023

For busybox, it's here: https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c#n1740
For OMR-Tracker, you can add, in the web interface, a network interface with specific settings. You should use that and set maybe a higher timeout and tries.
For Starlink, I use old provided router that have an ethernet interface. I will check without this router.

@ioogithub
Copy link
Author

For busybox, it's here: https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c#n1740

Thanks I will read and try to understand. Busybox is a single binary isn't it. If we wanted to update OMR is it as easy as installing the new binary on the router and testing it? If you make a decision I am available to test it.

For OMR-Tracker, you can add, in the web interface, a network interface with specific settings. You should use that and set maybe a higher timeout and tries.

I am looking at the OME web interface, what type of setting could be used to mitigate this issue for example?

So same issue happened again 1 hour ago:

Sep 14 10:16:31 vps glorytun-tcp-run[683]: read: Connection timed out
Sep 14 10:16:32 vps glorytun-tcp-run[683]: ::ffff:35.203.210.53.61748: connected
Sep 14 10:16:32 vps glorytun-tcp-run[683]: bad packet [49010F1C49010F1D49010F1E] !
Sep 14 10:16:32 vps glorytun-tcp-run[683]: ::ffff:35.203.210.53.61748: key exchange failed
Sep 14 10:16:38 vps glorytun-tcp-run[683]: ::ffff:wan2ip.56040: connected

and on vps:

Thu Sep 14 10:16:37 2023 daemon.err glorytun[31814]: read: Operation timed out
Thu Sep 14 10:16:37 2023 daemon.info glorytun[31814]: STOPPED tun0
Thu Sep 14 10:16:38 2023 daemon.err glorytun[31814]: vpsip.65001: connected
Thu Sep 14 10:16:38 2023 daemon.info glorytun[31814]: STARTED tun0
Thu Sep 14 10:16:54 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from wan1ip error (1.2.4.8,80.67.169.40,114.114.114.114)
Thu Sep 14 10:16:54 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan2gateway dev eth2
Thu Sep 14 10:16:56 2023 user.notice post-tracking-post-tracking: Restart wan2
Thu Sep 14 10:16:56 2023 daemon.notice netifd: wan2 (6225): udhcpc: received SIGTERM
Thu Sep 14 10:16:56 2023 daemon.notice netifd: wan2 (6225): udhcpc: unicasting a release of wan1ip to wan1gateway
Thu Sep 14 10:16:56 2023 daemon.notice netifd: wan2 (6225): udhcpc: sending release
Thu Sep 14 10:16:56 2023 daemon.notice netifd: wan2 (6225): udhcpc: entering released state
Thu Sep 14 10:16:56 2023 daemon.notice netifd: wan2 (6225): Command failed: Permission denied
Thu Sep 14 10:16:56 2023 daemon.notice netifd: Interface 'wan2' is now down
Thu Sep 14 10:16:56 2023 kern.info kernel: [399884.971093] igc 0000:04:00.0: changing MTU from 1460 to 1500
Thu Sep 14 10:16:56 2023 daemon.notice netifd: Interface 'wan2' is disabled
Thu Sep 14 10:16:56 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Sep 14 10:16:56 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss

I know you said this is just a scanner but every time after the scanner sends the bad packet, eth2 goes down.
It can't just be a coincidence that 3 days a scanner sends a bad packet and right after tun0 is restarted and wan2 interface goes down. Am I reading the log entries right? Isn't this at attack if it can bring down an interface? Also it is always wan2 that comes down after the glorytun restart.

@Ysurac
Copy link
Owner

Ysurac commented Sep 14, 2023

For OMR-Tracker, as I said you can increase timeout and tries.

For Glorytun-TCP, you see the problem on VPS because wan2 already doesn't answer, it's the "connection timed out", as there is no active connection glorytun-tcp answer to all connection, it's the bot with 35.203.* IP, then wan2 reconnect but seems to still loose many packets so omr-tracker detect it as down and restart interface.
In you log, you put "wan1ip" for udhcpc, it's really wan1 IP released for wan2 ? This seems strange.

@ioogithub
Copy link
Author

For Glorytun-TCP, you see the problem on VPS because wan2 already doesn't answer, it's the "connection timed out", as there is no active connection glorytun-tcp answer to all connection, it's the bot with 35.203.* IP, then wan2 reconnect but seems to still loose many packets so omr-tracker detect it as down and restart interface.

Okay I understand this, glorytun is timing out before the bad packet this is the start of the event. Glorytun is just reacting to the bot scan afterwards.

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants