Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMR Script Triggers for VPS and OMR-WAN Fail Events #2793

Closed
Network-Traditions opened this issue Mar 6, 2023 · 6 comments
Closed

OMR Script Triggers for VPS and OMR-WAN Fail Events #2793

Network-Traditions opened this issue Mar 6, 2023 · 6 comments

Comments

@Network-Traditions
Copy link

Is your feature request related to a problem? Please describe.
When a stable production VPS or OMR WAN connection develops an issue, rebooting the VPS or WAN interface often resolves the problem.

Describe the solution you'd like
Upon a VPS or OMR WAN failure event, have an option to trigger a bash script on OMR to complete a task such as REBOOT/RESTART to execute corrective action programatically.

Describe alternatives you've considered
We considered digging into OMR to capture failover events in manner similar to the method we use with the following script to obtain the OMR WAN public IP addresses for the dynamic DNS service:

  • pip=$(uci get openmptcprouter.wan1.publicip 2> /dev/null)
  • if [ -z "$pip" ]; then echo "0.0.0.0"; else echo $pip; fi

Additional context
We have a USA T-Mobile Business connection as one of our OMR WAN services. This service encounters connection problems (every 1 to 2 days) in such a way that the OMR USB Modem interface requires a restart to begin receiving packets again. We use the following manually executed bash script to remotely achieve the restart:

  • echo "usb2"> /sys/bus/usb/drivers/usb/unbind
  • sleep 5
  • echo "usb2"> /sys/bus/usb/drivers/usb/bind

Clicking OMR's WAN "Restart" button does not resolve this connection problem, only unplugging the USB modem connection or this script allows the modem to begin receiving packets again.

A similar issue exists with the VPS as well. When OMR is experiencing abnormal issues like VPN down, unable to contact the Admin Script, VPS disconnects, etc., often a VPS reboot resolves the issue.

We've deployed scripts on OMR to dynamically register the WAN public IP addresses and the currently connected VPS public IP addresses with our DNS provider. Doing so allows OMR to programatically update our public DNS connections when it switches VPS providers. Subsequently within about 5 minutes our email, voip calls, web services all get back online automatically without admin intervention. During such a fail event, we would like to execute a REBOOT/RESTART on the failing VPS in an attempt to clear the problem and restore it to a nominal operating status.

@Ysurac
Copy link
Owner

Ysurac commented Mar 7, 2023

I would need to know why VPS is failing to solve that. So I would need /var/log/daemon.log, ip r, ip a, iptables-save and a status page screenshot on router side when it's failing and when it's working.

@Network-Traditions
Copy link
Author

I'll try to capture that information when it occurs next time, but what I was wondering about, is there an opportunity to hook into OMR's monitoring events to trigger a bash script. (i.e. OMR determines the master VPS is offline and switches to an alternate and has an option to trigger a bash script upon completion that would execute a custom action such as reboot the master VPS). Something similar for the WAN interfaces would be useful as well. When our T-Mobile service starts giving us problems, the "RX" packet count of Network-Interface stops incrementing or only increments a few packets per display update. Eventually the OMR status page will indicate the respective WAN service is down with a red X. Hooking OMR's connection monitoring of the WAN interfaces to allow for custom bash script execution would present the same opportunity to programatically implement corrective action that otherwise required admin intervention.

@Network-Traditions
Copy link
Author

Data for a T-Mobile loss of service event at about 19:45 log time. When this type of service loss occurs, eventually OMR will loop attempting to recover the interface (WAN1). Observing the Network-Interfaces tab as illustrated below next to the red arrow, the RX: packets will not change from the number displayed:
TmobileFail
The System-OpenMPTCProuter-Status tab will indicate a problem as show for the "cell" WAN1 here:
TmobileFail0
The T-Mobile loss of service event eventually causes a disruption of all service temporarily at a minimum, but sometimes permanently requiring VPS and/or OMR reboots to regain internet connectivity with "star" WAN2, the multipath master:
TmobileFail1
Additionally, the OMR console will begin logging the following events:
OMR-dmesg.txt
To correct the problem, I SSH'd into OMR and executed the following script about 19:58 log time:

echo "usb2"> /sys/bus/usb/drivers/usb/unbind
sleep 5
echo "usb2"> /sys/bus/usb/drivers/usb/bind

Subsequently, ModemManager most often can fully recover the interface and bring WAN1 service back online for OMR:
InterfacesAfterUSBreset
StatusAfterUSBreset
Here are the requested logs:
daemon.txt
OMR-Systemlog.txt
ipa.txt
ipr.txt
iptables.txt

@Ysurac
Copy link
Owner

Ysurac commented Mar 8, 2023

This seems to be a bug in USB driver, I will add a kernel patch that may solve the issue.

@Network-Traditions
Copy link
Author

OK, let me know when you're ready to test the patch and how to properly implement it and I will let you know the result. Our system experiences the T-Mobile failure on average between 1-2 days.

@github-actions
Copy link

github-actions bot commented Jun 6, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants