Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails to add tunnel as it already exists - Smartgateway fails #83

Closed
SvenRoederer opened this issue Jun 28, 2020 · 6 comments
Closed

fails to add tunnel as it already exists - Smartgateway fails #83

SvenRoederer opened this issue Jun 28, 2020 · 6 comments

Comments

@SvenRoederer
Copy link
Contributor

On my node I was not able to access the internet via SmartGateway.
ip route returned no default route even a Smartgateway was avail in the net (direct neighbour 10.36.217.96). I found the logfile full of following messages:

Sun Jun 28 16:29:25 2020 daemon.err olsrd[6735]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Sun Jun 28 16:29:25 2020 daemon.err olsrd[6735]: Cannot create tunnel tnl_0a24d960
Sun Jun 28 16:29:43 2020 daemon.err olsrd[6735]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Sun Jun 28 16:29:43 2020 daemon.err olsrd[6735]: Cannot create tunnel tnl_0a24d960
Sun Jun 28 16:29:50 2020 daemon.info odhcpd[848]: Using a RA lifetime of 0 seconds on br-dhcp
Sun Jun 28 16:29:51 2020 daemon.err olsrd[6735]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Sun Jun 28 16:29:51 2020 daemon.err olsrd[6735]: Cannot create tunnel tnl_0a24d960
Sun Jun 28 16:29:59 2020 daemon.err olsrd[6735]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Sun Jun 28 16:29:59 2020 daemon.err olsrd[6735]: Cannot create tunnel tnl_0a24d960

This was seen running OpenWrt (Freifunk "Firmware Berlin (Hedy 1.0.6)")

@HRogge
Copy link
Contributor

HRogge commented Jun 28, 2020 via email

@SvenRoederer
Copy link
Contributor Author

Today I had some luck and found something in the logfile:

Thu Aug 20 13:55:24 2020 daemon.info olsrd[6019]: Tunnel tnl_0a1f0b01 added, to 10.31.11.1
Thu Aug 20 13:56:38 2020 daemon.info olsrd[6019]: Tunnel tnl_0a1f0b01 removed, to -
Thu Aug 20 13:56:38 2020 daemon.info olsrd[6019]: Tunnel tnl_0a24d960 added, to 10.36.217.96
Thu Aug 20 14:22:55 2020 kern.info kernel: [1140925.794673] 
Thu Aug 20 14:22:55 2020 kern.info kernel: [1140925.794673] do_page_fault(): sending SIGSEGV to olsrd for invalid read access from 00000004
Thu Aug 20 14:22:55 2020 kern.info kernel: [1140925.803387] epc = 00408779 in olsrd[400000+38000]
Thu Aug 20 14:22:55 2020 kern.info kernel: [1140925.808385] ra  = 004117a5 in olsrd[400000+38000]
Thu Aug 20 14:22:55 2020 kern.info kernel: [1140925.813384] 
Thu Aug 20 14:25:01 2020 user.notice OLSR watchdog: Process died - restarting!
Thu Aug 20 14:25:01 2020 daemon.info olsrd: /etc/init.d/olsrd: olsrd_write_interface() Warning: Interface 'wireless1' not found, skipped
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '1' (was 1) to /proc/sys/net/ipv4/ip_forward
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/tunl0/rp_filter
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/all/send_redirects
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/all/rp_filter
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/wlan0-adhoc-2/send_redirects
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/wlan0-adhoc-2/rp_filter
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: Adding interface wlan0-adhoc-2
Thu Aug 20 14:25:01 2020 daemon.info olsrd[8790]: New main address: 10.31.18.82
Thu Aug 20 14:25:02 2020 daemon.info olsrd[8790]: olsr.org - 0.9.0.3-git_beaa2ecc10-hash_c30b62bc0274c1fe3eb84848cb5f881d successfully started
Thu Aug 20 14:25:03 2020 daemon.info olsrd: /etc/init.d/olsrd: olsrd_setup_smartgw_rules() Notice: Inserting firewall rules for SmartGateway
Thu Aug 20 14:25:32 2020 daemon.info olsrd[8790]: Tunnel tnl_0ae6d26f added, to 10.230.210.111
Thu Aug 20 14:27:04 2020 daemon.info olsrd[8790]: Tunnel tnl_0ae6d26f removed, to -
Thu Aug 20 14:27:04 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:04 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:04 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:04 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:06 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:06 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:06 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:06 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:07 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:07 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:08 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:08 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960
Thu Aug 20 14:27:09 2020 daemon.err olsrd[8790]: Cannot add tunnel tnl_0a24d960 to 10.36.217.96: File exists (17)
Thu Aug 20 14:27:09 2020 daemon.err olsrd[8790]: Cannot create tunnel tnl_0a24d960

Up to some point the system works normal and changes the SmartGateway endpoint from time to time. For an unknown reason olsrd crashes then suddenly, and for this reason is not removing the established tunnel.
Thhe Freifunk OLSR-watchdog detects this crash and restarts the daemon. After some time the tunnel to the same endpoint, as before the crash, should be setup again. But this fails, as it's still there. Even the (old) tunnel is still there, it will not be used by olsr and the node has no uplink-route.

@pmelange
Copy link
Contributor

First, the version of OLSRd used in Hedy-1-0-X is way old and I have not seen this happen with the latest stable release.

Second, this is not a new problem, which was fixed in openwrt/luci@5d0b720 and openwrt/luci@667f73a way back in July 2018

So, for me the question is: Why aren't these changes in the Hedy-1.0.6 release?

The original freifunk issue is freifunk-berlin/firmware#522

@SvenRoederer
Copy link
Contributor Author

First, the version of OLSRd used in Hedy-1-0-X is way old and I have not seen this happen with the latest stable release.

I have not checked the code, but I assume the initial problem is still there. When the tunnel can not be setup, as it's already there, the daemon will flood the logfile with messages about this. So probably the handling of such situations should be rethought.

Second, this is not a new problem, which was fixed in openwrt/luci@5d0b720 and openwrt/luci@667f73a way back in July 2018

So, for me the question is: Why aren't these changes in the Hedy-1.0.6 release?

Obviously it was never cherry-picked / backported to the 17.01 branch.

@fhuberts
Copy link
Contributor

fhuberts commented Sep 24, 2020

I've gone to great lengths in the code to always clean up the tunnels.
IIRC there is an exit handler that cleans up the tunnels on exit and on a crash, and before smartgateway is started a cleanup is performed (though I'm not sure about that last part, it's been a long time).

Only on rare occasions (e.g. very hard crashes in which the exit handler is not called) can this still occur.
I remember distincly putting a lot of effort in fixing this because it was causing problems for us too at the time.

I strongly suggest you upgrade olsrd to the latest master.

Edit: I also suggest to use the sgw script that is provided by olsrd (files/sgw_policy_routing_setup.sh) and not use your own script. That script is an integral part of sgw.

@SvenRoederer
Copy link
Contributor Author

As the very old olsr nodes, which prevented us to use more recent version (#20), have gone offline, an update can be rolled out. If the olsr-code also was improved here, the issue might be not such a show stopper and resource-sink anymore.

Just had a short look into the olsrd-package (https://github.com/openwrt-routing/packages/tree/master/olsrd) and it seems that they also use the original sgw-script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants