Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services not publishing Tunnel Only Nodes - Nightly 1112 (LUA) #324

Closed
k1ky opened this issue Apr 1, 2022 · 19 comments
Closed

Services not publishing Tunnel Only Nodes - Nightly 1112 (LUA) #324

k1ky opened this issue Apr 1, 2022 · 19 comments

Comments

@k1ky
Copy link

k1ky commented Apr 1, 2022

Services are not publishing on Tunnel Only connected nodes when connected via Wi-Fi as WAN. Once they connect to another node via RF or DTD the services reappear. Disconnect the "buddy" and the services disappear. This dates back several Nightlies since the full LUA migration. (Tim has the support data and expansion on this problem comments below from @aanon4 )

@ab7pa
Copy link
Contributor

ab7pa commented Apr 1, 2022

Interesting... I've not seen this. Could you attach a support bundle taken after you see the issue? THX

@aanon4
Copy link
Contributor

aanon4 commented Apr 1, 2022

Some background on this. This only happens on hardware which has at least two ethernet devices (eth0 and eth1). When Mesh WiFI is disabled, we create a "dummy" mesh device called either eth0.3975 or eth1.3975 which we assign the mesh IP address to. We do this for a number of reasons, but one of which is keep OLSRD happy which needs to see its primary IP address attached to a real network devices, and of course we disabled the Mesh WiFI so we can't use that.

OLSRD will only publish services associated with IP address attached to active network devices. OLSRD considers a network to be active if the device is up and if that devices has a carrier. For an ethernet device (and any VLAN associated with it) this means a cable must be plugged in (and the other end into a switch, etc.) Generall this is fine because, if wifi is disable, the device is probably connected via DtD or LAN to something, and the VLAN is created on that physical network device. However, if only the WAN is connected, no carrier will be detected for the VLAN and so OLSRD will not publish any services associated with that IP address.

This is what happens here I think.

@ab7pa
Copy link
Contributor

ab7pa commented Apr 2, 2022

That explanation makes sense, @aanon4 Tim -- I've just never seen this issue on any of my non-RF tunnel nodes. They all display the services list even though they only have the "dummy" eth0.3975 interface. I haven't been able to reproduce the issue on any of my wifi-wan tunnel nodes. I was hoping maybe there'd be a log entry if an error was occurring.

@k1ky
Copy link
Author

k1ky commented Apr 2, 2022

This occurs on both GLINET AR750 Creta and Mikrotik hAP AC so far. @ab7pa I can post a support file if needed.

@ab7pa
Copy link
Contributor

ab7pa commented Apr 2, 2022

@k1ky So the issue is only occurring on dual-radio nodes?

@k1ky
Copy link
Author

k1ky commented Apr 2, 2022

I have not tried it on single radio nodes, but can check. I'm connecting to the the HAP/AR750 via Wi-Fi from my Laptop, then the node connects to a Home Wi-Fi using the Wi-Fi WAN. Tunnel enabled and connected to an offsite station.

@k1ky
Copy link
Author

k1ky commented Apr 2, 2022

Why doesn't this system support .tgz file format file attachment? Here is the .tgz support file zipped
supportdata-K1KY-AR750-750-P1-202204020946.zip

@aanon4
Copy link
Contributor

aanon4 commented Apr 2, 2022

So the important information is in data.txt; specifically:

eth1.3975 Link encap:Ethernet  HWaddr 94:83:C4:15:51:B6  
          inet addr:10.20.81.182  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

and

Table: Interfaces
Name	State	MTU	WLAN	Src-Adress	Mask	Dst-Adress
eth1.3975	DOWN
eth1.2	DOWN
tun60	UP	1422	No	172.31.111.77	255.255.255.252	255.255.255.255

The first the how linux views the state of the network devices, while the second is how OLSR views the state of the devices. If we did an 'ip link' dump (which we don't - maybe we should add it?) you'd see that eth1.3975 and eth1.2 both have NO-CARRIER set.

@aanon4
Copy link
Contributor

aanon4 commented Apr 2, 2022

As for solutions, unless there's a flag somewhere to tell linux to fake the carrier on a device (I can't find one) then there are two options I can think of. One is switch from using the LAN/DtD device for the VLAN to something which is always up. There are probably candidates for this, although the obvious one is the loopback address and OLSRD won't accept that. The second solution is for OLSR to ignore (selectively?) the NO-CARRIER state of a device. It is explicitly filtering the information is publishes based on the interface state, so we could modify the code to not care under some circumstances.

@k1ky
Copy link
Author

k1ky commented Apr 3, 2022

Just for the fun of it, I turned off Wi-Fi on my computer (which was connecting to the MESH Node AR750), and connected computer direct via Ethernet to the LAN port on the node and no difference regarding the Services not listing. The node was still connected to my home Wi-Fi and Tunneled to the outside world.

@aanon4
Copy link
Contributor

aanon4 commented Apr 3, 2022

Could you attach updated system data with this changed configuration?

@k1ky
Copy link
Author

k1ky commented Apr 3, 2022

Here ya go. Laptop Wi-Fi=Off, connected to AR750 direct via Ethernet.
supportdata-K1KY-AR750-750-P1-202204022221.zip

@aanon4
Copy link
Contributor

aanon4 commented Apr 3, 2022

Thanks. Could I have you reboot the node in this configuration (so the ethernet is connected during the reboot rather than being plugged in after the fact)?

@k1ky
Copy link
Author

k1ky commented Apr 3, 2022

Interesting to note that after sitting overnight, the Services are being published. Here is a support file after reboot with Ethernet from Laptop connected to LAN and no DTD or MESH RF Connections. Services are listed in this configuration. N-1112
supportdata-K1KY-AR750-750-P1-202204020946.zip

@aanon4
Copy link
Contributor

aanon4 commented Apr 3, 2022

For whatever reason your node had stopped updating the host and service files (something OLSRD does) for quite a while before you plugged in your ethernet cable. Not entirely obvious to me why as OLSRD was still working fine so ... perhaps .. there were no incoming changes from the network (which seems very unlikely on a mesh network, but maybe more likely if you're only connection is a tunnel). I suspect plugging in a cable wasn't sufficient reason for OLSRD to update these files .. which perhaps it should but at this point we're in corner cases or corner cases territory.

There's options to fix this a few comments back, so I'll let people provide feedback on those.

@k1ky
Copy link
Author

k1ky commented Apr 4, 2022

I'm wondering what's different from before the LUA migration that this function doesn't work under these conditions on these units?

@aanon4
Copy link
Contributor

aanon4 commented Apr 4, 2022

You can verify if this is LUA related by trying this on the current release build (which is non-LUA), but I think you'll see the same thing.

@aanon4
Copy link
Contributor

aanon4 commented Apr 6, 2022

Proposed fix: #327

@k1ky
Copy link
Author

k1ky commented Apr 27, 2022

This issue appears to be fixed with a few nightly releases a week or so ago. Appears to be good with Nightly 1191 4/26/22.

@k1ky k1ky closed this as completed Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants