Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubiquiti AC devices dont return any devices during a scan #888

Closed
aanon4 opened this issue Jul 2, 2023 · 25 comments
Closed

Ubiquiti AC devices dont return any devices during a scan #888

aanon4 opened this issue Jul 2, 2023 · 25 comments
Labels

Comments

@aanon4
Copy link
Contributor

aanon4 commented Jul 2, 2023

If you run a scan on a Ubiquiti AC device, you dont see any devices. In fact, the underlying command for this "iw wlan0 scan passive" doesnt return any devices except yourself (an active scan does the same). This works fine on non AC devices and AC devices from Mikrotik, so something low level is amiss here.

@Orv
Copy link

Orv commented Jul 24, 2023

Can we flag this as a bug? That would be helpful when looking at the list of issues.

@aanon4 aanon4 added the bug Something isn't working label Jul 24, 2023
@aanon4
Copy link
Contributor Author

aanon4 commented Oct 19, 2023

For my notes (and any one interested) here's a comparison of the WMI events (that basically the wifi chipset+firmware talking to the kernel) for an Mikrotik AC device compared to a UBNT AC device:

Mikrotik AC:

WMI_UPDATE_STATS_EVENTID
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.460368] ath10k_ahb a800000.wifi: scan event foreign channel exit type 256 reason 6 freq 5640 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.461357] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5640 cmd_flags 2 noise_floor -105 rx_clear_count -660641922 cycle_count -777560427
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.461788] ath10k_ahb a800000.wifi: scan event bss channel type 4 reason 6 freq 5640 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.462185] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5640 cmd_flags 1 noise_floor -105 rx_clear_count -660641922 cycle_count -777487445
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.578259] ath10k_ahb a800000.wifi: scan event foreign channel type 8 reason 6 freq 5655 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.578820] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5655 cmd_flags 0 noise_floor 0 rx_clear_count -658916258 cycle_count -762746815
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.718138] ath10k_ahb a800000.wifi: scan event foreign channel exit type 256 reason 6 freq 5655 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.718677] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5655 cmd_flags 2 noise_floor -105 rx_clear_count -658852809 cycle_count -745254147
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.718785] ath10k_ahb a800000.wifi: scan event bss channel type 4 reason 6 freq 5655 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.719244] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5655 cmd_flags 1 noise_floor -105 rx_clear_count -658852809 cycle_count -745181172
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.836737] ath10k_ahb a800000.wifi: scan event foreign channel type 8 reason 6 freq 5660 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 23:40:00 2023 kern.debug kernel: [ 3265.837589] ath10k_ahb a800000.wifi: chan info err_code 0 freq 5660 cmd_flags 0 noise_floor 0 rx_clear_count -658217886 cycle_count -730451421

UBNT AC:

Thu Oct 19 22:29:11 2023 kern.debug kernel: [ 1852.947426] ath10k_pci 0000:00:00.0: scan event bss channel type 4 reason 3 freq 5745 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 22:29:11 2023 kern.debug kernel: [ 1852.947462] ath10k_pci 0000:00:00.0: chan info err_code 0 freq 5745 cmd_flags 1 noise_floor -108 rx_clear_count 169826664 cycle_count -189096302
Thu Oct 19 22:29:11 2023 kern.debug kernel: [ 1853.052603] ath10k_pci 0000:00:00.0: scan event foreign channel type 8 reason 3 freq 5875 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 22:29:11 2023 kern.debug kernel: [ 1853.052678] ath10k_pci 0000:00:00.0: chan info err_code 0 freq 5875 cmd_flags 0 noise_floor 0 rx_clear_count 170286491 cycle_count -179724281
Thu Oct 19 22:29:11 2023 kern.debug kernel: [ 1853.052745] ath10k_pci 0000:00:00.0: wmi event debug mesg len 1476
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1853.975500] ath10k_pci 0000:00:00.0: scan event foreign channel type 8 reason 3 freq 5905 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1853.975547] ath10k_pci 0000:00:00.0: chan info err_code 0 freq 5905 cmd_flags 0 noise_floor 0 rx_clear_count 174864117 cycle_count -98622809
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1853.997974] ath10k_pci 0000:00:00.0: scan event bss channel type 4 reason 3 freq 5745 req_id 40961 scan_id 40960 vdev_id 0 state running (2)
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1853.998037] ath10k_pci 0000:00:00.0: chan info err_code 0 freq 5745 cmd_flags 1 noise_floor -108 rx_clear_count 175098680 cycle_count -96532478
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1854.120964] ath10k_pci 0000:00:00.0: WMI_UPDATE_STATS_EVENTID
Thu Oct 19 22:29:12 2023 kern.debug kernel: [ 1854.124816] ath10k_pci 0000:00:00.0: wmi event debug mesg len 1476

The distinction in the scan data is that while both show an ever changing frequency for the "foreign" events, the UBNT always shows the same frequency for the bss while the Mikrotik show the same as the previous foreign event. Which is why we get sensible scans from the Mikrotik devices but not the UBNT ones.

@slightlyunconventional
Copy link

I have a LiteBeam 5AC running nightly build 20231212-ee0dd54 which seems to be showing the same thing. The symptom is that the web UI's "WiFi Scan" almost never lists anything other than nodes on the channel I've already set.

Since scan doesn't work at all, and you need a working Internet/meshmap to compensate, that is a significant limit on usefulness in an emergency situation...

My node is currently using channel 160, frequency 5800. It's pointed at a nest of other AREDN nodes, on channel/freqs 137/5685, 142/5710, 149/5745, 152/5760, 160/5800, 162/5810, 182/5910, all with 10 MHz bandwidth. I know I can connect to the 149 and 160 nodes, but the "foreign" one doesn't appear in the scans.

I got a list of WMI events by logging in as root with ssh (port 2222) and running
echo 0x10002002 > /sys/module/ath10k_core/parameters/debug_mask, iw wlan0 scan passive, and dumped them with dmesg. I'm attaching two logs here: one when I was configured to use channel 160, and one when I was using channel 149.

litebeam5ac-149.txt
litebeam5ac-160.txt

Since WMI is a firmware/driver interface, does that mean that something is wrong with the firmware (supplied by Ubiquity)? Or could there be a driver problem, in which case the Linux ath10k developers should be made aware of it?

@Orv
Copy link

Orv commented Feb 14, 2024 via email

@slightlyunconventional
Copy link

slightlyunconventional commented Mar 18, 2024

What is the next step with this?

  • Try an alternate firmware?
  • Talk to someone at Ubiquiti?
  • Consult with other users of the Ubiquiti firmware (e.g. Linux kernel developers)?
  • Develop a scan workaround for the old firmware?
  • Wait for Ubiquiti to release another firmware version and try that?

@ae6xe
Copy link
Contributor

ae6xe commented Mar 19, 2024

Try testing with current openwrt image. Possibly, would also need to configure an adhoc connection in openwrt. Does the scan also not return anything with an iw passive scan? If so, this could be submitted as a defect back to openwrt or the package owner of iw. (Maybe already a defect upstream?)

@slightlyunconventional
Copy link

Yes, the problem occurs with passive scans (I included logs above).

The latest OpenWRT build for LiteBeam AC Gen2 is 23.05.02, using ath10k-firmware-qca988x-ct. How can I figure out which firmware vendor and version the OpenWRT and AREDN builds are using, in order to compare them?

@Orv is #1055 the "bad performance" issue that forced the switch away from firmware with working scan? I don't see mention of switching firmware there (from what to what?) so I'm not sure...

@ae6xe
Copy link
Contributor

ae6xe commented Mar 19, 2024

@slightlyunconventional I see logs above from AREDN images on the devices. After installing the openwrt image on the same device(s) is the problem reproducible? If the issue occurs in openwrt, then it is a problem upstream and a defect can be submitted. If the issue is not reproducible in openwrt, then the issue is introduced locally by AREDN -- some incompatibility introduced in using 3 different 'things': a) chip firmware (loaded into the chipset); b) ath10k-firmware* (linux driver); c) iw (user app). These 3 things are different in AREDN then openwrt -- we've modified to extend channels, and reusing DD-WRT firmware loaded into the chip (the source code is proprietary and we don't have access) -- it is a black box.

@slightlyunconventional
Copy link

I flashed the OpenWRT image using the web interface without any trouble. A passive scan with the 802.11AC interface lists 41 networks, all in the 5GHz range, most with channel width 1 (80 MHz), some with width 0 ("20 or 40 MHz"). (The AREDN SSID isn't listed, but the SSID is blank for about half the networks. AREDN was using 10MHz channels...)

echo 0x10002002 > /sys/module/ath10k_core/parameters/debug_mask) didn't produce a WMI log: dmesg just says:

[  643.123269] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
[  643.141183] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[  643.147447] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
[  643.155638] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24984 iram: 38672'
[  643.213031] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
[  643.236481] ath10k_pci 0000:00:00.0: rts threshold -1

There's some stuff in debugfs, but I don't know what would help:

root@OpenWrt:~# ls /sys/kernel/debug/ieee80211/phy0/ath10k/
ani_enable              fw_crash_dump           peer_stats              rx_reorder_stats
cal_data                fw_dbglog               peers                   set_rate_override
chip_id                 fw_regs                 pktlog_filter           set_rates
ct_special              fw_reset_stats          powerctl_table          simulate_fw_crash
debug_level             fw_stats                ps_state_enable         sta_tid_stats_mask
dfs_block_radar_events  htt_max_amsdu_ampdu     quiet_period            thresh62_ext
dfs_simulate_radar      htt_stats_mask          ratepwr_table           tpc_stats
dfs_stats               mem_value               reg_addr                wmi_services
enable_extd_tx_stats    misc                    reg_value
firmware_info           nf_cal_period           reset_htt_stats
fw_checksums            pdev_ext_stats          restart_failed

I'm pretty familiar with Linux (kernel and userspace), but not familiar with these devices or the wireless network stack. @ae6xe can you suggest ways to get more debug data that would be helpful? Thanks!

@ae6xe
Copy link
Contributor

ae6xe commented Mar 22, 2024

See this ref to debug ath10k: https://wireless.wiki.kernel.org/en/users/drivers/ath10k/debug

[update]
it would also be beneficial to soak in the 802.11ac-2013 IEEE specification, but one needs to purchase or have an IEEE subscription to appropriately obtain.

@slightlyunconventional
Copy link

Turns out OpenWRT has CONFIG_PACKAGE_ATH_DEBUG disabled, which explains why the command I showed you from that page didn't produce anything.

I am now building OpenWRT for LiteBeam 5AC Gen2, with CONFIG_PACKAGE_ATH_DEBUG enabled, and am hoping that will boot and give us what we need.

If we see the same broken WMI log, that suggests OpenWRT has a workaround higher up the stack that AREDN should grab. If the WMI log looks better, that suggests the firmware OpenWRT is using is better, and AREDN should adopt that. (I think that already matches @Orv 's comment though: maybe it's better at scanning, but worse in performance.)

@ae6xe
Copy link
Contributor

ae6xe commented Mar 26, 2024

openwrt's implementation doesn't have the code/changes for extended channel support and 5/10 MHz channels. All 3 levels (chip firmware, linux driver, user commands) have changes to support these add-on features. A test on openwrt will confirm which side of the fence the issue exists -- pre-existing openwrt issue, or AREDN specific features introduced issue. While I'd speculate it is AREDN features introducing this issue, it is better to know and not speculate. We are adding features with a blindfold on, given we have no visibility to the proprietary source code of the Qualcomm chip firmware.

@slightlyunconventional
Copy link

I think I just did that test, right? On AREDN, iw passive scan returns 1 network (the one I'm already connected to). On OpenWRT, it returns 41 devices (neighborhood home Wi-Fi networks).

It sounded like you wanted more detailed information from the WMI log from OpenWRT's firmware, so that's what I'm trying to obtain now. If you don't need that, I'll stop.

@aanon4
Copy link
Contributor Author

aanon4 commented Mar 26, 2024

I dont know if you've tried this, but have you run the AREDN scanning code while the device is set to 20MHz bandwidth?

@ae6xe
Copy link
Contributor

ae6xe commented Mar 27, 2024

Got it. In the openwrt test, configured for ad-hoc mode and not AP or station/client mode? Ad-hoc mode has been known to be lower quality, given it is not widely used. Although, I don't know 100% if this scan failure has a dependency to the 802.11 mode in use, maybe it doesn't.

Similiar to @aanon4's comment, if we compare the same apple to apple in openwrt verses AREDN (20 Mhz channel, ad-hoc mode, stations connected/non-connected, same part 15 channel). A failure in AREDN and success in openwrt helps to rule out places to look for the root cause.

@slightlyunconventional
Copy link

  1. AREDN channel 149 bandwidth 20MHz: 15-65 home WiFi SSIDs
  2. AREDN channel 149 bandwidth 10MHz: 1 SSID (AREDN-10-v3, one mesh peer using 149)
  3. AREDN channel 160 bandwidth 10MHz: 1 SSID (AREDN-10-v3, one mesh peer using 160)
  4. AREDN channel 160 bandwidth 20MHz: 1 SSID (AREDN-20-v3, no mesh peers)

Each time I changed the two settings in the web UI, rebooted, and ran this command: iw wlan0 scan passive | grep SSID | wc -l. I did not change the physical position of the node.

(The WMI logs I attached earlier were from the 10MHz cases, #2 and #3.)

Please let me know if an OpenWRT comparison is still useful. If I can do it from LuCI that's fine, but if from the cmdline I'll need further assistance in getting the right set of iw commands. (in AREDN I was getting "Resource busy" errors on iw wlan0 set channel 149 HT20, maybe because there were AREDN userspace processes actually using the interface...?)

@slightlyunconventional
Copy link

Attaching WMI log for channel 149 bandwidth 20MHz: channel149-width20.log

@aanon4
Copy link
Contributor Author

aanon4 commented Mar 31, 2024

FYI. If I use the DD-WRT firmware blob for the Ubiquiti AC devices (we use the CT firmware) then you do get a full scan @ 10 MHz but at the cost of approximately half the bandwidth during normal operation ... so not an option.

@aanon4
Copy link
Contributor Author

aanon4 commented Apr 1, 2024

Here's an interesting things to try. Put the AREDN device in 20 MHz mode. Then switch it back to 10 MHz mode. Do a scan. The first fails. Do another scan .. and you get back a list of devices.

Well .. I tried this a couple of times and it worked both times. Not quite sure what this is telling me, but something ...

@aanon4
Copy link
Contributor Author

aanon4 commented Apr 1, 2024

Okay ... the trick appears to be that the Ubiquiti driver cannot scan when it has an active IBSS. If you disassociate the wifi using "iw wlan0 ibss leave" and then scan you get results. There may be a moment during re-association when scanning also works, but it's brief (and may not be really there .. difficult to tell in my testing).

@ae6xe
Copy link
Contributor

ae6xe commented Apr 1, 2024 via email

@aanon4
Copy link
Contributor Author

aanon4 commented Apr 1, 2024

There are various work arounds in the AREDN code to handle firmware blobs which dont behave as one might like ... but not much we can really do about it.

@slightlyunconventional
Copy link

This is what I see with the 20240403 daily build:
AREDN LiteBeam AC Gen2 scan

In addition to the node I'm connected to, there are two other nearby nodes on two other channels; they are combined on the last line. It wasn't quite what I was expecting, but looks fixed! Thanks!

@aanon4 aanon4 closed this as completed Apr 9, 2024
@slightlyunconventional
Copy link

Comment from another user:

I discovered however that the node needs an Ethernet LAN cable connection to view the results. Apparently the scan disconnects the RF and WiFi radio while scanning. So make sure to connect your PC to the LAN port or switch when scanning.

The Gen 2 models have a WiFi access point which is very useful because it can be mounted and powered through its PoE port, but nearby users can access it via Part 15 2.4G WiFi. The WiFi scan disconnects both the mesh RF and WiFi, so the web page will not load.
it does seem the scan “times out” you can get back to the homepage eventually.

@aanon4
Copy link
Contributor Author

aanon4 commented Apr 10, 2024

Back in the day most wifi's would disconnect while they did the scan. These days most chipset dont do that in a visible way, although it still happens down in the low level firmware because there is only one radio still. Unfortunately with the Ubiquiti AC firmware we're more back to the old ways of doing things, so there is a momentary disconnect which must happen before the wifi scan can be started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants