-
Notifications
You must be signed in to change notification settings - Fork 2
Runtime Operation
How to bring the NSS data plane up, run ECM and SQM on top of it, and — at least as important — how to not lock yourself out of a remote device. The rules here exist because each was learned the hard way.
Loading qca-nss-drv boots the NSS firmware. The firmware takes
over all wired RX. Any port not armed in the glue beforehand is
RX-dead until reboot.
Therefore, in this stack:
- No NSS kernel module is autoloaded;
AUTOLOADis stripped from every package. - No init script starts the stack at boot. (
qca-nss-drv's init script only sets IRQ affinity/RPS when invoked — it does not load the module.) - The ECM init script's start action and the SQM scripts refuse to
load their modules unless
qca_nss_drvis already loaded. -
With ath11k NSS offload built in,
qca-nss-drvIS loaded at every boot (ath11k.ko and mac80211.ko carry hard symbol references to it) — but its platform probe returns-EPROBE_DEFERuntil the glue is armed (nss_dp_probe_gate()), so the firmware does not boot. Loading the module stack is inert; arming is the trigger. The glue records the deferred NSS core devices and re-attaches them whenfw_maskfirst goes non-zero, so the firmware boots synchronously inside the arming write.
The guard exists because of a real incident: an image shipped with
the ECM init script enabled at boot; ECM's module dependencies pulled
in qca-nss-drv, the firmware booted unarmed, and all wired RX was
dead from boot. (Recovery was over Wi-Fi, which the firmware does
not touch.) Modern OpenWrt enables init scripts by their START
line, including qosmio's historical "extra space in the shebang"
trick — do not rely on that trick; the ECM init here has no boot
start at all.
A plain reboot always returns the device to the stock host-only stack. That is the universal recovery path.
Order matters; deviations cost RX.
# 1. Glue first (it is also a dependency of qca-nss-drv, so it may
# already be loaded — but it must be ARMED before drv loads).
modprobe qca-ppe-nss
# 2. Arm the physical ports (bitmask of PPE port indexes; on a
# typical 4-port IPQ807x board ports 2..5 -> 0x3c).
echo 0x3c > /sys/kernel/debug/qca-ppe-nss/fw_mask
# 3. Boot the firmware. Cores print their version; armed ports are
# attached during the driver's one-shot registration.
# (With Wi-Fi offload images, drv is already loaded and
# probe-deferred; the arming write in step 2 boots the firmware
# by itself and this modprobe is a no-op.)
modprobe qca-nss-drv
# 4. Optional: ECM connection offload.
sysctl -w net.netfilter.nf_conntrack_events=1
modprobe ecm front_end_selection=1
echo 1 > /sys/kernel/debug/ecm/front_end_ipv4/accel_delay_pkts
echo 1 > /sys/kernel/debug/ecm/front_end_ipv6/accel_delay_pkts
# 5. Optional: PPPoE offload manager. It only catches sessions
# created AFTER it loads — bounce the WAN afterwards.
modprobe qca-nss-pppoe
ifup wan
# 6. Optional: flush stale state so new flows take the fast path.
echo 1 > /sys/kernel/debug/ecm/state/defunct_all
echo f > /proc/net/nf_conntrack # flush conntrack
# 7. Optional: SQM (see the SQM page). The sqm-scripts hotplug will
# have skipped its interface while NSS was down; restart it.
/etc/init.d/sqm restartImages built with CONFIG_ATH11K_NSS_SUPPORT boot with host-mode
Wi-Fi: ath11k autoloads with frame_mode=2 only and
nss_offload=0. Moving the radios onto the NSS data path happens at
runtime, after the firmware is up (ath11k's NSS setup hard-fails if
the firmware is not booted), by flipping the parameter and re-probing
the radio:
# after the firmware is booted and ports are attached (steps 1-3):
wifi down # avoid a pending-ack WARN at unbind
echo 1 > /sys/module/ath11k/parameters/nss_offload
echo c000000.wifi > /sys/bus/platform/drivers/ath11k/unbind
echo c000000.wifi > /sys/bus/platform/drivers/ath11k/bind
/etc/init.d/qca-nss-pbuf start # n2h pool tuning + `wifi up`The rebind cycles the WCSS remoteproc (verified clean); phy indexes
change but OpenWrt wireless config matches radios by DT path, so the
APs come back unattended. qca-nss-pbuf (shipped with kmod-ath11k)
applies the per-memory-profile n2h buffer pool sysctls — on a 512 MB
board expect a one-time ~35 MB pool growth after Wi-Fi offload comes
up; memory is flat afterwards.
If the radios fail to register with offload enabled, set
nss_offload=0 and rebind again — Wi-Fi returns in host mode (this
fallback keeps Wi-Fi as the recovery/escape path at all times).
Health checks: /sys/kernel/debug/qca-nss-drv/stats/wifili
(tx_sent_count, rx_deliverd climb with Wi-Fi traffic);
dmesg | grep "nss init soc" shows the wifili interface number.
Health checks:
-
/sys/kernel/debug/qca-ppe-nss/status— per-port attach state and counters.tx_busyandrx_unexpectedshould stay 0. -
grep n2h_rx /sys/kernel/debug/qca-nss-drv/stats/n2h(or thenss-statshelper) — N2H RX counters climb when the firmware path carries traffic. -
dmesgshows the firmware version on boot (e.g.NSS.FW.12.5-210-HK.R).
-
front_end_selection=1selects the NSS front end explicitly. -
accel_delay_pkts=1accelerates flows after the first packet — the default waits longer and skews short-flow benchmarks. - ECM acceleration is visible in
/sys/kernel/debug/ecm/.../connection countsand, decisively, in CPU load: an accelerated bulk flow leaves the CPU ~99 % idle. - Stopping ECM (
ecm_state stop/ rmmod) cleanly returns flows to the software path; this is repeatable at runtime. - PPPoE sessions established before
qca-nss-pppoeloaded are not managed; always bounce the WAN after loading the manager. - PPPoE-over-VLAN offloads without any vlan manager (the VLAN tag is embedded in ECM's unicast rule; verified with a tagged PPPoE WAN at line rate).
-
rmmod qca-nss-drvwith attached ports works (a stock-driver double-unregister panic is fixed by feed patch 0101), and the glue restores host-side state — but wired RX does not come back: the firmware's QID2RID queue takeover persists. Reboot to restore. (A cosmetic stock-driverregulator_putrefcount WARN appears in devres teardown at rmmod; harmless, not ours.) - There is deliberately no "stop the firmware" path; the supported way down is a reboot.
Distilled from operating a device with no serial console:
- Keep an out-of-band path that the NSS stack cannot touch — Wi-Fi on the host stack is ideal. Test it before experimenting.
- Never enable any NSS-related init at boot. A reboot must always produce a clean host-only system.
- For risky experiments, run them detached with a dead-man's
switch: a script that reboots the device unless a "keep" file
appears within N minutes (
sleep 1500; [ -f /tmp/keep ] || reboot -f).reboot -fworks even when networking is gone. - Remember
sshliveness is not a hang detector here: a firmware boot with unarmed ports kills networking while the SoC is perfectly healthy. Synced logs to flash + pstore are the truth (see Development and Testing). -
sysupgrade -nregenerates host keys; expect the ssh fingerprint to change.