Skip to content

Runtime Operation

Julius Bairaktaris edited this page Jun 18, 2026 · 3 revisions

Runtime Operation

How to bring the NSS data plane up, run ECM and SQM on top of it, and — at least as important — how to not lock yourself out of a remote device. The rules here exist because each was learned the hard way.

The cardinal rule: nothing autoloads

Loading qca-nss-drv boots the NSS firmware. The firmware takes over all wired RX. Any port not armed in the glue beforehand is RX-dead until reboot.

Therefore, in this stack:

  • No NSS kernel module is autoloaded; AUTOLOAD is stripped from every package.
  • No init script starts the stack at boot. (qca-nss-drv's init script only sets IRQ affinity/RPS when invoked — it does not load the module.)
  • The ECM init script's start action and the SQM scripts refuse to load their modules unless qca_nss_drv is already loaded.
  • With ath11k NSS offload built in, qca-nss-drv IS loaded at every boot (ath11k.ko and mac80211.ko carry hard symbol references to it) — but its platform probe returns -EPROBE_DEFER until the glue is armed (nss_dp_probe_gate()), so the firmware does not boot. Loading the module stack is inert; arming is the trigger. The glue records the deferred NSS core devices and re-attaches them when fw_mask first goes non-zero, so the firmware boots synchronously inside the arming write.

The guard exists because of a real incident: an image shipped with the ECM init script enabled at boot; ECM's module dependencies pulled in qca-nss-drv, the firmware booted unarmed, and all wired RX was dead from boot. (Recovery was over Wi-Fi, which the firmware does not touch.) Modern OpenWrt enables init scripts by their START line, including qosmio's historical "extra space in the shebang" trick — do not rely on that trick; the ECM init here has no boot start at all.

A plain reboot always returns the device to the stock host-only stack. That is the universal recovery path.

Bring-up sequence

Order matters; deviations cost RX.

# 1. Glue first (it is also a dependency of qca-nss-drv, so it may
#    already be loaded — but it must be ARMED before drv loads).
modprobe qca-ppe-nss

# 2. Arm the physical ports (bitmask of PPE port indexes; on a
#    typical 4-port IPQ807x board ports 2..5 -> 0x3c).
echo 0x3c > /sys/kernel/debug/qca-ppe-nss/fw_mask

# 3. Boot the firmware. Cores print their version; armed ports are
#    attached during the driver's one-shot registration.
#    (With Wi-Fi offload images, drv is already loaded and
#    probe-deferred; the arming write in step 2 boots the firmware
#    by itself and this modprobe is a no-op.)
modprobe qca-nss-drv

# 4. Optional: ECM connection offload.
sysctl -w net.netfilter.nf_conntrack_events=1
modprobe ecm front_end_selection=1
echo 1 > /sys/kernel/debug/ecm/front_end_ipv4/accel_delay_pkts
echo 1 > /sys/kernel/debug/ecm/front_end_ipv6/accel_delay_pkts

# 5. Optional: PPPoE offload manager. It only catches sessions
#    created AFTER it loads — bounce the WAN afterwards.
modprobe qca-nss-pppoe
ifup wan

# 6. Optional: flush stale state so new flows take the fast path.
echo 1 > /sys/kernel/debug/ecm/state/defunct_all
echo f > /proc/net/nf_conntrack   # flush conntrack

# 7. Optional: SQM (see the SQM page). The sqm-scripts hotplug will
#    have skipped its interface while NSS was down; restart it.
/etc/init.d/sqm restart

ath11k Wi-Fi offload bring-up

Images built with CONFIG_ATH11K_NSS_SUPPORT boot with host-mode Wi-Fi: ath11k autoloads with frame_mode=2 only and nss_offload=0. Moving the radios onto the NSS data path happens at runtime, after the firmware is up (ath11k's NSS setup hard-fails if the firmware is not booted), by flipping the parameter and re-probing the radio:

# after the firmware is booted and ports are attached (steps 1-3):
wifi down                      # avoid a pending-ack WARN at unbind
echo 1 > /sys/module/ath11k/parameters/nss_offload
echo c000000.wifi > /sys/bus/platform/drivers/ath11k/unbind
echo c000000.wifi > /sys/bus/platform/drivers/ath11k/bind
/etc/init.d/qca-nss-pbuf start # n2h pool tuning + `wifi up`

The rebind cycles the WCSS remoteproc (verified clean); phy indexes change but OpenWrt wireless config matches radios by DT path, so the APs come back unattended. qca-nss-pbuf (shipped with kmod-ath11k) applies the per-memory-profile n2h buffer pool sysctls — on a 512 MB board expect a one-time ~35 MB pool growth after Wi-Fi offload comes up; memory is flat afterwards.

If the radios fail to register with offload enabled, set nss_offload=0 and rebind again — Wi-Fi returns in host mode (this fallback keeps Wi-Fi as the recovery/escape path at all times).

Health checks: /sys/kernel/debug/qca-nss-drv/stats/wifili (tx_sent_count, rx_deliverd climb with Wi-Fi traffic); dmesg | grep "nss init soc" shows the wifili interface number.

Health checks:

  • /sys/kernel/debug/qca-ppe-nss/status — per-port attach state and counters. tx_busy and rx_unexpected should stay 0.
  • grep n2h_rx /sys/kernel/debug/qca-nss-drv/stats/n2h (or the nss-stats helper) — N2H RX counters climb when the firmware path carries traffic.
  • dmesg shows the firmware version on boot (e.g. NSS.FW.12.5-210-HK.R).

ECM notes

  • front_end_selection=1 selects the NSS front end explicitly.
  • accel_delay_pkts=1 accelerates flows after the first packet — the default waits longer and skews short-flow benchmarks.
  • ECM acceleration is visible in /sys/kernel/debug/ecm/.../connection counts and, decisively, in CPU load: an accelerated bulk flow leaves the CPU ~99 % idle.
  • Stopping ECM (ecm_state stop / rmmod) cleanly returns flows to the software path; this is repeatable at runtime.
  • PPPoE sessions established before qca-nss-pppoe loaded are not managed; always bounce the WAN after loading the manager.
  • PPPoE-over-VLAN offloads without any vlan manager (the VLAN tag is embedded in ECM's unicast rule; verified with a tagged PPPoE WAN at line rate).

Teardown facts

  • rmmod qca-nss-drv with attached ports works (a stock-driver double-unregister panic is fixed by feed patch 0101), and the glue restores host-side state — but wired RX does not come back: the firmware's QID2RID queue takeover persists. Reboot to restore. (A cosmetic stock-driver regulator_put refcount WARN appears in devres teardown at rmmod; harmless, not ours.)
  • There is deliberately no "stop the firmware" path; the supported way down is a reboot.

Remote-device safety checklist

Distilled from operating a device with no serial console:

  1. Keep an out-of-band path that the NSS stack cannot touch — Wi-Fi on the host stack is ideal. Test it before experimenting.
  2. Never enable any NSS-related init at boot. A reboot must always produce a clean host-only system.
  3. For risky experiments, run them detached with a dead-man's switch: a script that reboots the device unless a "keep" file appears within N minutes (sleep 1500; [ -f /tmp/keep ] || reboot -f). reboot -f works even when networking is gone.
  4. Remember ssh liveness is not a hang detector here: a firmware boot with unarmed ports kills networking while the SoC is perfectly healthy. Synced logs to flash + pstore are the truth (see Development and Testing).
  5. sysupgrade -n regenerates host keys; expect the ssh fingerprint to change.

Clone this wiki locally