-
Notifications
You must be signed in to change notification settings - Fork 2
Development and Testing
Build specifics, the no-serial-console test methodology this project was developed with, and the landmines that cost real debugging time. Future developers: read the landmines before touching anything.
The easiest path is the
Qualcommax_NSS_Builder:
its edma-nss variant already wires the nss-packages feed in by default,
so there is nothing to add by hand — just run the build.
To build by hand instead, add the feed to feeds.conf:
# feeds.conf
src-git nss https://github.com/JuliusBairaktaris/nss-packages.git;edma-nss
# local checkout instead of a remote clone:
# src-link nss /path/to/nss-packages # branch edma-nss
Minimum NSS config on top of a normal ipq807x build:
CONFIG_PACKAGE_kmod-qca-nss-drv=y
CONFIG_PACKAGE_kmod-qca-nss-ecm=y
CONFIG_PACKAGE_kmod-qca-nss-drv-pppoe=y # PPPoE offload
CONFIG_PACKAGE_kmod-qca-nss-drv-qdisc=y # NSS qdiscs for SQM
CONFIG_PACKAGE_kmod-qca-nss-drv-igs=y
CONFIG_PACKAGE_sqm-scripts-nss=y
CONFIG_NSS_FIRMWARE_VERSION_12_5=y
CONFIG_NSS_MEM_PROFILE_MEDIUM=y # 512 MB boards!
nss-firmware follows kmod-qca-nss-drv automatically. The vendor
code is not warning-clean: OpenWrt main sets CONFIG_KERNEL_WERROR,
which leaks -Werror into external modules, so the NSS packages
build with -Wno-error (documented in each Makefile).
-
sk_bufflayout changes force a full kmod rebuild. Kernel patch 0969 adds a bitfield tostruct sk_buff. If that patch (or anything else touching skbuff layout) changes, wipe the targetbuild_dirandstaging_dir— incremental builds will happily link kmods against stale layouts and the result misbehaves at runtime. -
Hand-written patches are forbidden. Generate patches with
git/quilt and verify them by application (
make package/X/prepare/quilt push). A hand-typed hunk with a malformed offset cost a build day. -
WSL2 only: building with the Windows PATH leaked into the
environment breaks
find -execdirinside the kernel build. Sanitize PATH for every build (export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin). Build on a real ext4 filesystem, never on/mnt/c. -
DISTRIB_REVISIONis the git HEAD at build time. Building with uncommitted changes produces an image that reports the old revision. When in doubt, identify an image by symbols (/proc/kallsymsmarkers), not by the revision string.
The entire bring-up was done sysupgrade-only: every image must boot with working EDMA networking by itself, and NSS is strictly runtime opt-in. The techniques that made that survivable:
-
pstore/ramoops is the crash console. The DT reserves a ramoops
region (with console-size, so the last kernel console survives a
hard crash); the
pstore-archiveinit script copies/sys/fs/pstoreto flash on every boot. After any suspicious reboot, read the archive first. -
Dead-man's switch for risky experiments. Run experiments
detached, with logs synced to flash continuously, ending in
reboot -funless a barrier file is present:(experiment; sleep 1500; [ -f /tmp/keep ] || reboot -f) &—reboot -fworks even with networking gone, which turns "drive out and pull the plug" into "wait 25 minutes". - ssh liveness is not a hang detector. A firmware boot with unarmed ports kills all wired RX while the SoC is healthy. Only synced logs + pstore distinguish "network dead" from "SoC dead".
-
Reachability without ICMP. Client firewalls often drop echo:
test reachability via ARP (
ip neigh flush dev X; ping -c1 <ip>; ip neigh show <ip>→ REACHABLE/STALE state), not ping success. -
tcpdumponpppoe-wansilently matches nothing (unsupported linktype for BPF filters). Capture on the physical port or the ifb instead. -
Flash quirk: sysupgrade's final reboot can hang after
remoteproc: stopped q6v5_wcss. The flash itself succeeded — a power cycle boots the new image. Do not misread this as a failed upgrade. -
/rootis not preserved by sysupgrade unless listed in/etc/sysupgrade.conf. Keep bring-up scripts there listed, or re-push them after every flash.
| Surface | What it tells you |
|---|---|
/sys/kernel/debug/qca-ppe-nss/status |
per-port attach state, tx_redirect_pkts, rx_fw_pkts, tx_busy, rx_unexpected
|
/sys/kernel/debug/qca-nss-drv/stats/n2h |
firmware↔host queue counters (n2h_rx_pkts climbing = fw path alive) |
/sys/kernel/debug/ecm/ |
per-flow acceleration state, defunct_all, front-end stats |
tc -s qdisc show dev <wan> / ifb |
nsstbl overlimits prove the fw shaper has authority |
/sys/fs/pstore, archived by init script |
crash console of the previous boot |
dmesg at drv load |
firmware version banner (e.g. NSS.FW.12.5-210-HK.R) |
To confirm that forwarded traffic is genuinely riding the firmware (not just that counters exist), use a delta under a confirmed-traversing load, not a static snapshot:
- Pin a flow through the router. On a split-routing test host only some paths traverse the device — drive a sustained download over a path you know hits the WAN (IPv6, or a DNAT/explicit route) and confirm the WAN counters move with it.
-
Sample a window straddling the load. Read counters, hold ~25 s
under load, read again. The offload signature:
-
pppoe_rx_bytes(or the WAN portrx_fw_pkts) climbs by the transferred volume — the bytes entered via the firmware; -
n2h_n2h_data_byts(firmware→host delivery) climbs by a tiny fraction of that (host sees <0.1 % of the bytes) — the rest was forwarded inside the firmware; - per-port glue
tx_redirect_pkts/rx_fw_pktsstay nearly flat for the accelerated flow (it bypasses the host redirect path entirely); - host CPU stays ~idle:
/proc/statidle delta ≈ 100 % of the window × nproc, softirq delta near zero,ksoftirqdat 0 %.ecm_nss_ipv{4,6}/accelerated_countshould be non-zero withpending_accel/pending_decelat 0.
-
-
Read the exception stats as by-design, not as leaks. The big
numbers in
ecm/stats/ecm_v{4,6}_exception_stats(local_packets_ignored,bcast/mcast_feature_disabled,not_ip_pppoe_packet,*_tcp_not_estab/not_confirm, fragments) are traffic a flow engine cannot accelerate (router-local, broadcast/multicast, ARP/ND/L2, flow-setup first packets). They are the expected residual, not a fault. -
Rule out a competing native engine. Low host CPU only proves
NSS offload if nothing else is doing the forwarding: confirm there
is no nft flowtable (
nft list ruleset | grep -i flowtable),firewall.@defaults[0].flow_offloading{,_hw}is unset, andethtool -k <conduit>showshw-tc-offload: off. Otherwise a host fastpath, not NSS, may be moving the bytes.
The integration was accepted through staged hardware gates; re-run the relevant ones after any significant change:
-
Firmware boot gate — fw boots with armed ports,
n2h_rx_pktsclimbing, two consecutive clean boots, pstore clean. - Data plane gate — 100 attach/detach cycles per port under traffic with zero stuck cycles; rmmod with attached ports; no duplicate-delivery (host rings must stay silent: zero duplicate ICMP sequence numbers).
- All-ports gate — every physical port through the fw path simultaneously, including a PPPoE(+VLAN) uplink, with per-port RX verified by cable hops; 15-minute soak; memory flat.
- ECM gate — accelerated bulk flow at line rate with CPU ~idle; ECM stop returns flows to software path; Wi-Fi clients unaffected.
-
SQM gate — shaper authority at a tight rate; RTT-under-load
at the production rate;
sqm stopclean; reboot-with-sqm-enabled boots safely (guard refuses, wired RX alive). -
Soak — multi-hour production traffic, then: counters sane
(
tx_busy=0,rx_unexpected=0), memory flat, pstore empty.