-
Notifications
You must be signed in to change notification settings - Fork 2
Development and Testing
Build specifics, the no-serial-console test methodology this project was developed with, and the landmines that cost real debugging time. Future developers: read the landmines before touching anything.
# feeds.conf
src-link nss /path/to/nss-packages # branch edma-nss
Minimum NSS config on top of a normal ipq807x build:
CONFIG_PACKAGE_kmod-qca-nss-drv=y
CONFIG_PACKAGE_kmod-qca-nss-ecm=y
CONFIG_PACKAGE_kmod-qca-nss-drv-pppoe=y # PPPoE offload
CONFIG_PACKAGE_kmod-qca-nss-drv-qdisc=y # NSS qdiscs for SQM
CONFIG_PACKAGE_kmod-qca-nss-drv-igs=y
CONFIG_PACKAGE_sqm-scripts-nss=y
CONFIG_NSS_FIRMWARE_VERSION_12_5=y
CONFIG_NSS_MEM_PROFILE_MEDIUM=y # 512 MB boards!
nss-firmware follows kmod-qca-nss-drv automatically. The vendor
code is not warning-clean: OpenWrt main sets CONFIG_KERNEL_WERROR,
which leaks -Werror into external modules, so the NSS packages
build with -Wno-error (documented in each Makefile).
-
sk_bufflayout changes force a full kmod rebuild. Kernel patch 0969 adds a bitfield tostruct sk_buff. If that patch (or anything else touching skbuff layout) changes, wipe the targetbuild_dirandstaging_dir— incremental builds will happily link kmods against stale layouts and the result misbehaves at runtime. -
Hand-written patches are forbidden. Generate patches with
git/quilt and verify them by application (
make package/X/prepare/quilt push). A hand-typed hunk with a malformed offset cost a build day. -
WSL2 only: building with the Windows PATH leaked into the
environment breaks
find -execdirinside the kernel build. Sanitize PATH for every build (export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin). Build on a real ext4 filesystem, never on/mnt/c. -
DISTRIB_REVISIONis the git HEAD at build time. Building with uncommitted changes produces an image that reports the old revision. When in doubt, identify an image by symbols (/proc/kallsymsmarkers), not by the revision string.
The entire bring-up was done sysupgrade-only: every image must boot with working EDMA networking by itself, and NSS is strictly runtime opt-in. The techniques that made that survivable:
-
pstore/ramoops is the crash console. The DT reserves a ramoops
region (with console-size, so the last kernel console survives a
hard crash); the
pstore-archiveinit script copies/sys/fs/pstoreto flash on every boot. After any suspicious reboot, read the archive first. -
Dead-man's switch for risky experiments. Run experiments
detached, with logs synced to flash continuously, ending in
reboot -funless a barrier file is present:(experiment; sleep 1500; [ -f /tmp/keep ] || reboot -f) &—reboot -fworks even with networking gone, which turns "drive out and pull the plug" into "wait 25 minutes". - ssh liveness is not a hang detector. A firmware boot with unarmed ports kills all wired RX while the SoC is healthy. Only synced logs + pstore distinguish "network dead" from "SoC dead".
-
Reachability without ICMP. Client firewalls often drop echo:
test reachability via ARP (
ip neigh flush dev X; ping -c1 <ip>; ip neigh show <ip>→ REACHABLE/STALE state), not ping success. -
tcpdumponpppoe-wansilently matches nothing (unsupported linktype for BPF filters). Capture on the physical port or the ifb instead. -
Flash quirk: sysupgrade's final reboot can hang after
remoteproc: stopped q6v5_wcss. The flash itself succeeded — a power cycle boots the new image. Do not misread this as a failed upgrade. -
/rootis not preserved by sysupgrade unless listed in/etc/sysupgrade.conf. Keep bring-up scripts there listed, or re-push them after every flash.
| Surface | What it tells you |
|---|---|
/sys/kernel/debug/qca-ppe-nss/status |
per-port attach state, tx_redirect_pkts, rx_fw_pkts, tx_busy, rx_unexpected
|
/sys/kernel/debug/qca-nss-drv/stats/n2h |
firmware↔host queue counters (n2h_rx_pkts climbing = fw path alive) |
/sys/kernel/debug/ecm/ |
per-flow acceleration state, defunct_all, front-end stats |
tc -s qdisc show dev <wan> / ifb |
nsstbl overlimits prove the fw shaper has authority |
/sys/fs/pstore, archived by init script |
crash console of the previous boot |
dmesg at drv load |
firmware version banner (e.g. NSS.FW.12.5-210-HK.R) |
The integration was accepted through staged hardware gates; re-run the relevant ones after any significant change:
-
Firmware boot gate — fw boots with armed ports,
n2h_rx_pktsclimbing, two consecutive clean boots, pstore clean. - Data plane gate — 100 attach/detach cycles per port under traffic with zero stuck cycles; rmmod with attached ports; no duplicate-delivery (host rings must stay silent: zero duplicate ICMP sequence numbers).
- All-ports gate — every physical port through the fw path simultaneously, including a PPPoE(+VLAN) uplink, with per-port RX verified by cable hops; 15-minute soak; memory flat.
- ECM gate — accelerated bulk flow at line rate with CPU ~idle; ECM stop returns flows to software path; Wi-Fi clients unaffected.
-
SQM gate — shaper authority at a tight rate; RTT-under-load
at the production rate;
sqm stopclean; reboot-with-sqm-enabled boots safely (guard refuses, wired RX alive). -
Soak — multi-hour production traffic, then: counters sane
(
tx_busy=0,rx_unexpected=0), memory flat, pstore empty.