Home

RTL-base — Technical Wiki

A complete design reference for the RTL-base Wi-Fi monitor + injection library, including every mathematical model the firmware uses. Notation: $f$ is the CPU frequency in Hz (measured $333{,}999{,}633$ Hz), and $c_{\mu s} = f/10^6 \approx 334$ is the integer cycles-per-microsecond constant.

Platform. Realtek RTL8721Dx (AmebaDplus), KM4 core. The silicon's CPUID reports Cortex-M33 r1p10 (ARMv8-M Mainline) — Realtek's "Real-M300" implementation — running at 334 MHz under FreeRTOS, Ameba-RTOS SDK (tracks master; validated on 1.3.0). The part boots in TrustZone secure state, which gates the DWT cycle counter (see Timer).

1. Overview

RTL-base is a Wi-Fi packet-injection and promiscuous-monitor component for the Realtek RTL8721Dx (AmebaDplus), KM4 core, built against Ameba-RTOS SDK 1.3.0 under FreeRTOS. A single facade, wifi_radio, coordinates three subsystems behind one interface: a nanosecond-precision composite clock (timer), a 32-injector ATCS scheduler (injector), and a lock-free promiscuous-capture pipeline (monitor).

Four operating modes cover every use case:

Mode	Timer	Capture	Injection
`RADIO_IDLE`	up	—	—
`RADIO_MONITOR_ONLY`	up	yes	—
`RADIO_INJECT_ONLY`	up	—	yes
`RADIO_DUAL`	up	yes	yes, channel arbitrated with capture

This Part (the User Guide) walks through configuring, building, and using the component end to end. Part II is the complete design reference — every mathematical model the firmware implements, for when you need to know why a number is what it is.

The firmware (built with app_example.c) also exposes an interactive research console over the SDK monitor, for live control and telemetry — see §7.

2. Prerequisites and SDK setup

Target hardware: an RTL8721Dx (AmebaDplus) module with SiP PSRAM (e.g. BW20-12F), Ameba-RTOS SDK 1.3.0, FreeRTOS.

If you are starting from a clean SDK checkout rather than an existing project tree, set up the environment and target chip per the official SDK workflow:

source env.sh           # Linux  (env.bat on Windows)
ameba.py soc

RTL-base is a normal internal SDK component once its files are in place — see §3 for exactly where.

3. Applying the supplied configuration

Three configuration artifacts ship with this component and should be applied before building. Part II §12 documents the rationale behind every individual setting; this section is just the "what to do."

Menuconfig overlay (menuconfig.conf). A verified set of Kconfig symbols tuned for injection/capture throughput and ns-precision timing (tickless idle off, heap-integrity checks off, the optional Wi-Fi feature surface off, SSL off, memory placement pinned; SHELL stays on — the SDK monitor hosts the interactive console, §7). Apply it with:

ameba.py menuconfig

then, inside the TUI, press [O] Load, point it at menuconfig.conf, [S] Save, [Q] Quit. Kconfig fills every symbol not present in a loaded file with its normal default, so this partial overlay merges safely on top of your existing .config — it will not blank out unrelated settings.

Boot clock (ameba_bootcfg.c). Not a menuconfig option. The supplied file sets Boot_SocClk_Info_Idx = 5 (PLL_334M / 1.0 V / Flash or single die, no PSRAM — the full index table is in §12.1).

Wi-Fi driver (ameba_wificfg.c). Power-save, aggregation, RF-calibration, and SKB-pool settings — see §12.2 for each one and why.

Component sources. Drop timer.{c,h}, injector.{c,h}, monitor.{c,h}, wifi_radio.{c,h}, app_example.c, and this component's CMakeLists.txt into your project tree the way ameba_add_internal_library already expects; the supplied CMakeLists.txt also adds the USB CDC-ACM include paths the monitor's default sink needs.

Note: this repo's usrcfg/ folder replaces the SDK's stock one — copy its contents over the official SDK's usrcfg/ folder before building.

Note: copy the usrcfg/ folder from this repo over the official SDK's usrcfg/ folder.

4. Building, flashing and verifying

ameba.py clean
ameba.py build
ameba.py flash -p <PORT> --chip-erase
ameba.py monitor -p <PORT> -b 1500000

Two independent things confirm the configuration in §3 actually landed, since .config → platform_autoconf.h regenerates silently on every build:

Boot log: [BOOT-I] KM4 CPU CLK: 334000000 Hz — confirms the bootcfg index from §3 selected the 334 MHz / 1.0 V slot (full detail in §12.1).
Regenerated platform_autoconf.h: the CONFIG_MBEDTLS_SSL_IN_CONTENT_LEN / _OUT_CONTENT_LEN SSL buffers (16 KB + 4 KB) should be absent — confirms the overlay's SSL symbols took effect. (CONFIG_SHELL stays defined: the interactive console in §7 depends on it.)

5. Quick start: using the facade

All radio access goes through wifi_radio.h. Three minimal, complete examples:

Inject only — schedule one named injector transmitting a fixed frame on a fixed channel:

#include "wifi_radio.h"

static const uint8_t probe_req[] = { /* 802.11 broadcast probe-request bytes */ };

void start_injection(void)
{
    radio_config_t cfg = RADIO_CONFIG_DEFAULT;
    cfg.mode           = RADIO_INJECT_ONLY;
    cfg.fixed_channel  = 6;

    if (radio_init(&cfg) != RADIO_OK) return;

    injectorManager *mgr = radio_get_injector();
    injectorManager_setInjector(mgr, "probe", probe_req, sizeof(probe_req),
                                 /* channel     */ 6,
                                 /* interval_ns */ 1000000ULL,
                                 /* maxPackets  */ 0u,   /* 0 = unlimited */
                                 /* hwRetries   */ 3u);
    injectorManager_activateInjector(mgr, "probe");
}

Monitor only — promiscuous capture streamed out as pcapng over the default USB-CDC sink:

#include "wifi_radio.h"

void start_capture(void)
{
    radio_config_t cfg = RADIO_CONFIG_DEFAULT;
    cfg.mode           = RADIO_MONITOR_ONLY;
    cfg.fixed_channel  = 6;
    cfg.initial_filter = FILTER_ALL;
    cfg.promisc_mode   = MONITOR_PROMISC_ALL;
    cfg.output         = MONITOR_OUT_USB_CDC;

    radio_init(&cfg);
}

Dual — capture runs continuously; the injector briefly takes the channel under an explicit grant:

#include "wifi_radio.h"

void inject_briefly_during_capture(void)
{
    radio_config_t cfg = RADIO_CONFIG_DEFAULT;   /* .mode is already RADIO_DUAL */

    radio_init(&cfg);
    radio_yield_channel_to_injector();    /* injector may now switch channel */
    /* ... configure + activate an injector via radio_get_injector() ...    */
    radio_reclaim_channel();              /* facade takes the channel back  */
}

radio_deinit() tears any mode down cleanly. See §11.1 for the full lifecycle state machine and §11.2 for the channel-grant protocol these calls implement.

6. Common tasks

Configuring an injector beyond the defaults:

injectorManager_setRate(mgr, "probe", INJ_RATE_54M);
injectorManager_setChannel(mgr, "probe", 11);
injectorManager_setTxPower(mgr, "probe", 18);        /* dBm; clamped by regulatory power tables */
injectorManager_setIntervalNs(mgr, "probe", 500000ULL);
injectorManager_setBurstCount(mgr, "probe", 4u);

Enabling channel-hopping capture:

int rc = radio_enable_hopping();   /* RADIO_ERR_BUSY if any injector is still active */

Reading per-operation cost telemetry (Welford statistics — §9.7):

inj_cost_stats_t stats;
if (radio_get_cost_stats(&stats) == RADIO_OK) {
    /* stats.chan_switch_same / .chan_switch_cross / .tx_power / .tx_inject,
       each an inj_cost_metric_t: count, min_ns, max_ns, mean_ns, stddev_ns, last_ns */
}

One-time absolute-time calibration (disciplined holdover — §8.9):

timer_set_cal_offset_ppb(12345);   /* measured ONCE against a traceable source (GPS/NTP) */

This is a crystal-only holdover clock, not an atomic clock — it minimizes drift after a one-time calibration and reports a growing uncertainty bound; it cannot track UTC without an external reference. §8.9 has the full accuracy budget.

7. Self-test and troubleshooting

Building with app_example.c included runs an exhaustive on-target self-test automatically at boot — pre-init error-path guards, all four facade modes, dual-band 2.4/5 GHz, 32-injector ultra load, ns-precision-under-load, the four cost-characterization regimes, radio-TSF correlation, and a multi-minute endurance soak — ending with:

========== RESULTS ==========
  PASS = 222
  FAIL = 0
  OVERALL: *** ALL PASS ***

Any FAIL line names the specific check; §13.4 lists what the suite as a whole covers, and §13 generally has the expected baseline numbers (timing, cost, soak) to compare a new run against.

After the self-test, app_example.c leaves the SDK monitor console live and registers an interactive research command set into its table (type RHELP to list them: STATUS, MODE, CHAN, HOP, INJ, COST, CLOCK, CAL, QUALITY, and more). The pcapng capture stream stays on its own USB-CDC endpoint, so binary capture never interleaves with the console. This requires CONFIG_SHELL=y, since the SDK monitor is the CLI host.

Interactive console

After the self-test the firmware drops into an interactive research console hosted by the SDK monitor over the LOGUART debug port (the same port the boot log uses; on the dev board it bridges to a host USB serial COM port). This is the control plane; the pcapng capture stream stays on its own USB-CDC endpoint (the data plane), so binary capture never interleaves with the console. It requires CONFIG_SHELL=y — the monitor is the host — and the research commands are registered into the monitor's command table (.cmd.table.data), so they appear alongside the built-in DW/EW/… diagnostics.

Commands are case-insensitive (the monitor upper-cases them); type RHELP for the list:

Command	Action
`STATUS`	radio + injector + monitor + timer health
`MODE <idle\|monitor\|inject\|dual> [ch]`	(re)initialise the radio in a mode
`DOWN`	deinitialise the radio
`CHAN [n]`	get/set the active channel
`HOP [off]`	enable channel hopping (or pin/off)
`YIELD` / `RECLAIM`	DUAL: hand the channel to the injector / take it back
`FILTER <type>`	capture filter: all/data/mgmt/ctrl/beacon/probereq/proberesp
`FCS [on\|off]`	append FCS to captured frames
`CHSTATS <ch>`	per-channel capture statistics
`TSF`	read the 802.11 radio TSF
`INJ <list\|add\|rm\|on\|off\|rate\|power> …`	manage injectors
`COST [reset]`	per-operation ns cost statistics (§9.7)
`CLOCK`	disciplined-clock status (§8.9)
`CAL [ppb]`	get/set the one-time calibration offset (§8.9)
`QUALITY`	Allan-deviation clock quality (§8.6)
`SELFTEST`	re-run the full boot self-test

Each command maps onto a verified wifi_radio / injector / monitor / timer public API. The injector stays a benign, bounded, CCA-on broadcast-probe transmitter — the console exposes configuration and telemetry, not attack primitives.

Boot-log lines worth recognizing:

Line	Meaning
`OTP BOOT VOL choose 0.9V but usrcfg choose 1.0V!`	`Boot_SocClk_Info_Idx` doesn't match the OTP-fused voltage for this die — works, but over-volts the core. Fix per §12.1.
`[WLAN-I] LWIP consume heap 816`	Normal — the LWIP layer's fixed ~816 B allocation, not a leak.
`[CLK-I] [CAL4M] / [CAL131K] ... PPM: ...`	Factory calibration of the on-die RC oscillators (4 MHz / 131 kHz) — informational, unrelated to the crystal-derived SysTick/coarse clock the timer module actually uses.

For anything not covered here, Part II is organized by subsystem (§8 Timer, §9 Injector, §10 Monitor, §11 Facade, §12 SDK configuration, §13 Measured performance) — start with the module that owns the symptom.

Part II — Module Reference

8. Timer

The timer provides nanosecond-precision monotonic timestamps from a composite clock chosen at runtime by probing the hardware.

8.1 Clock-source hierarchy

Source	Use	Resolution
`dwt`	DWT CYCCNT, servo-disciplined — needs secure-debug (SPNIDEN); locked on this part	~3 ns
`systick`	SysTick cycle accumulator, phase-disciplined against the coarse timer — active source	3 ns
`coarse`	Free-running 1 MHz RTIM timer (TIMER10/11), discipline reference	1000 ns

The reported resolution is

$$ r = \left\lceil \frac{1000}{c_{\mu s}} \right\rceil = \left\lceil \frac{1000}{334} \right\rceil = 3 \text{ ns}. $$

8.2 Cycle ↔ nanosecond conversion

To avoid 64-bit overflow, conversions split whole microseconds from the sub-microsecond remainder. For ns → cycles, with $n_{\mu s} = \lfloor n/1000 \rfloor$ and $n_{sub} = n \bmod 1000$:

$$ \text{cyc}(n) = \frac{n_{\mu s}, f}{10^6} + \left\lfloor \frac{n_{sub}, f + 5\cdot10^8}{10^9} \right\rfloor. $$

For cycles → ns, with $s = \lfloor \text{cyc}/f \rfloor$ and $\rho = \text{cyc} \bmod f$:

$$ \text{ns}(\text{cyc}) = s\cdot10^9 + \left\lfloor \frac{\rho\cdot10^9 + f/2}{f} \right\rfloor. $$

The $+5\cdot10^8$ and $+f/2$ terms implement round-to-nearest.

8.3 Sub-microsecond interpolation

The fine clock composes the coarse 1 MHz count (microseconds) with a SysTick-derived sub-microsecond residual. With $w$ the SysTick cycles elapsed inside the current microsecond, $\varphi$ the learned phase offset (cycles), and the servo's frequency word $\nu$ (where $\nu = \text{round}(\text{ratio}\cdot D)$, $D = 10{,}000$):

$$ \text{sub}_{cyc} = \big((w \bmod c_{\mu s}) - \varphi + c_{\mu s}\big) \bmod c_{\mu s}, $$

$$ \text{sub}_{ns} = \min!\left(999,; \frac{\text{sub}_{cyc}\cdot 1000\cdot D}{c_{\mu s}\cdot \nu}\right) \approx \frac{\text{sub}_{cyc}\cdot 1000}{c_{\mu s}\cdot \text{ratio}}. $$

A shared monotonic clamp guarantees $t_{k+1} \ge t_k$ across all task and ISR readers.

8.4 Kalman clock servo

A 500 ms-period task disciplines the SysTick-vs-coarse phase. State $\mathbf{x} = [\varphi, \rho]^\top$ (phase, phase-rate); constant-velocity model $F = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix}$, measurement $H = [1, 0]$.

Predict:

$$ \varphi^- = \varphi + \rho, \qquad P^- = F P F^\top + Q, $$

which expands (with process noise $Q = \mathrm{diag}(q_\varphi, q_\rho)$) to

$$ P^-_{00} = P_{00} + 2P_{01} + P_{11} + q_\varphi,\quad P^-_{01} = P_{01} + P_{11},\quad P^-_{11} = P_{11} + q_\rho. $$

Update with measured phase $z$ (modular innovation over modulus $c_{\mu s}$):

$$ \tilde{y} = \mathrm{wrap}(z - \varphi^-), \qquad S = P^-_{00} + R. $$

Outlier gate — reject and widen $Q$ if the normalized innovation exceeds $5\sigma$:

$$ \frac{\tilde{y}^{,2}}{S} > \sigma_{\text{gate}}^2 = 25. $$

Otherwise apply the gain $K = [K_0, K_1]^\top = [P^-{00}/S,; P^-{01}/S]^\top$:

$$ \varphi = \varphi^- + K_0,\tilde{y}, \qquad \rho = \rho + K_1,\tilde{y}, $$

$$ P_{00} = (1-K_0)P^-_{00},\quad P_{01} = (1-K_0)P^-_{01},\quad P_{11} = P^-_{11} - K_1 P^-_{01}. $$

After each accepted update the process noise anneals back toward its initial value, $q \leftarrow 0.99,q + 0.01,q_{\text{init}}$, so the filter recovers from a widened state.

8.5 Frequency track and drift

Over a growing window the servo measures the true cycles-per-microsecond ratio against the coarse reference. With $\Delta\text{cyc}$ measured cycles over $\Delta_{\mu s}$ coarse microseconds and ideal count $\Delta\text{cyc}{\text{ideal}} = \Delta{\mu s}\cdot c_{\mu s}$:

$$ \text{ratio} = \frac{\Delta\text{cyc}}{\Delta\text{cyc}_{\text{ideal}}}, \qquad \text{drift}_{\text{ppm}} = (\text{ratio} - 1)\cdot 10^6, $$

clamped to $\pm10^5$ ppm. Measured steady drift is $-1$ to $-2$ ppm.

8.6 Allan deviation

Clock quality is reported as the overlapping Allan deviation of the fractional-frequency samples $y_i$ at the window $\tau$:

$$ \sigma_y(\tau) = \sqrt{\frac{1}{2(M-1)} \sum_{i=1}^{M-1} \left(y_{i+1} - y_i\right)^2}, $$

where $M$ is the number of samples (so $M-1$ first-differences).

8.7 External 1PPS discipline

Given two consecutive PPS edge timestamps, the edge-to-edge residual is

$$ \varepsilon = (t_k - t_{k-1}) - 10^9 \text{ ns}. $$

The edge is accepted if $|\varepsilon| \le$ TIMER_PPS_TOLERANCE_NS ($10^6$ ns); lock asserts after TIMER_PPS_LOCK_EDGES $= 3$ good edges. The TAI mapping is offset-only (never re-steers the frequency loop):

$$ t_{\text{TAI}} = t_{\text{mono}} + \theta, \qquad \theta = (k\cdot 10^9) - t_k. $$

8.8 Spin-wait

Busy-waits are bounded to $[\text{MIN}, \text{MAX}] = [100\text{ ns}, 10\text{ ms}]$. With DWT they use wrap-safe signed cycle deadlines; without it (this part) they spin on the coarse base, so spin granularity is ~1 µs even though timestamps resolve to ~3 ns. Every loop carries a hard iteration cap and cannot hang.

8.9 Disciplined holdover (crystal-only absolute time)

This is not an atomic clock. An atomic clock disciplines to a quantum transition; this board has only a crystal. With no external reference at runtime there is no information path to UTC, so absolute time cannot be tracked — only held. This layer squeezes the crystal to its physical limit and reports honest uncertainty. To genuinely track UTC, feed a GPS/CSAC 1PPS edge into timer_pps_edge() (§8.7).

The monotonic clock (§8.1–8.8) is unchanged and remains the injection-scheduling base. A separate accessor, timer_get_time_disciplined_ns(), applies an absolute correction:

$$ t_{\text{disc}} = t_{\text{epoch}} + (t_{\text{mono}} - t_{\text{epoch}}),(1 - y_{\text{tot}}), \qquad y_{\text{tot}} = (\text{cal}_{\text{ppb}} + \text{therm}_{\text{ppb}})\times 10^{-9}. $$

Note $1\text{ ppb} = 1\text{ ns/s}$, so the correction is exact integer arithmetic. The epoch is re-snapshotted whenever $y_{\text{tot}}$ changes, keeping disciplined time continuous across rate updates.

One-time calibration constant. Measured once against any traceable source (GPS/NTP at provision time) and stored, applied via timer_set_cal_offset_ppb(). This nulls the bulk fixed offset; it is not a runtime reference.

Temperature feed-forward. The factory crystal S-curve (ameba_xtal_trackingcfg.c) gives frequency error vs a thermal abscissa $x$:

$$ \text{therm}_{\text{ppb}} = \frac{c_3 x^3 + c_2 x^2 + c_1 x + c_0}{\text{scale}}. $$

Evaluated each servo period from a registered temperature hook (timer_set_temp_source). It is off by default: AmebaDplus has no thermal driver (temperature comes from an ADC internal channel that must be confirmed per board), so the timer supplies the exact, configurable math and a hook rather than assuming a die-temp API.

Honest uncertainty. timer_get_clock_status() reports a $\pm$ bound that grows with time since calibration, $u(t) = t_{\text{since cal}} \cdot \max(\sigma_y(\tau)\cdot10^9,,50)\text{ ns}$ (Allan deviation floored at 0.05 ppm) — the clock always knows how wrong it might be.

Accuracy budget (crystal-only): raw ±10–25 ppm (~1–2 s/day) → + one-time cal ±1–3 ppm → + thermal feed-forward ±0.5–2 ppm (~40–170 ms/day). The drift floor is physical; only an external 1PPS removes it.

9. Injector

A FreeRTOS task implementing the ATCS (Apparent Tardiness Cost with Setups) policy over up to 32 named injectors.

9.1 ATCS priority key

Each cycle, every ready injector $j$ is scored and the maximum is selected (an $O(N)$ argmax, since keys change every cycle):

$$ K(j) = \frac{w_j}{p_j},\exp!\left(-\frac{\text{slack}_j}{k_1,\bar{p}}\right)\exp!\left(-\frac{s_{ij}}{k_2,\bar{s}}\right), $$

where $w_j = \text{priority}_j + 1$, $p_j$ is estimated airtime, $\text{slack}j$ is time-to-deadline, $s{ij}$ is the predicted switch cost from current channel $i$, and $\bar{p}, \bar{s}$ are EMA averages. The tunables default to $k_1 = k_2 = 2.0$.

9.2 Admission control (Liu & Layland)

Activation is admitted only if total airtime utilization stays within the bound $U_b = 0.85$:

$$ U = \sum_{i \in \text{active}} \frac{p_i}{T_i} + \frac{p_{\text{cand}}}{T_{\text{cand}}} ;\le; U_b. $$

With setup-aware admission enabled, the predicted per-period switch cost is folded in:

$$ U = \sum_i \frac{p_i + s_i}{T_i} ;\le; U_b, $$

which is why dense multi-channel workloads (expensive cross-band switches) admit fewer than 32 injectors.

9.3 Token-bucket fairness

Each injector holds a token bucket replenished 1:1 with elapsed airtime, capped at one interval. When tokens are exhausted the ATCS key is halved,

$$ K(j) \leftarrow \tfrac{1}{2}K(j), $$

deprioritizing a runaway injector so it cannot starve the others.

9.4 Fairness yield

If the selected deadline is already past, the loop transmits without sleeping. To prevent a backlog of perpetually-late injectors from pinning the CPU at scheduler priority, a one-tick vTaskDelay is forced every INJECTOR_FAIRNESS_YIELD_ITERS $= 8$ sleepless iterations.

9.5 Airtime model

Estimated airtime for a frame of $L$ bytes at PHY rate $R$ (Mb/s) follows the standard 802.11 form

$$ T_{\text{air}} \approx T_{\text{pre+hdr}} + \frac{8L}{R}, $$

with the rate code first mapped to 100-kb/s units (covering CCK, OFDM, HT, VHT 1SS, and HE 1SS). This $p_j = T_{\text{air}}$ feeds both the ATCS key and admission control.

9.6 Jitter EMA and robust averages

Timing averages use a robust exponential moving average with $\alpha = 1/8$; samples above a hard ceiling (INJECTOR_SWITCH_COST_MAX_NS $= 100$ ms) are rejected as outliers:

$$ e_{k+1} = \frac{7 e_k + x_k}{8}. $$

9.7 Per-operation cost statistics (Welford)

Independently of the predictor EMAs, the scheduler accumulates exact statistics for the ns cost of each operation — same-band switch, cross-band switch, TX-power set, raw-frame inject — using Welford's online algorithm. For each accepted sample $x$ (with running count $n$, mean $\mu$, sum-of-squares $M_2$):

$$ \delta = x - \mu_{n-1}, \quad \mu_n = \mu_{n-1} + \frac{\delta}{n}, \quad M_2 \mathrel{+}= \delta,(x - \mu_n), $$

$$ \sigma^2 = \frac{M_2}{n-1}, \qquad \sigma = \sqrt{\sigma^2}. $$

Read via injectorManager_getCostStats(); it reports $n$, min, max, $\mu$, $\sigma$, and last per operation.

When each cost is actually incurred. These are per-operation costs, not per-frame overheads, and the scheduler only pays them when the operation genuinely happens:

TX-power set is short-circuited in apply_tx_power(): if the requested power equals the current power it returns immediately with no wifi_set_tx_power() call. So in any steady workload where injectors hold a fixed power, the tx_power cost is zero — it is effectively pinned at runtime with no action needed. The ~0.75 ms tx_power figures in the self-test (Phase 9.3/9.4) appear only because that phase deliberately assigns a different power per injector to characterize the switch cost, then asserts it is nonzero. It is a probe, not a workload cost.
Channel switch (wifi_set_channel(), ~1.7 ms warm) is paid only on an actual channel change for the winning injector, gated by the grant predicate. It is the dominant real cost and is hardware-bound (PLL relock); set_channel_api_do_rfk = 0 already removes the per-switch RFK that would otherwise make it ~8 ms.
Raw-frame inject (~17–34 µs) is the only cost paid every frame.

Consequently the 25-of-32 admission result in the ultra phase is driven by the airtime-utilization bound plus channel-switch setup cost under a deliberately multi-channel/multi-power stress configuration — not by power-set cost. With injectors sharing a channel and power, setup_cost_raw_ns() contributes nothing and far more injectors admit.

9.8 Exponential backoff and reliable TX

On TX failure with EXP_BACKOFF, the wait doubles per step up to a cap:

$$ B(k) = B_0 \cdot 2^{\min(k,,6)}, \qquad B_0 = 1 \text{ ms}. $$

In reliable (lossless) TX mode a frame refused with back-pressure is retried with a fixed $60\ \mu s$ backoff up to INJECTOR_RELIABLE_MAX_ATTEMPTS $= 256$ attempts (yielding one tick periodically) rather than dropped — trading deadline precision for zero TX drops.

9.9 Error codes

INJ_OK (0), INJ_ERR (−1), NOT_FOUND (−2), INVALID_ARG (−3), NO_SPACE (−4), TIMER (−5), CHANNEL (−6), BUSY (−7), RATE (−8), POWER (−9), STATE (−10), ADMISSION (−11), UNSUPPORTED (−12).

10. Monitor

A lock-free promiscuous-capture pipeline emitting pcapng.

10.1 Pipeline

Four stages: (1) the driver promisc callback copies each frame into a fixed pool slot and pushes it to a lock-free SPSC ring; (2) monitor_task filters, updates statistics, and assembles pcapng Enhanced Packet Blocks; (3) writer_task drains blocks to the sink (USB-CDC or SPI); (4) hopper_task selects channels.

The SPSC ring of size $N$ (a power of two) uses bit-masked indices: $\text{idx} = i ,&, (N-1)$, so head/tail wrap with no modulo and no lock between the single producer (ISR) and single consumer.

10.2 RCU filter double-buffer

Filter reconfiguration copies the active set to a standby, mutates it, atomically swaps the active pointer, then waits for readers (tracked by refcount) holding the old set to drain — keeping the read path lock-free.

10.3 UCB1 channel hopping

The hopper treats channels as arms of a multi-armed bandit. With total observations $T = \sum_i n_i$, mean reward $\bar{r}_i$ (normalized traffic) and visit count $n_i$, it selects

$$ i^\star = \arg\max_i \left( \bar{r}_i + c\sqrt{\frac{\ln T}{n_i}} \right), \qquad c = 1.5, $$

balancing exploitation ($\bar{r}_i$) against exploration of under-sampled channels. Dwell time adapts from a base $D_0 = 80$ ms, extended for uncertain channels and multiplied by $1.5\times$ for bursty ones, clamped to $[30, 500]$ ms:

$$ D_i = \mathrm{clamp}\big(D_0 \cdot u_i \cdot (b_i,?,1.5:1),; 30,; 500\big) \text{ ms}. $$

10.4 Per-channel statistics

Per-channel RSSI mean and variance use the same Welford recurrence as §9.7; an EMA traffic metric drives the hopper's reward term.

10.5 pcapng output

Frames are wrapped with a radiotap header (TSFT, rate, channel frequency, RSSI) as Enhanced Packet Blocks, preceded by a Section Header Block and an Interface Description Block (nanosecond timestamps), with periodic Interface Statistics Blocks.

10.6 Radio TSF

monitor_get_radio_tsf() returns the 64-bit 802.11 MAC time in microseconds — the clock sniffers print — so captures, injections, and CSI align on one timeline.

10.7 CSI (opt-in)

Gated behind MONITOR_ENABLE_CSI, CSI capture maps to the SDK wifi_csi_config() + RTW_EVENT_CSI_DONE path, decoding the per-subcarrier I/Q payload. For subcarrier $k$ with components $I_k, Q_k$, amplitude and phase are

$$ a_k = \sqrt{I_k^2 + Q_k^2}, \qquad \theta_k = \operatorname{atan2}(Q_k, I_k). $$

Important: on this silicon CSI is link-sounded — it requires an associated peer (this device joined to an AP, or SoftAP with a client). Armed in pure promiscuous mode with no association it yields no reports.

11. Wi-Fi Radio Facade

Coordinates timer, injector, and monitor behind one interface.

11.1 Lifecycle state machine

Mutex-guarded transitions validated against a strict table; invalid transitions return RADIO_ERR_STATE, and a ROLLBACK state unwinds a partial init in reverse:

$$ \text{UNINIT} \to \text{STARTING} \to \text{RUNNING} \to \text{STOPPING} \to \text{STOPPED} \to \text{UNINIT}, $$

with STARTING → ROLLBACK on any subsystem failure.

11.2 Channel-grant protocol

The channel is a revocable, single-owner resource:

$$ \text{NONE} ;\Rightarrow; \text{FACADE} ;\rightleftarrows; \text{INJECTOR}, \qquad \text{FACADE} \to \text{HOPPER}. $$

Only the grant holder may switch channels. In DUAL the initial grant is FACADE; the app may yield to the injector and reclaim. Enabling the hopper requires no active injectors (enforced interlock → RADIO_ERR_BUSY).

11.3 Telemetry

radio_get_health_ex() (mode, channel, grant, timer/injector/monitor stats, servo diagnostics), radio_get_cost_stats() / radio_reset_cost_stats() (the §9.7 cost metrics), and radio_get_tsf() (§10.6).

11.4 Heap budget

DUAL requires ≈31 KB (manager + scheduler stack + three monitor stacks + queue/semaphores), comfortable against ≈44 KB free internal SRAM, or far more with PSRAM.

11.5 Error codes

RADIO_OK (0), RADIO_ERR (−1), STATE (−2), ARG (−3), TIMER (−4), INJECT (−5), MONITOR (−6), NXIO (−7), BUSY (−8), GRANT (−9).

12. SDK configuration (usrcfg)

The Ameba-RTOS SDK exposes board-level tuning through usrcfg/ source files compiled into the firmware. The settings below are optimized for uninterrupted injection and capture on the BW20-12F.

12.1 Boot clock — `ameba_bootcfg.c`

The SoC PLL and core voltage are selected by Boot_SocClk_Info_Idx into the SocClk_Info table. The supplied ameba_bootcfg.c sets the index to 5 (for both the CONFIG_USB_DEVICE_EN and non-USB branches). The relevant rows of the shipped table are:

Index	PLL	KM4 div	Voltage	Target
2	PLL_334M	CLKDIV(1) → 334 MHz	1.0 V	SiP PSRAM
5	PLL_334M	CLKDIV(1) → 334 MHz	1.0 V	Flash / single die (no PSRAM) — selected

Valid_Boot_Idx_for_SiP_Psram[] = {0, 1, 2, 6, 7, 8} and Valid_Boot_Idx_for_No_Psram[] = {3, 4, 5, 6, 7, 8} enumerate the indices each die type may use; index 5 is a valid no-PSRAM slot. Both index 2 and index 5 yield the same 334 MHz KM4 clock at CORE_VOL_1P0 (1.0 V). On a board whose OTP fuses are programmed for 0.9 V, selecting a 1.0 V index produces the boot line OTP BOOT VOL choose 0.9V but usrcfg choose 1.0V!; it runs but over-volts the core. If your die is fused for 0.9 V, choose a 0.9 V index from the valid set for your die type (e.g. index 0 or 1 for SiP PSRAM).

12.2 Wi-Fi driver — `ameba_wificfg.c`

These settings directly affect injection throughput, channel-switch latency, and capture reliability.

NP SKB pool — controls in-flight TX capacity:

$$\text{SKBs} \times \text{buf_size} \le \text{NP pool} - \text{headroom}.$$

With skb_num_np = 32 and MAX_SKB_BUF_SIZE ≈ 1.9 KB, the NP pool is the tightest resource on this build. The supplied ameba_wificfg.c enforces a floor at runtime: if skb_num_np < rx_ampdu_num + skb_num_np_rsvd it is raised to that sum (logged via RTK_LOGW). With A-MPDU disabled (see below) the default-path reservation is skb_num_np_rsvd = 6 with rx_ampdu_num = 0, so the floor is 6 and the configured 32 sits well above it. Enabling WTN (wtn_en) overrides this to skb_num_np = 20 with skb_num_np_rsvd = 16. Measured free NP heap after wifi_on() is ~5.7 KB (boot log NP heap: 5736, wifi used: 61376); this figure is fixed at Wi-Fi init for dual-band operation and is not materially changed by the SKB count or A-MPDU settings — see §12.4.

Channel-switch acceleration:

Setting	Value	Effect
`set_channel_api_do_rfk`	`0`	Skip RF calibration on every `wifi_set_channel` — drops same-band switch cost from ~8 ms to ~1.7 ms (measured). The PLL is already locked; a per-switch RFK is unnecessary for injection.
`rf_calibration_disable`	`0`	Full RF calibration runs during `wifi_on()`. The supplied file leaves this enabled (`0`); set it to `1` only if you have verified the factory calibration in OTP/EFUSE is sufficient for your board and want to trim ~200 ms off boot.

Power save — hard off (mandatory):

Setting	Value	Reason
`ips_enable`	`0`	Inactive Power Save disabled — the radio must never gate itself between injection bursts.
`lps_enable`	`0`	Legacy Power Save disabled — no DTIM-based sleep.
`ips_ctrl_by_usr`	`0`	Prevent user-level IPS re-enable.

If IPS or LPS are left on, the radio silently sleeps between bursts and injection timing collapses.

Aggregation — on:

Both ampdu_rx_enable and ampdu_tx_enable are 0 in the supplied file, with rx_ampdu_num = 0 and tx_ampdu_num = 0 on the default path. Raw-frame injection bypasses the aggregation path entirely, and promiscuous capture works on individual MPDUs, so the A-MPDU reorder buffers are unused here. Disabling A-MPDU trims a small amount of RX-path work and relaxes the SKB floor (rx_ampdu_num no longer contributes), and was measured to reduce the per-frame inject cost (Phase 9.1 tx-inject mean fell from ~26 µs to ~17 µs across runs). Trade-off: this build can no longer reassemble aggregated frames as a managed STA. Re-enable both flags (with a nonzero rx_ampdu_num) only if the device must act as a high-throughput station.

EDCCA (energy-detect CCA):

rtw_edcca_mode = RTW_EDCCA_CS selects the Carrier-Sense (fixed-threshold) energy-detect mode (see enum rtw_edcca_mode in wifi_api_types.h). For sustained injection on a busy 2.4 GHz band this is the better choice than the SDK default RTW_EDCCA_NORM: the dynamic NORM threshold adapts to real-time RSSI and will intermittently hold off TX when it senses energy, which was observed to produce sparse TX drops and reduced injection throughput under the 8-injector 2.4 GHz stress test. CS is more permissive and deterministic. The injector additionally bypasses TX carrier-sense at runtime via injectorManager_setCcaBypass(mgr, true); the EDCCA floor is a separate, regulatory-level gate. Other selectable values are RTW_EDCCA_ADAPT (fixed ETSI-adaptivity threshold) and RTW_EDCCA_DISABLE (no energy-detect gate; closed-lab only).

RX sensitivity:

rx_cca_thresh = 0 selects the driver's dynamic minimum-RX-power mechanism — the most sensitive setting, correct for a capture build so weak frames are not rejected. The documented override range is [-100, 0) dBm for a fixed floor; a positive value is out of range.

Band selection:

freq_band_support = RTW_SUPPORT_BAND_MAX enables dual-band (2.4 + 5 GHz). If your workload is 2.4 GHz only, change to RTW_SUPPORT_BAND_2_4G — then every channel switch is a same-band relock (~1.7 ms) and you never pay the ~15 ms cold cross-band penalty.

Antenna diversity:

antdiv_mode = RTW_ANTDIV_DISABLE — the BW20-12F has a single on-board PCB antenna; enabling diversity adds a per-frame decision loop with no second antenna to switch to. The injector's setAntenna/getAntenna correctly return INJ_ERR_UNSUPPORTED when this is disabled.

12.3 Other config files

File	Status	Notes
`ameba_sleepcfg.c`	Stock	XTAL off during sleep is fine — the radio is fully awake during injection. Low-power UART RX disabled.
`ameba_pinmapcfg.c`	Stock	BW20-12F default. SPI output pins (PB17–PB21) pulled down; change PB17 to `GPIO_PuPd_UP` (CS idle-high) if using the SPI capture sink.
`ameba_flashcfg.c`	Stock	Quad-IO at `CLKDIV(2)` (~167 MHz SPIC). No change needed — XIP performance is not on the injection hot path.
`ameba_xtal_trackingcfg.c`	Stock	Crystal S-curve and cap-sensitivity coefficients. These feed the factory calibration the timer's frequency track runs on top of; do not modify.
Power tables (`ameba_wifi_power_table_usrcfg.c`)	Modified	Regulatory TX power limits. The supplied file is not stock: the FCC tables (`tx_pwr_limit_2g_fcc` / `tx_pwr_limit_5g_fcc`) are overridden to `127` on every channel/rate (a "no software limit" shielded-lab override, marked in-file). All other domains (ETSI, MKK, IC, KCC, CN, WW) retain their original values; ACMA/CHILE/MEXICO/UKRAINE/QATAR/UK/NCC/EXT are stubbed to a single `{0}` entry. `regu_en[16]` enables FCC, MKK, ETSI, IC, KCC, WW, and CN. The injector's `setTxPower` is clamped by these tables via `wifi_set_tx_power`. Values are in 0.25 dBm units; `127` means "no software limit." See §12.5 for the table layout and lookup function.

12.4 Heap budget summary

The heap footprint is dominated by a fixed allocation made at wifi_on() for dual-band operation, before any injectors run. The measured boot figures are stable across builds:

$$\text{NP free} \approx 5.7 \text{ KB} ;(\texttt{NP heap: 5736}), \qquad \text{AP free} \approx 63 \text{ KB} ;(\texttt{AP heap: 63000}).$$

In the 2-minute DUAL soak the general free heap holds flat at ~27.9 KB (heap=27856, worst dip 0 B). Two findings from measurement are worth recording, because they bound what ameba_wificfg.c can do:

The init allocation does not move with the SKB count or A-MPDU settings. Disabling A-MPDU and dropping rx_ampdu_num to 0 did not recover general heap (the floor stayed at 27856 across runs); the gain from that change is per-frame inject cost, not memory. The fixed dual-band allocation is the floor.
The only ameba_wificfg.c lever that materially reduces this allocation is going single-band (freq_band_support = RTW_SUPPORT_BAND_2_4G or _5G), which frees the unused band's calibration/table memory. That directly conflicts with the dual-band requirement — they are the same allocation. Choose one.

It held flat with zero leak across the full soak, so the configuration is stable; the margin is simply smaller than a single-band build would give.

12.5 TX power limits — `ameba_wifi_power_table_usrcfg.c`

This file supplies the per-domain regulatory TX-power-limit tables and the lookup function the PHY layer calls. It is compiled from the SDK's usrcfg/ and replaces the stock copy.

Enabled domains. regu_en[16] is ordered {FCC, MKK, ETSI, IC, KCC, ACMA, CHILE, MEXICO, WW, GL, UKRAINE, CN, QATAR, UK, NCC, EXT}. The supplied file enables FCC, MKK, ETSI, IC, KCC, WW, and CN; the remaining domains are 0. regu_en_array_len is derived with sizeof. pwrlmt_regu_remapping[] (with array_len_of_pwrlmt_regu_remapping) lets a customer remap a domain code onto another domain's limits; it ships as a single zeroed entry.

Table layout. Each enabled domain has a 2.4 GHz table tx_pwr_limit_2g_<domain>[4][14] (rows: CCK, OFDM, HT_B20, HT_B40; columns: 14 channels), a 5 GHz table tx_pwr_limit_5g_<domain>[3][28] (rows: OFDM, HT_B20, HT_B40; the HT_B40 row uses 14 valid entries), and a spectral-shaping table tx_shap_<domain>[2][4]. Limit values are 0.25 dBm units (e.g. 68 → 17 dBm); 127 is the sentinel for "no software limit." Disabled domains are stubbed as [][CH_NULL] arrays of {0}.

FCC shielded-lab override. The supplied file sets every entry of tx_pwr_limit_2g_fcc[4][14] and tx_pwr_limit_5g_fcc[3][28] to 127, removing the software limit for FCC channels. This is an explicit lab override (so labelled in the file) and must be reverted to compliant values before any over-the-air use outside a shielded enclosure. tx_shap_fcc is left at its default {{0,1,1,1},{0,0,0}}.

byRate baseline. array_mp_txpwr_byrate_2g[] and array_mp_txpwr_byrate_5g[] give the per-rate baseline TX power (also 0.25 dBm units), with their *_array_len companions computed via sizeof. These are the default-rate power points the limit tables then cap.

Lookup function. wifi_hal_phy_get_power_limit_value(u8 regulation, u8 band, u8 limit_rate, u8 chnl_idx, bool is_shape) is the single entry point the PHY uses. It switches on the TXPWR_LMT_* regulation code, indexes the matching 2.4 GHz table when band == RTW_BAND_ON_24G and the 5 GHz table otherwise, and returns either the spectral-shaping index (is_shape == true) or the power limit (is_shape == false). Unmatched regulations and TXPWR_LMT_WW shaping return the defaults (127 limit, -1 shape).

13. Measured performance

All figures measured on hardware (RTL8721Dx @ 334 MHz, SysTick clock source).

13.1 Timer

Metric	Value
Resolution	3 ns (SysTick)
Steady drift	−1 to −2 ppm, servo converged ~4 s
Monotonicity	0 backsteps over 4000 reads under 32-injector load
Spin granularity	~1 µs (coarse-bounded on secure part)

13.2 Injector — per-operation cost (steady state)

Operation	n	min	mean	max	σ
Raw-frame inject (fixed ch/pwr)	19082	4.4 µs	16.6 µs	260 µs	10.4 µs
Channel switch, same band	994	0.83 ms	1.69 ms	2.72 ms	0.24 ms
Channel switch, cross band	452	1.43 ms	1.75 ms	1.96 ms	0.05 ms
TX-power set (probe only)	2642	0.60 ms	0.75 ms	1.84 ms	0.10 ms

Raw-frame inject improved to a ~16.6 µs mean after disabling A-MPDU (§12.2). The TX-power-set row is measured only in the synthetic multi-power phase; it is not paid in a fixed-power workload (§9.7). Cross-band switching is ~1.7 ms warm (steady state) versus ~15 ms cold. Jitter: 1-injector mean ~45 µs (σ ~8 µs); 8-injector mean ~18 µs (σ ~2 µs).

13.3 Endurance soak (2 min, DUAL, reliable TX)

144,432 frames injected, 0 TX drops, 0 monitor drops, heap flat (0 B dip), worst drift 1 ppm, scheduler never stuck.

13.4 Self-test

222 checks, 0 failures across the full suite: pre-init guards, all four facade modes, dual-band (2.4 + 5 GHz), 32-injector ultra load, ns-precision-under-load, the four cost-characterization regimes, radio-TSF correlation, and the endurance soak.