Skip to content

RaptorQ (RFC 6330) FEC layer for the stream link#86

Merged
josephnef merged 2 commits into
masterfrom
raptorq-fec
Jun 7, 2026
Merged

RaptorQ (RFC 6330) FEC layer for the stream link#86
josephnef merged 2 commits into
masterfrom
raptorq-fec

Conversation

@josephnef
Copy link
Copy Markdown
Collaborator

Summary

The corruption survey in #85 showed real-range OFDM frames on this link will see 30–70% loss. tun_p2p.py's blind --repeat N is a fixed-cost workaround that can't compose to handle the tail; this PR ships a real erasure code on top of the existing stream framing.

Library

raptorq from cberner (Rust+PyO3 binding to the RFC 6330 reference port). MIT, manylinux abi3 wheels on PyPI, ~26 Gbps enc / ~7 Gbps dec at K=1000 on commodity x86. uv add raptorq is the only install step.

Wire format

The existing stream.py framing stays untouched. FEC is an inner envelope living inside StreamFrame.payload:

   FEC_MAGIC      (2)  = 0xF52E
   VERSION/FLAGS  (1)  = 0
   K              (1)  = source symbols per block
   KREAL          (1)  = real source symbols in this block (≤ K). Trailing
                        (K - KREAL) decoded symbols are zero-pad to discard.
   SYMBOL_SIZE    (2)  = LE u16
   BLOCK_ID       (2)  = LE u16 wraps
   RAPTORQ_PKT    (var) = lib-managed SBN+ESI+symbol
   inner overhead   = 9 B + raptorq's 4 B SBN/ESI = 13 B

Source symbols are themselves concatenations of length-prefixed IP packets:

[u16 len_a][packet_a]…[u16 len_b][packet_b]…[zero pad to SYMBOL_SIZE]

So small packets (ACK floods) share symbols instead of each burning a whole symbol's worth of airtime.

Files

  • tools/precoder/pyproject.toml — add raptorq>=2.
  • tools/precoder/stream_fec.pyFecConfig, FecEncoder (concatenation packing + block encoding), FecDecoder (block-incremental decode + late-symbol drop + block expiry).
  • tools/precoder/test_stream_fec.py — 19 unit tests: round-trip, loss tolerance 0/20/40% at R/K=1, 50% at R/K=2, unrecoverable-block bookkeeping at 70%, concatenation, partial flush, block-id wrap, MTU enforcement, garbage envelopes.
  • tools/precoder/tun_p2p.py — new --fec-k/--fec-overhead/--fec-symbol-size/--fec-flush-ms/--fec-block-expire-ms flags. tx_thread feeds packets through the encoder; a parallel fec_flush_thread force-encodes partial blocks every flush-ms (sparse traffic doesn't stall). rx_thread feeds payloads through the decoder; decoded IP packets go to TUN. Outer SeqWindow dedup is forced OFF when FEC is on (RaptorQ symbols self-dedup via SBN+ESI). New fec=[...] segment in the periodic stderr report. Docstring extended.

Hardware verification

Two-netns single-host bench (RTL8812AU 0x8812 + TP-Link Archer T2U Plus / RTL8821AU 2357:0120, ch 6, no --repeat, ping -c 30 -i 1):

Config RTT min/avg/max Loss DUP Blocks ok/lost
--fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50 121 / 160 / 207 ms 0% 0 30 / 1 (startup)
--fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20 73 / 95 / 145 ms 0% 0 30 / 1 (startup)

The K=8 config trades a bit of recovery margin for a 65 ms drop in median RTT. Both decode 100% of source packets on a healthy link; the survey's noisier regimes are what motivates --fec-overhead > 1.

For comparison from PR #82's earlier numbers (same bench, byte mode):

Mode Loss Avg RTT
Byte mode --repeat 1 10% 7 ms
Byte mode --repeat 4 + dedup 0% 10 ms (with up to 25 DUPs per ping eaten by dedup)
FEC K=8 R/K=1 flush=20 0% 95 ms

FEC moves us from "blind redundancy + dedup" to "real erasure code". The latency cost is the K-source-symbol encode buffer; the win is that the codec scales gracefully to higher loss rates by raising --fec-overhead instead of running out at --repeat=∞.

Test plan

  • cd tools/precoder && uv run pytest → 87 passed (31 pipeline + 37 stream + 19 fec)
  • python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py → 8 passed
  • tun_p2p.py --help parses cleanly (incl. all FEC flags)
  • Bench: K=16/R=1 and K=8/R=1, both 30/30 ping with 0% loss and 0 DUPs

Open caveats (documented in script)

  • Strict block boundaries — no cross-block FEC, no Raptor carousel. Good enough at K=8–16 + 20–50 ms flush; revisit if the latency budget tightens further.
  • No rateless dynamic overhead — R/K is fixed at construction. A future PR could let RX hint TX to send more repair symbols via a reverse-channel feedback envelope.
  • Patent note: RFC 6330 has Qualcomm patents largely expired in primary jurisdictions by 2026; cberner's MIT lib explicitly notes this.

Builds on #82 (TUN bridge, merged), #83 (corrupted-frame surfacing, merged), #84 (phy soft metrics, open), #85 (corruption survey, open).

🤖 Generated with Claude Code

josephnef and others added 2 commits June 7, 2026 17:15
The corruption survey in PR #85 showed that real-range OFDM frames on
this link will see 30-70% loss. tun_p2p.py's blind --repeat N is a
fixed-cost workaround; this PR ships a real erasure code on top of
the existing stream framing.

* tools/precoder/pyproject.toml: add `raptorq>=2` (cberner's PyO3
  binding to the Rust RFC 6330 implementation; abi3 manylinux wheels
  on PyPI, MIT, ~26 Gbps enc / ~7 Gbps dec at K=1000).
* tools/precoder/stream_fec.py: FecConfig (K, symbol_size, overhead)
  + FecEncoder.add_packet / .flush + FecDecoder.add_symbol /
  .expire_blocks_older_than. IP packets are concatenation-packed into
  K-symbol blocks (u16 length prefix + payload per packet, zero pad
  to symbol_size). RaptorQ produces K + ceil(K*overhead) output
  symbols per block; receiver decodes from any K+ε of them.
* tools/precoder/test_stream_fec.py: 19 unit tests — round-trip with
  no loss; loss tolerance @ 0/20/40% (R/K=1 reliably recovers there);
  50% loss recovers only at R/K=2; 70% loss is unrecoverable
  (asserts the expire-blocks bookkeeping fires); concatenation
  packing; partial-block flush; block-id wrap; MTU enforcement;
  garbage-envelope handling.
* tools/precoder/tun_p2p.py: new --fec-k / --fec-overhead /
  --fec-symbol-size / --fec-flush-ms / --fec-block-expire-ms flags.
  tx_thread feeds packets through FecEncoder; a parallel
  fec_flush_thread force-encodes partial blocks every flush-ms so
  sparse traffic doesn't stall. rx_thread feeds payloads through
  FecDecoder, writing decoded IP packets to TUN. RX-side seq dedup is
  forced off when FEC is on (RaptorQ symbols are dedup-friendly via
  SBN+ESI). New `fec=[...]` segment in the periodic stderr report.
* tun_p2p.py docstring extended with an FEC section.

Inner envelope (lives inside StreamFrame.payload):
  MAGIC(2)=0xF52E  VER(1)=0  K(1)  KREAL(1)  SYMBOL_SIZE(2)
  BLOCK_ID(2)  RAPTORQ_PKT(var)   = 9 B header + raptorq-managed
  SBN+ESI+symbol.

Source symbols are themselves concatenations of length-prefixed IP
packets, so small packets (ACK flood) share symbols instead of each
burning a whole symbol's worth of airtime.

Hardware verification (two-netns single-host bench, RTL8812AU 0x8812
+ TP-Link Archer T2U Plus / RTL8821AU 0x0120, ch 6, no --repeat,
ping -c 30 -i 1):

  --fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50
    30/30 received, 0% loss, 0 DUP, RTT 121/160/207 ms
    blk-ok=30 blk-lost=1 (startup), sym-tx=1023 sym-rx=973

  --fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20
    30/30 received, 0% loss, 0 DUP, RTT 73/95/145 ms
    blk-ok=30 blk-lost=1, sym-tx=544 sym-rx=511

The smaller K trades a bit of recovery margin for a 65 ms drop in
median RTT. Both decode 100% on a healthy link; the survey's noisier
regimes are what motivates --fec-overhead > 1.

Test results

  cd tools/precoder && uv run pytest
  → 87 passed (31 pipeline + 37 stream + 19 fec)

  python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py
  → 8 passed

Open caveats (documented in stream_fec.py / tun_p2p.py docstrings)

* Strict block boundaries — no cross-block FEC, no Raptor carousel.
  Good enough at K=8-16 + 20-50 ms flush; revisit if the latency
  budget tightens further.
* No rateless dynamic overhead — R/K is fixed at construction.
* RFC 6330 has Qualcomm patents largely expired in primary
  jurisdictions by 2026; cberner's MIT lib explicitly notes this.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@josephnef josephnef merged commit a487237 into master Jun 7, 2026
5 checks passed
@josephnef josephnef deleted the raptorq-fec branch June 7, 2026 14:26
josephnef added a commit that referenced this pull request Jun 7, 2026
## Summary

Companion to PR #86. RaptorQ is a block code; each IP packet has to wait
for K source symbols to accumulate before the encoder emits anything,
giving an unavoidable ~95 ms per-packet latency floor on this link even
at K=8/flush=20 ms. **RFC 8681 RLC** is a sliding-window code: every
source symbol is shipped **systematically** (zero encoder buffer),
repair symbols are linear combinations over the last `window` source
symbols, and the decoder can emit each source the instant it arrives —
better for interactive traffic.

Per the user's direction the implementation wraps **Inria's reference
`swif-codec`** C library (irtf-nwcrg/swif-codec, co-authored by Vincent
Roca, the RFC 8681 author) via cffi rather than reimplementing the codec
in pure Python.

## Wire format

A new RLC envelope MAGIC `0xF534` lives alongside RaptorQ's frozen
`0xF52E`. The receive dispatcher peeks the first two bytes of the stream
payload and routes per-frame — a mixed-scheme deployment is silently
rejected (the foreign decoder counts the symbol as `malformed`) instead
of corrupting IP traffic.

```
MAGIC(2)  VER(1)  TYPE(1)  SYMBOL_SIZE(2)  ESI(4)  WIN(1)  KEY(2)  DT(1)  PAYLOAD(symbol_size)
```

Source symbols ride the same length-prefix concat-packing scheme as
RaptorQ.

## Files

| Path | Purpose |
|---|---|
| `vendor/swif-codec/` | Pinned snapshot of upstream commit `de8cd8e`
(CeCILL-B) + four small patches in `PATCHES.md` (C99 callback-signature
fix, drop `#define DEBUG` and a handful of unconditional
`printf`/`full_symbol_dump` calls). |
| `_swif_build.py` | cffi extension builder. Drives gcc directly —
modern setuptools' distutils path-handling broke the standard
`ffi.compile()` path. |
| `stream_fec.py` (modified) | Reshaped into a thin dispatcher:
`FecConfig` grows `scheme`/`window`/`density_threshold`;
`make_encoder`/`make_decoder` route to the right module. `FecConfig(k=…)
/ FecEncoder(cfg) / FecDecoder(cfg)` callers still work via
backward-compat aliases — `scheme` defaults to `raptorq` for them;
`tun_p2p.py`'s `--fec-scheme` defaults to `rlc`. |
| `stream_fec_raptorq.py` | Moved-out RaptorQ code, renamed
`RaptorQEncoder` / `RaptorQDecoder`. Otherwise unchanged. |
| `stream_fec_rlc.py` | New `RlcEncoder` / `RlcDecoder` over the cffi
binding. Encoder emits 1 source envelope + `ceil(overhead)` repair
envelopes per sealed source symbol; decoder feeds source symbols
straight through (systematic) and rebuilds the encoder's coding window
on each repair to re-derive the same TinyMT32 coefficients. |
| `test_stream_fec_rlc.py` | 13 RLC tests: round-trip, loss tolerance at
0/10/20 %, overhead bumping for 30 % loss, concatenation packing,
oversized rejection, dispatcher MAGIC routing, garbage envelope drop,
partial-symbol flush, distinct-MAGIC, config validation. |
| `tun_p2p.py` (modified) | New `--fec-scheme {rlc,raptorq}`,
`--fec-window`, `--fec-density-threshold`. `make_encoder`/`make_decoder`
factories. TUN-write boundary drops malformed FEC-recovered packets as
`mal`-counted instead of taking the bridge down. |

## Verification

**Offline** — `cd tools/precoder && uv sync && uv run pytest`:
- 100 tests pass (31 pipeline + 37 stream + 19 raptorq + 13 rlc).

**Hardware (two-netns single-host bench, RTL8812AU `0x8812` ↔ TP-Link
Archer T2U Plus / RTL8821AU `2357:0120`, ch 6), 10-min soak (600 pings
at 1 Hz, no `--repeat`):**

| Scheme | Config | Loss | RTT min/avg/max | Airtime/pkt | blk-lost |

|---------|---------------------|-------:|-----------------:|------------:|---------:|
| **RLC** | W=16, R/K=1, 20 ms | **6.0 %** | 13 / **42** / 79 ms | ~4 ms
(2 envelopes) | 10 / 586 |
| **RaptorQ** | K=8, R/K=1, 20 ms | **0.17 %** | 59 / **95** / 146 ms |
~34 ms (17 envelopes) | 1 / 606 |

The 6 % RLC loss is end-to-end after recovery: ~1.7 % of RLC blocks are
unrecoverable (window expires before enough repairs land), and each
unrecoverable block can carry multiple concatenation-packed ICMP
packets. Raising `--fec-overhead` closes the gap at the cost of airtime.

### Bandwidth-vs-recovery trade-off (this is the bigger story than RTT)

Per-IP-packet envelope count from the same soak: RLC ships **2**
envelopes per packet, RaptorQ ships **17** — an 8.5× airtime gap at this
traffic shape. The reason is structural: RaptorQ's block code emits K
source + R repair symbols **per block** regardless of how full the block
is, so a 1 Hz ping triggers a flush of 1 real packet plus K-1
zero-padded sources + K repairs. RLC's sliding-window code emits 1
source + R repair **per source symbol**, so sparse traffic stays sparse
on the wire.

This makes `--fec-scheme` a **bandwidth knob as much as a latency one**:
- **RLC** wins on **interactive / sparse** traffic — lower per-packet
airtime *and* lower per-packet latency, at the cost of higher residual
loss.
- **RaptorQ** wins on **bulk / saturated** traffic — its asymptotic ~50
% loss tolerance at large K beats RLC's ~30 % at small W, and the
K-symbol amortisation cost only stings when blocks aren't naturally
full.

## Test plan

- [x] `uv run pytest` → 100 passed
- [x] `python tun_p2p.py --help | grep '^\s*--fec'` parses scheme +
window + density-threshold + the existing five flags
- [x] 10-min RLC soak: 564/600 (6 %), RTT avg 42 ms, ~2 envelopes/pkt
- [x] 10-min RaptorQ regression: 599/600 (0.17 %), RTT avg 95 ms, ~17
envelopes/pkt — matches PR #86 baseline
- [x] Mixed-scheme negative: `test_dispatcher_routes_by_magic` proves
RLC envelopes go nowhere through a RaptorQ decoder and vice-versa
- [ ] Reviewer to re-soak on their bench with their own pair of
8812/8821 adapters

Builds on master (#86 merged). cffi extension is built lazily on first
import; `pyproject.toml` adds `cffi>=1.16`.

## Open caveats (documented in script)

- `swif-codec` upstream's C99 / verbose-codec quirks are patched in the
vendored copy; reapply on next upstream pull.
- The fixed encoder repair cadence (no rateless overhead bumping) is
shared with the RaptorQ path; a future PR could add reverse-channel
hints.
- Windows MSVC build of the cffi extension is untested — Linux/macOS
only on CI initially.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
josephnef added a commit that referenced this pull request Jun 7, 2026
Both fields are already on the RX descriptor: `seq_num` is parsed at
FrameParser.cpp:98, `tsfl` was one commented-out line at line 129. The
FEC layer (#86 / #87) and any latency-measurement consumer want both
visible; this is the data the chip already gives us.

* src/FrameParser.h — add `uint32_t tsfl` to rx_pkt_attrib alongside
  the existing seq_num.
* src/FrameParser.cpp — uncomment the TSFL parser:
  -   /* pattrib.tsfl=(byte)GET_RX_STATUS_DESC_TSFL_8812(pdesc); */
  +   pattrib.tsfl = GET_RX_STATUS_DESC_TSFL_8812(pdesc);
  Drop the bogus `(byte)` cast — the macro reads all 32 bits of
  pdesc+20 as a u32, not a byte (verified against rtl8812a_recv.h).
* demo/main.cpp — extend the <devourer-stream> printf with
  `seq=%u tsfl=%u`. Optional fields; PR #84's regex pattern in
  stream_rx.py / tun_p2p.py / corruption_analysis.py already tolerates
  the new fields via the same pass-through approach used for
  rssi/evm/snr (no Python-side change required to keep working).

What this enables (out of scope for this PR — just data surfacing)

* FEC RX side can dedup by chip-side seq before feeding the codec, so
  air-level retransmissions stop double-counting at the codec.
* One-way latency measurement by diffing TSF against the host clock
  at TX time — a building block for the F5 TX-RPT goodput numbers and
  for any adaptive `--fec-overhead` loop.

Verification

* `cmake --build build -j` clean.
* Default behaviour: <devourer-stream> lines now carry seq + tsfl
  fields; existing Python consumers (regexes are tolerant) keep
  working. tests/regress.py 4-cell matrix byte-identical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
josephnef added a commit that referenced this pull request Jun 7, 2026
## Summary

Both fields are already on the RX descriptor: `seq_num` is parsed at
`FrameParser.cpp:98`, `tsfl` was one commented-out line at line 129. The
FEC layer (#86 / #87) and any latency-measurement consumer want both
visible; this PR surfaces what the chip already gives us.

## Changes

- **`src/FrameParser.h`** — add `uint32_t tsfl` to `rx_pkt_attrib`
alongside the existing `seq_num`.
- **`src/FrameParser.cpp`** — uncomment the TSFL parser and drop the
bogus `(byte)` cast (the macro reads all 32 bits of `pdesc+20` as a u32,
not a byte — verified against `rtl8812a_recv.h`):
  ```diff
  - /* pattrib.tsfl=(byte)GET_RX_STATUS_DESC_TSFL_8812(pdesc); */
  + pattrib.tsfl = GET_RX_STATUS_DESC_TSFL_8812(pdesc);
  ```
- **`demo/main.cpp`** — extend the `<devourer-stream>` printf with
`seq=%u tsfl=%u`. Optional fields; PR #84's regex pattern in
`stream_rx.py` / `tun_p2p.py` / `corruption_analysis.py` already
tolerates them via the same pass-through approach used for rssi/evm/snr.

## What this enables (out of scope for this PR — just data surfacing)

- FEC RX side can dedup by chip-side seq before feeding the codec, so
air-level retransmissions stop double-counting at the codec.
- One-way latency measurement by diffing TSF against the host clock at
TX time — a building block for the F5 TX-RPT goodput numbers and any
adaptive `--fec-overhead` loop.

## Test plan

- [x] `cmake --build build -j` clean
- [x] `<devourer-stream>` lines on master now carry `seq` + `tsfl`
fields; existing Python consumers tolerate the additions via their
existing regex pass-through (no Python-side change required).
- [ ] Reviewer to run an existing tun_p2p bench and confirm the new
fields appear without disturbing throughput / loss numbers.

Second in the five-feature C++ series. Followed by:
- F3 — selectable stream-carrier rate/BW (uses F1's HT-MCS unlock + this
PR's seq/tsfl plumbing for dup detection)
- F5 — C2H TX-RPT parser + REG_FIFOPAGE_INFO queue-depth poll
- F2 — BB-dbgport per-subcarrier IQ spike (research)

Predecessor: F1 (#88).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant