[BUG] Failed init leaves hugepage files behind in /dev/hugepages, blocking subsequent runs

DPDK init failures (e.g. memory regions exceed `HugePages_Free`) leave `<file-prefix>map_*` files in `/dev/hugepages/` that pin every page. `HugePages_Free` stays at 0 and every subsequent run fails the same way until the files are removed by hand. Even clean exit-0 runs leak a few pages.

Hit on `examples/daqiri_bench_raw_tx_rx_spark.yaml`: `num_bufs: 51200` × `buf_size: 8064` overruns the kernel-default 1024 × 2 MiB hugepages. 2048 × 2 MiB works.

**Steps/Code to reproduce bug**

1. Boot with kernel-default `nr_hugepages=1024`.
2. `daqiri_bench_raw_gpudirect examples/daqiri_bench_raw_tx_rx_spark.yaml --seconds 5` → fails with `Failed to allocate TX message pool!`.
3. `ls /dev/hugepages/` shows leftover `*map_*` files.
4. Re-run fails identically until `sudo rm -f /dev/hugepages/*map_*`.

**Expected behavior**

- Preflight check on memory-region footprint vs `HugePages_Free` with an actionable error before EAL allocates.
- `rte_eal_cleanup()` on every error path out of `daqiri_init`.
- Surface the manual cleanup recipe from the run instructions (today it's only in a collapsed FAQ entry).

**Environment overview**

- Bare-metal host, bench inside Docker.
- Source build via `scripts/build-container.sh` (`BASE_TARGET=dpdk DAQIRI_MGR="dpdk socket rdma"`).

**Environment details**

- DAQIRI `main` @ 2532bb6 (post PR #45).
- Reproduces with any `kind:` (`huge`, `host_pinned`, `device`) — DPDK uses hugepages for its own mempools/rings regardless.

**Proposed fix (docs-only follow-up to PR #49; in-code work tracked separately)**

- `docs/tutorials/benchmarking_examples.md`, before the first `docker run`: "Configure hugepages first" callout — `grep Huge /proc/meminfo`, size against the YAML's `num_bufs` × `buf_size`, runtime knobs for both 2 MiB and 1 GiB pools, and a link to the persistent grub recipe in `system_configuration.md`.

    ```bash
    echo 2048 | sudo tee /proc/sys/vm/nr_hugepages                                  # 2 MiB pool, 4 GiB
    echo 4    | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages  # 1 GiB pool, 4 GiB
    ```

- `docs/tutorials/system_configuration.md`, Spark tab: state the shipped Spark YAML needs ~2048 × 2 MiB; default 1024 OOMs at `Failed to allocate TX message pool!`.
- Same file, new "Troubleshooting" sub-section: orphan-hugepage symptom, `<file-prefix>map_*` root cause, cleanup in `sudo` + Docker forms, `pgrep -af daqiri_bench` safety check, leak-on-clean-exit caveat.

    ```bash
    sudo rm -f /dev/hugepages/*map_*
    ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Failed init leaves hugepage files behind in /dev/hugepages, blocking subsequent runs #56

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Failed init leaves hugepage files behind in /dev/hugepages, blocking subsequent runs #56

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions