ferrflow_parallel (the "all-cores" stat shown next to the single-threaded numbers) reports a near-constant ~165–280 ms floor regardless of fixture size, making ferrflow's default (multi-job) mode look dramatically slower than --jobs 1. It isn't: a direct ferrflow check --timing on ubuntu-latest (nproc=4) is 4–6 ms in both --jobs 1 and default mode — the parallel path has no such overhead.
Root cause is in this harness, not ferrflow. In scripts/run.sh the parallel measurement is written to results/raw/parallel/${fixture}-${cmd}.json — not scoped by install method. ferrflow runs three methods in order binary → npm → docker, and each sets par_cmd and re-runs the parallel hyperfine into that same file. So the docker method (last) overwrites the binary measurement, and ferrflow_parallel ends up measuring docker run --rm … ferrflow <cmd> — i.e. container startup overhead (~constant), not all-cores latency. (npx would do the same via node startup.)
Fix: only produce the parallel stat for the binary method, so it reflects the real binary all-cores time vs the binary --jobs 1 single time.
ferrflow_parallel(the "all-cores" stat shown next to the single-threaded numbers) reports a near-constant ~165–280 ms floor regardless of fixture size, making ferrflow's default (multi-job) mode look dramatically slower than--jobs 1. It isn't: a directferrflow check --timingonubuntu-latest(nproc=4) is 4–6 ms in both--jobs 1and default mode — the parallel path has no such overhead.Root cause is in this harness, not ferrflow. In
scripts/run.shthe parallel measurement is written toresults/raw/parallel/${fixture}-${cmd}.json— not scoped by install method. ferrflow runs three methods in order binary → npm → docker, and each setspar_cmdand re-runs the parallel hyperfine into that same file. So the docker method (last) overwrites the binary measurement, andferrflow_parallelends up measuringdocker run --rm … ferrflow <cmd>— i.e. container startup overhead (~constant), not all-cores latency. (npx would do the same via node startup.)Fix: only produce the parallel stat for the
binarymethod, so it reflects the real binary all-cores time vs the binary--jobs 1single time.