Skip to content

[V4] Benchmark suite 1/N#105

Merged
ConorWilliams merged 35 commits into
modulesfrom
v4-bench-serial
May 11, 2026
Merged

[V4] Benchmark suite 1/N#105
ConorWilliams merged 35 commits into
modulesfrom
v4-bench-serial

Conversation

@ConorWilliams
Copy link
Copy Markdown
Owner

@ConorWilliams ConorWilliams commented May 2, 2026

New format for benchmarks + early serial sketches of benchmarks and very very rough docs

Summary by CodeRabbit

  • New Features

    • Added twelve new benchmark algorithms: heat diffusion, numerical integration, knapsack, Mandelbrot set, matrix multiplication, N-queens, prime counting, quicksort, prefix scan, Skynet, Strassen, and enhanced reference implementations.
  • Documentation

    • Added comprehensive benchmark documentation with individual pages for each benchmark, improved navigation, and detailed guidance on benchmark structure and scaling characteristics.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 085e8a0d-d0d3-4880-a17e-fa0f2ae07211

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v4-bench-serial

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ConorWilliams ConorWilliams changed the base branch from main to modules May 2, 2026 14:07
@ConorWilliams ConorWilliams changed the title [V4] Benchmarks (serial) [V4] Benchmarks suite May 11, 2026
@ConorWilliams ConorWilliams changed the title [V4] Benchmarks suite [V4] Benchmarks suite 1/N May 11, 2026
@ConorWilliams ConorWilliams changed the title [V4] Benchmarks suite 1/N [V4] Benchmark suite 1/N May 11, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
benchmark/lib/bench.hpp (1)

28-42: 💤 Low value

Consider documenting the formattable requirement.

The std::format call at line 36 requires both Expected and the result type to be formattable. While the current benchmark types (e.g., std::int64_t, custom result with formatter) satisfy this, future users may encounter cryptic compilation errors if they use non-formattable types.

📝 Suggested documentation

Add a comment above the function template:

+// Requires: Expected must be formattable with std::format and equality-comparable with result type
 template <typename Expected, typename Check, typename Fn>
 void bench(benchmark::State &state, std::int64_t threads, const Expected &expected, Check check, Fn fn) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/lib/bench.hpp` around lines 28 - 42, Add a short comment above the
bench template documenting that the code uses std::format on the expected and
the function result, so both the Expected template parameter and the Fn return
type (std::invoke_result_t<Fn>) must be formattable (i.e., have a std::formatter
specialization for char) to avoid cryptic compile errors; mention the specific
symbols involved (bench, Expected, Fn, and the std::format call that formats
result and expected) so future users know the requirement.
benchmark/src/serial/primes.cpp (1)

12-18: 💤 Low value

Consider starting the loop at 2 for efficiency.

The loop currently starts at i=1, but since is_prime(1) always returns false, this wastes one primality check per benchmark iteration. Starting at i=2 would be slightly more efficient without changing correctness.

⚡ Proposed optimization
 run_primes(state, [](std::int64_t lim) {
   std::int64_t count = 0;
-  for (std::int64_t i = 1; i < lim; ++i) {
+  for (std::int64_t i = 2; i < lim; ++i) {
     count += is_prime(i) ? 1 : 0;
   }
   return count;
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/src/serial/primes.cpp` around lines 12 - 18, The loop passed into
run_primes currently iterates i from 1 to lim and calls is_prime(1)
unnecessarily; update the lambda's loop to start at i = 2 (while keeping lim and
count semantics and returning count) so it skips the trivial non-prime 1 and
avoids the wasted primality check in the function passed to run_primes.
benchmark/lib/matmul.hpp (1)

67-69: 💤 Low value

Prefer std::abs or std::fabs for clarity.

The manual absolute value computation can be replaced with the standard library function.

Suggested simplification
-    float diff = (A[i] - B[i]) / A[i];
-    if (diff < 0) {
-      diff = -diff;
-    }
+    float diff = std::fabs((A[i] - B[i]) / A[i]);

Note: This suggestion can be combined with the previous comment's fix for a more robust solution.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/lib/matmul.hpp` around lines 67 - 69, Replace the manual sign-flip
on variable diff with the standard library absolute function: use std::abs (or
std::fabs for floating types) instead of if (diff < 0) diff = -diff; and ensure
the appropriate header (<cmath> or <cstdlib>) is included so the compiler
resolves std::abs/std::fabs for the type of diff.
docs/benchmarks/index.md (1)

37-46: ⚡ Quick win

Prefer relative links for in-repo paths.

These links point to github.com/.../tree/main/..., which can drift from the docs version/branch and breaks fork portability. Use repo-relative links so docs stay branch-correct and self-contained.

Suggested edit
-- [`benchmark/lib/`](https://github.com/conorwilliams/libfork/tree/main/benchmark/lib)
+- [`benchmark/lib/`](../../../benchmark/lib/)
   contains shared kernels, input sizes, and correctness helpers.
-- [`benchmark/src/libfork/`](https://github.com/conorwilliams/libfork/tree/main/benchmark/src/libfork)
+- [`benchmark/src/libfork/`](../../../benchmark/src/libfork/)
   contains libfork coroutine and scheduler benchmarks.
-- [`benchmark/src/serial/`](https://github.com/conorwilliams/libfork/tree/main/benchmark/src/serial)
+- [`benchmark/src/serial/`](../../../benchmark/src/serial/)
   contains single-threaded baselines.
-- [`benchmark/src/openmp/`](https://github.com/conorwilliams/libfork/tree/main/benchmark/src/openmp)
+- [`benchmark/src/openmp/`](../../../benchmark/src/openmp/)
   contains OpenMP tasking comparisons where present.
-- [`benchmark/src/baremetal/`](https://github.com/conorwilliams/libfork/tree/main/benchmark/src/baremetal)
+- [`benchmark/src/baremetal/`](../../../benchmark/src/baremetal/)
   contains low-level coroutine or data-structure baselines.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/benchmarks/index.md` around lines 37 - 46, The GitHub absolute links in
docs/benchmarks/index.md (the list items referencing [`benchmark/lib/`],
[`benchmark/src/libfork/`], [`benchmark/src/serial/`],
[`benchmark/src/openmp/`], and [`benchmark/src/baremetal/`]) should be replaced
with repo-relative links (e.g. ./benchmark/lib/ and ./benchmark/src/libfork/
etc.) so the docs remain branch-aware and portable; update each markdown link
target to the corresponding relative path while keeping the link text unchanged.
benchmark/src/serial/strassen.cpp (1)

42-49: 💤 Low value

Consider validating that n is even or a power of 2.

The recursive division at Line 49 (unsigned m = n / 2) assumes n is even. If n is odd, integer division will truncate and the sub-blocks won't cover the full matrix, leading to incorrect results.

If the benchmark framework guarantees power-of-2 sizes, document this assumption. Otherwise, add validation:

🛡️ Proposed validation
 void strassen(float const *A, unsigned sa, float const *B, unsigned sb, float *C, unsigned sc, unsigned n) {
 
+  // Strassen requires power-of-2 dimensions or even n at minimum
+  if (n % 2 != 0 && n > strassen_cutoff) {
+    // Fall back to naive for odd sizes
+    naive_multiply(A, sa, B, sb, C, sc, n);
+    return;
+  }
+
   if (n <= strassen_cutoff) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/src/serial/strassen.cpp` around lines 42 - 49, The strassen
function assumes n is even when computing m = n / 2; add validation at the start
of strassen (before the strassen_cutoff check or immediately after) to ensure n
is valid (either assert/document that n is a power-of-two or explicitly check n
% 2 == 0), and if odd either pad inputs to the next even/power-of-two size or
return an error/convert to naive_multiply; reference the strassen function, the
n parameter, strassen_cutoff, and naive_multiply when implementing the chosen
validation/handling approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/lib/matmul.hpp`:
- Around line 63-75: The function matmul_max_relative_error divides by A[i]
without guarding against zero/near-zero; change the computation to use a safe
denominator by comparing std::abs(A[i]) with a small epsilon (e.g., 1e-8f) and
divide by std::max(std::abs(A[i]), epsilon), or when |A[i]| < epsilon use
absolute error (std::abs(A[i]-B[i])) instead; update the diff calculation and
comparisons in matmul_max_relative_error to compute diff = std::abs(A[i]-B[i]) /
std::max(std::abs(A[i]), epsilon) (or the absolute fallback) so you avoid
huge/undefined relative errors for tiny reference values.

In `@benchmark/lib/nqueens.hpp`:
- Around line 17-21: The lookup table nqueens_answers is declared with 28
entries but only 21 values are provided, causing indices 21–27 to be
zero-initialized and yield incorrect results when accessed via
.at(state.range(0)). Fix by making the table length match the provided data
(change the constexpr std::array size to 21) or by populating the remaining
seven entries with the correct N-queens counts; alternatively add runtime
validation around the .at(state.range(0)) call to ensure state.range(0) is
within the supported index range before accessing nqueens_answers.

In `@benchmark/src/serial/matmul.cpp`:
- Around line 12-17: The recursive split truncates odd n via m = n / 2 causing
incorrect results; before splitting in the recursive matmul code, guard
odd-sized matrices by either (a) falling back to the scalar/basecase
routine—call matmul_basecase_multiply<Add>(A, B, R, n, s) when (n % 2 != 0) or
(b) validate/enforce power-of-two inputs with an explicit check/assert and
error, instead of proceeding with m = n / 2; update the block that currently
computes unsigned m = n / 2 to perform this guard using the existing
matmul_basecase and matmul_basecase_multiply symbols and the A, B, R, n, s
parameters.

In `@benchmark/src/serial/quicksort.cpp`:
- Around line 38-43: The quicksort implementation always recurses on the right
partition (quicksort(pivot + 1, last)) which can cause O(n) stack depth for
degenerate pivots; change the control flow to always recurse on the smaller
partition and loop on the larger one: after partition(first,last) returns pivot,
compare sizes of left (pivot - first) and right (last - (pivot + 1)) and call
quicksort on the smaller side, then update first/last to the bounds of the
larger side and continue the while loop (use quicksort_basecase as the base case
threshold). Ensure you still call partition(first,last) each iteration and only
recurse once per loop.

---

Nitpick comments:
In `@benchmark/lib/bench.hpp`:
- Around line 28-42: Add a short comment above the bench template documenting
that the code uses std::format on the expected and the function result, so both
the Expected template parameter and the Fn return type
(std::invoke_result_t<Fn>) must be formattable (i.e., have a std::formatter
specialization for char) to avoid cryptic compile errors; mention the specific
symbols involved (bench, Expected, Fn, and the std::format call that formats
result and expected) so future users know the requirement.

In `@benchmark/lib/matmul.hpp`:
- Around line 67-69: Replace the manual sign-flip on variable diff with the
standard library absolute function: use std::abs (or std::fabs for floating
types) instead of if (diff < 0) diff = -diff; and ensure the appropriate header
(<cmath> or <cstdlib>) is included so the compiler resolves std::abs/std::fabs
for the type of diff.

In `@benchmark/src/serial/primes.cpp`:
- Around line 12-18: The loop passed into run_primes currently iterates i from 1
to lim and calls is_prime(1) unnecessarily; update the lambda's loop to start at
i = 2 (while keeping lim and count semantics and returning count) so it skips
the trivial non-prime 1 and avoids the wasted primality check in the function
passed to run_primes.

In `@benchmark/src/serial/strassen.cpp`:
- Around line 42-49: The strassen function assumes n is even when computing m =
n / 2; add validation at the start of strassen (before the strassen_cutoff check
or immediately after) to ensure n is valid (either assert/document that n is a
power-of-two or explicitly check n % 2 == 0), and if odd either pad inputs to
the next even/power-of-two size or return an error/convert to naive_multiply;
reference the strassen function, the n parameter, strassen_cutoff, and
naive_multiply when implementing the chosen validation/handling approach.

In `@docs/benchmarks/index.md`:
- Around line 37-46: The GitHub absolute links in docs/benchmarks/index.md (the
list items referencing [`benchmark/lib/`], [`benchmark/src/libfork/`],
[`benchmark/src/serial/`], [`benchmark/src/openmp/`], and
[`benchmark/src/baremetal/`]) should be replaced with repo-relative links (e.g.
./benchmark/lib/ and ./benchmark/src/libfork/ etc.) so the docs remain
branch-aware and portable; update each markdown link target to the corresponding
relative path while keeping the link text unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a982437-d99b-4984-8243-f36c29870e86

📥 Commits

Reviewing files that changed from the base of the PR and between 3d896e9 and f3d8367.

📒 Files selected for processing (63)
  • benchmark/lib/CMakeLists.txt
  • benchmark/lib/bench.hpp
  • benchmark/lib/fib.hpp
  • benchmark/lib/fold.hpp
  • benchmark/lib/heat.hpp
  • benchmark/lib/integrate.hpp
  • benchmark/lib/knapsack.hpp
  • benchmark/lib/macros.hpp
  • benchmark/lib/mandelbrot.hpp
  • benchmark/lib/matmul.hpp
  • benchmark/lib/nqueens.hpp
  • benchmark/lib/primes.hpp
  • benchmark/lib/quicksort.hpp
  • benchmark/lib/scan.hpp
  • benchmark/lib/skynet.hpp
  • benchmark/lib/uts.hpp
  • benchmark/src/baremetal/fib.cpp
  • benchmark/src/libfork/fib.cpp
  • benchmark/src/libfork/fold.cpp
  • benchmark/src/libfork/switch_io_pool.cpp
  • benchmark/src/libfork/switch_random.cpp
  • benchmark/src/libfork/uts.cpp
  • benchmark/src/openmp/fib.cpp
  • benchmark/src/openmp/uts.cpp
  • benchmark/src/serial/CMakeLists.txt
  • benchmark/src/serial/fib.cpp
  • benchmark/src/serial/heat.cpp
  • benchmark/src/serial/integrate.cpp
  • benchmark/src/serial/knapsack.cpp
  • benchmark/src/serial/mandelbrot.cpp
  • benchmark/src/serial/matmul.cpp
  • benchmark/src/serial/nqueens.cpp
  • benchmark/src/serial/primes.cpp
  • benchmark/src/serial/quicksort.cpp
  • benchmark/src/serial/scan.cpp
  • benchmark/src/serial/skynet.cpp
  • benchmark/src/serial/strassen.cpp
  • benchmark/src/serial/uts.cpp
  • docs/ChangeLog.md
  • docs/benchmarks.md
  • docs/benchmarks/benchmarks/fib.md
  • docs/benchmarks/benchmarks/fold.md
  • docs/benchmarks/benchmarks/heat.md
  • docs/benchmarks/benchmarks/index.md
  • docs/benchmarks/benchmarks/integrate.md
  • docs/benchmarks/benchmarks/knapsack.md
  • docs/benchmarks/benchmarks/mandelbrot.md
  • docs/benchmarks/benchmarks/matmul.md
  • docs/benchmarks/benchmarks/mergesort.md
  • docs/benchmarks/benchmarks/nqueens.md
  • docs/benchmarks/benchmarks/primes.md
  • docs/benchmarks/benchmarks/quicksort.md
  • docs/benchmarks/benchmarks/scan.md
  • docs/benchmarks/benchmarks/skynet.md
  • docs/benchmarks/benchmarks/strassen.md
  • docs/benchmarks/benchmarks/switch-io-pool.md
  • docs/benchmarks/benchmarks/switch-random.md
  • docs/benchmarks/benchmarks/uts.md
  • docs/benchmarks/index.md
  • docs/index.md
  • docs/installation.md
  • docs/quickstart.md
  • zensical.toml
💤 Files with no reviewable changes (1)
  • docs/benchmarks.md

Comment thread benchmark/lib/matmul.hpp
Comment thread benchmark/lib/nqueens.hpp Outdated
Comment thread benchmark/src/serial/matmul.cpp
Comment thread benchmark/src/serial/quicksort.cpp
@ConorWilliams ConorWilliams merged commit b2a4543 into modules May 11, 2026
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant