Skip to content

[V4] Small object optimizations#102

Merged
ConorWilliams merged 7 commits intomodulesfrom
v4-optimizations
Apr 25, 2026
Merged

[V4] Small object optimizations#102
ConorWilliams merged 7 commits intomodulesfrom
v4-optimizations

Conversation

@ConorWilliams
Copy link
Copy Markdown
Owner

@ConorWilliams ConorWilliams commented Apr 25, 2026

Summary by CodeRabbit

  • Refactor

    • Consolidated benchmark utility components for improved code organization and maintainability.
    • Optimized parameter storage for async operations to reduce memory overhead for small, copyable data types.
  • Documentation

    • Added documentation for thread-local context management in worker threads.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9120171-f211-49b0-b26f-a8d349abc862

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v4-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ConorWilliams ConorWilliams changed the base branch from main to modules April 25, 2026 12:53
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
benchmark/lib/macros.hpp (1)

24-32: Optional: take the callable by forwarding reference.

auto make_args copies the callable on each call. It's harmless here (registration-time, called once per benchmark), but auto &&make_args + std::forward is the more idiomatic shape for a generic callback helper and avoids surprising copies if someone later passes a stateful functor.

♻️ Proposed refactor
-inline void bench_thread_args(benchmark::Benchmark *bench, auto make_args) {
-  unsigned hw = std::max(1U, std::thread::hardware_concurrency());
-  for (unsigned t : {1U, 2U, 4U, 6U, 8U, 12U, 16U, 24U, 32U, 48U, 64U, 96U}) {
-    if (t > hw) {
-      return;
-    }
-    make_args(bench, t);
-  }
-}
+inline void bench_thread_args(benchmark::Benchmark *bench, auto &&make_args) {
+  unsigned hw = std::max(1U, std::thread::hardware_concurrency());
+  for (unsigned t : {1U, 2U, 4U, 6U, 8U, 12U, 16U, 24U, 32U, 48U, 64U, 96U}) {
+    if (t > hw) {
+      return;
+    }
+    std::forward<decltype(make_args)>(make_args)(bench, t);
+  }
+}

Side note (pre-existing, carried over from common.hpp): on machines whose hardware_concurrency() falls between candidates (e.g. 10, 20, 128 cores), the benchmark never runs at the actual hardware maximum. Probably worth a follow-up to clamp/append hw to the schedule, but out of scope for this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/lib/macros.hpp` around lines 24 - 32, The bench_thread_args helper
should accept the callable as a forwarding reference to avoid accidental copies:
change the parameter from auto make_args to a templated forwarding reference
(e.g., MakeArgs&& make_args) and invoke it with
std::forward<MakeArgs>(make_args)(bench, t) inside the loop; update the function
template signature to template<class MakeArgs> inline void
bench_thread_args(benchmark::Benchmark *bench, MakeArgs&& make_args) so stateful
functors are preserved and not copied.
src/core/promise.cxx (1)

507-512: Reuse the small_trivially_copyable concept from :ops to avoid drift.

This static_assert open-codes exactly the predicate already defined as small_trivially_copyable in src/core/ops.cxx (lines 25-27). If the storage policy ever changes (e.g., the 2 * sizeof(void*) threshold or an added alignment check), this site will silently disagree with the actual storage rules in store_as_t/fwd_fn, breaking the noexcept invariant this assert is meant to guard.

Consider exposing the concept from the :ops partition and using it directly:

♻️ Proposed refactor
-    // Required for noexcept specifier to be correct: each stored type must be either a
-    // reference (original behavior) or a small trivially-copyable value (value-storage opt).
-    static_assert(
-        (std::is_reference_v<Fn> || (std::is_trivially_copyable_v<Fn> && sizeof(Fn) <= 2 * sizeof(void *))) &&
-        (... && (std::is_reference_v<Args> ||
-                 (std::is_trivially_copyable_v<Args> && sizeof(Args) <= 2 * sizeof(void *)))));
+    // Required for noexcept specifier to be correct: each stored type must be either a
+    // reference (original behavior) or a small trivially-copyable value (value-storage opt).
+    static_assert((std::is_reference_v<Fn> || small_trivially_copyable<Fn>) &&
+                  (... && (std::is_reference_v<Args> || small_trivially_copyable<Args>)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/promise.cxx` around lines 507 - 512, The static_assert in
promise.cxx currently duplicates the predicate used for storage sizing; replace
the open-coded predicate with the shared concept small_trivially_copyable (as
defined in ops) so the noexcept check stays in sync with the actual storage
policy. Update the static_assert to use small_trivially_copyable<Fn> && (... &&
small_trivially_copyable<Args>) (or the appropriate template syntax for the
concept) and ensure the header/partition that defines small_trivially_copyable
is included so this site and the implementations used by store_as_t and fwd_fn
share the same definition.
src/core/ops.cxx (1)

25-34: Consider also constraining alignment.

small_trivially_copyable only checks size. A type like struct alignas(64) tag{}; is trivially copyable and sizeof == 64 (so just barely passes on a 32-byte threshold isn't relevant, but) — more importantly, on 32-bit targets 2 * sizeof(void*) == 8, which would still admit types with alignof > 8 and force pkg's overall alignment up. Adding alignof(T) <= alignof(std::max_align_t) (or alignof(void*)) would keep pkg's layout tight. Optional — may not matter for the actual call sites in this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/ops.cxx` around lines 25 - 34, small_trivially_copyable currently
only checks size and trivial copyability, which can allow types with large
alignment (e.g., alignas(64)) and inadvertently increase enclosing object
alignment; update the concept used by store_as_t (small_trivially_copyable and
thus store_as_t) to also require alignof(std::remove_cvref_t<T>) <=
alignof(std::max_align_t) (or <= alignof(void*)) so only small, low-alignment
trivially copyable types are treated as stored-by-value; adjust the concept
definition to apply the alignment check to std::remove_cvref_t<T> so store_as_t
behavior is correct for reference and cv-qualified types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@benchmark/lib/macros.hpp`:
- Around line 24-32: The bench_thread_args helper should accept the callable as
a forwarding reference to avoid accidental copies: change the parameter from
auto make_args to a templated forwarding reference (e.g., MakeArgs&& make_args)
and invoke it with std::forward<MakeArgs>(make_args)(bench, t) inside the loop;
update the function template signature to template<class MakeArgs> inline void
bench_thread_args(benchmark::Benchmark *bench, MakeArgs&& make_args) so stateful
functors are preserved and not copied.

In `@src/core/ops.cxx`:
- Around line 25-34: small_trivially_copyable currently only checks size and
trivial copyability, which can allow types with large alignment (e.g.,
alignas(64)) and inadvertently increase enclosing object alignment; update the
concept used by store_as_t (small_trivially_copyable and thus store_as_t) to
also require alignof(std::remove_cvref_t<T>) <= alignof(std::max_align_t) (or <=
alignof(void*)) so only small, low-alignment trivially copyable types are
treated as stored-by-value; adjust the concept definition to apply the alignment
check to std::remove_cvref_t<T> so store_as_t behavior is correct for reference
and cv-qualified types.

In `@src/core/promise.cxx`:
- Around line 507-512: The static_assert in promise.cxx currently duplicates the
predicate used for storage sizing; replace the open-coded predicate with the
shared concept small_trivially_copyable (as defined in ops) so the noexcept
check stays in sync with the actual storage policy. Update the static_assert to
use small_trivially_copyable<Fn> && (... && small_trivially_copyable<Args>) (or
the appropriate template syntax for the concept) and ensure the header/partition
that defines small_trivially_copyable is included so this site and the
implementations used by store_as_t and fwd_fn share the same definition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b116f4b-f2c6-440f-84f7-2147533d0da9

📥 Commits

Reviewing files that changed from the base of the PR and between de98dc4 and ba4412d.

📒 Files selected for processing (12)
  • benchmark/lib/CMakeLists.txt
  • benchmark/lib/common.hpp
  • benchmark/lib/macros.hpp
  • benchmark/src/baremetal/fib.cpp
  • benchmark/src/libfork/fib.cpp
  • benchmark/src/libfork/uts.cpp
  • benchmark/src/serial/fib.cpp
  • benchmark/src/serial/uts.cpp
  • src/core/execute.cxx
  • src/core/ops.cxx
  • src/core/promise.cxx
  • src/core/thread_locals.cxx
💤 Files with no reviewable changes (7)
  • benchmark/src/libfork/uts.cpp
  • benchmark/src/libfork/fib.cpp
  • benchmark/src/baremetal/fib.cpp
  • benchmark/src/serial/fib.cpp
  • benchmark/lib/CMakeLists.txt
  • benchmark/src/serial/uts.cpp
  • benchmark/lib/common.hpp

@ConorWilliams ConorWilliams merged commit 34c14b5 into modules Apr 25, 2026
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant