feat(cpu): implement RadixAttnSwaSink with sliding window attention support#516
feat(cpu): implement RadixAttnSwaSink with sliding window attention support#516chenghuaWang merged 2 commits intoUbiquitousLearning:v2from
Conversation
…upport - Add new CPU operator RadixAttnSwaSink for radix attention with sink tokens - Implement forward kernel `fwd_bshd` for BSHD tensor layout with architecture-specific optimizations - Support both ARM64 and x86 architectures with SIMD instructions - Integrate with prefix cache system for efficient KV cache management - Add comprehensive test cases covering prefill, decode and append modes - Fix tensor rank validation and output shape calculation - Update library loading logic to support Linux platform correctly
WalkthroughIntroduces a sliding-window radix attention implementation (RadixAttnSwaSink) with architecture-specific forward kernels (fwd_bshd), a CPU-based prefix cache system for KV management, test infrastructure, and updates kernel function naming conventions and platform-specific extension loading for Linux. Changes
Sequence DiagramsequenceDiagram
participant Test as Test Suite
participant RadixOp as RadixAttnSwaSink Op
participant Kernel as fwd_bshd Kernel
participant Cache as CPU Prefix Cache
Test->>Cache: Initialize (full-attention or sliding-window mode)
activate Cache
Cache->>Cache: Setup per-layer allocators<br/>(HiCPUAllocator)
deactivate Cache
Test->>Test: Generate Q, K_idx, V_idx, s_aux tensors
Test->>Cache: Allocate K/V storage per layer
Test->>Cache: Store K/V via physical addresses
Test->>RadixOp: Submit RadixAttnSwaSink operation<br/>(Q, K_idx, V_idx, s_aux, options)
activate RadixOp
RadixOp->>RadixOp: Validate input tensor ranks<br/>(Q: rank-4, K/V: rank-1)
RadixOp->>Kernel: fwd_bshd(B, H_Q, H_KV, S_Q, S_KV,<br/>D_QK, D_V, window, seq_len,<br/>Q, K_ptrs, V_ptrs, s_aux, output)
activate Kernel
Kernel->>Kernel: Per Q-head parallelization
Kernel->>Kernel: Map KV/V tokens via indices
Kernel->>Kernel: Compute Q @ K with sliding window
Kernel->>Kernel: Apply per-head softmax scaling
Kernel->>Kernel: Integrate sink token from s_aux
Kernel->>Kernel: Accumulate to output
deactivate Kernel
RadixOp->>RadixOp: Allocate output with<br/>shape [B, H_Q, D_V]
deactivate RadixOp
Test->>Test: Compare output vs. reference<br/>(allClose validation)
alt Match
Test->>Test: Pass
else Mismatch
Test->>Test: Print diagnostics
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
mllm-ext-opset/cpu/radix_swa_sink/RadixAttnSwaSink.cpp (2)
23-60: Guard optional S_AUX before indexing inputs[3].
forwardunconditionally readsinputs[3], butreshapeexplicitly allows configurations with only three tensors wheneveroptions_.s_aux_enableis false. In that scenario we dereference past the end of the inputs vector, leading to UB/crash before the kernel launches. Please either require the op to run withs_aux_enable == true(and assert that a fourth tensor is present) or handle the disabled case explicitly before dereferencing. Example fix to enforce the invariant:+ MLLM_RT_ASSERT(options_.s_aux_enable); + MLLM_RT_ASSERT_GE(inputs.size(), 4); auto& S_AUX = inputs[3];
39-87: Fix K/V shape usage after switching them to rank-1.
reshapenow enforcesk.rank() == 1andv.rank() == 1, yetforwardstill indexesK.shape()[1]/V.shape()[1]and even readsV.shape()[3]. That pulls garbage (or faults) for rank-1 tensors, breaking the sliding-window kernel immediately. Use the 0th dimension for the sequence length and drop the nonexistent head/dimension lookups—options_already carriesD_QK/D_V. For instance:- auto D_V = V.shape()[3]; - MLLM_RT_ASSERT_EQ(H_Q, options_.q_head); - auto S_KV = K.shape()[1]; - MLLM_RT_ASSERT_EQ(S_KV, V.shape()[1]); + MLLM_RT_ASSERT_EQ(H_Q, options_.q_head); + MLLM_RT_ASSERT_EQ(K.rank(), 1); + MLLM_RT_ASSERT_EQ(V.rank(), 1); + auto S_KV = K.shape()[0]; + MLLM_RT_ASSERT_EQ(S_KV, V.shape()[0]);Without this fix the op miscomputes
S_KV(often reading random memory) and the kernel walks the wrong number of cache slots.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
mllm-ext-opset/cpu/radix_swa_sink/RadixAttnSwaSink.cpp(3 hunks)mllm-ext-opset/cpu/radix_swa_sink/radix_swa_sink_fwd_bshd.hpp(1 hunks)mllm-ext-opset/cpu/radix_swa_sink/tests/main.cpp(1 hunks)mllm/backends/cpu/kernels/common/radix_attn/fwd_bshd.hpp(1 hunks)mllm/backends/cpu/ops/RadixAttnOp.cpp(1 hunks)mllm/mllm.cpp(1 hunks)mllm/nn/lmcache/PrefixCache.cpp(1 hunks)mllm/nn/lmcache/PrefixCache.hpp(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
mllm-ext-opset/cpu/radix_swa_sink/radix_swa_sink_fwd_bshd.hpp
[error] 6-6: 'cmath' file not found
(clang-diagnostic-error)
[error] 29-29: declaration uses identifier '__ArchTag', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 29-29: declaration uses identifier '__QDType', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 29-29: declaration uses identifier '__KDType', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 29-29: declaration uses identifier '__VDType', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 29-29: declaration uses identifier '__ODType', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 29-29: declaration uses identifier '__AccDType', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 31-31: 9 adjacent parameters of 'fwd_bshd' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 31-31: parameter name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 32-32: declaration uses identifier '__q', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 32-32: declaration uses identifier '__k', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 32-32: declaration uses identifier '__v', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 33-33: declaration uses identifier '__s_aux', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 33-33: declaration uses identifier '__out', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
[error] 34-34: variable 'head_repeat_times' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/backends/cpu/kernels/common/radix_attn/fwd_bshd.hpp
[error] 33-33: 6 adjacent parameters of 'fwd_bshd' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 33-33: parameter name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 33-33: parameter name 'D' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 33-33: declaration uses identifier '__q', which is a reserved identifier
(bugprone-reserved-identifier,-warnings-as-errors)
mllm-ext-opset/cpu/radix_swa_sink/tests/main.cpp
[error] 1-1: 'cmath' file not found
(clang-diagnostic-error)
[error] 8-8: 2 adjacent parameters of 'radixAttnSWAwSink' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 8-8: 9 adjacent parameters of 'radixAttnSWAwSink' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 12-12: variable name 'id' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 27-27: 5 adjacent parameters of 'eagerSWAwSink' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 30-30: variable name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 30-30: invalid case style for variable 'B'
(readability-identifier-naming,-warnings-as-errors)
[error] 31-31: invalid case style for variable 'S_Q'
(readability-identifier-naming,-warnings-as-errors)
[error] 32-32: invalid case style for variable 'H_Q'
(readability-identifier-naming,-warnings-as-errors)
[error] 33-33: invalid case style for variable 'S_KV'
(readability-identifier-naming,-warnings-as-errors)
[error] 34-34: invalid case style for variable 'H_KV'
(readability-identifier-naming,-warnings-as-errors)
[error] 35-35: invalid case style for variable 'D_QK'
(readability-identifier-naming,-warnings-as-errors)
[error] 47-47: variable 'scale' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 66-66: 7 adjacent parameters of 'testCase' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 66-66: 2 adjacent parameters of 'testCase' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 73-73: variable name 'Q' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 73-73: invalid case style for variable 'Q'
(readability-identifier-naming,-warnings-as-errors)
[error] 74-74: variable name 'K' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 74-74: invalid case style for variable 'K'
(readability-identifier-naming,-warnings-as-errors)
[error] 75-75: variable name 'V' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 75-75: invalid case style for variable 'V'
(readability-identifier-naming,-warnings-as-errors)
[error] 104-104: invalid case style for variable 'O_ref'
(readability-identifier-naming,-warnings-as-errors)
[error] 134-134: variable name 'O' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 134-134: invalid case style for variable 'O'
(readability-identifier-naming,-warnings-as-errors)
[error] 140-140: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 141-141: variable 'O_ref' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 141-141: invalid case style for variable 'O_ref'
(readability-identifier-naming,-warnings-as-errors)
[error] 142-142: variable 'O' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 142-142: variable name 'O' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 142-142: invalid case style for variable 'O'
(readability-identifier-naming,-warnings-as-errors)
[error] 148-148: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 148-148: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
mllm/nn/lmcache/PrefixCache.hpp
[error] 21-21: 'memory' file not found
(clang-diagnostic-error)
[error] 59-59: class 'PrefixCache' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 68-68: parameter 2 is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions
(readability-avoid-const-params-in-decls,-warnings-as-errors)
[error] 99-99: parameter name 'fp' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 102-102: member variable 'options_' has protected visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 103-103: member variable 'tree_' has protected visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 111-111: parameter 2 is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions
(readability-avoid-const-params-in-decls,-warnings-as-errors)
mllm/nn/lmcache/PrefixCache.cpp
[error] 4-4: 'cmath' file not found
(clang-diagnostic-error)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-macos
- GitHub Check: build-android
🔇 Additional comments (1)
mllm/mllm.cpp (1)
109-109: LGTM! Correctly extends.soextension support to Linux.The change properly adds Linux and UnknownPlatform to the
.soextension path, which is correct since Linux uses.sofor shared libraries. The UnknownPlatform fallback is also reasonable.
| // Compute should be done in [full_attention_start, full_attention_end] | ||
| auto full_attention_start = std::max(0, s_q_idx + cur_kv_seq_len - S_Q - left_sliding_window); | ||
| auto full_attention_end = std::min(cur_kv_seq_len, s_q_idx + cur_kv_seq_len - S_Q); | ||
|
|
||
| // We then map full attention to local attention position; | ||
| auto local_attention_start = 0; | ||
|
|
||
| auto local_attention_end = full_attention_end - full_attention_start; | ||
| if (local_attention_end < left_sliding_window) local_attention_end++; | ||
|
|
||
| for (int s_kv_idx = local_attention_start; s_kv_idx < local_attention_end; ++s_kv_idx) { | ||
| // k_token and v_token shape is [B, 1, H, D] | ||
| __KDType* k_token = __k[s_kv_idx]; | ||
| __VDType* v_token = __v[s_kv_idx]; | ||
|
|
||
| // Offset to one head. | ||
| // k_token and v_token shape is [D] | ||
| k_token = k_token + b_idx * H_KV * D_QK + h_kv_id * D_QK; | ||
| v_token = v_token + b_idx * H_KV * D_V + h_kv_id * D_V; | ||
|
|
||
| // 1. MMA0. Q @ K -> A_i | ||
| __AccDType acc_s; | ||
| mllm::cpu::radix_attn::details::VectorDotProduct<__ArchTag, __QDType, __KDType, __AccDType>::run(q_token, k_token, | ||
| &acc_s, D_QK); | ||
|
|
||
| // 2. Do softmax stuff. | ||
| scores_max_prev = scores_max; | ||
| scores_max = std::max(scores_max_prev, acc_s); | ||
| scores_scale = std::exp2(scores_max_prev * scale - scores_max * scale); | ||
| acc_s = std::exp2(acc_s * scale - scores_max * scale); | ||
| scores_sum = acc_s; | ||
| logsum = logsum * scores_scale + scores_sum; | ||
|
|
||
| // 3. Scale | ||
| mllm::cpu::radix_attn::details::MulFromConst<__ArchTag, __AccDType, __AccDType>::run(acc_o, scores_scale, D_V); | ||
|
|
||
| // 4. MMA1. | ||
| mllm::cpu::radix_attn::details::FMAConstArray<__ArchTag, __AccDType, __AccDType, __AccDType>::run(acc_o, acc_s, | ||
| v_token, D_V); | ||
| } |
There was a problem hiding this comment.
Apply the computed sliding-window offset when reading KV tokens.
full_attention_start is calculated for each query, but the inner loop still uses __k[s_kv_idx] / __v[s_kv_idx]. As soon as the window slides (i.e., full_attention_start > 0), we start from slot 0 instead of the correct global index, so the kernel attends to stale tokens. Offset the lookup by full_attention_start (and reuse cur_kv_seq_len bounds) before the dot-product:
- for (int s_kv_idx = local_attention_start; s_kv_idx < local_attention_end; ++s_kv_idx) {
- __KDType* k_token = __k[s_kv_idx];
- __VDType* v_token = __v[s_kv_idx];
+ for (int s_kv_idx = local_attention_start; s_kv_idx < local_attention_end; ++s_kv_idx) {
+ const int kv_global_idx = full_attention_start + s_kv_idx;
+ __KDType* k_token = __k[kv_global_idx];
+ __VDType* v_token = __v[kv_global_idx];Without this adjustment the sink operator returns incorrect outputs for any nontrivial sliding-window step.
| auto O = | ||
| radixAttnSWAwSink(Q, k_cache_indices, v_cache_indices, s_aux, 1, H_Q, H_KV, D_QK, D_V, left_sliding_window, cur_seq_len); | ||
|
|
There was a problem hiding this comment.
Use the requested batch size when invoking the op.
testCase accepts B, but the radixAttnSWAwSink call always hardcodes batch = 1. Any scenario that exercises B > 1 will configure the kernel incorrectly and drop work for the extra batch entries. Please pass B through to the options.
- auto O =
- radixAttnSWAwSink(Q, k_cache_indices, v_cache_indices, s_aux, 1, H_Q, H_KV, D_QK, D_V, left_sliding_window, cur_seq_len);
+ auto O = radixAttnSWAwSink(Q, k_cache_indices, v_cache_indices, s_aux, B, H_Q, H_KV, D_QK, D_V, left_sliding_window,
+ cur_seq_len);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto O = | |
| radixAttnSWAwSink(Q, k_cache_indices, v_cache_indices, s_aux, 1, H_Q, H_KV, D_QK, D_V, left_sliding_window, cur_seq_len); | |
| auto O = radixAttnSWAwSink(Q, k_cache_indices, v_cache_indices, s_aux, B, H_Q, H_KV, D_QK, D_V, left_sliding_window, | |
| cur_seq_len); |
🧰 Tools
🪛 Clang (14.0.6)
[error] 134-134: variable name 'O' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 134-134: invalid case style for variable 'O'
(readability-identifier-naming,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm-ext-opset/cpu/radix_swa_sink/tests/main.cpp around lines 134 to 136, the
call to radixAttnSWAwSink hardcodes batch=1 which ignores the testCase parameter
B; update the call to pass the actual batch size B (or variable representing B)
into the radixAttnSWAwSink options so the kernel is configured with the
requested batch size and will process all batch entries correctly.
| #include <memory> | ||
| #include <vector> | ||
| #include <optional> | ||
| #include "mllm/core/Tensor.hpp" | ||
| #include "mllm/engine/prefix_cache/TLB.hpp" | ||
| #include "mllm/engine/prefix_cache/Cache.hpp" // IWYU pragma: export | ||
|
|
There was a problem hiding this comment.
Add the missing <string> include.
PrefixCacheOptions exposes a std::string field, but this header never includes <string>. Any translation unit that includes PrefixCache.hpp before another header that happens to pull in <string> will fail to compile. Relying on transitive includes is explicitly discouraged—always include the headers you depend on.(stackoverflow.com)
#include <memory>
#include <vector>
+#include <string>
#include <optional>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <memory> | |
| #include <vector> | |
| #include <optional> | |
| #include "mllm/core/Tensor.hpp" | |
| #include "mllm/engine/prefix_cache/TLB.hpp" | |
| #include "mllm/engine/prefix_cache/Cache.hpp" // IWYU pragma: export | |
| #include <memory> | |
| #include <vector> | |
| #include <string> | |
| #include <optional> | |
| #include "mllm/core/Tensor.hpp" | |
| #include "mllm/engine/prefix_cache/TLB.hpp" | |
| #include "mllm/engine/prefix_cache/Cache.hpp" // IWYU pragma: export |
🧰 Tools
🪛 Clang (14.0.6)
[error] 21-21: 'memory' file not found
(clang-diagnostic-error)
🤖 Prompt for AI Agents
In mllm/nn/lmcache/PrefixCache.hpp around lines 21 to 27, the header declares
PrefixCacheOptions with a std::string field but never includes <string>; add
#include <string> (place it with the other standard includes near the top of the
file) so the header directly satisfies its dependency and no longer relies on
transitive includes.
| class CpuPrefixCache final : protected PrefixCache { | ||
| public: | ||
| explicit CpuPrefixCache(const PrefixCacheOptions& options); | ||
|
|
||
| void promote(const std::vector<int64_t>& token_ids, | ||
| const std::vector<std::vector<prefix_cache::vp_addr_t>>& key_cache_addresses, | ||
| const std::vector<std::vector<prefix_cache::vp_addr_t>>& value_cache_addresses, int64_t extra_key) override; | ||
|
|
||
| prefix_cache::vp_addr_t allocKey(int layer_idx) override; | ||
|
|
||
| prefix_cache::vp_addr_t allocValue(int layer_idx) override; | ||
|
|
||
| void freeKey(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void freeValue(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| char* physicalAddrKey(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| char* physicalAddrValue(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void prefetchKey(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void prefetchValue(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void purgeKey(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void purgeValue(int layer_idx, prefix_cache::vp_addr_t addr) override; | ||
|
|
||
| void _initFullAttention() override; | ||
|
|
||
| void _initSlidingWindowAttention() override; | ||
|
|
||
| void _validateKeyTokenShape(Tensor& key) override; | ||
|
|
||
| void _validateValueTokenShape(Tensor& value) override; | ||
|
|
||
| private: | ||
| // Each Layer each allocator for memory contiguous on CPU platform. | ||
| std::vector<std::pair<prefix_cache::_AllocatorImpl::ptr_t, prefix_cache::_AllocatorImpl::ptr_t>> caches_; |
There was a problem hiding this comment.
Switch to public inheritance for CpuPrefixCache.
CpuPrefixCache currently derives protected from PrefixCache, which blocks implicit upcasts to the base type. That defeats the polymorphic interface you just introduced—callers that work with PrefixCache& or PrefixCache* cannot use CpuPrefixCache at all. Make the inheritance public so the concrete implementation can substitute for the base.
-class CpuPrefixCache final : protected PrefixCache {
+class CpuPrefixCache final : public PrefixCache {🧰 Tools
🪛 Clang (14.0.6)
[error] 111-111: parameter 2 is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions
(readability-avoid-const-params-in-decls,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/nn/lmcache/PrefixCache.hpp around lines 106 to 144, CpuPrefixCache
currently inherits from PrefixCache using protected inheritance which prevents
implicit upcasts to PrefixCache and breaks polymorphism; change the inheritance
specifier from protected to public (i.e., make CpuPrefixCache publicly inherit
PrefixCache) so callers can use CpuPrefixCache where a PrefixCache* or
PrefixCache& is expected, then rebuild and run tests to ensure no access
violations or visibility issues occur.
fwd_bshdfor BSHD tensor layout with architecture-specific optimizationsSummary by CodeRabbit
New Features
Tests
Improvements