feat(cpu-kernels): add SIMD-based vector operations for flash attention#484
feat(cpu-kernels): add SIMD-based vector operations for flash attention#484chenghuaWang merged 4 commits intoUbiquitousLearning:v2from
Conversation
Introduce Highway-based SIMD implementations for common vector operations including dot product, scalar multiplication, fused multiply-add, and constant filling. These are integrated into the flash attention v2 CPU backend for performance optimization on x86 architectures. The changes include: - New SIMD kernels in `impl-any-simd-inl.hpp` - Dynamic dispatch setup in `impl-any-simd-inst.cpp` - Public interface definitions in `impl-any-simd-inst.hpp` - Template specializations in `impl-any-simd.hpp Also update MLIR Python bindings flag in install script and fix build dependency version in pyproject.toml. Add initial documentation for MLLM IR and MLIR extensions.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces SIMD-accelerated FP32 vector operations for Flash Attention 2 on x86 architecture using Highway library, adds dynamic dispatch wrappers bridging template specializations to HWY implementations, updates build configuration to enable MLIR Python bindings, and pins scikit-build-core to a specific version. Changes
Sequence DiagramsequenceDiagram
participant User
participant Template as Template Specialization<br/>(__X86ArchTag)
participant Wrapper as Dynamic Dispatch Wrapper<br/>(call_*)
participant HWY as Highway SIMD Implementation<br/>(vector_dot_product_*)
User->>Template: Invoke VectorDotProduct::run()
Template->>Wrapper: call_vector_dot_product_fp32_fp32_fp32()
Wrapper->>HWY: HWY_DYNAMIC_DISPATCH → best SIMD target
HWY->>HWY: SIMD loop (Highway vectors)
HWY->>HWY: Scalar tail handling
HWY-->>Wrapper: Result
Wrapper-->>Template: Return
Template-->>User: Computed result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Rationale: The diff spans multiple files with heterogeneous changes—SIMD implementations with logic density (FP32 vector operations), new dynamic dispatch infrastructure (HWY_DYNAMIC_DISPATCH patterns), template specializations bridging layers, and build configuration changes. While individual cohorts follow consistent patterns, each requires separate reasoning: SIMD correctness and Highway API usage, dispatch mechanism validation, specialization mapping accuracy, and configuration impact assessment. Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (10)
scripts/install_mlir.sh (1)
5-5: Enable Python bindings: OK. Consider script hardening and flag normalization.CMake treats On/ON equally; use ON for consistency and add strict bash flags.
Apply:
+set -euo pipefail @@ -cmake -G Ninja ../llvm -DMLIR_ENABLE_BINDINGS_PYTHON=On -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_BUILD_EXAMPLES=ON -DLLVM_TARGETS_TO_BUILD="Native;NVPTX" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON -DLLVM_CCACHE_BUILD=ON -DLLVM_USE_SANITIZER="Address;Undefined" -DMLIR_INCLUDE_INTEGRATION_TESTS=ON +cmake -G Ninja ../llvm -DMLIR_ENABLE_BINDINGS_PYTHON=ON -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_BUILD_EXAMPLES=ON -DLLVM_TARGETS_TO_BUILD="Native;NVPTX" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON -DLLVM_CCACHE_BUILD=ON -DLLVM_USE_SANITIZER="Address;Undefined" -DMLIR_INCLUDE_INTEGRATION_TESTS=ONOptional (if CI lacks auto‑detect): add
-DPython3_EXECUTABLE="$(command -v python3)".mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inl.hpp (4)
11-24: Dot product SIMD path looks correct; consider adding restrict and HWY_INLINE for consistency.Small nits: help optimizer/aliasing and follow Highway style.
Apply:
-void vector_dot_product_fp32_fp32_fp32(const float* lhs, const float* rhs, float* out, size_t len) { +HWY_INLINE void vector_dot_product_fp32_fp32_fp32(const float* HWY_RESTRICT lhs, + const float* HWY_RESTRICT rhs, + float* HWY_RESTRICT out, size_t len) {Ensure the corresponding
HWY_EXPORT(vector_dot_product_fp32_fp32_fp32)exists in the inst TU.
26-36: Mul-from-const: add restrict and inline; otherwise LGTM.-void mul_from_const_fp32(float* from, float c, size_t len) { +HWY_INLINE void mul_from_const_fp32(float* HWY_RESTRICT from, float c, size_t len) {
37-47: FMA path: add restrict and inline.-void fma_const_array_fp32(float* acc_o, float acc_s, const float* v_token, size_t len) { +HWY_INLINE void fma_const_array_fp32(float* HWY_RESTRICT acc_o, float acc_s, + const float* HWY_RESTRICT v_token, size_t len) {
49-55: Fill-const: add restrict and inline.-void filled_with_const_fp32(float* a, float v, size_t len) { +HWY_INLINE void filled_with_const_fp32(float* HWY_RESTRICT a, float v, size_t len) {mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp (1)
30-39: Wrappers look correct; consider noexcept for all.Signatures and HWY_DYNAMIC_DISPATCH usage are good. Consider adding noexcept to the dot and FMA wrappers for consistency and to prevent exceptions crossing ABI.
-HWY_DLLEXPORT void call_vector_dot_product_fp32_fp32_fp32(..., size_t len) { +HWY_DLLEXPORT void call_vector_dot_product_fp32_fp32_fp32(..., size_t len) noexcept { HWY_DYNAMIC_DISPATCH(vector_dot_product_fp32_fp32_fp32)(lhs, rhs, out, len); } - -HWY_DLLEXPORT void call_fma_const_array_fp32(..., size_t len) { +HWY_DLLEXPORT void call_fma_const_array_fp32(..., size_t len) noexcept { HWY_DYNAMIC_DISPATCH(fma_const_array_fp32)(acc_o, acc_s, v_token, len); }mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd.hpp (4)
6-6: Guard or decouple the Highway include.If this header is included in builds without Highway in the include path, <hwy/highway.h> will fail. Either gate it or provide a fallback for HWY_RESTRICT.
Option A (guard):
-#include <hwy/highway.h> +#if defined(MLLM_WITH_HWY) +# include <hwy/highway.h> +#endifOption B (fallback define placed after includes):
+#ifndef HWY_RESTRICT +# if defined(_MSC_VER) +# define HWY_RESTRICT __restrict +# else +# define HWY_RESTRICT __restrict__ +# endif +#endif
14-20: Make run() wrappers consistent: restrict, inlining, noexcept, and clearer names.Unify qualifiers and reduce “easily-swappable parameters” risk; keep names descriptive.
template<> struct VectorDotProduct<__X86ArchTag, mllm_fp32_t, mllm_fp32_t, mllm_fp32_t> { - static MLLM_FORCE_INLINE void run(const mllm_fp32_t* __restrict__ __lhs, const mllm_fp32_t* __restrict__ __rhs, - mllm_fp32_t* __out, size_t len) { - call_vector_dot_product_fp32_fp32_fp32(__lhs, __rhs, __out, len); + static MLLM_FORCE_INLINE void run(const mllm_fp32_t* HWY_RESTRICT lhs, + const mllm_fp32_t* HWY_RESTRICT rhs, + mllm_fp32_t* HWY_RESTRICT out, + size_t len) noexcept { + call_vector_dot_product_fp32_fp32_fp32(lhs, rhs, out, len); } }; template<> struct MulFromConst<__X86ArchTag, mllm_fp32_t, mllm_fp32_t> { - static void run(mllm_fp32_t* from, mllm_fp32_t c, size_t len) { call_mul_from_const_fp32(from, c, len); } + static MLLM_FORCE_INLINE void run(mllm_fp32_t* HWY_RESTRICT dst, + mllm_fp32_t scale, + size_t len) noexcept { + call_mul_from_const_fp32(dst, scale, len); + } }; template<> struct FMAConstArray<__X86ArchTag, mllm_fp32_t, mllm_fp32_t, mllm_fp32_t> { - static void run(mllm_fp32_t* acc_o, mllm_fp32_t acc_s, const mllm_fp32_t* v_token, size_t len) { - call_fma_const_array_fp32(acc_o, acc_s, v_token, len); + static MLLM_FORCE_INLINE void run(mllm_fp32_t* HWY_RESTRICT acc_out, + mllm_fp32_t acc_scale, + const mllm_fp32_t* HWY_RESTRICT vec, + size_t len) noexcept { + call_fma_const_array_fp32(acc_out, acc_scale, vec, len); } }; template<> struct FilledWithConst<__X86ArchTag, mllm_fp32_t> { - static void run(mllm_fp32_t* a, mllm_fp32_t v, size_t len) { call_filled_with_const_fp32(a, v, len); } + static MLLM_FORCE_INLINE void run(mllm_fp32_t* HWY_RESTRICT dst, + mllm_fp32_t value, + size_t len) noexcept { + call_filled_with_const_fp32(dst, value, len); + } };Also applies to: 22-25, 27-32, 34-37
14-20: Consider std::span to reduce swappable-parameter risk.If C++20 is available, change pointer+len pairs to std::span to improve safety and clarity.
Example:
- static MLLM_FORCE_INLINE void run(const mllm_fp32_t* HWY_RESTRICT lhs, const mllm_fp32_t* HWY_RESTRICT rhs, - mllm_fp32_t* HWY_RESTRICT out, size_t len) noexcept { - call_vector_dot_product_fp32_fp32_fp32(lhs, rhs, out, len); - } + static MLLM_FORCE_INLINE void run(std::span<const mllm_fp32_t> lhs, + std::span<const mllm_fp32_t> rhs, + std::span<mllm_fp32_t> out) noexcept { + call_vector_dot_product_fp32_fp32_fp32(lhs.data(), rhs.data(), out.data(), out.size()); + }
8-11: Move impl-any-simd-inst.hpp inclusion to implementation file to avoid forcing HWY as transitive public dependency.The include chain Kernels.hpp → fwd_bshd.hpp → impl-any-simd.hpp → impl-any-simd-inst.hpp exposes HWY (
#include <hwy/base.h>) to all downstream consumers via the public re-export marked in Kernels.hpp line 44-45. Moving this include to a .cpp file or private header would prevent downstream code from requiring HWY.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inl.hpp(1 hunks)mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp(1 hunks)mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp(1 hunks)mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd.hpp(1 hunks)mllm/compile/README.md(1 hunks)pyproject.toml(1 hunks)scripts/install_mlir.sh(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp
[error] 8-8: 'mllm/core/DataTypes.hpp' file not found
(clang-diagnostic-error)
[error] 20-20: variable 'call_fma_const_array_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 23-23: variable 'call_filled_with_const_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 25-25: variable 'call_mul_from_const_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inl.hpp
[error] 4-4: 'hwy/highway.h' file not found
(clang-diagnostic-error)
[error] 13-13: variable 'd' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 13-13: variable name 'd' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 15-15: variable 'i' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 15-15: variable name 'i' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 21-21: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 26-26: 2 adjacent parameters of 'mul_from_const_fp32' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 26-26: parameter name 'c' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 27-27: variable 'd' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 27-27: variable name 'd' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 28-28: variable name 'vc' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 29-29: variable 'i' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 29-29: variable name 'i' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 38-38: variable 'd' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 38-38: variable name 'd' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 40-40: variable 'i' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 40-40: variable name 'i' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 49-49: parameter name 'a' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 49-49: 2 adjacent parameters of 'filled_with_const_fp32' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 49-49: parameter name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 50-50: variable 'd' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 50-50: variable name 'd' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 51-51: variable name 'vv' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 52-52: variable 'i' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 52-52: variable name 'i' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp
[error] 4-4: 'mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp' file not found
(clang-diagnostic-error)
[error] 11-11: macro 'HWY_TARGET_INCLUDE' used to declare a constant; consider using a 'constexpr' constant
(cppcoreguidelines-macro-usage,-warnings-as-errors)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd.hpp
[error] 6-6: 'hwy/highway.h' file not found
(clang-diagnostic-error)
[error] 16-16: 2 adjacent parameters of 'run' of similar type ('const int *__restrict') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 24-24: 2 adjacent parameters of 'run' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 24-24: parameter name 'c' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 36-36: parameter name 'a' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 36-36: 2 adjacent parameters of 'run' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 36-36: parameter name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-android
- GitHub Check: build-macos
🔇 Additional comments (6)
mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp (1)
14-27: Dynamic-dispatch exports verified—no issues found.The verification confirms all four functions (
vector_dot_product_fp32_fp32_fp32,fma_const_array_fp32,filled_with_const_fp32,mul_from_const_fp32) have bothHWY_EXPORTdeclarations and correspondingHWY_DYNAMIC_DISPATCHcalls in their wrapper functions. The linkage is properly configured and will not produce link errors.mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inl.hpp (1)
4-4: No action needed—Highway dependency is already properly configured.Highway is already fetched via CPMAddPackage (mllm/backends/cpu/CMakeLists.txt line 81-85) and explicitly linked to MllmCPUBackend with
target_link_libraries(MllmCPUBackend PUBLIC hwy)(line 92). CMake automatically propagates include directories through public linking, so#include <hwy/highway.h>will resolve correctly at compile time.Likely an incorrect or invalid review comment.
pyproject.toml (1)
3-3: Consider version pinning strategy; manual CI verification needed as no CI configuration files exist in repository.The current exact pin (
==0.10.0) is valid and improves reproducibility. However, scikit-build-core v0.11.6 is the latest release on PyPI (Aug 2025), so v0.10.0 is outdated. If you're intentionally using an older version for compatibility, keep the exact pin; if not, consider upgrading or using a bounded range like>=0.10,<0.11.The CI image verification suggested in the original comment cannot be completed: no CI configuration files (
.github/workflows, etc.) were found in the repository. Verify manually that your CI environment has scikit-build-core 0.10.0 available.mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp (3)
25-28: HWY_EXPORT placement looks right.Exporting the per-target functions before defining the dynamic-dispatch shims is correct.
4-4: No action required—header file exists and include path is correct.The verification confirms that
impl-any-simd-inst.hppexists at the expected path and is correctly included. The static analysis warning is a false positive, likely a tool configuration issue rather than an actual problem with the code or build system.
6-14: Dynamic-dispatch scaffold verified as correct.The inl file (impl-any-simd-inl.hpp) properly wraps all four functions (vector_dot_product_fp32_fp32_fp32, mul_from_const_fp32, fma_const_array_fp32, filled_with_const_fp32) with HWY_BEFORE_NAMESPACE/HWY_AFTER_NAMESPACE macros and defines them inside the HWY_NAMESPACE namespace. This follows standard Highway conventions—explicit HWY_EXPORT macros are not needed when using foreach_target.h, as that mechanism handles per-target re-inclusion and visibility.
| HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_DLLEXPORT a, float v, size_t len) { | ||
| HWY_DYNAMIC_DISPATCH(filled_with_const_fp32)(a, v, len); | ||
| } |
There was a problem hiding this comment.
Fix invalid HWY_DLLEXPORT on parameters + align scalar types.
HWY_DLLEXPORT must annotate the function, not parameters. Also, prefer mllm_fp32_t over float for consistency with the rest of the API.
Apply:
-HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_DLLEXPORT a, float v, size_t len) {
- HWY_DYNAMIC_DISPATCH(filled_with_const_fp32)(a, v, len);
-}
+HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_RESTRICT a, mllm_fp32_t v, size_t len) noexcept {
+ HWY_DYNAMIC_DISPATCH(filled_with_const_fp32)(a, v, len);
+}
-HWY_DLLEXPORT void call_mul_from_const_fp32(mllm_fp32_t* HWY_DLLEXPORT from, float c, size_t len) {
- HWY_DYNAMIC_DISPATCH(mul_from_const_fp32)(from, c, len);
-}
+HWY_DLLEXPORT void call_mul_from_const_fp32(mllm_fp32_t* HWY_RESTRICT from, mllm_fp32_t c, size_t len) noexcept {
+ HWY_DYNAMIC_DISPATCH(mul_from_const_fp32)(from, c, len);
+}Also applies to: 45-47
🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp around lines
41-43 (and also apply the same change to lines 45-47), remove the misplaced
HWY_DLLEXPORT placed on the parameter declaration(s) so the export attribute
only decorates the function itself, and change the scalar parameter type from
float to mllm_fp32_t to match the API; also update any matching
prototypes/forward declarations accordingly.
| #ifndef HIGHWAY_HWY_FA2_IMPL_INL_H_ | ||
| #define HIGHWAY_HWY_FA2_IMPL_INL_H_ | ||
|
|
There was a problem hiding this comment.
Header guard name is incorrect/mismatched.
Guard references “INL” instead of “INST” and non‑project style. Fix to match filename.
Apply:
-#ifndef HIGHWAY_HWY_FA2_IMPL_INL_H_
-#define HIGHWAY_HWY_FA2_IMPL_INL_H_
+// impl-any-simd-inst.hpp
+#ifndef MLLM_BACKENDS_CPU_KERNELS_COMMON_FA2_1_IMPL_ANY_SIMD_INST_HPP_
+#define MLLM_BACKENDS_CPU_KERNELS_COMMON_FA2_1_IMPL_ANY_SIMD_INST_HPP_
@@
-#endif // HIGHWAY_HWY_FA2_IMPL_INL_H_
+#endif // MLLM_BACKENDS_CPU_KERNELS_COMMON_FA2_1_IMPL_ANY_SIMD_INST_HPP_Also applies to: 29-29
🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp (around lines
5–7 and also line 29) the header guard uses the wrong name/format
("HIGHWAY_HWY_FA2_IMPL_INL_H_" and "INL") which doesn't match the filename or
project style; replace it with a guard that matches the file path and “INST”
suffix (e.g. MLLM_BACKENDS_CPU_KERNELS_COMMON_FA2_1_IMPL_ANY_SIMD_INST_HPP_) and
update both the #ifndef/#define at the top and the corresponding #endif comment
at the bottom to use the new identifier.
| HWY_DLLEXPORT void call_vector_dot_product_fp32_fp32_fp32(const mllm_fp32_t* HWY_RESTRICT lhs, | ||
| const mllm_fp32_t* HWY_RESTRICT rhs, mllm_fp32_t* HWY_RESTRICT out, | ||
| size_t len); | ||
|
|
||
| HWY_DLLEXPORT void call_fma_const_array_fp32(mllm_fp32_t* HWY_RESTRICT acc_o, mllm_fp32_t acc_s, | ||
| const mllm_fp32_t* HWY_RESTRICT v_token, size_t len); | ||
|
|
||
| HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_DLLEXPORT a, float v, size_t len); | ||
|
|
||
| HWY_DLLEXPORT void call_mul_from_const_fp32(mllm_fp32_t* HWY_DLLEXPORT from, float c, size_t len); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Misuse of HWY_DLLEXPORT on parameter declarations; use HWY_RESTRICT for pointers.
Putting HWY_DLLEXPORT on parameters is invalid and may break compilation/ABI. Keep HWY_DLLEXPORT on the function, use HWY_RESTRICT on pointer params, and include for size_t.
Apply:
#include "mllm/core/DataTypes.hpp"
+#include <cstddef>
@@
-HWY_DLLEXPORT void call_vector_dot_product_fp32_fp32_fp32(const mllm_fp32_t* HWY_RESTRICT lhs,
- const mllm_fp32_t* HWY_RESTRICT rhs, mllm_fp32_t* HWY_RESTRICT out,
- size_t len);
+HWY_DLLEXPORT void call_vector_dot_product_fp32_fp32_fp32(const mllm_fp32_t* HWY_RESTRICT lhs,
+ const mllm_fp32_t* HWY_RESTRICT rhs,
+ mllm_fp32_t* HWY_RESTRICT out, size_t len);
@@
-HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_DLLEXPORT a, float v, size_t len);
+HWY_DLLEXPORT void call_filled_with_const_fp32(mllm_fp32_t* HWY_RESTRICT a, float v, size_t len);
@@
-HWY_DLLEXPORT void call_mul_from_const_fp32(mllm_fp32_t* HWY_DLLEXPORT from, float c, size_t len);
+HWY_DLLEXPORT void call_mul_from_const_fp32(mllm_fp32_t* HWY_RESTRICT from, float c, size_t len);Please confirm mllm/core/DataTypes.hpp resolves on all targets; otherwise adjust the include path.
I'll verify the review comment by examining the file and understanding the Highway library macros. Let me start by investigating.
Now let me examine the actual file in the repository:
The file doesn't exist in search results without cloning. Let me try with cloning:
🏁 Script executed:
#!/bin/bash
# Locate and examine the target file
fd "impl-any-simd-inst.hpp" --type fLength of output: 65
🏁 Script executed:
#!/bin/bash
# Find the file in the repository
find . -name "impl-any-simd-inst.hpp" 2>/dev/nullLength of output: 67
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine the actual file content
cat "./git/mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp"
Length of output: 99
---
🏁 Script executed:
```shell
#!/bin/bash
# List directory structure and try to read with fd's path
ls -la mllm/backends/cpu/kernels/common/fa2_1/ 2>/dev/null || ls -la git/mllm/backends/cpu/kernels/common/fa2_1/ 2>/dev/null || pwd && find . -name "impl-any-simd-inst.hpp" -type f -exec cat {} \;
Length of output: 1834
Now let me verify the Highway library macro semantics:
---
🌐 Web query:
```web
Highway library HWY_DLLEXPORT HWY_RESTRICT macro documentation
💡 Result:
Short answer:
-
HWY_RESTRICT — Macro you put after a pointer (e.g. T* HWY_RESTRICT p) to promise the pointer does not alias other pointers; used to improve code generation. (See Highway quick-reference / FAQ and the library headers.) [1][2]
-
HWY_DLLEXPORT — Macro used on functions/variables to mark them as part of Highway’s public/ABI surface (exported symbols). In the codebase it prefixes public API declarations (e.g. SupportedTargets(), AllocateAlignedBytes(), per-target queries) so they are exported when building the library; the implementation of HWY_DLLEXPORT lives in highway_export.h and on common platforms expands to the platform-appropriate export attribute (e.g. __declspec(dllexport)/__declspec(dllimport) on MSVC or attribute((visibility("default"))) / equivalent on GCC/Clang). See usages in the headers. [2][3]
References:
- Highway quick reference / macros (describes HWY_RESTRICT). [1]
- Highway source headers (base.h shows HWY_RESTRICT mapping for MSVC; targets.h / per_target.h / aligned_allocator.h show HWY_DLLEXPORT usage). [2][3]
If you want, I can fetch and show the exact highway_export.h contents (the precise HWY_DLLEXPORT definition) and paste the lines here.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check if size_t is available from existing includes or needs explicit cstddef
# Look at what hwy/base.h and DataTypes.hpp provide
grep -r "include.*cstddef\|include.*cstdlib\|include.*cstring" mllm/core/DataTypes.hpp mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp 2>/dev/null || echo "No cstddef/cstdlib/cstring found in direct includes"
# Check if size_t is used elsewhere in the codebase without explicit cstddef
rg -n "size_t" mllm/backends/cpu/kernels/common/fa2_1/ -A1 -B1 --max-count=5
Length of output: 10230
Fix HWY_DLLEXPORT misuse on parameters in both header and implementation files; replace with HWY_RESTRICT and add explicit #include <cstddef>.
The review is correct: HWY_DLLEXPORT (function visibility macro) is incorrectly applied to pointer parameters in two functions. These should use HWY_RESTRICT instead (aliasing guarantee). The same issues exist in both impl-any-simd-inst.hpp and impl-any-simd-inst.cpp.
Required fixes:
impl-any-simd-inst.hpp (lines 23, 25):
- Line 23: Replace
HWY_DLLEXPORTwithHWY_RESTRICTon parametera - Line 25: Replace
HWY_DLLEXPORTwithHWY_RESTRICTon parameterfrom
impl-any-simd-inst.cpp (lines 41, 45):
- Line 41: Replace
HWY_DLLEXPORTwithHWY_RESTRICTon parametera - Line 45: Replace
HWY_DLLEXPORTwithHWY_RESTRICTon parameterfrom
Also add #include <cstddef> to both files for explicit size_t definition.
🧰 Tools
🪛 Clang (14.0.6)
[error] 20-20: variable 'call_fma_const_array_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 23-23: variable 'call_filled_with_const_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 25-25: variable 'call_mul_from_const_fp32' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.hpp around lines
16–26 and mllm/backends/cpu/kernels/common/fa2_1/impl-any-simd-inst.cpp around
lines 41–45, HWY_DLLEXPORT was mistakenly used on pointer parameters — replace
those HWY_DLLEXPORT occurrences on parameters `a` (hpp line ~23 / cpp line ~41)
and `from` (hpp line ~25 / cpp line ~45) with HWY_RESTRICT to provide the
correct aliasing guarantee, and add #include <cstddef> to both files to ensure
size_t is defined.
| @@ -0,0 +1 @@ | |||
| # MLLM IR and It's MLIR Extensions | |||
There was a problem hiding this comment.
Typo: “It’s” → “Its”.
Use the possessive “Its” in the heading.
Apply this diff:
-# MLLM IR and It's MLIR Extensions
+# MLLM IR and Its MLIR Extensions📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # MLLM IR and It's MLIR Extensions | |
| # MLLM IR and Its MLIR Extensions |
🤖 Prompt for AI Agents
In mllm/compile/README.md around line 1, the heading contains a typo using the
contraction "It's" instead of the possessive "Its"; update the first-line
heading from "MLLM IR and It's MLIR Extensions" to "MLLM IR and Its MLIR
Extensions" and save the file.
- Update macro checks to properly include ARM architecture
- Add missing `HWY_RESTRICT` qualifiers to function parameters
- Wrap implementation includes with proper architecture guards
- Fix typo in README.md ("It's" → "Its")
The logical condition for excluding ARM architectures has been updated to use proper parentheses grouping. This ensures that the code is only compiled for non-ARM architectures as intended, improving build compatibility across different CPU targets.
Include CPUArchHelper.hpp to enable architecture-specific checks in the FA2 SIMD implementation files. This change ensures proper compilation and runtime behavior for different CPU architectures by providing necessary architecture detection utilities.
Introduce Highway-based SIMD implementations for common vector operations including dot product, scalar multiplication, fused multiply-add, and constant filling. These are integrated into the flash attention v2 CPU backend for performance optimization on x86 architectures.
The changes include:
impl-any-simd-inl.hppimpl-any-simd-inst.cppimpl-any-simd-inst.hppAlso update MLIR Python bindings flag in install script and fix build dependency version in pyproject.toml. Add initial documentation for MLLM IR and MLIR extensions.
Summary by CodeRabbit
New Features
Build
Documentation