Skip to content

Add ARM64/NEON SIMD paths to all compute kernels#14

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/migrate-repo-to-arm-one-more-time
Draft

Add ARM64/NEON SIMD paths to all compute kernels#14
Copilot wants to merge 2 commits intomainfrom
copilot/migrate-repo-to-arm-one-more-time

Conversation

Copy link

Copilot AI commented Mar 10, 2026

All five compute kernels were gated behind #ifdef __x86_64__ with only scalar fallbacks on other architectures — ARM64 got zero SIMD acceleration.

Changes

NEON intrinsic paths added to each kernel

  • matrix_operations.cppfloat64x2_t + vmlaq_f64 for fused multiply-accumulate on doubles
  • polynomial_eval.cpp — parallel coefficient evaluation using vmlaq_f64
  • string_search.cppvceqq_u8 16-byte character comparison per cycle
  • memory_operations.cppvld1q_u8/vst1q_u8 16-byte loads/stores
  • hash_operations.cpp — 16-byte batched loads via vld1q_u8

Consistent arch-detection pattern across all files

#ifdef __aarch64__
#include <arm_neon.h>
#define USE_ARM_NEON 1
#endif

#if USE_X86_SIMD
    // SSE2 path (unchanged)
#elif USE_ARM_NEON
    // NEON path
#else
    // scalar fallback
#endif

Dockerfile — build flags updated: -march=armv8-a on AArch64, -O3 -std=c++14 on both targets; ubuntu:22.04 base already multi-arch.

main.cpp — runtime banner reports "AArch64 with NEON Optimizations" on ARM64.

Original prompt

This section details on the original issue you should resolve

<issue_title>migrate this repo to arm</issue_title>
<issue_description>use whatever arm mcp server tools you need.</issue_description>

Comments on the Issue (you are @copilot in this section)

Custom agent used: arm-migration-agent
Scan a project and migrate to ARM architecture


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

- Add ARM NEON SIMD implementations for all 5 compute kernels:
  * matrix_operations: float64x2_t with vmlaq_f64 (FMA) for 2x double matmul
  * polynomial_eval: float64x2_t with vmlaq_f64 for 2-coeff parallel Horner
  * string_search: uint8x16_t with vceqq_u8 for 16-byte parallel char search
  * memory_operations: uint8x16_t with vld1q_u8/vst1q_u8 for 16-byte copy
  * hash_operations: uint8x16_t with vld1q_u8/vst1q_u8 for 16-byte batch load
- Rename polynomial_eval_sse -> polynomial_eval_simd (arch-neutral name)
- Update Dockerfile: arch-detect build flags (-march=armv8-a on AArch64,
  -march=native on x86), -O3 (was -O2), -std=c++14 (was -std=c++11)
- Add USE_ARM_NEON macro detection in all source files via #ifdef __aarch64__
- All SIMD guards use #if/#elif/#else chains (x86 -> ARM -> scalar fallback)
- ubuntu:22.04 base image confirmed multi-arch (supports linux/arm64)
- Fix scalar hash fallback: cast to unsigned char to prevent sign-extension
Copilot AI changed the title [WIP] Migrate this repository to ARM architecture Add ARM64/NEON SIMD paths to all compute kernels Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

migrate this repo to arm

2 participants