Skip to content

Conversation

Copy link

Copilot AI commented Oct 28, 2025

Migrates compute benchmark suite from x86-only to multi-architecture support (x86-64 + ARM64).

Changes

Architecture detection

  • Added compile-time architecture detection (__aarch64__, __ARM_NEON)
  • Main now displays "ARM64 with NEON" or "x86-64 with SSE2" based on build target

SIMD intrinsics

  • Matrix operations: _mm_loadu_pdvld1q_f64, _mm_mul_pdvmulq_f64, fused multiply-add with vmlaq_f64
  • Hash operations: _mm_loadu_si128vld1q_u8, array-based byte extraction (NEON lanes require compile-time constants)
  • String search: _mm_cmpeq_epi8vceqq_u8, _mm_movemask_epi8 → array storage + iteration
  • Memory copy: _mm_loadu/storeu_si128vld1q_u8/vst1q_u8
  • Polynomial eval: _mm_mul_pd/_mm_add_pdvmulq_f64/vaddq_f64, horizontal add via lane extraction

Build system

  • Dockerfile: Added -march=native for automatic SIMD enablement
  • .gitignore: Exclude build artifacts

Example transformation in matrix multiplication:

#if USE_X86_SIMD
    __m128d sum_vec = _mm_setzero_pd();
    __m128d a_vec = _mm_loadu_pd(&data[i][k]);
    __m128d b_vec = _mm_set_pd(other.data[k+1][j], other.data[k][j]);
    sum_vec = _mm_add_pd(sum_vec, _mm_mul_pd(a_vec, b_vec));
#elif USE_ARM_NEON
    float64x2_t sum_vec = vdupq_n_f64(0.0);
    float64x2_t a_vec = vld1q_f64(&data[i][k]);
    double b_arr[2] = {other.data[k][j], other.data[k+1][j]};
    float64x2_t b_vec = vld1q_f64(b_arr);
    sum_vec = vmlaq_f64(sum_vec, a_vec, b_vec);  // fused multiply-add
#endif

All modules maintain scalar fallback for non-SIMD platforms.

Original prompt

This section details on the original issue you should resolve

<issue_title>Migrate this repo to Arm</issue_title>
<issue_description>Please use any MCP tools that you need.</issue_description>

Comments on the Issue (you are @copilot in this section)

Note

Custom agent used: arm-migration-agent
Scan a project and migrate to ARM architecture


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 5 commits October 28, 2025 20:30
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Copilot AI changed the title [WIP] Migrate this repository to Arm architecture Add ARM64 NEON support to x86 SSE2 compute benchmarks Oct 28, 2025
Copilot AI requested a review from JoeStech October 28, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate this repo to Arm

2 participants