ArmDeveloperEcosystem · jasonrandrews · Sep 10, 2025 · Jul 11, 2025 · Sep 8, 2025
diff --git a/assets/contributors.csv b/assets/contributors.csv
@@ -100,4 +100,5 @@ Ann Cheng,Arm,anncheng-arm,hello-ann,,
 Fidel Makatia Omusilibwa,,,,,
 Ker Liu,,,,,
 Rui Chang,,,,,
-
+Alejandro Martinez Vicente,Arm,,,,
+Mohamad Najem,Arm,,,,
diff --git a/content/learning-paths/cross-platform/simd-loops/1-about.md b/content/learning-paths/cross-platform/simd-loops/1-about.md
@@ -0,0 +1,79 @@
+---
+title: About SIMD Loops
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+Writing high-performance software for Arm processors often involves delving into
+its SIMD technologies. For many developers, that journey started with Neon --- a
+familiar, fixed-width vector extension that has been around for years. But as
+Arm architectures continue to evolve, so do their SIMD technologies.
+
+Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
+workloads. Unlike Neon, they aren’t just wider --- they’re different. These
+extensions introduce new instructions, more flexible programming models, and
+support for concepts like predication, scalable vectors, and streaming modes.
+However, they also come with a learning curve.
+
+That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
+in.
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
+you in the process of learning how to write SVE and SME code. It is a collection
+of self-contained, real-world loop kernels --- written in a mix of C, ACLE
+intrinsics, and inline assembly --- that target everything from simple arithmetic
+to matrix multiplication, sorting, and string processing. You can compile them,
+run them, step through them, and use them as a foundation for your own SIMD
+work.
+
+If you’re familiar with Neon intrinsics and would like to explore what SVE and
+SME have to offer, the [SIMD
+Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
+
+## What is SIMD Loops ?
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
+project built to help you learn how to write SIMD code for modern Arm
+architectures --- specifically using SVE (Scalable Vector Extension) and SME
+(Scalable Matrix Extension). It is designed for programmers who already know
+their way around Neon intrinsics but are now facing the more powerful --- and
+more complex --- world of SVE and SME.
+
+The goal of SIMD Loops is to provide working, readable examples that demonstrate
+how to use the full range of features available in SVE, SVE2, and SME2. Each
+example is a self-contained loop kernel --- a small piece of code that performs
+a specific task like matrix multiplication, vector reduction, histogram or
+memory copy --- and shows how that task can be implemented across different
+vector instruction sets.
+
+Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
+takes the opposite approach: it aims to showcase the architecture, not the
+problem. The loop kernels are chosen to be realistic and meaningful, but the
+main goal is to demonstrate how specific features and instructions work in
+practice. If you’re trying to understand scalability, predication,
+gather/scatter, streaming mode, ZA storage, compact instructions, or the
+mechanics of matrix tiles --- this is where you’ll see them in action.
+
+The project includes:
+- Dozens of numbered loop kernels, each focused on a specific feature or pattern
+- Reference C implementations to establish expected behavior
+- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1
+- Build support for different instruction sets, with runtime validation
+- A simple command-line runner to execute any loop interactively
+- Optional standalone binaries for bare-metal and simulator use
+
+You don’t need to worry about auto-vectorization, compiler flags, or tooling
+quirks. Each loop is hand-written and annotated to make the use of SIMD features
+clear. The intent is that you can study, modify, and run each loop as a learning
+exercise --- and use the project as a foundation for your own exploration of
+Arm’s vector extensions.
+
+## Where to get it?
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
+open-source code licensed under BSD 3-Clause. You can access the source code
+from the following GitLab project:
+https://gitlab.arm.com/architecture/simd-loops
+
diff --git a/content/learning-paths/cross-platform/simd-loops/2-using.md b/content/learning-paths/cross-platform/simd-loops/2-using.md
@@ -0,0 +1,78 @@
+---
+title: Using SIMD Loops
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
+change current directory to it with:
+
+```BASH
+git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
+cd simd-loops.git
+```
+
+## SIMD Loops structure
+
+In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
+source code for the loops is organized under the loops directory. The complete
+list of loops is documented in the loops.inc file, which includes a brief
+description and the purpose of each loop. Every loop is associated with a
+uniquely named source file following the naming pattern `loop_<NNN>.c`, where
+`<NNN>`  represents the loop number.
+
+A loop is structured as follows:
+
+```C
+// Includes and loop_<NNN>_data structure definition
+
+#if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC)
+
+// C code
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#if defined(HAVE_xxx_INTRINSICS)
+
+// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#elif defined(<ASM_COND>)
+
+ // Hand-written inline assembly :
+// <ASM_COND> = __ARM_FEATURE_SME2p1, __ARM_FEATURE_SME2, __ARM_FEATURE_SVE2p1,
+//              __ARM_FEATURE_SVE2, __ARM_FEATURE_SVE, or __ARM_NEON
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#else
+
+#error "No implementations available for this target."
+
+#endif
+
+// Main of loop: Buffers allocations, loop function call, result functional checking
+```
+
+Each loop is implemented in several SIMD extension variants, and conditional
+compilation is used to select one of the optimisations for the
+`inner_loop_<NNN>` function. The native C implementation is written first, and
+it can be generated either when building natively (HAVE_NATIVE) or through
+compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
+SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
+support is not available, the build process falls back to handwritten inline
+assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
+SVE2.1, SVE2, and others. The overall code structure also includes setup and
+cleanup code in the main function, where memory buffers are allocated, the
+selected loop kernel is executed, and results are verified for correctness.
+
+At compile time, you can select which loop optimisation to compile, whether it
+is based on SME or SVE intrinsics, or one of the available inline assembly
+variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
+sme_intrinsics` ...).
+
+As the result of the build, two types of binaries are generated. The first is a
+single executable named `simd_loops`, which includes all the loop
+implementations. A specific loop can be selected by passing parameters to the
+program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists
+of individual standalone binaries, each corresponding to a specific loop.