-
Notifications
You must be signed in to change notification settings - Fork 248
Add a new learning path on SIMD Loops. #2281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jasonrandrews
merged 2 commits into
ArmDeveloperEcosystem:main
from
Arnaud-de-Grandmaison-ARM:simd-loops
Sep 10, 2025
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
79 changes: 79 additions & 0 deletions
79
content/learning-paths/cross-platform/simd-loops/1-about.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| --- | ||
| title: About SIMD Loops | ||
| weight: 3 | ||
|
|
||
| ### FIXED, DO NOT MODIFY | ||
| layout: learningpathall | ||
| --- | ||
|
|
||
| Writing high-performance software for Arm processors often involves delving into | ||
| its SIMD technologies. For many developers, that journey started with Neon --- a | ||
| familiar, fixed-width vector extension that has been around for years. But as | ||
| Arm architectures continue to evolve, so do their SIMD technologies. | ||
|
|
||
| Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern | ||
| workloads. Unlike Neon, they aren’t just wider --- they’re different. These | ||
| extensions introduce new instructions, more flexible programming models, and | ||
| support for concepts like predication, scalable vectors, and streaming modes. | ||
| However, they also come with a learning curve. | ||
|
|
||
| That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes | ||
| in. | ||
|
|
||
| [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help | ||
| you in the process of learning how to write SVE and SME code. It is a collection | ||
| of self-contained, real-world loop kernels --- written in a mix of C, ACLE | ||
| intrinsics, and inline assembly --- that target everything from simple arithmetic | ||
| to matrix multiplication, sorting, and string processing. You can compile them, | ||
| run them, step through them, and use them as a foundation for your own SIMD | ||
| work. | ||
|
|
||
| If you’re familiar with Neon intrinsics and would like to explore what SVE and | ||
| SME have to offer, the [SIMD | ||
| Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you ! | ||
|
|
||
| ## What is SIMD Loops ? | ||
|
|
||
| [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source | ||
| project built to help you learn how to write SIMD code for modern Arm | ||
| architectures --- specifically using SVE (Scalable Vector Extension) and SME | ||
| (Scalable Matrix Extension). It is designed for programmers who already know | ||
| their way around Neon intrinsics but are now facing the more powerful --- and | ||
| more complex --- world of SVE and SME. | ||
|
|
||
| The goal of SIMD Loops is to provide working, readable examples that demonstrate | ||
| how to use the full range of features available in SVE, SVE2, and SME2. Each | ||
| example is a self-contained loop kernel --- a small piece of code that performs | ||
| a specific task like matrix multiplication, vector reduction, histogram or | ||
| memory copy --- and shows how that task can be implemented across different | ||
| vector instruction sets. | ||
|
|
||
| Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops | ||
| takes the opposite approach: it aims to showcase the architecture, not the | ||
| problem. The loop kernels are chosen to be realistic and meaningful, but the | ||
| main goal is to demonstrate how specific features and instructions work in | ||
| practice. If you’re trying to understand scalability, predication, | ||
| gather/scatter, streaming mode, ZA storage, compact instructions, or the | ||
| mechanics of matrix tiles --- this is where you’ll see them in action. | ||
|
|
||
| The project includes: | ||
| - Dozens of numbered loop kernels, each focused on a specific feature or pattern | ||
| - Reference C implementations to establish expected behavior | ||
| - Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1 | ||
| - Build support for different instruction sets, with runtime validation | ||
| - A simple command-line runner to execute any loop interactively | ||
| - Optional standalone binaries for bare-metal and simulator use | ||
|
|
||
| You don’t need to worry about auto-vectorization, compiler flags, or tooling | ||
| quirks. Each loop is hand-written and annotated to make the use of SIMD features | ||
| clear. The intent is that you can study, modify, and run each loop as a learning | ||
| exercise --- and use the project as a foundation for your own exploration of | ||
| Arm’s vector extensions. | ||
|
|
||
| ## Where to get it? | ||
|
|
||
| [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an | ||
| open-source code licensed under BSD 3-Clause. You can access the source code | ||
| from the following GitLab project: | ||
| https://gitlab.arm.com/architecture/simd-loops | ||
|
|
78 changes: 78 additions & 0 deletions
78
content/learning-paths/cross-platform/simd-loops/2-using.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| --- | ||
| title: Using SIMD Loops | ||
| weight: 4 | ||
|
|
||
| ### FIXED, DO NOT MODIFY | ||
| layout: learningpathall | ||
| --- | ||
|
|
||
| First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and | ||
| change current directory to it with: | ||
|
|
||
| ```BASH | ||
| git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git | ||
| cd simd-loops.git | ||
| ``` | ||
|
|
||
| ## SIMD Loops structure | ||
|
|
||
| In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the | ||
| source code for the loops is organized under the loops directory. The complete | ||
| list of loops is documented in the loops.inc file, which includes a brief | ||
| description and the purpose of each loop. Every loop is associated with a | ||
| uniquely named source file following the naming pattern `loop_<NNN>.c`, where | ||
| `<NNN>` represents the loop number. | ||
|
|
||
| A loop is structured as follows: | ||
|
|
||
| ```C | ||
| // Includes and loop_<NNN>_data structure definition | ||
|
|
||
| #if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC) | ||
|
|
||
| // C code | ||
| void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... } | ||
|
|
||
| #if defined(HAVE_xxx_INTRINSICS) | ||
|
|
||
| // Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions | ||
| void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... } | ||
|
|
||
| #elif defined(<ASM_COND>) | ||
|
|
||
| // Hand-written inline assembly : | ||
| // <ASM_COND> = __ARM_FEATURE_SME2p1, __ARM_FEATURE_SME2, __ARM_FEATURE_SVE2p1, | ||
| // __ARM_FEATURE_SVE2, __ARM_FEATURE_SVE, or __ARM_NEON | ||
| void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... } | ||
|
|
||
| #else | ||
|
|
||
| #error "No implementations available for this target." | ||
|
|
||
| #endif | ||
|
|
||
| // Main of loop: Buffers allocations, loop function call, result functional checking | ||
| ``` | ||
|
|
||
| Each loop is implemented in several SIMD extension variants, and conditional | ||
| compilation is used to select one of the optimisations for the | ||
| `inner_loop_<NNN>` function. The native C implementation is written first, and | ||
| it can be generated either when building natively (HAVE_NATIVE) or through | ||
| compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g., | ||
| SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE | ||
| support is not available, the build process falls back to handwritten inline | ||
| assembly targeting one of the available SIMD extensions, such as SME2.1, SME2, | ||
| SVE2.1, SVE2, and others. The overall code structure also includes setup and | ||
| cleanup code in the main function, where memory buffers are allocated, the | ||
| selected loop kernel is executed, and results are verified for correctness. | ||
|
|
||
| At compile time, you can select which loop optimisation to compile, whether it | ||
| is based on SME or SVE intrinsics, or one of the available inline assembly | ||
| variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics | ||
| sme_intrinsics` ...). | ||
|
|
||
| As the result of the build, two types of binaries are generated. The first is a | ||
| single executable named `simd_loops`, which includes all the loop | ||
| implementations. A specific loop can be selected by passing parameters to the | ||
| program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists | ||
| of individual standalone binaries, each corresponding to a specific loop. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.