Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
ffd00d9
Deploy SqueezeNet 1.0 INT8 model with ONNX Runtime on Azure Cobalt 100
odidev Jul 10, 2025
185bcc8
Deploy MySQL on the Microsoft Azure Cobalt 100 processors
odidev Aug 27, 2025
8ac8a26
Content dev
madeline-underwood Sep 28, 2025
e5e3083
Edited
madeline-underwood Sep 28, 2025
ea97167
Updates
madeline-underwood Sep 29, 2025
d20c7fb
Update kafka_cluster.md
madeline-underwood Sep 29, 2025
1a7124a
Content dev
madeline-underwood Sep 29, 2025
699e5d7
Removed draft status
madeline-underwood Sep 29, 2025
4b70400
Tweaks
madeline-underwood Sep 29, 2025
097c381
Tweaks
madeline-underwood Sep 29, 2025
fd81a7c
Tweaks
madeline-underwood Sep 29, 2025
2d34a28
Corrected ms blog title ref
madeline-underwood Sep 29, 2025
ad2c1bd
Tweaks
madeline-underwood Sep 29, 2025
80f1287
Merge pull request #2362 from madeline-underwood/patch-38
jasonrandrews Sep 29, 2025
2dca83c
Update _index.md
pareenaverma Sep 29, 2025
564c300
Update container CLI for macOS install guide
jasonrandrews Sep 29, 2025
03ed0e5
Merge pull request #2364 from jasonrandrews/review
jasonrandrews Sep 29, 2025
d580eaa
Merge pull request #2363 from madeline-underwood/nginx
jasonrandrews Sep 29, 2025
ff943be
Removed bold in bulleted list
madeline-underwood Sep 29, 2025
e50de2d
Updates
madeline-underwood Sep 29, 2025
4923367
Removing floating sentence.
madeline-underwood Sep 29, 2025
db9c11b
Added missing subheading
madeline-underwood Sep 29, 2025
24f960d
Updates
madeline-underwood Sep 29, 2025
538698f
Updates
madeline-underwood Sep 29, 2025
c7d8b94
Updates
madeline-underwood Sep 29, 2025
f80dcd6
Updates
madeline-underwood Sep 29, 2025
3175751
Merge pull request #2139 from odidev/onnx_LP
pareenaverma Sep 29, 2025
0182bea
Merge branch 'ArmDeveloperEcosystem:main' into training_inference
madeline-underwood Sep 29, 2025
3b6fc26
Added trailing slash
madeline-underwood Sep 29, 2025
4755031
Merge branch 'training_inference' of https://github.com/madeline-unde…
madeline-underwood Sep 29, 2025
07381b7
Adding image label and referencing credits
madeline-underwood Sep 29, 2025
e05343a
Removed image for copyright reasons
madeline-underwood Sep 30, 2025
10b5697
Corrected heading size
madeline-underwood Sep 30, 2025
9c56ef3
Starting content dev
madeline-underwood Sep 30, 2025
de76857
Updates
madeline-underwood Sep 30, 2025
71bf055
Fixing note formatting
madeline-underwood Sep 30, 2025
e143764
Refactor Go installation instructions for clarity and completeness
madeline-underwood Sep 30, 2025
9d25c74
Refactor section headings and improve clarity in Golang installation …
madeline-underwood Sep 30, 2025
fae46f9
Standardize title casing and improve section headings for clarity in …
madeline-underwood Sep 30, 2025
eabbd30
Enhance documentation clarity by adding descriptions and improving se…
madeline-underwood Sep 30, 2025
c9f2a5e
Add description and enhance formatting in background.md for Azure Cob…
madeline-underwood Sep 30, 2025
51b4722
Enhance documentation for Golang baseline testing and deployment on A…
madeline-underwood Sep 30, 2025
28d411a
Update _index.md
pareenaverma Sep 30, 2025
d6767a0
Merge pull request #2260 from odidev/mysql
pareenaverma Sep 30, 2025
265ef29
Enhance documentation for Golang on Azure Cobalt 100. Update titles, …
madeline-underwood Sep 30, 2025
8fb9164
Update floating-point Learning Path based on feedback.
jasonrandrews Sep 30, 2025
dc04c2c
Merge pull request #2370 from jasonrandrews/review
jasonrandrews Sep 30, 2025
17b7571
Add description to _index.md for clarity on deploying Golang on Azure…
madeline-underwood Sep 30, 2025
6b84c89
Merge pull request #2366 from madeline-underwood/tiny_ml_updates
jasonrandrews Sep 30, 2025
1cdc34f
Merge pull request #2367 from madeline-underwood/training_inference
jasonrandrews Sep 30, 2025
5d9d341
Enhance documentation for Golang on Azure Cobalt 100 by adding descri…
madeline-underwood Sep 30, 2025
44f59f6
Enhance benchmarking documentation for Golang by updating titles and …
madeline-underwood Sep 30, 2025
2e8ab13
Merge pull request #2371 from madeline-underwood/golang
jasonrandrews Sep 30, 2025
b11ba7c
Remove draft status from the SIMD migration learning path
jasonrandrews Sep 30, 2025
3099c23
Merge pull request #2372 from jasonrandrews/review2
jasonrandrews Sep 30, 2025
7aa2428
First review of llama.cpp with Streamline Learning Path
jasonrandrews Sep 30, 2025
41ba375
Merge pull request #2373 from jasonrandrews/review
jasonrandrews Sep 30, 2025
defcb1c
Update spelling word list and update tags
jasonrandrews Sep 30, 2025
fe17602
Merge pull request #2374 from jasonrandrews/review
jasonrandrews Sep 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4949,4 +4949,15 @@ uop
walkthrough
warmups
xo
yi
yi
AMX
AlexNet
FMAC
MySql
MyStrongPassword
RDBMS
SqueezeNet
TIdentify
goroutines
mysqlslap
squeezenet
8 changes: 4 additions & 4 deletions content/install-guides/container.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ sw_vers -productVersion
Example output:

```output
15.5
15.6.1
```

You must be running macOS 15.0 or later to use the Container CLI.
Expand All @@ -60,13 +60,13 @@ Go to the [GitHub Releases page](https://github.com/apple/container/releases) an
For example:

```bash
wget https://github.com/apple/container/releases/download/0.2.0/container-0.2.0-installer-signed.pkg
wget https://github.com/apple/container/releases/download/0.4.1/container-0.4.1-installer-signed.pkg
```

Install the package:

```bash
sudo installer -pkg container-0.2.0-installer-signed.pkg -target /
sudo installer -pkg container-0.4.1-installer-signed.pkg -target /
```

This installs the Container binary at `/usr/local/bin/container`.
Expand All @@ -90,7 +90,7 @@ container --version
Example output:

```output
container CLI version 0.2.0
container CLI version 0.4.1 (build: release, commit: 4ac18b5)
```

## Build and run a container
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,23 @@ title: Understand floating-point behavior across x86 and Arm architectures

minutes_to_complete: 30

who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand floating-point behavior across these architectures. Both architectures provide reliable and consistent floating-point computation following the IEEE 754 standard.
who_is_this_for: This is a topic for developers who are porting applications from x86 to Arm and want to understand floating-point behavior across these architectures. Both architectures provide reliable and consistent floating-point computation following the IEEE 754 standard.

learning_objectives:
- Understand that Arm and x86 produce identical results for all well-defined floating-point operations.
- Recognize that differences only occur in special undefined cases permitted by IEEE 754.
- Learn best practices for writing portable floating-point code across architectures.
- Apply appropriate precision levels for portable results.
- Learn to recognize floating-point differences and make your code portable across architectures.

prerequisites:
- Access to an x86 and an Arm Linux machine.
- Familiarity with floating-point numbers.

author: Kieran Hejmadi
author:
- Kieran Hejmadi
- Jason Andrews

### Tags
skilllevels: Introductory
skilllevels: Advanced
subjects: Performance and Architecture
armips:
- Cortex-A
Expand Down
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
---
title: Single and double precision considerations
title: Precision and floating-point instruction considerations
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Understanding numerical precision differences in single vs double precision
When moving from x86 to Arm you may see differences in floating-point behavior. Understanding these differences may require digging deeper into the details, including the precision and the floating-point instructions.

This section explores how different levels of floating-point precision can affect numerical results. The differences shown here are not architecture-specific issues, but demonstrate the importance of choosing appropriate precision levels for numerical computations.
This section explores an example with minor differences in floating-point results, particularly focused on Fused Multiply-Add (FMAC) operations. You can run the example to learn more about how the same C code can produce different results on different platforms.

### Single precision limitations
## Single precision and FMAC differences

Consider two mathematically equivalent functions, `f1()` and `f2()`. While they should theoretically produce the same result, small differences can arise due to the limited precision of floating-point arithmetic.
Consider two mathematically equivalent functions, `f1()` and `f2()`. While they should theoretically produce the same result, small differences can arise due to the limited precision of floating-point arithmetic and the instructions used.

The differences shown in this example are due to using single precision (float) arithmetic, not due to architectural differences between Arm and x86. Both architectures handle single precision arithmetic according to IEEE 754.
When these small differences are amplified, you can observe how Arm and x86 architectures handle floating-point operations differently, particularly with respect to FMAC (Fused Multiply-Add) operations. The example shows the Clang compiler on Arm using FMAC instructions by default, which can lead to slightly different results compared to x86, which is not using FMAC instructions.

Functions `f1()` and `f2()` are mathematically equivalent. You would expect them to return the same value given the same input.

Use an editor to copy and paste the C++ code below into a file named `single-precision.cpp`
Use an editor to copy and paste the C code below into a file named `example.c`

```cpp
```c
#include <stdio.h>
#include <math.h>

Expand All @@ -42,74 +42,109 @@ int main() {

// Theoretically, result1 and result2 should be the same
float difference = result1 - result2;
// Multiply by a large number to amplify the error

// Multiply by a large number to amplify the error - using single precision (float)
// This is where architecture differences occur due to FMAC instructions
float final_result = 100000000.0f * difference + 0.0001f;

// Using double precision for the calculation makes results consistent across platforms
double final_result_double = 100000000.0 * difference + 0.0001;

// Print the results
printf("f1(%e) = %.10f\n", x, result1);
printf("f2(%e) = %.10f\n", x, result2);
printf("Difference (f1 - f2) = %.10e\n", difference);
printf("Final result after magnification: %.10f\n", final_result);
printf("Final result after magnification (float): %.10f\n", final_result);
printf("Final result after magnification (double): %.10f\n", final_result_double);

return 0;
}
```

You need access to an Arm and x86 Linux computer to compare the results. The output below is from Ubuntu 24.04 using Clang. The Clang version is 18.1.3.

Compile and run the code on both x86 and Arm with the following command:

```bash
g++ -g single-precision.cpp -o single-precision
./single-precision
clang -g example.c -o example -lm
./example
```

Output running on x86:
The output running on x86:

```output
f1(1.000000e-08) = 0.0000000000
f2(1.000000e-08) = 0.0000000050
Difference (f1 - f2) = -4.9999999696e-09
Final result after magnification: -0.4999000132
Final result after magnification (float): -0.4999000132
Final result after magnification (double): -0.4998999970
```

Output running on Arm:
The output running on Arm:

```output
f1(1.000000e-08) = 0.0000000000
f2(1.000000e-08) = 0.0000000050
Difference (f1 - f2) = -4.9999999696e-09
Final result after magnification: -0.4998999834
Final result after magnification (float): -0.4998999834
Final result after magnification (double): -0.4998999970
```

Depending on your compiler and library versions, you may get the same output on both systems. You can also use the `clang` compiler and see if the output matches.
Notice that the double precision results are identical across platforms, while the single precision results differ.

You can disable the fused multiply-add on Arm with a compiler flag:

```bash
clang -g -ffp-contract=off example.c -o example2 -lm
./example2
```

Now the output of `example2` on Arm matches the x86 output.

You can use `objdump` to look at the assembly instructions to confirm the use of FMAC instructions.

Page through the `objdump` output to find the difference shown below in the `main()` function.

```bash
clang -g single-precision.cpp -o single-precision -lm
./single-precision
llvm-objdump -d ./example | more
```

In some cases the GNU compiler output differs from the Clang output.
The Arm output includes `fmadd`:

```output
8c8: 1f010800 fmadd s0, s0, s1, s2
```

Here's what's happening:
The x86 uses separate multiply and add instructions:

```output
125c: f2 0f 59 c1 mulsd %xmm1, %xmm0
1260: f2 0f 10 0d b8 0d 00 00 movsd 0xdb8(%rip), %xmm1 # 0x2020 <_IO_stdin_used+0x20>
1268: f2 0f 58 c1 addsd %xmm1, %xmm0
```

1. Different square root algorithms: x86 and Arm use different hardware and library implementations for `sqrtf(1 + 1e-8)`
{{% notice Note %}}
On Ubuntu 24.04 the GNU Compiler, `gcc`, produces the same result as x86 and does not use the `fmadd` instruction. Be aware that corner case examples like this may change in future compiler versions.
{{% /notice %}}

2. Tiny implementation differences get amplified. The difference between the two `sqrtf()` results is only about 3e-10, but this gets multiplied by 100,000,000, making it visible in the final result.
## Techniques for consistent results

3. Both `f1()` and `f2()` use `sqrtf()`. Even though `f2()` is more numerically stable, both functions call `sqrtf()` with the same input, so they both inherit the same architecture-specific square root result.
You can make the results consistent across platforms in several ways:

4. Compiler and library versions may produce different output due to different implementations of library functions such as `sqrtf()`.
- Use double precision for critical calculations by changing `100000000.0f` to `100000000.0` (double precision).

The final result is that x86 and Arm libraries compute `sqrtf(1.00000001)` with tiny differences in the least significant bits. This is normal and expected behavior and IEEE 754 allows for implementation variations in transcendental functions like square root, as long as they stay within specified error bounds.
- Disable fused multiply-add operations using the `-ffp-contract=off` compiler flag.

The very small difference you see is within acceptable floating-point precision limits.
- Use the compiler flag `-ffp-contract=fast` to enable fused multiply-add on x86.

### Key takeaways
## Key takeaways

- The small differences shown are due to library implementations in single-precision mode, not fundamental architectural differences.
- Single-precision arithmetic has inherent limitations that can cause small numerical differences.
- Using numerically stable algorithms, like `f2()`, can minimize error propagation.
- Understanding [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) is important for writing portable code.
- Different floating-point behavior between architectures can often be traced to specific hardware features or instructions such as Fused Multiply-Add (FMAC) operations.
- FMAC performs multiplication and addition with a single rounding step, which can lead to different results compared to separate multiply and add operations.
- Compilers may use FMAC instructions on Arm by default, but not on x86.
- To ensure consistent results across platforms, consider using double precision for critical calculations and controlling compiler optimizations with flags like `-ffp-contract=off` and `-ffp-contract=fast`.
- Understanding [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) remains important for writing portable code.

By adopting best practices and appropriate precision levels, developers can ensure consistent results across platforms.
If you see differences in floating-point results, it typically means you need to look a little deeper to find the causes.

Continue to the next section to see how precision impacts the results.
These situations are not common, but it is good to be aware of them as a software developer migrating to the Arm architecture. You can be confident that floating-point on Arm behaves predictably and that you can get consistent results across multiple architectures.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
---
title: "Migrate x86-64 SIMD to Arm64"

draft: true
cascade:
draft: true

minutes_to_complete: 30

who_is_this_for: This is an advanced topic for developers migrating vectorized (SIMD) code from x86-64 to Arm64.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,34 @@ weight: 2
layout: learningpathall
---

## TinyML
## Overview

This Learning Path is about TinyML. It is a starting point for learning how innovative AI technologies can be used on even the smallest of devices, making Edge AI more accessible and efficient. You will learn how to set up your host machine to facilitate compilation and ensure smooth integration across devices.

This section provides an overview of the domain with real-life use cases and available devices.
## What is TinyML?


TinyML represents a significant shift in Machine Learning deployment. Unlike traditional Machine Learning, which typically depends on cloud-based servers or high-performance hardware, TinyML is tailored to function on devices with limited resources, constrained memory, low power, and fewer processing capabilities.

TinyML has gained popularity because it enables AI applications to operate in real-time, directly on the device, with minimal latency, enhanced privacy, and the ability to work offline. This shift opens up new possibilities for creating smarter and more efficient embedded systems.

### Benefits and applications
## Benefits and applications

The benefits of TinyML align well with the Arm architecture, which is widely used in IoT, mobile devices, and edge AI deployments.

Here are some of the key benefits of TinyML on Arm:


- **Power Efficiency**: TinyML models are designed to be extremely power-efficient, making them ideal for battery-operated devices like sensors, wearables, and drones.
- Power efficiency: TinyML models are designed to be extremely power-efficient, making them ideal for battery-operated devices like sensors, wearables, and drones.

- **Low Latency**: AI processing happens on-device, so there is no need to send data to the cloud, which reduces latency and enables real-time decision-making.
- Low latency: AI processing happens on-device, so there is no need to send data to the cloud, which reduces latency and enables real-time decision-making.

- **Data Privacy**: With on-device computation, sensitive data remains local, providing enhanced privacy and security. This is a priority in healthcare and personal devices.
- Data privacy: with on-device computation, sensitive data remains local, providing enhanced privacy and security. This is a priority in healthcare and personal devices.

- **Cost-Effective**: Arm devices, which are cost-effective and scalable, can now handle sophisticated Machine Learning tasks, reducing the need for expensive hardware or cloud services.
- Cost-effective: Arm devices, which are cost-effective and scalable, can now handle sophisticated machine learning tasks, reducing the need for expensive hardware or cloud services.

- **Scalability**: With billions of Arm devices in the market, TinyML is well-suited for scaling across industries, enabling widespread adoption of AI at the edge.
- Scalability: with billions of Arm devices in the market, TinyML is well-suited for scaling across industries, enabling widespread adoption of AI at the edge.

TinyML is being deployed across multiple industries, enhancing everyday experiences and enabling groundbreaking solutions. The table below shows some examples of TinyML applications.

Expand Down
Loading
Loading