From 9b5797c0e63dc4af60d3be79ab1ab4fbd69da66c Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 10:11:09 +0000 Subject: [PATCH 01/14] Starting content development --- .../multi-accuracy-libamath/_index.md | 11 ++++------- .../multi-accuracy-libamath/floating-point-rep.md | 6 +++--- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md index b64edf6edb..e19f56eace 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md @@ -1,18 +1,15 @@ --- -title: Understanding Libamath's vector accuracy modes - -draft: true -cascade: - draft: true +title: Choosing accuracy modes in Libamath (Arm Performance Libraries) minutes_to_complete: 20 author: Joana Cruz -who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of Arm Performance Libraries. +who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes for vectorized math functions in Libamath, a component of Arm Performance Libraries. learning_objectives: - Understand how accuracy is defined in Libamath. - Pick an appropriate accuracy mode for your application. + - Use Libamath with different vector accuracy modes in practice. prerequisites: - An Arm computer running Linux with [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) version 25.04 or newer installed. @@ -25,7 +22,7 @@ armips: tools_software_languages: - Arm Performance Libraries - GCC -- Libmath +- Libamath operatingsystems: - Linux diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 1dff7d8364..87c3c19f0b 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -19,7 +19,7 @@ A Floating Point number is typically expressed as: where: * B is the base; * e is the exponent; -* d.dddd...d is the mantissa (or significand). It is p-bit word, where p represents the precision; +* d.dddd...d is the mantissa (or significand). It is a p-bit word, where *p* represents the precision; * +/- sign which is usually stored separately. If the leading digit is non-zero then it is a normalized representation/normal number. @@ -33,7 +33,7 @@ Fixing `B=2, p=24` {{% /notice %}} -Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is strictly smaller than base), when fixing a base and a precision. +A floating-point number can have multiple non-normalized representations, but only one normalized form - assuming a fixed base and precision, and that the leading digit is strictly less than the base. ### Building a Floating-Point Ruler @@ -56,7 +56,7 @@ Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2 ### The Floating-Point bitwise representation -Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. +Since there are \( B^p \) possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. In Example 2, 3+2+1=6 bits are needed. From c5daf3eec54dfd401068c89e6c082c35f357a15e Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 13:50:26 +0000 Subject: [PATCH 02/14] Updates --- .../floating-point-rep.md | 31 +++++++++++-------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 87c3c19f0b..8a2fa967a7 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -1,16 +1,18 @@ --- -title: Floating Point Representation +title: Understanding floating-point representation weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Floating-Point Representation Basics +Floating-point numbers are essential for representing real numbers in computing, but they come with limits on precision and range. This Learning Path explains how floating-point values are structured, how bitwise representation works, and what the IEEE-754 standard defines, including special values like NaN and subnormals. -Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution. +## What is a floating-point number? -A Floating Point number is typically expressed as: +Floating-point numbers are a finite and discrete approximation of the real numbers, allowing one to implement and compute functions in the continuous domain with an adequate (but limited) resolution. + +A floating-point number is typically expressed as: ```output +/-d.dddd...d x B^e @@ -22,6 +24,9 @@ where: * d.dddd...d is the mantissa (or significand). It is a p-bit word, where *p* represents the precision; * +/- sign which is usually stored separately. +The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by p, and a system with p bits of precision can distinguish between +\( 2^p \) different fractional values. + If the leading digit is non-zero then it is a normalized representation/normal number. {{% notice Example 1 %}} @@ -35,9 +40,9 @@ Fixing `B=2, p=24` A floating-point number can have multiple non-normalized representations, but only one normalized form - assuming a fixed base and precision, and that the leading digit is strictly less than the base. -### Building a Floating-Point Ruler +## How precision and exponents define floating-point values -Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system. +Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, one can create the set of all the normalized values in this system. {{% notice Example 2 %}} `B=2, p=3, emax=2, emin=-1` @@ -52,15 +57,15 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon {{% /notice %}} -Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger. +Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers gets larger as numbers get bigger. -### The Floating-Point bitwise representation +## Bitwise representation of floating-point numbers Since there are \( B^p \) possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. In Example 2, 3+2+1=6 bits are needed. -Based on this, the floating point's bitwise representation is defined to be: +Based on this, the floating-point's bitwise representation is defined as: ``` b0 b1 b2 b3 b4 b5 @@ -83,13 +88,13 @@ For this reason, E is called the biased exponent, and in order to retrieve the v x = (-1)^S x M x 2^(E-1) ``` -## IEEE-754 Single Precision +## IEEE-754 single precision format -Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229) +Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating-Point Standard](https://ieeexplore.ieee.org/document/8766229) In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits. -The value of a (normalized) Floating Point in IEEE-754 can be represented as: +The value of a (normalized) Floating-Point in IEEE-754 can be represented as: ```output x=(−1)^S x 1.M x 2^E−127 @@ -111,7 +116,7 @@ They allow the representation of numbers very close to zero, smaller than what i Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0. -The interpretation of denormal Floating Point in IEEE-754 can be represented as: +The interpretation of denormal Floating-Point in IEEE-754 can be represented as: ``` x=(−1)^S x 0.M x 2^−126 From e673a7d7767f0c5b6688225d3c946fe0ea10cd57 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 14:23:39 +0000 Subject: [PATCH 03/14] updates --- .../multi-accuracy-libamath/floating-point-rep.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 8a2fa967a7..4c8d7a7a89 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -57,7 +57,7 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon {{% /notice %}} -Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers gets larger as numbers get bigger. +Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers gets larger as numbers get bigger. ## Bitwise representation of floating-point numbers From b743a472eca8894bb18a3dd784e3b8fe806767af Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 21:10:00 +0000 Subject: [PATCH 04/14] Updates --- .../multi-accuracy-libamath/_index.md | 2 +- .../floating-point-rep.md | 51 ++++++++++--------- 2 files changed, 29 insertions(+), 24 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md index e19f56eace..e14a24e13a 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md @@ -1,5 +1,5 @@ --- -title: Choosing accuracy modes in Libamath (Arm Performance Libraries) +title: Select accuracy modes in Libamath (Arm Performance Libraries) minutes_to_complete: 20 author: Joana Cruz diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 4c8d7a7a89..7829850ba8 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -21,20 +21,19 @@ A floating-point number is typically expressed as: where: * B is the base; * e is the exponent; -* d.dddd...d is the mantissa (or significand). It is a p-bit word, where *p* represents the precision; -* +/- sign which is usually stored separately. +* d.dddd...d is the mantissa (or significand). It is a *p*-bit word, where *p* represents the precision; +* and the +/- sign is usually stored separately. -The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by p, and a system with p bits of precision can distinguish between -\( 2^p \) different fractional values. +The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by *p*, and a system with *p* bits of precision can distinguish between \( 2^p \) different fractional values. -If the leading digit is non-zero then it is a normalized representation/normal number. +If the leading digit is non-zero, then it is a normalized representation (also called a *normal number*). {{% notice Example 1 %}} -Fixing `B=2, p=24` +Fixing `B = 2, p = 24` `0.1 = 1.10011001100110011001101 × 2^4` is a normalized representation of 0.1 -`0.1 = 0.000110011001100110011001 × 2^0` is a non normalized representation of 0.1 +`0.1 = 0.000110011001100110011001 × 2^0` is a non-normalized representation of 0.1 {{% /notice %}} @@ -42,10 +41,10 @@ A floating-point number can have multiple non-normalized representations, but on ## How precision and exponents define floating-point values -Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, one can create the set of all the normalized values in this system. +Given a base `B`, a precision `p`, a maximum exponent `emax`, and a minimum exponent `emin`, one can create the set of all the normalized values in this system. {{% notice Example 2 %}} -`B=2, p=3, emax=2, emin=-1` +`B = 2, p = 3, emax = 2, emin = -1` | Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² | |-------------|-------|------|------|------| @@ -57,11 +56,11 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon {{% /notice %}} -Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers gets larger as numbers get bigger. +For any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. However, the gap between them (also called a [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers becomes larger as numbers themselves increase in magnitude. ## Bitwise representation of floating-point numbers -Since there are \( B^p \) possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. +Since there are \( B^p \) possible mantissas and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. In Example 2, 3+2+1=6 bits are needed. @@ -82,7 +81,7 @@ b3, b4, b5 -> mantissa (M) However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3. But in the system being defined, only the integer values in the range [-1, 2] are of interest. -For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) an offset must be added or subtracted (in this case, subtract 1): +For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (that is the unbiased exponent) an offset must be added or subtracted (in this case, subtract 1): ```output x = (-1)^S x M x 2^(E-1) @@ -92,31 +91,37 @@ x = (-1)^S x M x 2^(E-1) Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating-Point Standard](https://ieeexplore.ieee.org/document/8766229) -In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits. +In this format: -The value of a (normalized) Floating-Point in IEEE-754 can be represented as: +* The sign is represented using 1 bit +* The exponent uses 8 bits +* The mantissa uses 23 bits + +The value of a normalized floating-point number in IEEE-754 can be represented as: ```output x=(−1)^S x 1.M x 2^E−127 ``` -The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit - that is we have 24 bits of precision. In normalized numbers the leading digit is implicitly 1. +The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit in normalized numbers, giving a total of 24 bits of precision. -{{% notice Special Cases in IEEE-754 Single Precision %}} -Since we have 8 bits of storage, meaning E ranges between 0 and 2^8-1=255. However not all these 256 values are going to be used for normal numbers. +{{% notice Special cases in IEEE-754 single precision %}} +Since the exponent field uses 8 bits, E ranges between 0 and 2^8-1=255. However not all these 256 values are used for normal numbers. If the exponent E is: * 0, then we are either in the presence of a denormalized number or a 0 (if M is 0 as well); -* 1 to 254 then we are in the normalized range; -* 255 then we are in the presence of Inf (if M==0), or Nan (if M!=0). +* 1 to 254 then this is in the normalized range; +* 255: infinity (if M==0), or NaN (if M!=0). -Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard. +## Subnormal numbers + +Subnormal numbers (also called denormal numbers) allow representation of values closer to zero than is possible with normalized exponents. They are special floating-point values defined by the IEEE-754 standard. They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range. -Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0. +Subnormal numbers do not have a leading 1 in their representation. They also assume an exponent of –126. -The interpretation of denormal Floating-Point in IEEE-754 can be represented as: +The interpretation of denormal floating-point in IEEE-754 can be represented as: ``` x=(−1)^S x 0.M x 2^−126 @@ -140,5 +145,5 @@ x=(−1)^s x 0.M x 2^−126 | 11 (1.75) | 0.375 | 0.875 | 1.75 | 3.5 | 7.0 | --> {{% /notice %}} -If you're interested in diving deeper in this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a good place to start. +If you're interested in diving deeper into this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a great place to start. From 12523c109bb6f56b98db6e2c8459223959bdabb1 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 21:23:54 +0000 Subject: [PATCH 05/14] Updates --- .../floating-point-rep.md | 5 ++-- .../multi-accuracy-libamath/ulp.md | 28 +++++++++---------- 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 7829850ba8..8fb6f1ae86 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -113,7 +113,7 @@ If the exponent E is: * 1 to 254 then this is in the normalized range; * 255: infinity (if M==0), or NaN (if M!=0). -## Subnormal numbers +##### Subnormal numbers Subnormal numbers (also called denormal numbers) allow representation of values closer to zero than is possible with normalized exponents. They are special floating-point values defined by the IEEE-754 standard. @@ -145,5 +145,6 @@ x=(−1)^s x 0.M x 2^−126 | 11 (1.75) | 0.375 | 0.875 | 1.75 | 3.5 | 7.0 | --> {{% /notice %}} -If you're interested in diving deeper into this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a great place to start. +## Further information +If you're interested in diving deeper into this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a great place to start. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md index d37302d6bd..2a3d26b109 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md @@ -6,7 +6,7 @@ weight: 3 layout: learningpathall --- -# ULP +## ULP Units in the Last Place (ULP) is the distance between two adjacent floating-point numbers at a given value, representing the smallest possible change in that number's representation. @@ -16,9 +16,9 @@ It is a property of a number and can be calculated with the following expression ULP(x) = nextafter(x, +inf) - x ``` -Building on the example shown in the previous section: +Building on the example from the previous section: -Fixed `B=2, p=3, e^max=2, e^min=-1` +Fixed `B = 2, p = 3, e^max = 2, e^min = -1` | Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² | |-------------|-------|------|------|------| @@ -27,7 +27,7 @@ Fixed `B=2, p=3, e^max=2, e^min=-1` | 1.10 (1.5) | 0.75 | 1.5 | 3.0 | 6.0 | | 1.11 (1.75) | 0.875 | 1.75 | 3.5 | 7.0 | -Based on the above definition, the ULP value for the numbers in this set can be computed as follows: +Based on the above definition, you can compute the ULP value for the numbers in this set as follows: ``` ULP(0.625) = nextafter(0.625, +inf) - 0.625 = 0.75-0.625 = 0.125 @@ -36,21 +36,23 @@ ULP(0.625) = nextafter(0.625, +inf) - 0.625 = 0.75-0.625 = 0.125 ULP(4.0) = 1.0 ``` -As the exponent of `x` grows, `ULP(x)` also increases exponentially; that is, the spacing between floating points becomes larger. +As the exponent of `x` increases, `ULP(x)` increases exponentially. That is, the spacing between floating-point values grows with the magnitude of x. Numbers with the same exponent have the same ULP. -For normalized IEEE-754 floats, a similar behavior is observed: the distance between two adjacent representable values — i.e., ULP(x) — is a power of two that depends only on the exponent of x. +## ULP in IEEE-754 -Hence, another expression used to calculate the ULP of normalized Floating Point numbers is: +For normalized IEEE-754 floating-point numbers, a similar behavior is observed: the distance between two adjacent representable values — that is, ULP(x) — is a power of two that depends only on the exponent of x. + +A faster, commonly used expression for ULP is: ``` ULP(x) = 2^(e-p+1) ``` -where: -* `e` is the exponent (in the IEEE-754 definition of single precision this is `E-127`) -* `p` is the precision +Where: +* `e` is the unbiased exponent (in the IEEE-754 definition of single precision this is `E-127`) +* `p` is the precision (23 for IEEE-754 single-precision) When computing the ULP of IEEE-754 floats, this expression becomes: ``` @@ -72,14 +74,12 @@ The second smallest is: ``` second_min_pos_denormal = 2 ^ -22 x 2 ^ -126 = 2^-148 = 2*2^-149 ``` -and so on... - -The denormal numbers are evenly spaced by `2^-149`. +Thus, all denormal numbers are evenly spaced by `2^-149`. {{% /notice %}} -## ULP implementation +## ULP implementation in C Below is an example of an implementation of the ULP function of a number. From d79978acf5ad12ba937df44b2ca60d3fe6bd47ea Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Tue, 1 Jul 2025 21:30:21 +0000 Subject: [PATCH 06/14] Updates --- .../multi-accuracy-libamath/multi-accuracy.md | 9 +++++---- .../multi-accuracy-libamath/ulp-error.md | 6 +++--- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md index 807b338b49..e281f4a758 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md @@ -9,8 +9,9 @@ layout: learningpathall ## The 3 accuracy modes of Libamath -Libamath vector functions can come in various accuracy modes for the same mathematical function. -This means, some of our functions allow users and compilers to choose between: +Libamath provides multiple accuracy modes for the same vectorized mathematical function, enabling developers to choose between speed and precision. + +This means, some of the functions allow users and compilers to choose between: - **High accuracy** (≤ 1 ULP) - **Default accuracy** (≤ 3.5 ULP) - **Low accuracy / max performance** (approx. ≤ 4096 ULP) @@ -59,7 +60,7 @@ The default mode strikes a **practical balance** between performance and numeric - **Analytics workloads** [3] such as log or sqrt during feature extraction - **Inference pipelines** [4] - especially on edge devices where latency matters + especially on edge devices where latency is critical Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**. @@ -74,7 +75,7 @@ This mode trades precision for speed — aggressively. It's designed for: where statistical convergence outweighs per-sample accuracy [6] - **Genetic algorithms, audio processing, and embedded DSP** -Avoid in control-flow-critical code or where **errors amplify**. +Avoid in control-flow-critical code or where errors might compound or affect control flow. ## Summary diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md index 8f253905f9..8ea05d0a1b 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md @@ -1,12 +1,12 @@ --- -title: ULP Error and Accuracy +title: ULP error and accuracy weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -# ULP Error and Accuracy +# ULP error and accuracy In the development of Libamath, a metric called ULP error is used to assess the accuracy of functions. This metric measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point “steps” (ULPs) these two numbers are apart. @@ -20,7 +20,7 @@ ulp_err = | want - got | / ULP(want) Because this is a relative measure in terms of floating-point spacing (ULPs)—that is, this metric is scale-aware—it is ideal for comparing accuracy across magnitudes. Otherwise, error measures would be very biased by the uneven distribution of the floats. -# ULP Error Implementation +# ULP error implementation In practice, however, the above expression may take different forms to account for sources of error that may occur during the computation of the error itself. From cbf4be77c1e2aa2a0e13196a9b016b35b6932de0 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 10:27:11 +0000 Subject: [PATCH 07/14] Updates --- .../floating-point-rep.md | 3 ++- .../multi-accuracy-libamath/multi-accuracy.md | 22 ++++++++--------- .../multi-accuracy-libamath/ulp-error.md | 24 +++++++++++-------- 3 files changed, 27 insertions(+), 22 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 8fb6f1ae86..d75c4839ce 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -21,7 +21,8 @@ A floating-point number is typically expressed as: where: * B is the base; * e is the exponent; -* d.dddd...d is the mantissa (or significand). It is a *p*-bit word, where *p* represents the precision; +* d.dddd...d is the mantissa (or significand); +* It is a *p*-bit word, where *p* represents the precision; * and the +/- sign is usually stored separately. The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by *p*, and a system with *p* bits of precision can distinguish between \( 2^p \) different fractional values. diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md index e281f4a758..68a6468698 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md @@ -7,11 +7,11 @@ layout: learningpathall --- -## The 3 accuracy modes of Libamath +## The three accuracy modes of Libamath -Libamath provides multiple accuracy modes for the same vectorized mathematical function, enabling developers to choose between speed and precision. +Libamath provides multiple accuracy modes for the same vectorized mathematical function, allowing developers to choose between speed and precision depending on workload requirements. -This means, some of the functions allow users and compilers to choose between: +Some functions offer selectable modes with tradeoffs between: - **High accuracy** (≤ 1 ULP) - **Default accuracy** (≤ 3.5 ULP) - **Low accuracy / max performance** (approx. ≤ 4096 ULP) @@ -19,22 +19,22 @@ This means, some of the functions allow users and compilers to choose between: ## How accuracy modes are encoded in Libamath -You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol: +You can recognize the accuracy mode of a function by the **suffix** in the function symbol: - **`_u10`** → High accuracy - For instance, `armpl_vcosq_f32_u10` - Ensures results stay within **1 Unit in the Last Place (ULP)**. + Example: `armpl_vcosq_f32_u10` + Ensures results within **1 Unit in the Last Place (ULP)**. - *(no suffix)* → Default accuracy - For instance, `armpl_vcosq_f32` - Keeps errors within **3.5 ULP** — a sweet spot for many workloads. + Example: `armpl_vcosq_f32` + Keeps errors within **3.5 ULP** - balancing precision and performance. -- **`_umax`** → Low accuracy - For instance, `armpl_vcosq_f32_umax` +- **`_umax`** → Low accuracy/max spend + Example: `armpl_vcosq_f32_umax` Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision. -## Applications +## When to use each mode Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters. diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md index 8ea05d0a1b..626d65fcb7 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md @@ -6,23 +6,23 @@ weight: 4 layout: learningpathall --- -# ULP error and accuracy +## ULP error and accuracy -In the development of Libamath, a metric called ULP error is used to assess the accuracy of functions. -This metric measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point “steps” (ULPs) these two numbers are apart. +In the development of Libamath, a metric called ULP error is used to assess the accuracy of floating-point functions. +ULP (Unit in the Last Place) measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point steps (ULPs) separate them. -It can be calculated by: +The formula is: ``` ulp_err = | want - got | / ULP(want) ``` -Because this is a relative measure in terms of floating-point spacing (ULPs)—that is, this metric is scale-aware—it is ideal for comparing accuracy across magnitudes. Otherwise, error measures would be very biased by the uneven distribution of the floats. +Because ULP error is defined in terms of floating-point spacing, it is inherently scale-aware. In contrast to absolute error, ULP error avoids bias due to the uneven distribution of floating-point numbers across different magnitudes. -# ULP error implementation +## ULP error implementation -In practice, however, the above expression may take different forms to account for sources of error that may occur during the computation of the error itself. +In practice, the basic expression is modified to account for additional sources of error introduced during the computation itself. In the implementation used here, this quantity is held by a term called `tail`: @@ -36,9 +36,11 @@ This term takes into account the error introduced by casting `want` from a highe tail = | (want_l - want) / ULP(want) | ``` -Here is a simplified version of the ULP Error. Use the same `ulp.h` from the previous section. +## A simplified version -Use a text editor to opy the code below into a new file `ulp_error.h`. +Here is a simplified version of the ULP error. Use the same `ulp.h` header from the previous section. + +Use a text editor to copy the code below into a new file called `ulp_error.h`: ```C // Defines ulpscale(x) @@ -68,7 +70,9 @@ double ulp_error(float got, double want_l) { return fabs(scalbn(diff, -ulp_exp) - tail); } ``` -Note that the final scaling is done with respect to the rounded reference. +{{% notice Note %}} +The final scaling is done with respect to the rounded reference. +{{% /notice %}} In this implementation, it is possible to get exactly 0.0 ULP error if and only if: From 8bc75e2a3a2b292b10ed8c827c4356056464f588 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 10:52:33 +0000 Subject: [PATCH 08/14] Updates --- .../multi-accuracy-libamath/multi-accuracy.md | 29 +++++++++---------- .../multi-accuracy-libamath/ulp.md | 14 +++++---- 2 files changed, 23 insertions(+), 20 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md index 68a6468698..a8c32f47c4 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md @@ -7,7 +7,7 @@ layout: learningpathall --- -## The three accuracy modes of Libamath +## Accuracy modes Libamath provides multiple accuracy modes for the same vectorized mathematical function, allowing developers to choose between speed and precision depending on workload requirements. @@ -17,7 +17,7 @@ Some functions offer selectable modes with tradeoffs between: - **Low accuracy / max performance** (approx. ≤ 4096 ULP) -## How accuracy modes are encoded in Libamath +### How accuracy modes are encoded You can recognize the accuracy mode of a function by the **suffix** in the function symbol: @@ -29,7 +29,7 @@ You can recognize the accuracy mode of a function by the **suffix** in the funct Example: `armpl_vcosq_f32` Keeps errors within **3.5 ULP** - balancing precision and performance. -- **`_umax`** → Low accuracy/max spend +- **`_umax`** → Low accuracy/max performance Example: `armpl_vcosq_f32_umax` Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision. @@ -39,20 +39,19 @@ You can recognize the accuracy mode of a function by the **suffix** in the funct Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters. -### High Accuracy (≤ 1 ULP) +### High accuracy (≤ 1 ULP) -Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (such as high-degree polynomials, careful range reduction, or FMA usage) and are ideal for: +Use when bit-level correctness matters. These routines employ advanced algorithms (such as high-degree polynomials, tight range reduction, or FMA usage) and are ideal for: - **Scientific computing** such as simulations or finite element analysis - **Signal processing pipelines** [1,2] particularly recursive filters or transform -- **Validation & reference implementations** +- **Validation and reference implementations** -While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains. +While slower, these functions provide **near-bitwise reproducibility** — critical for validation and scientific fidelity. - -### Default Accuracy (≤ 3.5 ULP) +### Default accuracy (≤ 3.5 ULP) The default mode strikes a **practical balance** between performance and numerical fidelity. It’s optimized for: @@ -65,7 +64,7 @@ The default mode strikes a **practical balance** between performance and numeric Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**. -### Low Accuracy / Max Performance (≤ 4096 ULP) +### Low accuracy / max performance (≤ 4096 ULP) This mode trades precision for speed — aggressively. It's designed for: @@ -93,21 +92,21 @@ If your workload has mixed precision needs, you can *selectively call different {{% /notice %}} -#### References +## References 1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM. -2. Texas Instruments. Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs. Application Report SPRA509, 1999. +2. Texas Instruments. *Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs*. Application Report SPRA509, 1999. https://www.ti.com/lit/pdf/spra509 -3. Ma, S., & Huai, J. (2019). Approximate Computation for Big Data Analytics. arXiv:1901.00232. +3. Ma, S., & Huai, J. (2019). *Approximate Computation for Big Data Analytics*. arXiv:1901.00232. https://arxiv.org/pdf/1901.00232 -4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37. +4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). *Deep Learning with Limited Numerical Precision*. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37. https://proceedings.mlr.press/v37/gupta15.html 5. Unity Technologies. *Precision Modes*. Unity Shader Graph Documentation. [https://docs.unity3d.com/Packages/com.unity.shadergraph@17.1/manual/Precision-Modes.html](https://docs.unity3d.com/Packages/com.unity.shadergraph@17.1/manual/Precision-Modes.html) -6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). Rounding Error using Low Precision Approximate Random Variables. arXiv:2012.09739. +6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). *Rounding Error using Low Precision Approximate Random Variables*. arXiv:2012.09739. https://arxiv.org/abs/2012.09739 diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md index 2a3d26b109..c9db64e9b8 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md @@ -6,16 +6,18 @@ weight: 3 layout: learningpathall --- -## ULP +## What is ULP? -Units in the Last Place (ULP) is the distance between two adjacent floating-point numbers at a given value, representing the smallest possible change in that number's representation. +Units in the Last Place (ULP) is the distance between two adjacent floating-point numbers at a given value. It represents the smallest possible change in a number's representation at that magnitude. -It is a property of a number and can be calculated with the following expression: +ULP is a function of the number's exponent and can be calculated with the following expression: ```output ULP(x) = nextafter(x, +inf) - x ``` +## ULP Example: Binary Model + Building on the example from the previous section: Fixed `B = 2, p = 3, e^max = 2, e^min = -1` @@ -27,7 +29,7 @@ Fixed `B = 2, p = 3, e^max = 2, e^min = -1` | 1.10 (1.5) | 0.75 | 1.5 | 3.0 | 6.0 | | 1.11 (1.75) | 0.875 | 1.75 | 3.5 | 7.0 | -Based on the above definition, you can compute the ULP value for the numbers in this set as follows: +Based on the above definition, you can compute the ULP values for the numbers in this set as follows: ``` ULP(0.625) = nextafter(0.625, +inf) - 0.625 = 0.75-0.625 = 0.125 @@ -36,7 +38,7 @@ ULP(0.625) = nextafter(0.625, +inf) - 0.625 = 0.75-0.625 = 0.125 ULP(4.0) = 1.0 ``` -As the exponent of `x` increases, `ULP(x)` increases exponentially. That is, the spacing between floating-point values grows with the magnitude of x. +As the exponent of `x` increases, `ULP(x)` increases exponentially. That is, the spacing between floating-point values grows with the magnitude of x. Numbers with the same exponent have the same ULP. @@ -44,6 +46,8 @@ Numbers with the same exponent have the same ULP. For normalized IEEE-754 floating-point numbers, a similar behavior is observed: the distance between two adjacent representable values — that is, ULP(x) — is a power of two that depends only on the exponent of x. +### Optimized Expression + A faster, commonly used expression for ULP is: ``` From 21b5be3c0b2eb01734c5cbe82ac2b871fdd3dbec Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 17:59:25 +0000 Subject: [PATCH 09/14] Updates --- .../multi-accuracy-libamath/_index.md | 12 ++--- .../floating-point-rep.md | 51 ++++++++++++------- .../multi-accuracy-libamath/ulp-error.md | 3 +- 3 files changed, 39 insertions(+), 27 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md index e14a24e13a..0474eaaf90 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md @@ -4,12 +4,12 @@ title: Select accuracy modes in Libamath (Arm Performance Libraries) minutes_to_complete: 20 author: Joana Cruz -who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes for vectorized math functions in Libamath, a component of Arm Performance Libraries. +who_is_this_for: This is an introductory topic for developers who want to use the different accuracy modes for vectorized math functions in Libamath, a component of Arm Performance Libraries. learning_objectives: - - Understand how accuracy is defined in Libamath. - - Pick an appropriate accuracy mode for your application. - - Use Libamath with different vector accuracy modes in practice. + - Understand how accuracy is defined in Libamath + - Select an appropriate accuracy mode for your application + - Use Libamath with different vector accuracy modes in practice prerequisites: - An Arm computer running Linux with [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) version 25.04 or newer installed. @@ -31,10 +31,6 @@ further_reading: title: ArmPL Libamath Documentation link: https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions type: documentation -# - resource: -# title: PLACEHOLDER BLOG -# link: PLACEHOLDER BLOG LINK -# type: blog - resource: title: ArmPL Installation Guide link: https://learn.arm.com/install-guides/armpl/ diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index d75c4839ce..57ba90d805 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -1,16 +1,22 @@ --- -title: Understanding floating-point representation +title: Floating-point representation weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -Floating-point numbers are essential for representing real numbers in computing, but they come with limits on precision and range. This Learning Path explains how floating-point values are structured, how bitwise representation works, and what the IEEE-754 standard defines, including special values like NaN and subnormals. +Floating-point numbers are essential for representing real numbers in computing, but they come with limits on precision and range. + +This Learning Path covers the following: + +* How floating-point values are structured +* How bitwise representation works +* The IEEE-754 standard definition, including special values such as NaN and subnormals ## What is a floating-point number? -Floating-point numbers are a finite and discrete approximation of the real numbers, allowing one to implement and compute functions in the continuous domain with an adequate (but limited) resolution. +Floating-point numbers are a finite, discrete approximation of real numbers. They allow functions in the continuous domain to be computed with adequate, but limited, resolution. A floating-point number is typically expressed as: @@ -19,17 +25,17 @@ A floating-point number is typically expressed as: ``` where: -* B is the base; -* e is the exponent; -* d.dddd...d is the mantissa (or significand); -* It is a *p*-bit word, where *p* represents the precision; -* and the +/- sign is usually stored separately. +* B is the base +* e is the exponent +* d.dddd...d is the mantissa (or significand) +* *p* is the number of bits used for precision +* the +/- sign is stored separately The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by *p*, and a system with *p* bits of precision can distinguish between \( 2^p \) different fractional values. -If the leading digit is non-zero, then it is a normalized representation (also called a *normal number*). +If the leading digit is non-zero, the number is said to be normalized (also called a *normal number*). -{{% notice Example 1 %}} +{{% notice Example 1%}} Fixing `B = 2, p = 24` `0.1 = 1.10011001100110011001101 × 2^4` is a normalized representation of 0.1 @@ -38,11 +44,18 @@ Fixing `B = 2, p = 24` {{% /notice %}} -A floating-point number can have multiple non-normalized representations, but only one normalized form - assuming a fixed base and precision, and that the leading digit is strictly less than the base. +A floating-point number can have multiple non-normalized forms, but only one normalized representation for a given value - assuming a fixed base and precision, and that the leading digit is strictly less than the base. ## How precision and exponents define floating-point values -Given a base `B`, a precision `p`, a maximum exponent `emax`, and a minimum exponent `emin`, one can create the set of all the normalized values in this system. +Given: + +* a base `B` +* a precision `p` +* a maximum exponent `emax` +* a minimum exponent `emin` + +You can create the full set of representable normalized values. {{% notice Example 2 %}} `B = 2, p = 3, emax = 2, emin = -1` @@ -57,11 +70,11 @@ Given a base `B`, a precision `p`, a maximum exponent `emax`, and a minimum expo {{% /notice %}} -For any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. However, the gap between them (also called a [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) grows as the exponent increases. So the spacing between floating-point numbers becomes larger as numbers themselves increase in magnitude. +For any exponent, n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. However, the gap between them (also called a [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) increases with the magnitude of the exponent. ## Bitwise representation of floating-point numbers -Since there are \( B^p \) possible mantissas and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system. +Since there are \( B^p \) possible mantissas and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given floating-point number in a system. In Example 2, 3+2+1=6 bits are needed. @@ -90,7 +103,7 @@ x = (-1)^S x M x 2^(E-1) ## IEEE-754 single precision format -Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating-Point Standard](https://ieeexplore.ieee.org/document/8766229) +Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating-Point Standard](https://ieeexplore.ieee.org/document/8766229). In this format: @@ -131,11 +144,15 @@ x=(−1)^S x 0.M x 2^−126 From 95aba49732d7589def4f0b1f642391ab3b3b59cf Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 19:10:35 +0000 Subject: [PATCH 11/14] Updates --- .../multi-accuracy-libamath/multi-accuracy.md | 2 +- .../multi-accuracy-libamath/ulp-error.md | 10 +++++----- .../multi-accuracy-libamath/ulp.md | 17 ++++++++--------- 3 files changed, 14 insertions(+), 15 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md index a8c32f47c4..dc15bdf4c4 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md @@ -88,7 +88,7 @@ Avoid in control-flow-critical code or where errors might compound or affect con {{% notice Tip %}} -If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t. +If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Choose conservatively where correctness matters, and push for speed elsewhere. {{% /notice %}} diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md index 1489d807b2..c2bdc6e22b 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md @@ -29,7 +29,7 @@ In the implementation used here, this quantity is held by a term called `tail`: ulp_err = | (got - want) / ULP(want) - tail | ``` -This term takes into account the error introduced by casting `want` from a higher precision to working precision. This contribution is given in terms of ULP distance: +This term compensates for the rounding error that occurs when the high-precision reference (`want_l`, a `double`) is cast down to a `float`. This contribution is given in terms of ULP distance: ``` tail = | (want_l - want) / ULP(want) | @@ -37,7 +37,7 @@ tail = | (want_l - want) / ULP(want) | ## A simplified version -Here is a simplified version of the ULP error. Use the same `ulp.h` header from the previous section. + Below is a practical implementation of the ULP error calculation based on the model above. Use the same `ulp.h` header from the previous section. Use a text editor to copy the code below into a new file called `ulp_error.h`: @@ -53,11 +53,11 @@ double ulp_error(float got, double want_l) { float want = (float) want_l; // Early exit for exact match - if (want_l == (double)want && got == want) { + if ((want_l == (double) want && got == want)) { return 0.0; } - int ulp_exp = ulpscale(want); + int ulp_exp = ulpscale(want); // Base-2 exponent for scaling ULP(want) // Fractional tail from float rounding double tail = scalbn(want_l - (double)want, -ulp_exp); @@ -113,4 +113,4 @@ The output should be: ULP error: 1.0 ``` -If you are interested in diving into the full implementation of the ulp error, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error. \ No newline at end of file +If you are interested in diving into the full implementation of the ULP error, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md index c9db64e9b8..77f93c36d7 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md @@ -1,5 +1,5 @@ --- -title: Units in the Last Place (ULP) +title: Units in the last place (ULP) weight: 3 ### FIXED, DO NOT MODIFY @@ -8,7 +8,7 @@ layout: learningpathall ## What is ULP? -Units in the Last Place (ULP) is the distance between two adjacent floating-point numbers at a given value. It represents the smallest possible change in a number's representation at that magnitude. +Units in the last place (ULP) is the distance between two adjacent floating-point numbers at a given value. It represents the smallest possible change in a number's representation at that magnitude. ULP is a function of the number's exponent and can be calculated with the following expression: @@ -71,12 +71,12 @@ Note that for denormal numbers, the latter expression does not apply. In single precision as defined in IEEE-754, the smallest positive subnormal is: ``` -min_pos_denormal = 2 ^ -23 x 2 ^ -126 = 2^-149 +min_pos_denormal = 2⁻²³ × 2⁻¹²⁶ = 2⁻¹⁴⁹ ``` The second smallest is: ``` -second_min_pos_denormal = 2 ^ -22 x 2 ^ -126 = 2^-148 = 2*2^-149 +second_min_pos_denormal = 2⁻²² × 2⁻¹²⁶ = 2⁻¹⁴⁸ = 2 × 2⁻¹⁴⁹ ``` Thus, all denormal numbers are evenly spaced by `2^-149`. @@ -103,13 +103,12 @@ static inline uint32_t asuint(float x) { // Compute exponent of ULP spacing at x static inline int ulpscale(float x) { - //recover the biased exponent E + // Recover the biased exponent E int e = asuint(x) >> 23 & 0xff; if (e == 0) e++; // handle subnormals - // get exponent of the ULP - // e-p = E - 127 -23 + // Compute the ULP exponent: e - p = E - 127 - 23 return e - 127 - 23; } @@ -122,10 +121,10 @@ static float ulp(float x) { There are three key functions in this implementation: * the `asuint(x)` function reinterprets the bit pattern of a float as a 32-bit unsigned integer, allowing the extraction of specific bit fields such as the exponent. * the `ulpscale(x)` function returns the base-2 exponent of the ULP spacing at a given float value x, which is the result of `log2(ULP(x))`. The `e` variable in this function corresponds to the quantity E previously mentioned (the bitwise value of the exponent). -* the `scalbnf(m, n)` function (a standard function declared in math.h) efficiently evaluates `m x 2^n`. +* the `scalbnf(m, n)` function (a standard function declared in math.h) efficiently evaluates `m × 2^n`. -Below is an example which uses the `ulp()` function. +Here's an example program that calls `ulp()` to compute the spacing near a float value. Use a text editor to save the code below in a file named `ulp.c`. From e3e89d4216200bbf266ff6f73473a2332a7a6308 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 19:28:44 +0000 Subject: [PATCH 12/14] Updates --- .../multi-accuracy-libamath/examples.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md index b622d0edae..38e82af0da 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md @@ -12,7 +12,7 @@ Here is an example invoking all accuracy modes of the Neon single precision exp Make sure you have [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) installed. -Use a text editor save the code below in a file named `example.c`. +Use a text editor to save the code below in a file named `example.c`. ```C { line_numbers = "true" } #include @@ -40,7 +40,7 @@ int main(void) { printf("Libamath example:\n"); printf("-----------------------------------------------\n"); printf(" // Display worst-case ULP error in expf for each\n"); - printf(" // accuracy mode, along with approximate (`got`) and exact results (`want`)\n\n"); + printf(" // accuracy mode, along with approximate (\\\"got\\\") and exact results (\\\"want\\\")\n\n"); check_accuracy (armpl_vexpq_f32_u10, 0x1.ab312p+4, "armpl_vexpq_f32_u10(%a) delivers error under 1.0 ULP"); check_accuracy (armpl_vexpq_f32, 0x1.8163ccp+5, "armpl_vexpq_f32(%a) delivers error under 3.5 ULP"); @@ -89,5 +89,5 @@ armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits ULP error = 1745.2120 ``` -The inputs used for each variant correspond to the worst case scenario known to date (ULP Error argmax). +The inputs used for each variant correspond to the current worst-case scenario known to date (ULP Error argmax). This means that the ULP error should not be higher than the one demonstrated here, ensuring the results remain below the defined thresholds for each accuracy. \ No newline at end of file From fcf5e16eb7736e6bef330ede54c6d303916fe46d Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 19:36:12 +0000 Subject: [PATCH 13/14] Updates --- .../multi-accuracy-libamath/multi-accuracy.md | 2 +- .../multi-accuracy-libamath/ulp-error.md | 2 +- .../servers-and-cloud-computing/multi-accuracy-libamath/ulp.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md index dc15bdf4c4..33432d07df 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md @@ -93,7 +93,7 @@ If your workload has mixed precision needs, you can *selectively call different ## References -1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM. +1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.), SIAM. 2. Texas Instruments. *Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs*. Application Report SPRA509, 1999. https://www.ti.com/lit/pdf/spra509 diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md index c2bdc6e22b..2230e99273 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md @@ -6,7 +6,7 @@ weight: 4 layout: learningpathall --- -## ULP error and accuracy +## Overview In the development of Libamath, a metric called ULP error is used to assess the accuracy of floating-point functions. ULP (Unit in the Last Place) measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point steps (ULPs) separate them. diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md index 77f93c36d7..28d22bdb64 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md @@ -16,7 +16,7 @@ ULP is a function of the number's exponent and can be calculated with the follow ULP(x) = nextafter(x, +inf) - x ``` -## ULP Example: Binary Model +## ULP example: binary model Building on the example from the previous section: From 30755bf6f94605933d70d703973aadb74ecb4ab1 Mon Sep 17 00:00:00 2001 From: Maddy Underwood Date: Wed, 2 Jul 2025 19:56:12 +0000 Subject: [PATCH 14/14] Tweaks --- .../multi-accuracy-libamath/floating-point-rep.md | 2 +- .../servers-and-cloud-computing/multi-accuracy-libamath/ulp.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md index 8613abd6bd..610aa1be31 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md @@ -6,7 +6,7 @@ weight: 2 layout: learningpathall --- -## Understanding the Floating-Point Number System and IEEE-754 Format +## Understanding the floating-point number system and IEEE-754 format Floating-point numbers are essential for representing real numbers in computing, but they come with limits on precision and range. diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md index 28d22bdb64..1f0e77fde5 100644 --- a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md +++ b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md @@ -46,7 +46,7 @@ Numbers with the same exponent have the same ULP. For normalized IEEE-754 floating-point numbers, a similar behavior is observed: the distance between two adjacent representable values — that is, ULP(x) — is a power of two that depends only on the exponent of x. -### Optimized Expression +### Optimized expression A faster, commonly used expression for ULP is: