ArmDeveloperEcosystem · jasonrandrews · Mar 5, 2025 · Mar 5, 2025
diff --git a/content/learning-paths/cross-platform/matrix/3-code-1.md b/content/learning-paths/cross-platform/matrix/3-code-1.md
@@ -23,12 +23,13 @@ In the Matrix processing library, you implement two types of checks:
 
 The idea here is to make the program fail in a noticeable way. Of course, in a real world application, the error should be caught and dealt with by the application, if it can. Error handling, and especially recovering from errors, can be a complex topic.
 
-At the top of file `include/Matrix/Matrix.h`, include `<cassert>` to get the C-style assertions declarations for checks in `Debug` mode only:
+At the top of file `include/Matrix/Matrix.h`, include `<cassert>` to get the C-style assertions declarations for checks in `Debug` mode only, as well as `<cstddef>` which provides standard C declaration like `size_t`:
 
 ```CPP
 #pragma once
 
 #include <cassert>
+#include <cstddef>
 
 namespace MatComp {
 ```
@@ -44,7 +45,7 @@ const Version &getVersion();
 /// and the EXIT_FAILURE error code. It will also print the file name (\p
 /// fileName) and line number (\p lineNumber) that caused that application to
 /// exit.
-[[noreturn]] void die(const char *fileName, std::size_t lineNumber,
+[[noreturn]] void die(const char *fileName, size_t lineNumber,
                       const char *reason);
 
 ```
@@ -109,10 +110,11 @@ The Matrix data structure has the following private data members:
 Modern C++ offers constructs in the language to deal safely with memory; you will use `std::unique_ptr` which guaranties that the Matrix class will be safe from a whole range of memory management errors.
 
 Add the following includes at the top of `include/Matrix/Matrix.h`, right under
-the '<cassert>' include:
+the `<cstddef>` include:
 
 ```CPP
 #include <cassert>
+#include <cstddef>
 #include <cstring>
 #include <initializer_list>
 #include <iostream>
@@ -229,12 +231,6 @@ TEST(Matrix, defaultConstruct) {
 TEST(Matrix, booleanConversion) {
     EXPECT_FALSE(Matrix<int8_t>());
     EXPECT_FALSE(Matrix<double>());
-
-    EXPECT_TRUE(Matrix<int8_t>(1, 1));
-    EXPECT_TRUE(Matrix<double>(1, 1));
-
-    EXPECT_TRUE(Matrix<int8_t>(1, 1, 1));
-    EXPECT_TRUE(Matrix<double>(1, 1, 2.0));
 }
 ```
 
@@ -320,9 +316,9 @@ The tests should still pass, check for yourself.
 The next step is to be able to construct valid matrices, so add this constructor to the public section of class `Matrix` in `include/Matrix/Matrix.h`:
 
 ```CPP
-    /// Construct a \p numRows x \p numColumns uninitialized Matrix
-    Matrix(size_t numRows, size_t numColumns)
-        : numRows(numRows), numColumns(numColumns), data() {
+    /// Construct a \p numRows x \p numCols uninitialized Matrix
+    Matrix(size_t numRows, size_t numCols)
+        : numRows(numRows), numColumns(numCmns), data() {
         allocate(getNumElements());
     }
 ```
@@ -348,6 +344,17 @@ TEST(Matrix, uninitializedConstruct) {
 ```
 
 This constructs a valid `Matrix` if it contains elements), and the `uninitializedConstruct` test checks that two valid matrices of different types and dimensions can be constructed.
+You should also update the `booleanConversion` test in this file to check for boolean conversion for valid matrices so it now looks like:
+
+```CPP
+TEST(Matrix, booleanConversion) {
+    EXPECT_FALSE(Matrix<int8_t>());
+    EXPECT_FALSE(Matrix<double>());
+
+    EXPECT_TRUE(Matrix<int8_t>(1, 1));
+    EXPECT_TRUE(Matrix<double>(1, 1));
+}
+```
 
 Compile and test again, all should pass:
 
@@ -374,6 +381,35 @@ ninja check
 [  PASSED  ] 4 tests.
 ```
 
+Another constructor that is missing is one that will create and initialize matrices to a known value. Let's add it to `Matrix` in `include/Matrix/Matrix.h`:
+
+```CPP
+    /// Construct a \p numRows x \p numCols Matrix with all elements
+    /// initialized to value \p val.
+    Matrix(size_t numRows, size_t numCols, Ty val) : Matrix(numRows, numCols) {
+        allocate(getNumElements());
+        for (size_t i = 0; i < getNumElements(); i++)
+            data[i] = val;
+    }
+```
+
+Add boolean conversion tests for this new constructor by modifying `booleanConversion` in `tests/Matrix.cpp` so it looks like:
+
+```CPP
+TEST(Matrix, booleanConversion) {
+    EXPECT_FALSE(Matrix<int8_t>());
+    EXPECT_FALSE(Matrix<double>());
+
+    EXPECT_TRUE(Matrix<int8_t>(1, 1));
+    EXPECT_TRUE(Matrix<double>(1, 1));
+
+    EXPECT_TRUE(Matrix<int8_t>(1, 1, 1));
+    EXPECT_TRUE(Matrix<double>(1, 1, 2.0));
+}
+```
+
+You should be getting the pattern now: each new feature or method comes with tests.
+
 The `Matrix` class is missing two important methods:
 - A *getter*, to read the matrix element at (row, col).
 - A *setter*, to modify the matrix element at (row, col).
@@ -397,20 +433,6 @@ Add them now in the public section of `Matrix` in `include/Matrix/Matrix.h`:
     }
 ```
 
-Another constructor that is missing is one that will create and initialize matrices to a known value. Let's add it to `Matrix` in `include/Matrix/Matrix.h`:
-
-```CPP
-    /// Construct a \p numRows x \p numColumns Matrix with all elements
-    /// initialized to value \p val.
-    Matrix(size_t numRows, size_t numCols, Ty val) : Matrix(numRows, numCols) {
-        allocate(getNumElements());
-        for (size_t i = 0; i < getNumElements(); i++)
-            data[i] = val;
-    }
-```
-
-You should be getting the pattern now.
-
 Add tests for those 3 methods in `tests/Matrix.cpp`:
 
 ```CPP
@@ -503,7 +525,7 @@ The C++ `std::initializer_list`  enables users to provide a list of literal
 values (in row major order) to use to initialize the matrix with:
 
 ```CPP
-    /// Construct a \p numRows x \p numColumns Matrix with elements
+    /// Construct a \p numRows x \p numCols Matrix with elements
     /// initialized from the values from \p il in row-major order.
     Matrix(size_t numRows, size_t numCols, std::initializer_list<Ty> il)
         : Matrix(numRows, numCols) {
@@ -862,7 +884,7 @@ ninja check
 [----------] 16 tests from Matrix (0 ms total)
 
 [----------] Global test environment tear-down
-[==========] 16 tests from 3 test suites ran. (0 ms total)
+[==========] 16 tests from 1 test suite ran. (0 ms total)
 [  PASSED  ] 16 tests.
 ```
 
@@ -941,6 +963,7 @@ Add these to the public section of `Matrix` in `include/Matrix/Matrix.h`:
                 return false;
         return true;
     }
+
     /// Returns true iff matrices do not compare equal.
     bool operator!=(const Matrix &rhs) const { return !(*this == rhs); }
 ```
@@ -1069,4 +1092,4 @@ The compiler also catch a large number of type or misuse errors. With this core
 
 You can refer to this chapter source code in
 `code-examples/learning-paths/cross-platform/matrix/chapter-3` in the archive that
-you have downloaded earlier.
+you have downloaded earlier.
diff --git a/content/learning-paths/cross-platform/matrix/4-code-2.md b/content/learning-paths/cross-platform/matrix/4-code-2.md
@@ -92,7 +92,7 @@ makes them suitable for using in bigger algorithms and is a common pattern used
 
 One point worth mentioning is related to the `Abs` class: depending on the type
 used at instantiation, the compiler selects an optimized implementation for
-unsigned types, and there is no need to compute the absolute value of an always
+unsigned types, as there is no need to compute the absolute value of an always
 positive value. This optimization is transparent to users.
 
 Those operators are marked as `constexpr` so that the compiler can optimize the
@@ -203,7 +203,7 @@ type traits (from `<numeric_limit>`) such as `max` to get the maximum value
 representable for a given type.
 
 As those tests have been added to a new source file, it needs to be known to the
-build system, so add it now to the matrix-test target in `CMakeLists.txt`:
+build system, so add it now to the `matrix-test` target in `CMakeLists.txt`:
 
 ```TXT
 add_executable(matrix-test tests/main.cpp
@@ -292,7 +292,7 @@ First, create a `applyEltWiseUnaryOp` helper routine in the public section of
  operation as follows:
 
 ```CPP
-    /// Apply element wise unary scalar operator \p uOp to each element.
+    /// Apply element wise unary scalar operator \p op to each element.
     template <template <typename> class uOp>
     Matrix &applyEltWiseUnaryOp(const uOp<Ty> &op) {
         static_assert(std::is_base_of<unaryOperation, uOp<Ty>>::value,
@@ -505,7 +505,7 @@ ninja check
 [----------] 4 tests from unaryOperator (0 ms total)
 
 [----------] Global test environment tear-down
-[==========] 27 tests from 3 test suites ran. (0 ms total)
+[==========] 27 tests from 2 test suites ran. (0 ms total)
 [  PASSED  ] 27 tests.
 ```
 
@@ -692,7 +692,7 @@ Add this `applyEltWiseBinaryOp` helper routine to the public section of `Matrix`
 in `include/Matrix/Matrix.h`:
 
 ```CPP
-    /// Apply element wise binary scalar operator \p bOp to each element.
+    /// Apply element wise binary scalar operator \p op to each element.
     template <template <typename> class bOp>
     Matrix &applyEltWiseBinaryOp(const bOp<Ty> &op, const Matrix &rhs) {
         static_assert(std::is_base_of<binaryOperation, bOp<Ty>>::value,
@@ -1116,6 +1116,27 @@ content.
 - Resize: to be able to dynamically change a matrix dimensions.
 - Extract: to be able to extract part of a matrix.
 
+### Optimization
+
+The code written so far is relatively high level and allows the compiler to
+perform a large number of optimizations, from propagating constants to
+unrolling loops to name but a few most basic ones.
+
+The `applyEltWiseUnaryOp` and `applyEltWiseBinaryOp` helper routines
+from the Matrix library process one element at a time. The compiler
+may make use of Arm specific SIMD (Single Instruction, Multiple Data)
+instructions to process several elements at a time with one instruction. This
+is an optimization named vectorization, that can either be done automatically
+by the compiler (this is named *autovectorization*) or it can be done manually by the developper with the use of *intrinsics functions*.
+You can learn more about the compiler's autovectorization capabilities with the
+[Learn about Autovectorization](/learning-paths/cross-platform/loop-reflowing/)
+learning path and about other vectorization tricks with the
+[Optimize SIMD code with vectorization-friendly data layout](/learning-paths/cross-platform/vectorization-friendly-data-layout/)
+learning path.
+
+You can also learn how to
+[Accelerate Matrix Multiplication Performance with SME2](/learning-paths/cross-platform/multiplying-matrices-with-sme2/).
+
 ## What have you achieved so far?
 
 At this stage, the code structure looks like:
@@ -1153,4 +1174,4 @@ You can continue to add more functions, and more tests.
 
 You can refer to this chapter source code in
 `code-examples/learning-paths/cross-platform/matrix/chapter-4` in the archive that
-you have downloaded earlier.
+you have downloaded earlier.