diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/1.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/1.md index 3c3d1a3a45..c35d95d6ad 100644 --- a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/1.md +++ b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/1.md @@ -1,20 +1,20 @@ --- -title: Basics of Compilers +title: Compiler basics weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Introduction to C++ and Compilers +## Introduction to C++ and compilers -The C++ language gives the programmer the freedom to be expressive in the way they write code - allowing low-level manipulation of memory and data structures. Compared to managed languages, such as Java, C++ source code is generally less portable, requiring recompilation to the target Arm architecture. In the context of optimizing C++ workloads on Arm, significant performance improvements can be achieved without modifying the source code, simply by using the compiler correctly. +The C++ language gives you the freedom to be expressive in the way you write code - allowing low-level manipulation of memory and data structures. Compared to managed languages, such as Java, C++ source code is generally less portable, requiring recompilation to the target Arm architecture. In the context of optimizing C++ workloads on Arm, significant performance improvements can be achieved without modifying the source code, simply by using the compiler correctly. -Writing performant C++ code is a topic in itself and out of scope for this learning path. Instead we will focus on how to effectively use the compiler to target Arm instances in a cloud environment. +Writing performant C++ code is a complex topic, but you can learn how to effectively use the compiler to target the Arm architecture for a Linux application. -## Purpose of a Compiler +## What is the purpose of a compiler? -The g++ compiler is part of the GNU Compiler Collection (GCC), which is a set of compilers for various programming languages, including C++. The primary objective of the g++ compiler is to translate C++ source code into machine code that can be executed by a computer. This process involves several high-level stages: +The G++ compiler is part of the GNU Compiler Collection (GCC), which is a set of compilers for various programming languages, including C++. The primary objective of the g++ compiler is to translate C++ source code into machine code that can be executed by a computer. This process involves several high-level stages: - Preprocessing: In this initial stage, the preprocessor handles directives that start with a # symbol, such as `#include`, `#define`, and `#if`. It expands included header files, replaces macros, and processes conditional compilation statements. @@ -24,19 +24,25 @@ The g++ compiler is part of the GNU Compiler Collection (GCC), which is a set of - Linking: The final stage involves linking the object code with necessary libraries and other object files. The linker merges multiple object files and libraries, resolves external references, allocates memory addresses for functions and variables, and generates an executable file that can be run on the target platform. -An interesting fact about the g++ compiler is that it is designed to optimize both the performance and the size of the generated code. The compiler performs various optimizations based on the knowledge it has of the program, and it can be configured to prioritize reducing the size of the generated executable. +An interesting fact about the GNU compiler is that it is designed to optimize both the performance and the size of the generated code. The compiler performs various optimizations based on the knowledge it has of the program, and it can be configured to prioritize reducing the size of the generated executable. +### Compiler versions -### Compiler Versioning +Two popular compilers of C++ are the GNU Compiler Collection (GCC) and LLVM - both of which are open-source compilers and have contributions from Arm engineers to support the latest architectures. Proprietary or vendor-specific compilers, such as `nvcc` for compiling for NVIDIA GPUs, are often based on these open-source compilers. Alternative proprietary compilers are often designed for specific use cases. For example, safety-critical applications may need to comply with various ISO standards, which also include the compiler. The functional safety [Arm Compiler for Embedded](https://developer.arm.com/Tools%20and%20Software/Arm%20Compiler%20for%20Embedded%20FuSa) is an example of a C/C++ compiler. -Two popular compilers of C++ are the GNU Compiler Collection (GCC) and LLVM - both of which are open-source compilers and have contributions from Arm engineers to support the latest architectures. Proprietary or vendor-specific compilers, such as `nvcc` for compiling for NVIDIA GPUs, are often based on these open-source compilers. Alternative proprietary compilers are often designed for specific use cases. For example, safety-critical applications may need to comply with various ISO standards, which also include the compiler. The functional safety [Arm Compiler for Embedded](https://developer.arm.com/Tools%20and%20Software/Arm%20Compiler%20for%20Embedded%20FuSa) is such an example of a C/C++ compiler. +If you are an application developer who is not working in the safety qualification domain, you can use the open-source GCC/G++ compiler. -Most application developers are not in this safety qualification domain so we will be using the open-source GCC/G++ compiler for this learning path. +There are multiple Linux distributions available to choose from. Each Linux distribution has a default compiler. -There are multiple Linux distribtions available to choose from. Each Linux distribution and operating system has a default compiler. For example after installing the default g++ on an `r8g` AWS instance, the default g++ compiler as of January 2025 is below. +Print the version information for your compiler: -``` output +```bash g++ --version +``` + +For example, after installing `g++` on Ubuntu 24.04, the default compiler as of January 2025 is shown below. + +```output g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO @@ -59,9 +65,9 @@ Red Hat EL8 | 8*, 9, 10 | 10 SUSE Linux ES15 | 7*, 9, 10 | 7 -The biggest and most simple performance gain can be achieved by using the most recent compiler available. The most recent optimisations and support will be available through the latest compiler. +The easiest way to achieve a performance gain is by using the most recent compiler available. The most recent optimizations and support are available through the latest compiler. -Looking at the g++ documentation as an example, the most recent version of GCC available at the time of writing, version 14.2, has the following support and optimisations listed on their website [change note](https://gcc.gnu.org/gcc-14/changes.html). +Looking at the G++ documentation as an example, the most recent version of GCC, version 14.2, has the following support and optimizations listed in the [release notes](https://gcc.gnu.org/gcc-14/changes.html). ```output A number of new CPUs are supported through the -mcpu and -mtune options (GCC identifiers in parentheses). @@ -70,14 +76,13 @@ A number of new CPUs are supported through the -mcpu and -mtune options (GCC ide - Arm Cortex-A720 (cortex-a720). - Arm Cortex-X4 (cortex-x4). - Microsoft Cobalt-100 (cobalt-100). -... ``` -Sufficient due diligence should be taken when updating your C++ compiler because the process may reveal bugs in your source code. These bugs are often undefined behaviour caused by not adhering to the C++ standard. It is rare that the compiler itself will introduced a bug. However, in such events known bugs are made publicly available in the compiler documentation. +Sufficient due diligence should be taken when updating your C++ compiler because the process may reveal bugs in your source code. These bugs are often undefined behavior caused by not adhering to the C++ standard. It is rare that the compiler itself will introduced a bug. However, in such events known bugs are made publicly available in the compiler documentation. -## Basic g++ Optimisation Levels +## Basic G++ optimization levels -Using the g++ compiler as an example, the most course-grained dial you can adjust is the optimisation level, denoted with `-O`. This adjusts a variety of lower-level optimsation flags at the expense of increased computation time, memory use and debuggability. When aggresive optimisation is used, the optimised binary may not show expected behaviour when hooked up to a debugger such as `gdb`. This is because the generated code may not match the original source code or program order, for example from loop unrolling and vectorisation. +Using the G++ compiler as an example, the most course-grained dial you can adjust is the optimization level, denoted with `-O`. This adjusts a variety of lower-level optimization flags at the expense of increased computation time, memory use and debuggability. When aggressive optimization is used, the optimized binary may not show expected behavior when hooked up to a debugger such as `gdb`. This is because the generated code may not match the original source code or program order, for example from loop unrolling and vectorization. A few of the most common optimization levels are in the table below. @@ -90,4 +95,4 @@ A few of the most common optimization levels are in the table below. | `-Os` | Optimizes code size, reducing the overall binary size. | | `-Ofast` | Enables optimizations that may not strictly adhere to standard compliance. | - Please refer to your compiler documentation for full details on the optimisation level, for example [GCC](https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Optimize-Options.html). \ No newline at end of file + Please refer to your compiler documentation for full details on the optimization level, for example you can review the G++ [Options That Control Optimization ](https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Optimize-Options.html). \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md index 8705b65f79..1eb8eca389 100644 --- a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md +++ b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md @@ -1,22 +1,26 @@ --- -title: Setup Your Environment +title: Set up Your Environment weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- -If you are new to cloud computing, please refer to our learning path on [Getting started with Servers and Cloud Computing](https://learn.arm.com/learning-paths/servers-and-cloud-computing/intro/). +If you are new to cloud computing, please refer to [Getting started with Servers and Cloud Computing](https://learn.arm.com/learning-paths/servers-and-cloud-computing/intro/). It provides an introduction to the Arm servers available from various cloud service providers. -## Connect to an AWS Arm-based Instance +## Connect to an AWS Arm-based instance -In this example we will be building and running our C++ application on an AWS Graviton 4 (`r8g.xlarge`) instance running Ubuntu 24.04 LTS. Once connected run the following commands to confirm the operating system and archiecture version. +In this example you will build and run a C++ application on an AWS Graviton 4 (`r8g.xlarge`) instance running Ubuntu 24.04 LTS. + +Create the AWS instance using your AWS account. Connect to the instance using SSH or AWS Session Manager so you can enter shell commands. + +Once connected, run the following commands to confirm the operating system and architecture. ```bash cat /etc/*lsb* ``` -You will see an output such as the following: +You see output similar to: ```output DISTRIB_ID=Ubuntu @@ -25,58 +29,65 @@ DISTRIB_CODENAME=noble DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS" ``` -Next, we will confirm we are using a 64-bit Arm-based system using the following command +Next, confirm we are using a 64-bit Arm-based system using the following command ```bash uname -m ``` -You will see the following output. +You see the following output: ```output aarch64 ``` -## Enable Environment modules +## Enable environment modules + +Environment modules is a tool to quickly modify your shell configuration and environment variables. For this activity, it allows you to quickly switch between different compiler versions to demonstrate potential improvements. -Environment modules are a tool to quickly modify your shell configuration and environment variables. For this learning path, it allows us to quickly switch between different compiler versions to demonstrate potential improvements. +First, you need to install the environment modules package. -Install Environment Modules +In your terminal and run the following command: - First, you need to install the environment modules package. Open your terminal and run the following command: - ```bash - sudo apt update - sudo apt install environment-modules - ``` +```bash +sudo apt update +sudo apt install environment-modules +``` -Load environment modules after the package is installed. +Load environment modules after the package is installed: ```bash sudo chmod 755 /usr/share/modules/init/bash source /usr/share/modules/init/bash ``` -Reload your shell configuration. + +Reload your shell configuration: ```bash source ~/.bashrc ``` -Install various compiler version on your Ubuntu system. For this example we will install version 9 of the gcc/g++ compiler to demonstrate potential improvements your application could achieve. +Install multiple compiler versions on your Ubuntu system. For this example you can install GCC version 9 to demonstrate potential improvements your application could achieve. + +Install GCC version 9: ```bash sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt update -sudo apt install gcc-9 g++-9 +sudo apt install gcc-9 g++-9 -y ``` Create a module file for each compiler installed. ```bash mkdir -p ~/modules/gcc -nano ~/modules/gcc/9 ``` -Copy and paste the text below into the nano text editor and save the file -```ouput + +Use a text editor to modify the file `~/modules/gcc/9` + +Copy and paste the text below into the file and save it. + +```console #%Module1.0 prepend-path PATH /usr/bin/gcc-9 prepend-path PATH /usr/bin/g++-9 diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md index 5920fc7d50..2c28d2f616 100644 --- a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md +++ b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md @@ -1,32 +1,47 @@ --- -title: Finding Supported Neoverse Features -weight: 3 +title: Find specific Neoverse features +weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Identify the Neoverse Version +You may want to find out which Neoverse processor a cloud instance uses. -To understand which Neoverse version a cloud instance uses check the [Arm partner webpage](https://www.arm.com/partners/aws). +You can learn the history of each cloud service providers, but as time progresses it becomes more complex to summarize. -Alternatively, if you already have access to the instance, run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row. +For example, in 2919 AWS announced Graviton2 processors. The Graviton2 instance types include M6g, C6g, R6g, and T4g. AWS advertises 40% better price performance over the same generation of x86 instances. Graviton2 instances include up to 64 vCPUs. Graviton2 uses Arm Neoverse N1 cores. -```output +Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores. + +In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3. + +There are more than 150 instance types with Graviton processors. + +Alternatively, if you have access to the instance, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row. + +For example, on the `r8g.xlarge` instance run: + +```bash lscpu | grep -i model +``` + +The output is: + +```output Model name: Neoverse-V2 Model: 1 ``` -Here you can confirm the AWS`r8g.xlarge` instance, is based on the Neoverse-V2 Arm IP. We will use this instance for the remainder of this learning path. -## Understand Supported CPU Features +You can confirm the AWS `r8g.xlarge` instance is based on the Neoverse V2 processor. Use this instance for the remainder of the Learning Path. -Next, to identify the CPU extensions supported by this architecture at runtime we can observe the Linux hardware capabilities (HWCAP) vector. The C++ source code below that reads a specific vector that contains the information. +## Understand supported CPU features -Copy and paste the c program into a file named, `hw_cap.c`. +To identify Arm architecture features at runtime in a C program, you can use the Linux hardware capabilities (HWCAP) vector. The source code below reads a specific vector that contains the information. -```c +Use a text editor to copy and paste the C program below into a file named `hw_cap.c`. +```C #include #include #include @@ -71,14 +86,14 @@ int main() ``` -Compile and run with the command below. +Compile and run the program with the commands: ```bash gcc hw_cap.c -o hw_cap ./hw_cap ``` -On Graviton 4, I the output below confirms the scalable vector extensions (SVE) are available. +The output below confirms scalable vector extensions (SVE) are available. ```output AES instructions are available @@ -91,9 +106,11 @@ Scalable Vector Extension (SVE) instructions are available For the latest list of all hardware capabilities available for a specific linux kernel version, refer to the `arch/arm/include/uapi/asm/hwcap.h` header file in the Linux Kernel source code. -Further, knowing the width of SVE (Scalable Vector Extension) can be useful for optimizing software performance, as it allows developers to tailor their code to fully utilize the available vector processing capabilities of the hardware. Copy the following C code into a file named `sve_width.c`. +Additionally, knowing the SVE (Scalable Vector Extension) vector width is useful for optimizing software performance. + +Use a text editor to copy and paste the following C code into a file named `sve_width.c`. -```c +```C #include #include @@ -104,36 +121,43 @@ int main() { } ``` -Compile with the following command. +Compile and run the program with the following commands: ```bash g++ sve_width.c -o sve_width -mcpu=neoverse-v2 +./sve_width ``` -This shows that the Neoverse-V2 based Graviton 4 instance has a SVE width of 8 bytes (128 bits). +The output shows that the Neoverse V2 based Graviton 4 instance has a SVE width of 8 bytes (128 bits). ```output SVE vector length: 16 bytes ``` -## Supported Compiler Features +## Supported compiler features -Fortunately, the g++ compiler will automatically identify the host systems capability. The `-###` argument can be used to show the full options used when compiling. +Fortunately, the G++ compiler automatically identifies the host systems capability. The `-###` argument can be used to show the full options used when compiling. -If the host is the same platform you are compiling for, you can observe which CPUs are potential targets for your command with the following g++ command. +You can observe which processors are potential targets for compiling your code using the following G++ command: -```output +```bash g++ -E -mcpu=help -xc /dev/null -cc1: note: valid arguments are: cortex-a34 cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 ampere1 ampere1a emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 cortex-a76ae cortex-a77 cortex-a78 cortex-a78ae cortex-a78c cortex-a65 cortex-a65ae cortex-x1 cortex-x1c **neoverse-n1** ares neoverse-e1 octeontx2 octeontx2t98 octeontx2t96 octeontx2t93 octeontx2f95 octeontx2f95n octeontx2f95mm a64fx tsv110 thunderx3t110 neoverse-v1 zeus neoverse-512tvb saphira cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 cortex-r82 cortex-a510 cortex-a710 cortex-a715 cortex-x2 cortex-x3 neoverse-n2 cobalt-100 neoverse-v2 grace demeter generic ``` -Comparing to when using `g++9` we can see there are fewer CPU targets to optimise for as recently released CPUs are omitted, for example the Neoverse V2. +The output is: -``` -g++-9 -E -mcpu=help -xc /dev/null -cc1: note: valid arguments are: cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 ares neoverse-n1 neoverse-e1 a64fx tsv110 zeus neoverse-v1 neoverse-512tvb saphira neoverse-n2 cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 generic +```output +cc1: note: valid arguments are: cortex-a34 cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 ampere1 ampere1a emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 cortex-a76ae cortex-a77 cortex-a78 cortex-a78ae cortex-a78c cortex-a65 cortex-a65ae cortex-x1 cortex-x1c **neoverse-n1** ares neoverse-e1 octeontx2 octeontx2t98 octeontx2t96 octeontx2t93 octeontx2f95 octeontx2f95n octeontx2f95mm a64fx tsv110 thunderx3t110 neoverse-v1 zeus neoverse-512tvb saphira cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 cortex-r82 cortex-a510 cortex-a710 cortex-a715 cortex-x2 cortex-x3 neoverse-n2 cobalt-100 neoverse-v2 grace demeter generic ``` +Comparing the same command using `g++9` and see there are fewer CPU targets to optimize for because recent CPUs are not yet included, for example the Neoverse V2. +```bash +g++-9 -E -mcpu=help -xc /dev/null +``` +The output from version 9 is: +```output +cc1: note: valid arguments are: cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 ares neoverse-n1 neoverse-e1 a64fx tsv110 zeus neoverse-v1 neoverse-512tvb saphira neoverse-n2 cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 generic +``` diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md index 5054dbdc52..3bd01f103c 100644 --- a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md +++ b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md @@ -1,20 +1,26 @@ --- -title: Source Code Example +title: Try an example application weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Defining the goal +## Understand your goal -If you intend your application to be portable across a variety of Arm architecture versions, selecting a target architecture with, `-march=` with a the value mapped to the lowest Arm architecture in your deployment fleet. The is enabled by the backwards compatibility of the Arm architecture. If running your C++ application in a memory constrained environment, for example in a containerised environment, you may wish to consider optimising for size. +If you intend for your application to be portable across a variety of Arm servers, you should select a target architecture with, `-march=` with a value matching the lowest Arm architecture among the set of systems you plan to use. This is enabled by the backwards compatibility of the Arm architecture. -If you're building to be performant on a specific CPU, as in our case we are building to run natively on an AWS Graviton 4 instance (Arm Neoverse V2), we recommend specifying the system using the `-mcpu` flag. +If you are running your C++ application in a memory constrained environment, such as a containerized environment, you should optimize for size. -## Vectorizable Loop +If you're building an application to get the highest performance on a specific processor, such as AWS Graviton 4 (Arm Neoverse V2), you should consider specifying the system using the `-mcpu` flag. -Copy and paste the following C++ snippet into a file called `vectorizable_loop.cpp`. The naive snippet below initialiases a vector of 1 million elements and doubles each element, storing the result in the same vector. This is repeated 5 times to caculate the average runtime. This naive loop with be autovectorized by the compiler. +While these are general guidelines, you should experiment with the various optimization settings. + +## What is a vectorizable loop? + +Use an editor to copy and paste the C++ code below into a file named `vectorizable_loop.cpp`. + +The code initializes a vector of 1 million elements, doubles each element, and stores the result in the same vector. This is repeated 5 times to calculate the average runtime. ```c++ #include @@ -52,12 +58,13 @@ int main() { return 0; } - ``` -## Using different versions of the g++ compiler +## Use different versions of the G++ compiler -As a trivial example we will compile `vectorizable_loop.cpp` with the same arguments. +Compare compiler versions by building `vectorizable_loop.cpp` with the same arguments, but different compiler versions. + +Run the commands below to use version 13 an then version 9 on the same code: ```bash g++ vectorizable_loop.cpp -o vectorizable_loop_gcc_13 @@ -65,7 +72,10 @@ g++-9 vectorizable_loop.cpp -o vectorizable_loop_gcc_9 ./vectorizable_loop_gcc_13 ./vectorizable_loop_gcc_9 ``` -In this naive and trivial example we observe ~19% speed improvement moving from version 9 to version 13. In reality, large code bases with multiple files are more likely to observe runtime and memory improvements. + +In this simple example you observe about 20% speed improvement moving from version 9 to version 13. + +The output is shown below: ```output // gcc v.13 @@ -87,7 +97,13 @@ Elapsed time for iteration 5: 0.362911 seconds Average elapsed time: 0.362615 seconds ``` -## Targeting Performance +You see that newer compiler versions can impact performance. + +## Target performance + +Another way to observe compiler impact is the optimization level. + +Compile the application with three different optimization levels: ```bash g++ -O1 vectorizable_loop.cpp -o level_1 @@ -97,36 +113,50 @@ g++ -O3 vectorizable_loop.cpp -o level_3 Running the 3 output binaries we observe a significant elapsed time improvement with minimal change in file size and compile time. Please note that larger code bases may see larger output binary sizes and compilation time. -```output +```bash ./level_1 -Average elapsed time: 0.0526484 seconds ./level_2 -Average elapsed time: 0.0420332 seconds ./level_3 +``` + +The output from each executable is below: + +```output +Average elapsed time: 0.0526484 seconds +Average elapsed time: 0.0420332 seconds Average elapsed time: 0.0155661 seconds ``` -## Understanding what optimisations were used +You see that optimization level impacts performance. -Naturally, the next question is to understand which part of your source code was optimised. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. These reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization. +## Understanding optimizations -For a more manageable overview, you can enable basic optimization reports using specific arguments such as -fopt-info-vec, which focuses on vectorization optimizations. The -fopt-info flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest without being inundated with data. +Naturally, the next question is to understand which part of your source code was optimized. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. These reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization. -Applying this to the following command we can see that there is no vector optimisation with the `-O1` optimisation level. +For a more manageable overview, you can enable basic optimization reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest without being inundated with data. -``` +Applying this to the following command we can see that there is no vector optimization with the `-O1` optimization level. + +Run the compier with the arguments shown: + +```bash g++ -O1 vectorizable_loop.cpp -o level_1 -fopt-info-vec ``` -``` -g++ -O2 vectorizable_loop.cpp -o level_1 -fopt-info-vec +The output is: + +```output vectorizable_loop.cpp:13:30: optimized: loop vectorized using 16 byte vectors /usr/include/c++/13/bits/stl_algobase.h:930:22: optimized: loop vectorized using 16 byte vectors ``` -However the same command with the `-O2` optimistaion level we observe line 13, column 30 of our source code was optimised. -## Targeting Balanced Performance +However the same command with the `-O2` optimization level we observe line 13, column 30 of our source code was optimized. + +## Target balanced performance + +For balanced performance, AWS recommends using the `-mcpu=neoverse-512tvb` option. The value `neoverse-512tvb` instructs GCC to optimize for Neoverse cores that support SVE and have a vector bandwidth of 512 bits per cycle. -On AWS, for balanced performance they recommend using the `-mcpu=neoverse-512tvb` option. The value ‘neoverse-512tvb’ instructs GCC to optimize for Neoverse cores that support SVE and have a vector bandwidth of 512 bits per cycle. Essentially, this option directs GCC to target Neoverse cores capable of executing four 128-bit Advanced SIMD arithmetic instructions per cycle, as well as an equivalent number of SVE arithmetic instructions per cycle (two for 256-bit SVE, four for 128-bit SVE). This tuning is more general than optimizing for a specific core like Neoverse V1, but more specific than the default tuning options. +This option directs GCC to target Neoverse cores capable of executing four 128-bit Advanced SIMD arithmetic instructions per cycle, as well as an equivalent number of SVE arithmetic instructions per cycle (two for 256-bit SVE, four for 128-bit SVE). This tuning is more general than optimizing for a specific core like Neoverse V1, but more specific than the default tuning options. +There are numerous options available for G++ that impact code performance and size. A basic understanding of these helps helps you to pick the best options for your application. diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/_index.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/_index.md index e8f2e2ac0b..40b211f10e 100644 --- a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/_index.md @@ -1,30 +1,29 @@ --- -title: Learn Basic C++ Optimisation Techniques using the G++ Compiler +title: Learn about optimization techniques using the G++ compiler draft: true cascade: draft: true minutes_to_complete: 60 -who_is_this_for: Beginner C++ developers who are looking to optimise their workload on Arm-based cloud instances with no source code modifications. +who_is_this_for: Beginner C++ developers who are looking to optimize applications on Arm-based cloud instances with no source code modifications. learning_objectives: - - Compile a C++ program for a specific Arm target - - Use compiler flags to control the optimisation + - Compile a C++ program for a specific Arm target. + - Use compiler flags to control the optimization. prerequisites: - - Basic understanding of C++ - - Basic understanding of compilers + - Basic understanding of C++. + - Basic understanding of compilers. author: Kieran Hejmadi ### Tags skilllevels: Introductory -subjects: C++ +subjects: Performance and Architecture armips: - Neoverse tools_software_languages: - - G++ - C++ operatingsystems: - Linux