diff --git a/content/install-guides/armpl.md b/content/install-guides/armpl.md index 39e88138f1..9f22e4d3ca 100644 --- a/content/install-guides/armpl.md +++ b/content/install-guides/armpl.md @@ -32,8 +32,8 @@ layout: installtoolsall # DO NOT MODIFY. Always true for tool install ar [Arm Performance Libraries](https://developer.arm.com/downloads/-/arm-performance-libraries#documentation) provides developers with optimized math libraries for high performance computing applications on Arm Neoverse based hardware. -These libraries include highly optimized functions for BLAS, LAPACK, FFT, sparse linear algebra, libamath and libastring. -These libraries are free to use and do not require a license. They can be installed either standalone or with your installation of [Arm Compiler for Linux](/install-guides/acfl). This install guide covers the standalone installation. +These libraries include highly optimized functions for BLAS, LAPACK, FFT, sparse linear algebra, random number generation, libamath and libastring. +These libraries are free to use and do not require a license. Arm Performance Libraries are available for use on [Windows 11 on Arm](#windows), [macOS](#macos) (Apple Silicon), and [Linux](#linux) (AArch64) hosts. @@ -63,6 +63,16 @@ Click 'Install' and then 'Finish' to complete the installation. ![win_wizard04 #left](/install-guides/_images/armpl_wizard04.png) +To install Arm Performance Libraries from a command prompt and automatically accept the End User License Agreement use: +```console +msiexec.exe /i arm-performance-libraries__Windows.msi /quiet ACCEPT_EULA=1 +``` + +To install Arm Performance Libraries using the `winget` package manager and automatically accept the End User License Agreement use: +```console +winget install --accept-package-agreements Arm.ArmPerformanceLibraries +``` + You can now start linking your application to the Arm Performance libraries on your Windows on Arm device. Follow the examples in the included `RELEASE_NOTES` file of your extracted installation directory to get started. For more information refer to [Get started with Arm Performance Libraries](https://developer.arm.com/documentation/109361). @@ -74,28 +84,28 @@ For more information refer to [Get started with Arm Performance Libraries](https In a terminal, run the command shown below to download the macOS package: ```console -wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_25.04/arm-performance-libraries_25.04_macOS.tgz +wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_25.07/arm-performance-libraries_25.07_macOS.tgz ``` Use tar to extract the file: ```console -tar zxvf arm-performance-libraries_25.04_macOS.tgz +tar zxvf arm-performance-libraries_25.07_macOS.tgz ``` Output of above command: ```console -armpl_25.04_flang-new_clang_19.dmg +armpl_25.07_flang-20.dmg ``` Mount the disk image by running from a terminal: ```console -hdiutil attach armpl_25.04_flang-new_clang_19.dmg +hdiutil attach armpl_25.07_flang-20.dmg ``` Now run the installation script as a superuser: ```console -/Volumes/armpl_25.04_flang-new_clang_19_installer/armpl_25.04_flang-new_clang_19_install.sh -y +/Volumes/armpl_25.07_flang-20_installer/armpl_25.07_flang-20_install.sh -y ``` Using this command you automatically accept the End User License Agreement and the packages are installed to the `/opt/arm` directory. If you want to change the installation directory location use the `--install_dir=` option with the script and provide the desired directory location. @@ -107,7 +117,7 @@ For more information refer to [Get started with Arm Performance Libraries](https ## How do I install Arm Performance Libraries on Linux? {#linux} -Arm Performance Libraries are supported on most Linux distributions like Ubuntu, RHEL, SLES and Amazon Linux on an `AArch64` host and compatible with various versions of GCC, LLVM, and NVHPC. The GCC compatible releases are built with GCC 14 and tested with GCC versions 7 to 14. The LLVM compatible releases are tested with LLVM 19.1. The NVHPC compatible releases are tested with NVHPC 24.7. +Arm Performance Libraries are supported on most Linux distributions like Ubuntu, RHEL, SLES and Amazon Linux on an `AArch64` host and compatible with various versions of GCC, LLVM, and NVHPC. The GCC compatible releases are built with GCC 14 and tested with GCC versions 7 to 14. The LLVM compatible releases are tested with LLVM 20.1. The NVHPC compatible releases are tested with NVHPC 25.5. ### How do I manually download and install Arm Performance Libraries on Linux? @@ -122,26 +132,26 @@ The instructions shown below are for deb based installers for GCC users. In a terminal, run the command shown below to download the Debian package: ```bash -wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_25.04.1/arm-performance-libraries_25.04.1_deb_gcc.tar +wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_25.07/arm-performance-libraries_25.07_deb_gcc.tar ``` Use `tar` to extract the file and then change directory: ```bash -tar xf arm-performance-libraries_25.04.1_deb_gcc.tar +tar xf arm-performance-libraries_25.07_deb_gcc.tar ``` Run the installation script as a super user: ```bash -sudo ./arm-performance-libraries_25.04.1_deb/arm-performance-libraries_25.04.1_deb.sh --accept +sudo ./arm-performance-libraries_25.07_deb/arm-performance-libraries_25.07_deb.sh --accept ``` Using the `--accept` switch you automatically accept the End User License Agreement and the packages are installed to the `/opt/arm` directory. If you want to change the installation directory location use the `--install-to` option with the script and provide the desired directory location. -## How do I download and install Arm Performance Libraries using system packages on Linux? +### How do I download and install Arm Performance Libraries using system packages on Linux? Arm Performance Libraries are available to install using Linux system package managers. The instructions shown below are for the Ubuntu system package manager `apt` command. @@ -190,13 +200,13 @@ module avail The output should be similar to: ```output -armpl/25.04.1_gcc +armpl/25.07_gcc ``` Load the appropriate module: ```console -module load armpl/25.04.1_gcc +module load armpl/25.07_gcc ``` You can now compile and test the examples included in the `/opt/arm//examples/`, or `//examples/` directory, if you have installed to a different location than the default. diff --git a/content/install-guides/container.md b/content/install-guides/container.md new file mode 100644 index 0000000000..09d5e1fda2 --- /dev/null +++ b/content/install-guides/container.md @@ -0,0 +1,215 @@ +--- +title: Container CLI for macOS + +draft: true + +author: Rani Chowdary Mandepudi + +minutes_to_complete: 10 + +official_docs: https://github.com/apple/container + +additional_search_terms: +- container +- virtualization + +layout: installtoolsall +multi_install: false +multitool_install_part: false +test_maintenance: false +weight: 1 +--- + +Container CLI is an open-source command line tool from Apple for building and running Arm Linux containers directly on macOS using lightweight virtual machines, without the need for Docker Desktop or Linux VMs. + +It supports the full OCI (Open Container Initiative) workflow: building, running, tagging, and pushing container images. + +## What should I do before installing the Container CLI? + +This article provides a step-by-step guide to install and use the `container` command-line tool for building and running Arm Linux containers natively on macOS systems with Apple silicon. + +Confirm you are using an Apple silicon Mac by running: + +```bash +uname -m +``` + +The output on macOS is: + +```output +arm64 +``` + +Container CLI only works on Macs with Apple silicon, including M1, M2, M3, and M4. + +Use the following command to verify macOS version: + +```bash +sw_vers -productVersion +``` + +Example output: + +```output +15.5 +``` + +Your computer must be running macOS 15.0 or later to use the Container CLI. + +## How do I install Container CLI? + +To install Container CLI on macOS, follow the steps below: + +From the [official GitHub Release page](https://github.com/apple/container/releases), download the latest signed `.pkg` installer. + +For example: + +```bash +wget https://github.com/apple/container/releases/download/0.2.0/container-0.2.0-installer-signed.pkg +``` + +Install the downloaded package using: + +```bash +sudo installer -pkg container-0.2.0-installer-signed.pkg -target / +``` + +This installs the Container binary at `/usr/local/bin/container` + +After installation, start the container system service by running the following command: + +```bash +container system start +``` + +{{% notice Note %}} +The system service must be running to use container operations such as build, run, or push. It may also need to be started again after a reboot, depending on system settings. +{{% /notice %}} + +The background server process is now running. + +Verify the CLI version: + +```bash +container --version +``` + +Example output: + +```output +container CLI version 0.2.0 +``` + +This confirms that the Container CLI is successfully installed and ready to use. + +## How do I build, run, and push a container using the Container CLI? + +### Create a Dockerfile + +You can define a simple image that prints the system architecture. + +Use an editor to create a file named `Dockerfile` with the following contents: + +```bash +FROM ubuntu:latest +CMD echo -n "Architecture is " && uname -m +``` + +### Build the container image + +Build the image from the `Dockerfile`. + +This will pull the Ubuntu base image and tag the result as `uname`. + +```bash +container build -t uname . +``` + +The output will be similar to: + +```output +Successfully built uname:latest +``` + +### Run the container + +Execute the container to verify it runs successfully and prints the system architecture. + +```bash +container run --rm uname +``` + +The output is: + +```output +Architecture is aarch64 +``` + +The `--rm` flag removes the container after it finishes. + +### Tag and push the image + +Once the image is built and tested locally, it can be pushed to a container registry such as Docker Hub. This allows the image to be reused across machines or shared with others. + +Use the `tag` command to apply a registry-compatible name to the image: + +```bash +container images tag uname docker.io//uname:latest +``` + +Replace `` with your Docker Hub username. + +Before pushing the image, log in to Docker Hub: + +```bash +container registry login docker.io +``` + +Enter your Docker Hub username and password. + +{{% notice Note %}} +The same command works with other registries such as GitHub Container Registry (ghcr.io) or any OCI-compliant registry. Replace `docker.io` with the appropriate registry hostname. +{{% /notice %}} + +Next, upload the tagged image to Docker Hub. + +```bash +container images push docker.io//uname:latest +``` + +Once the push completes successfully, the image will be available in the Docker Hub repository. It can be pulled and run on other systems that support the Arm architecture. + +## How can I list images and containers? + +You can view locally built or pulled images using: + +```bash +container images ls +``` + +To see running or previously executed containers: + +```bash +container ls +``` + +## How do I uninstall the Container CLI? + +The Container CLI includes an uninstall script that allows you to remove the tool from your system. You can choose to remove the CLI with or without user data. + +Uninstall and keep user data (images, containers): + +```bash +uninstall-container.sh -k +``` + +Use this if you plan to reinstall later and want to preserve your local container data. + +Uninstall and delete all user data: + +```bash +uninstall-container.sh -d +``` +This will permanently remove the CLI and all container images, logs, and metadata. + +You can now build and run Arm Linux containers on macOS. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md index 57ce4e6537..7fdba8f583 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md @@ -1,23 +1,19 @@ --- title: Optimize Arm applications and shared libraries with BOLT -draft: true -cascade: - draft: true - minutes_to_complete: 30 -who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT. +who_is_this_for: This is an advanced topic for performance engineers and software developers targeting Arm platforms who want to optimize application binaries and shared libraries using BOLT. learning_objectives: - - Instrument and optimize application binaries for individual workload features using BOLT. - - Collect separate BOLT profiles and merge them for comprehensive code coverage. - - Optimize shared libraries independently. - - Integrate optimized shared libraries into applications. - - Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios. + - Instrument and optimize application binaries for individual workload features using BOLT + - Collect and merge separate BOLT profiles to improve code coverage + - Optimize shared libraries independently of application binaries + - Integrate optimized shared libraries into applications + - Evaluate and compare performance across baseline, isolated, and merged optimization scenarios prerequisites: - - An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. + - An Arm-based Linux system with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed author: Gayathri Narayana Yegna Narayanan diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/example-picture.png b/content/learning-paths/servers-and-cloud-computing/bolt-merge/example-picture.png deleted file mode 100644 index c69844bed4..0000000000 Binary files a/content/learning-paths/servers-and-cloud-computing/bolt-merge/example-picture.png and /dev/null differ diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md index 9914183789..2b9da132aa 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md @@ -1,63 +1,61 @@ --- -title: BOLT overview +title: Overview weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance. +## What is BOLT? -Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. +[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses uses profiling data from [Linux Perf](/install-guides/perf/) to identify frequently executed functions and basic blocks. Based on this data, BOLT reorders code to improve instruction cache locality, reduce branch mispredictions, and shorten critical execution paths. -You should use an Arm Linux system with at least 8 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible. +This often results in faster startup times, lower CPU cycles per instruction (CPI), and improved throughput - especially for large, performance-sensitive applications like databases, web servers, or system daemons. -## What will I do in this Learning Path? +{{% notice Note %}} +BOLT complements compile-time optimizations like LTO (Link-Time Optimization) and PGO (Profile-Guided Optimization). It applies after linking, giving it visibility into the final binary layout, which traditional compiler optimizations do not. +{{% /notice %}} -In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the application and two share libraries which are used by MySQL are also optimized using BOLT. +Before you begin, ensure that you have the following installed: -Here is an outline of the steps: +- [BOLT](/install-guides/bolt/) +- [Linux Perf](/install-guides/perf/) -1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only +You should use an Arm-based Linux system with at least 8 CPUs and 16 GB of RAM. This Learning Path was tested on Ubuntu 24.04, but other Linux distributions are also supported. - A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns. +## What will I do in this Learning Path? -2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so` +In this Learning Path, you'll learn how to use BOLT to optimize both applications and shared libraries. You'll walk through a real-world example using MySQL and two of its dependent libraries: - This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack. +- `libssl.so` +- `libcrypto.so` -3. Merge profile data for broader code coverage +You will: - By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload. +- **Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only** - a read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns. -4. Run BOLT on each binary application and library +- **Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`** - this means that you can apply BOLT optimizations to not just your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack. - With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack. +- **Merge profile data for broader code coverage** - by combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload. -5. Link the final optimized binary with the separately optimized libraries to deploy a fully optimized runtime stack +- **Run BOLT on each binary application and library** - with the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack. - After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements. +- **Link the final optimized binary with the separately optimized libraries to deploy a fully optimized runtime stack** - after optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements. ## What is BOLT profile merging? -BOLT profile merging is the process of combining profiling from multiple runs into a single profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations. - -![Why BOLT Profile Merging?](Bolt-merge.png) - -## What are good applications for BOLT? - -MySQL and Sysbench are used as example applications, but you can use this method for any feature-rich application that: +BOLT profile merging combines profiling data from multiple runs into one unified profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations. -1. Exhibits multiple runtime paths +![Diagram showing how BOLT profile merging combines multiple runtime profiles into a single optimized view#center](bolt-merge.png "Why BOLT profile merging improves optimization coverage") - Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage. +## What types of applications benefit from BOLT? -2. Uses dynamic libraries +Although this Learning Path uses MySQL and Sysbench as examples, you can apply the same method to any feature-rich application that: - Most modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application. +- **Exhibits multiple runtime paths** - applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage. -3. Requires full-stack binary optimization for performance-critical deployment +- **Uses dynamic libraries** - most modern applications rely on shared libraries for functionality. + Optimizing shared libraries alongside the main binary ensures consistent performance across your stack. - In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits. +- **Requires full-stack binary optimization for performance-critical deployment** - in scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md index 69a7b05852..df7b0da2fb 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md @@ -6,23 +6,23 @@ weight: 3 layout: learningpathall --- -In this step, you will use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads. +## Overview -The collected profiles will be merged with others and used to optimize the application's code layout. +In this step, you'll use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads. The collected profiles will later be merged with others and used to optimize the application's code layout. -## Build mysqld from source +## Build mysqld from source -Follow these steps to build the MySQL server (`mysqld`) from source: +Build the MySQL server (`mysqld`) binary from source. -Install the required dependencies: +Start by installing the required dependencies: ```bash sudo apt update sudo apt install -y build-essential cmake libncurses5-dev libssl-dev libboost-all-dev \ - bison pkg-config libaio-dev libtirpc-dev git ninja-build liblz4-dev + bison pkg-config libaio-dev libtirpc-dev git ninja-build liblz4-dev ``` -Download the MySQL source code. You can change to another version in the `checkout` command below if needed. +Clone the MySQL source code. You can change to another version in the `checkout` command below if needed: ```bash git clone https://github.com/mysql/mysql-server.git @@ -30,7 +30,7 @@ cd mysql-server git checkout mysql-8.0.37 ``` -Configure the build for debug: +Next, configure the build for debug: ```bash mkdir build && cd build @@ -40,16 +40,18 @@ cmake .. -DCMAKE_C_FLAGS="-O3 -march=native -Wno-enum-constexpr-conversion -fno- -DWITH_BOOST=$HOME/boost -DDOWNLOAD_BOOST=On -DWITH_ZLIB=bundled -DWITH_LZ4=system -DWITH_SSL=system ``` -Build MySQL: +Then build MySQL: ```bash ninja ``` -After the build completes, the `mysqld` binary is located at `$HOME/mysql-server/build/runtime_output_directory/mysqld` +After the build completes, the `mysqld` binary is located in `$HOME/mysql-server/build/runtime_output_directory/mysqld` {{% notice Note %}} -You can run `mysqld` directly from the build directory as shown, or run `make install` to install it system-wide. For testing and instrumentation, running from the build directory is usually preferred. +- Replace `runtime_output_directory` with your actual path (`runtime_output_directory/` is a placeholder — the real directory might differ based on your build system or configuration). + +- You can run `mysqld` directly from the build directory or install it system-wide using `make install`. For testing and instrumentation, running it locally from the build directory is recommended. {{% /notice %}} After building mysqld, install MySQL server and client utilities system-wide: @@ -58,7 +60,7 @@ After building mysqld, install MySQL server and client utilities system-wide: sudo ninja install ``` -This will make the `mysql` client and other utilities available in your PATH. +This makes the `mysql` client and other utilities available in your PATH. ```bash echo 'export PATH="$PATH:/usr/local/mysql/bin"' >> ~/.bashrc @@ -67,13 +69,15 @@ source ~/.bashrc Ensure the binary is unstripped and includes debug symbols for BOLT instrumentation. -To work with BOLT, your application binary should be: +Make sure your application binary: + +- Is built from source + + - Includes symbol information (unstripped) -- Built from source -- Unstripped, with symbol information available -- Compiled with frame pointers enabled (`-fno-omit-frame-pointer`) +I - s compiled with frame pointers (`-fno-omit-frame-pointer`) -You can verify this with: +You can verify symbol presence with: ```bash readelf -s $HOME/mysql-server/build/runtime_output_directory/mysqld | grep main @@ -89,12 +93,12 @@ The partial output is: 61046: 0000000005ffd5c0 40 FUNC GLOBAL DEFAULT 13 _Z21record_main_[...] ``` -If the symbols are missing, rebuild the binary with debug info and no stripping. +If the symbols are missing, rebuild the binary with debug flags and disable stripping. -## Prepare MySQL server before running workloads +## Prepare MySQL server for profiling -Before running the workload, you may need to initialize a new data directory if this is your first run: +Before running the workload, you might need to initialize a new data directory if this is your first run: ```bash # Initialize a new data directory @@ -102,7 +106,9 @@ Before running the workload, you may need to initialize a new data directory if bin/mysqld --initialize-insecure --datadir=data ``` -Start the instrumented server. On an 8-core system, use available cores (e.g., 2 for mysqld, 7 for sysbench). Run the command from build directory. +Start the instrumented server. On an 8-core system, use core 2 for mysqld and core 7 for Sysbench to avoid contention. + +Run the command from build directory: ```bash taskset -c 2 ./bin/mysqld \ @@ -139,16 +145,15 @@ taskset -c 2 ./bin/mysqld \ Adjust `--datadir`, `--socket`, and `--port` as needed for your environment. Make sure the server is running and accessible before proceeding. -With the database running, open a second terminal to create a benchmark User and third terminal to run the client commands. +## Create the benchmark user and database + +With the database running, open a second terminal to create a benchmark user and third terminal to run the client commands. In the new terminal, navigate to the build directory: ```bash cd $HOME/mysql-server/build ``` - -## Create Benchmark User and Database - Run once after initializing MySQL for the first time: ```bash bin/mysql -u root <<< " @@ -158,19 +163,23 @@ GRANT ALL PRIVILEGES ON *.* TO 'bench'@'localhost' WITH GRANT OPTION; FLUSH PRIVILEGES;" ``` -This sets up the bench user and the bench database with full privileges. Do not repeat this before every test — it is only required once. +This sets up the bench user and the bench database with full privileges. + +{{% notice Note %}} +You only need to do this once. Don’t repeat it before each test. +{{% /notice %}} -## Reset Benchmark Database Between Runs +## Reset the database between runs -This clears all existing tables and data from the bench database, giving you a clean slate for sysbench prepare without needing to recreate the user or reinitialize the datadir. +This clears all existing tables and data from the bench database, giving you a clean slate for Sysbench prepare without needing to recreate the user or reinitialize the datadir. ```bash bin/mysql -u root <<< "DROP DATABASE bench; CREATE DATABASE bench;" ``` -## Install and build sysbench +## Install and build Sysbench -In a third terminal, run the commands below if you have not run sysbench yet. +In a third terminal, run the commands below if you have not run Sysbench yet: ```bash git clone https://github.com/akopytov/sysbench.git @@ -183,7 +192,7 @@ export LD_LIBRARY_PATH=/usr/local/mysql/lib/ Use `./src/sysbench` for running benchmarks unless installed globally. -## Create a dataset with sysbench +## Prepare the dataset with Sysbench Run `sysbench` with the `prepare` option: @@ -201,7 +210,7 @@ Run `sysbench` with the `prepare` option: src/lua/oltp_read_write.lua prepare ``` -## Shutdown MySQL and snapshot dataset for fast reuse +## Shut down MySQL and snapshot dataset for fast reuse Do these steps once at the start from MySQL source directory @@ -240,17 +249,21 @@ llvm-bolt $HOME/mysql-server/build/bin/mysqld \ 2>&1 | tee $HOME/mysql-server/bolt-instrumentation-readonly.log ``` -### Explanation of key options +## Explanation of key options -- `-instrument`: Enables profile generation instrumentation -- `--instrumentation-file`: Path where the profile output will be saved -- `--instrumentation-wait-forks`: Ensures the instrumentation continues through forks (important for daemon processes) +These flags control how BOLT collects runtime data from the instrumented binary. Understanding them helps ensure accurate and comprehensive profile generation: + +- `-instrument`: enables instrumentation mode. BOLT inserts profiling instructions into the binary to record execution behavior at runtime. +- `--instrumentation-file=&1 | tee $HOME/mysql-server/bolt-instrumentation-writeonly.log ``` -Run sysbench again with the write-only workload: +## Run the write-only workload + +Run Sysbench using the write-only Lua script to generate a workload profile: ```bash -# On an 8-core system, use available cores (e.g., 7 for sysbench) +# On an 8-core system, use available cores (e.g., 7 for Sysbench) taskset -c 7 ./src/sysbench \ --db-driver=mysql \ --mysql-host=127.0.0.1 \ @@ -50,24 +55,31 @@ taskset -c 7 ./src/sysbench \ src/lua/oltp_write_only.lua run ``` -Make sure that the `--instrumentation-file` is set appropriately to save `profile-writeonly.fdata`. +Confirm that `--instrumentation-file` is set to `profile-writeonly.fdata`. + +## Reset the dataset after profiling + +After running each benchmark, cleanly shut down the MySQL server and reset the in-memory dataset to ensure the next run starts in a consistent state: -After completing each benchmark run (e.g. after sysbench run), you must cleanly shut down the MySQL server and reset the dataset to ensure the next test starts from a consistent state. ```bash ./bin/mysqladmin -u root shutdown ; rm -rf /dev/shm/dataset ; cp -R data/ /dev/shm/dataset ``` -### Verify the Second Profile Was Generated +## Verify that both profiles exist + +Verify that the following `.fdata` files have been generated: ```bash +ls -lh $HOME/mysql-server/build/profile-readonly.fdata ls -lh $HOME/mysql-server/build/profile-writeonly.fdata ``` +## Merge the read and write profiles Both `.fdata` files should now exist and contain valid data: - `profile-readonly.fdata` - `profile-writeonly.fdata` -### Merge the Feature Profiles +## Merge the feature profiles Use `merge-fdata` to combine the feature-specific profiles into one comprehensive `.fdata` file: @@ -85,15 +97,15 @@ Profile from 2 files merged. This creates a single merged profile (`profile-merged.fdata`) covering both read-only and write-only workload behaviors. -### Verify the Merged Profile +## Verify the merged profile -Check the merged `.fdata` file: +Confirm the merged profile file exists and is non-empty: ```bash ls -lh $HOME/mysql-server/build/profile-merged.fdata ``` -### Generate the Final Binary with the Merged Profile +## Optimize the binary with the merged profile Use LLVM-BOLT to generate the final optimized binary using the merged `.fdata` file: @@ -111,6 +123,13 @@ llvm-bolt $HOME/mysql-server/build/bin/mysqld \ 2>&1 | tee $HOME/mysql-server/build/bolt-readwritemerged-opt.log ``` +{{% notice Note %}} +Key flags explained: +- `-reorder-blocks=ext-tsp`: Reorders code blocks to improve cache locality +- `-split-functions`: Separates hot and cold regions for better performance +- `-dyno-stats`: Prints dynamic profile-based statistics during optimization +{{% /notice %}} + This command optimizes the binary layout based on the merged workload profile, creating a single binary (`mysqldreadwrite_merged.bolt_instrumentation`) that is optimized across both features. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md index 65da71ece2..eebc49ee7c 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md @@ -5,7 +5,12 @@ weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### Instrument Shared Libraries (e.g., libcrypto, libssl) + +## Overview + +In this section, you'll learn how to instrument and optimize shared libraries, and specifically, `libssl.so` and `libcrypto.so` using BOLT. These libraries are used by MySQL, and optimizing them can improve overall performance. You'll rebuild OpenSSL from source to include symbol information, then collect profiles and apply BOLT optimizations. + +## Prepare instrumentable versions If system libraries like `/usr/lib/libssl.so` are stripped, rebuild OpenSSL from source with relocations: @@ -18,7 +23,7 @@ make -j$(nproc) make install ``` -### Instrument libssl +## Instrument libssl.so Use `llvm-bolt` to instrument `libssl.so.3`: @@ -33,13 +38,13 @@ llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ 2>&1 | tee $HOME/mysql-server/bolt-instrumentation-libssl.log ``` -Then launch MySQL using the **instrumented shared library** and run a **read+write** sysbench test to populate the profile +Launch MySQL with the instrumented `libssl.so` and run a read+write Sysbench test to populate the profile -### Optimize libssl using the profile +## Optimize libssl using the profile After running the read+write test, ensure `libssl-readwrite.fdata` is populated. -Run BOLT on the uninstrumented `libssl.so` with the collected read-write profile: +Run BOLT on the uninstrumented `libssl.so` using the collected read+write profile: ```bash llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ @@ -55,13 +60,13 @@ llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ 2>&1 | tee $HOME/mysql-server/build/bolt-libssl.log ``` -### Replace the library at runtime +## Replace the library at runtime Copy the optimized version over the original and export the path: ```bash # Set LD_LIBRARY_PATH in the terminal before launching mysqld in order for mysqld to pick the optimized library. -cp $HOME/bolt-libs/openssl/libssl.so.optimized $HOME/bolt-libs/openssl/libssl.so.3 +cp $HOME/bolt-libs/openssl/libssl.so.optimized $HOME/bolt-libs/openssl/lib/libssl.so.3 export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib # You can confirm that mysqld is loading your optimized library with: @@ -69,6 +74,10 @@ LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/ ldd build/bin/mysqld | grep libssl ``` +{{% notice tip %}} +Setting `LD_LIBRARY_PATH` ensures that MySQL dynamically links to the optimized shared library at runtime. This does not permanently override system libraries. +{{% /notice %}} + It should show: ```output @@ -77,7 +86,7 @@ libssl.so.3 => /home/ubuntu/bolt-libs/openssl/libssl.so.3 This ensures MySQL will dynamically load the optimized `libssl.so`. -### Run the final workload and validate performance +## Run the final workload and validate performance Start the BOLT-optimized MySQL binary and link it against the optimized `libssl.so`. Run the combined workload: @@ -108,11 +117,11 @@ taskset -c 7 ./src/sysbench \ In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged scenarios. -### BOLT optimization for libcrypto +## Optimize libcrypto.so with BOLT Follow these steps to instrument and optimize `libcrypto.so`: -### Instrument libcrypto +## Instrument libcrypto.so ```bash llvm-bolt $HOME/bolt-libs/openssl/libcrypto.so.3 \ @@ -124,11 +133,12 @@ llvm-bolt $HOME/bolt-libs/openssl/libcrypto.so.3 \ --instrumentation-wait-forks \ 2>&1 | tee $HOME/mysql-server/bolt-instrumentation-libcrypto.log ``` -Then launch MySQL using the instrumented shared library and run a read+write sysbench test to populate the profile. -### Optimize libcrypto using the profile +Launch MySQL using the instrumented shared library and run a read+write Sysbench test to populate the profile. + +## Optimize libcrypto using the profile After running the read+write test, ensure `libcrypto-readwrite.fdata` is populated. -Run BOLT on the uninstrumented libcrypto.so with the collected read-write profile: +Run BOLT on the uninstrumented libcrypto.so using the collected read+write profile to generate an optimized library: ```bash llvm-bolt $HOME/bolt-libs/openssl/lib/libcrypto.so.3 \ -o $HOME/bolt-libs/openssl/lib/libcrypto.so.optimized \ @@ -147,7 +157,7 @@ Replace the original at runtime: ```bash # Set LD_LIBRARY_PATH in the terminal before launching mysqld in order for mysqld to pick the optimized library. -cp $HOME/bolt-libs/openssl/libcrypto.so.optimized $HOME/bolt-libs/openssl/libcrypto.so.3 +cp $HOME/bolt-libs/openssl/libcrypto.so.optimized $HOME/bolt-libs/openssl/lib/libcrypto.so.3 export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/ # You can confirm that mysqld is loading your optimized library with: diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md index 2c96c74b9f..ff35989ccb 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md @@ -6,14 +6,15 @@ weight: 6 layout: learningpathall --- -This step presents the performance comparisons across various BOLT optimization scenarios. You'll see how baseline performance compares with BOLT-optimized binaries using merged profiles and bolted external libraries. +## Overview -For all test cases shown in the table below, sysbench was configured with --time=0 --events=10000. -This means each test ran until exactly 10,000 requests were completed per thread, rather than running for a fixed duration. +This section compares the performance of baseline binaries with BOLT-optimized versions. It highlights the impact of merged profile optimizations and shared library enhancements on overall system throughput and latency. -### 1. Baseline Performance (No BOLT) +All tests used Sysbench with the flags `--time=0 --events=10000`. This configuration ensures that each test completes exactly 10,000 requests per thread, delivering consistent workload runtimes across runs. -| Metric | Read-Only (Baseline) | Write-Only (Baseline) | Read+Write (Baseline) | +## Baseline performance (without BOLT) + +| Metric |Read-only | Write-only | Read + write | |---------------------------|----------------------|------------------------|------------------------| | Transactions/sec (TPS) | 1006.33 | 2113.03 | 649.15 | | Queries/sec (QPS) | 16,101.24 | 12,678.18 | 12,983.09 | @@ -21,9 +22,9 @@ This means each test ran until exactly 10,000 requests were completed per thread | Latency 95th % (ms) | 1.04 | 0.83 | 1.79 | | Total time (s) | 9.93 | 4.73 | 15.40 | -### 2. Performance Comparison: Merged vs Non-Merged Instrumentation +## Performance comparison: merged and non-merged instrumentation -| Metric | Regular BOLT R+W (No Merge, system libssl) | Merged BOLT (BOLTed Read+Write + BOLTed libssl) | +| Metric | Regular BOLT (read + write, system libssl) | Merged BOLT (read + write + libssl) | |---------------------------|---------------------------------------------|-------------------------------------------------| | Transactions/sec (TPS) | 850.32 | 879.18 | | Queries/sec (QPS) | 17,006.35 | 17,583.60 | @@ -31,9 +32,9 @@ This means each test ran until exactly 10,000 requests were completed per thread | Latency 95th % (ms) | 1.52 | 1.39 | | Total time (s) | 11.76 | 11.37 | -Second run: +Second test run: -| Metric | Regular BOLT R+W (No Merge, system libssl) | Merged BOLT (BOLTed Read+Write + BOLTed libssl) | +| Metric | Regular BOLT (read + write, system libssl) | Merged BOLT (read + write + libssl) | |---------------------------|---------------------------------------------|-------------------------------------------------| | Transactions/sec (TPS) | 853.16 | 887.14 | | Queries/sec (QPS) | 17,063.22 | 17,742.89 | @@ -41,9 +42,9 @@ Second run: | Latency 95th % (ms) | 1.39 | 1.37 | | Total time (s) | 239.9 | 239.9 | -### 3. BOLTed READ, BOLTed WRITE, MERGED BOLT (Read+Write+BOLTed Libraries) +## Performance across BOLT optimizations -| Metric | Bolted Read-Only | Bolted Write-Only | Merged BOLT (Read+Write+libssl) | Merged BOLT (Read+Write+libcrypto) | Merged BOLT (Read+Write+libssl+libcrypto) | +| Metric | BOLT read-only | BOLT write-only | Merged BOLT (read + write + libssl) | Merged BOLT (read + write + libcrypto) | Merged BOLT (read + write + libcrypto + libssl) | |---------------------------|---------------------|-------------------|----------------------------------|------------------------------------|-------------------------------------------| | Transactions/sec (TPS) | 1348.47 | 3170.92 | 887.14 | 896.58 | 902.98 | | Queries/sec (QPS) | 21575.45 | 19025.52 | 17742.89 | 17931.57 | 18059.52 | @@ -52,17 +53,17 @@ Second run: | Total time (s) | 239.8 | 239.72 | 239.9 | 239.9 | 239.9 | {{% notice Note %}} -All sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for sysbench and avoid contention with mysqld. +All Sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use Sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for Sysbench and avoid contention with mysqld. {{% /notice %}} -### Key metrics to analyze +## Key metrics to analyze -- **TPS (Transactions Per Second)**: Higher is better. -- **QPS (Queries Per Second)**: Higher is better. -- **Latency (Average and 95th Percentile)**: Lower is better. +- **TPS (transactions per second)** – higher is better +- **QPS (queries per second)** – higher is better +- **Latency (average and 95th percentile)** – lower is better -### Conclusion +## Conclusion -- BOLT substantially improves performance over non-optimized binaries due to better instruction cache utilization and reduced execution path latency. -- Merging feature-specific profiles does not negatively affect performance; instead, it captures a broader set of runtime behaviors, making the binary better tuned for varied real-world workloads. -- Separately optimizing external user-space libraries, even though providing smaller incremental gains, further complements the overall application optimization, delivering a fully optimized execution environment. +- BOLT-optimized binaries clearly outperform baseline versions by improving instruction cache usage and shortening execution paths. +- Merging feature-specific profiles does not negatively affect performance. Instead, they allow better tuning for varied real-world workloads by capturing a broader set of runtime behaviors. +- External library optimizations (for example, `libssl` and `libcrypto`) provide smaller incremental gains, delivering a fully-optimized execution environment. diff --git a/data/stats_current_test_info.yml b/data/stats_current_test_info.yml index 27aae38555..98c60af541 100644 --- a/data/stats_current_test_info.yml +++ b/data/stats_current_test_info.yml @@ -54,7 +54,8 @@ sw_categories: tests_and_status: [] armpl: readable_title: Arm Performance Libraries - tests_and_status: [] + tests_and_status: + - ubuntu:latest: passed aws-cli: readable_title: AWS CLI tests_and_status: []