diff --git a/.wordlist.txt b/.wordlist.txt index 0b5f05a1cd..6620632450 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -1657,7 +1657,6 @@ Kconfig buildroot RMM Bolt -Optimisation PGO llvmorg latencies @@ -4235,4 +4234,93 @@ libssl misclassification retransmission subquery -uninstrumented \ No newline at end of file +uninstrumented +ASIL +AdvSIMD +AnyCPU +BIST +BMS +Benchstat +Bleve +CMS +CPUx +CockroachDB +CycloneDDS +DCPS +DCPerf +DataReaders +DataWriters +Dn +EV +Gi +GopherLua +HARA +HHVM +HIL +HipHop +JIRA +Jayat +Julien +MISRA +MarkdownRenderXHTML +MediaWiki +NET's +NSG +OrchardCMS +OrchardCore +PATHNAME +Polarion +QoS +RSS +Req +SELinux +STS +ThreadPool +VM's +VM’s +autorun +azureuser +bb +benchstat +biogo +bitness +bleve +br +brstack +cockroachdb +cycloneDDS +differentiators +esbuild +etcd +facebookresearch +gRPC +geomean +geomeans +geospatial +hardcoding +igor +interop +ipfrag +ipv +krishna +metaprogramming +minifies +misprediction +multicast +multicore +odinlmshen +optimise +orchardcore +ov +pathname +psci +retuned +rexec +rmem +roadmap +runnable +taskset +unicast +wrk's +yy +zenoh \ No newline at end of file diff --git a/assets/contributors.csv b/assets/contributors.csv index 1bd3df8895..8de0eb0454 100644 --- a/assets/contributors.csv +++ b/assets/contributors.csv @@ -82,7 +82,7 @@ Odin Shen,Arm,odincodeshen,odin-shen-lmshen,, Avin Zarlez,Arm,AvinZarlez,avinzarlez,,https://www.avinzarlez.com/ Shuheng Deng,Arm,,,, Yiyang Fan,Arm,,,, -Julien Jayat,Arm,,,, +Julien Jayat,Arm,JulienJayat-Arm,julien-jayat-a980a397,, Geremy Cohen,Arm,geremyCohen,geremyinanutshell,, Barbara Corriero,Arm,,,, Nina Drozd,Arm,NinaARM,ninadrozd,, diff --git a/content/install-guides/aws-q-cli.md b/content/install-guides/aws-q-cli.md index e9b223296d..e7cfaa568d 100644 --- a/content/install-guides/aws-q-cli.md +++ b/content/install-guides/aws-q-cli.md @@ -235,7 +235,7 @@ You can ask Amazon Q to set the default model for future sessions. ## Install an MCP server -As an example of using MCP with Amazon Q, you can configure the Github MCP server. +As an example of using MCP with Amazon Q, you can configure a local Github MCP server. Go to your GitHub account developer settings and create a personal access token with the following permissions: diff --git a/content/install-guides/bolt.md b/content/install-guides/bolt.md index ea58b1a190..1878f388ca 100644 --- a/content/install-guides/bolt.md +++ b/content/install-guides/bolt.md @@ -145,19 +145,19 @@ You are now ready to [verify BOLT is installed](#verify). For Arm Linux use the file with `aarch64` in the name: ```bash -wget https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.5/clang+llvm-17.0.5-aarch64-linux-gnu.tar.xz +wget https://github.com/llvm/llvm-project/releases/download/llvmorg-19.1.7/clang+llvm-19.1.7-aarch64-linux-gnu.tar.xz ``` 2. Extract the downloaded file ```bash -tar -xvf clang+llvm-17.0.5-aarch64-linux-gnu.tar.xz +tar -xvf clang+llvm-19.1.7-aarch64-linux-gnu.tar.xz ``` 3. Add the path to BOLT in your `.bashrc` file ```bash -echo 'export PATH="$PATH:$HOME/clang+llvm-17.0.5-aarch64-linux-gnu/bin"' >> ~/.bashrc +echo 'export PATH="$PATH:$HOME/clang+llvm-19.1.7-aarch64-linux-gnu/bin"' >> ~/.bashrc source ~/.bashrc ``` @@ -201,9 +201,8 @@ The output is similar to: ```output LLVM (http://llvm.org/): - LLVM version 18.0.0git + LLVM version 19.1.7 Optimized build with assertions. -BOLT revision 99c15eb49ba0b607314b3bd221f0760049130d97 Registered Targets: aarch64 - AArch64 (little endian) diff --git a/content/install-guides/browsers/brave.md b/content/install-guides/browsers/brave.md index ef5d444f0a..e98528d170 100644 --- a/content/install-guides/browsers/brave.md +++ b/content/install-guides/browsers/brave.md @@ -29,7 +29,15 @@ The Brave browser runs on Windows on Arm as a native ARM64 application, and is a ### Linux -To install Brave on Linux: +There are two options to install Brave on Linux. + +To install Brave with a single command: + +```bash +curl -fsS https://dl.brave.com/install.sh | sh +``` + +To install Brave using multiple commands: {{< tabpane code=true >}} {{< tab header="Ubuntu/Debian" language="bash">}} diff --git a/content/install-guides/dcperf.md b/content/install-guides/dcperf.md new file mode 100644 index 0000000000..42f21f6f3e --- /dev/null +++ b/content/install-guides/dcperf.md @@ -0,0 +1,192 @@ +--- +title: DCPerf +author: Kieran Hejmadi +minutes_to_complete: 20 +official_docs: https://github.com/facebookresearch/DCPerf?tab=readme-ov-file#install-and-run-benchmarks + +additional_search_terms: +- linux +- Neoverse + +test_images: +- ubuntu:22.04 +test_maintenance: false + +layout: installtoolsall +multi_install: false +multitool_install_part: false +tool_install: true +weight: 1 +--- + +## Introduction + +DCPerf is an open-source benchmarking and microbenchmarking suite originally developed by Meta. It faithfully replicates the characteristics of general-purpose data center workloads, with particular attention to microarchitectural fidelity. DCPerf stands out for accurate simulation of behaviors such as cache misses and branch mispredictions, which are details that many other benchmarking tools overlook. + +You can use DCPerf to generate performance data to inform procurement decisions, and for regression testing to detect changes in the environment, such as kernel and compiler changes. + +DCPerf runs on Arm-based servers. The examples below have been tested on an AWS `c7g.metal` instance running Ubuntu 22.04 LTS. + +{{% notice Note %}} +When running on a server provided by a cloud service, you have limited access to some parameters, such as UEFI settings, which can affect performance. +{{% /notice %}} + +## Install prerequisites + +To get started, install the required software: + +```bash +sudo apt update +sudo apt install -y python-is-python3 python3-pip python3-venv git +``` + +It is recommended that you install Python packages in a Python virtual environment. + +Set up your virtual environment: + +```bash +python3 -m venv venv +source venv/bin/activate +``` +If requested, restart the recommended services. + +Install the required packages: + +```bash +pip3 install click pyyaml tabulate pandas +``` + +Clone the repository: + +```bash +git clone https://github.com/facebookresearch/DCPerf.git +cd DCPerf +``` + +## Running the MediaWiki benchmark + +DCPerf offers many benchmarks. See the official documentation for the benchmark of your choice. + +One example is the MediaWiki benchmark, designed to faithfully reproduce the workload of the Facebook social networking site. + +Install HipHop Virtual Machine (HHVM), a virtual machine used to execute the web application code: + +```bash +wget https://github.com/facebookresearch/DCPerf/releases/download/hhvm/hhvm-3.30-multplatform-binary-ubuntu.tar.xz +tar -Jxf hhvm-3.30-multplatform-binary-ubuntu.tar.xz +cd hhvm +sudo ./pour-hhvm.sh +export LD_LIBRARY_PATH="/opt/local/hhvm-3.30/lib:$LD_LIBRARY_PATH" +``` + +Confirm `hhvm` is available. The `hhvm` binary is located in the `DCPerf/hhvm/aarch64-ubuntu22.04/hhvm-3.30/bin` directory: + +```bash +hhvm --version +# Return to the DCPerf root directory +cd .. +``` + +You should see output similar to: + +```output +HipHop VM 3.30.12 (rel) +Compiler: 1704922878_080332982 +Repo schema: 4239d11395efb06bee3ab2923797fedfee64738e +``` + +Confirm security-enhanced Linux (SELinux) is disabled with the following commands: + +```bash +sudo apt install selinux-utils +getenforce +``` + +You should see the following response: + +```output +Disabled +``` + +If you do not see the `Disabled` output, see your Linux distribution documentation for information about how to disable SELinux. + +You can automatically install all dependencies for each benchmark using the `install` argument with the `benchpress_cli.py` command-line script: + +```console +sudo ./benchpress_cli.py install oss_performance_mediawiki_mlp +``` + +This step might take several minutes to complete, depending on your system's download and setup speed. + +## Run the MediaWiki benchmark + +For the sake of brevity, you can provide the duration and timeout arguments using a `JSON` dictionary with the `-i` argument: + +```console +sudo ./benchpress_cli.py run oss_performance_mediawiki_mlp -i '{ + "duration": "30s", + "timeout": "1m" +}' +``` + +While the benchmark is running, you can monitor CPU activity and observe benchmark-related processes using the `top` command. + +When the benchmark is complete, a `benchmark_metrics_*` directory is created within the `DCPerf` directory, containing a `JSON` file for the system specs and another for the metrics. + +For example, the metrics file lists the following: + +```output + "metrics": { + "Combined": { + "Nginx 200": 1817810, + "Nginx 404": 79019, + "Nginx 499": 3, + "Nginx P50 time": 0.036, + "Nginx P90 time": 0.056, + "Nginx P95 time": 0.066, + "Nginx P99 time": 0.081, + "Nginx avg bytes": 158903.93039183, + "Nginx avg time": 0.038826036781319, + "Nginx hits": 1896832, + "Wrk RPS": 3160.65, + "Wrk failed requests": 79019, + "Wrk requests": 1896703, + "Wrk successful requests": 1817684, + "Wrk wall sec": 600.1, + "canonical": 0 + }, + "score": 2.4692578125 +``` + +## Understanding the benchmark results + +The metrics file contains several key performance indicators from the benchmark run: + + +- **Nginx 200, 404, 499**: The number of HTTP responses with status codes 200 (success), 404 (not found), and 499 (client closed request) returned by the Nginx web server during the test. +- **Nginx P50/P90/P95/P99 time**: The response time percentiles (in seconds) for requests handled by Nginx. For example, P50 is the median response time, P99 is the time under which 99% of requests completed. +- **Nginx avg bytes**: The average number of bytes sent per response. +- **Nginx avg time**: The average response time for all requests. +- **Nginx hits**: The total number of requests handled by Nginx. +- **Wrk RPS**: The average number of requests per second (RPS) generated by the `wrk` load testing tool. +- **Wrk failed requests**: The number of requests that failed during the test. +- **Wrk requests**: The total number of requests sent by `wrk`. +- **Wrk successful requests**: The number of requests that completed successfully. +- **Wrk wall sec**: The total wall-clock time (in seconds) for the benchmark run. +- **score**: An overall performance score calculated by DCPerf, which can be used to compare different systems or configurations. + +{{% notice Note %}} + `wrk` is a modern HTTP benchmarking tool used to generate load and measure web server performance. It is widely used for benchmarking because it can produce significant load and provides detailed statistics. For more information, see [wrk's GitHub page](https://github.com/wg/wrk). +{{% /notice %}} + +These metrics help you evaluate the performance and reliability of the system under test. Higher values for successful requests and RPS, and lower response times, generally indicate better performance. The score provides a single value for easy comparison across runs or systems. + +## Next steps + +These are some activites you might like to try next: + +* Use the results to compare performance across different systems, hardware configurations, or after making system changes, such as kernel, compiler, or driver updates. + +* Consider tuning system parameters or trying alternative DCPerf benchmarks to further evaluate your environment. + +* Explore additional DCPerf workloads, including those that simulate key-value stores, in-memory caching, or machine learning inference. diff --git a/content/install-guides/fm_fvp/fvp.md b/content/install-guides/fm_fvp/fvp.md index 956ecfbb4a..d5d3a3c283 100644 --- a/content/install-guides/fm_fvp/fvp.md +++ b/content/install-guides/fm_fvp/fvp.md @@ -3,7 +3,7 @@ title: Fixed Virtual Platforms (FVP) minutes_to_complete: 15 official_docs: https://developer.arm.com/documentation/100966/ author: Ronan Synnott -weight: 3 +weight: 3 ### FIXED, DO NOT MODIFY tool_install: false # Set to true to be listed in main selection page, else false @@ -69,3 +69,17 @@ telnetterminal2: Listening for serial connection on port 5002 A visualization of the FVP will also be displayed. Terminate the FVP with `Ctrl+C`. + +{{% notice %}} +You might run into an enablement issue related to the stack: +``` +cannot enable executable stack as shared object requires: Invalid argument +``` +This stems from the status of the the exec flag, a security feature which helps prevent certain types of buffer overflow attacks. FVPs use just-in-time compilation and require an executable stack to function properly. + +You can a workaround this error using `execstack` on each of the runtime binaries in the error trace. +``` +execstack -c +``` +{{% /notice %}} + diff --git a/content/learning-paths/automotive/_index.md b/content/learning-paths/automotive/_index.md index a2d86544b2..f25f1f9cce 100644 --- a/content/learning-paths/automotive/_index.md +++ b/content/learning-paths/automotive/_index.md @@ -11,15 +11,16 @@ subtitle: Build secure, connected, smart IoT devices title: Automotive weight: 4 subjects_filter: -- Containers and Virtualization: 2 +- Containers and Virtualization: 3 - Performance and Architecture: 1 operatingsystems_filter: - Baremetal: 1 -- Linux: 2 +- Linux: 3 - RTOS: 1 tools_software_languages_filter: - Automotive: 1 -- Docker: 1 -- Python: 1 +- Docker: 2 +- Python: 2 - ROS 2: 1 +- ROS2: 1 --- diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md new file mode 100644 index 0000000000..026b729ba8 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md @@ -0,0 +1,143 @@ +--- +title: Functional Safety for automotive software development +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Why Functional Safety Matters in Automotive Software + +[Functional Safety](https://en.wikipedia.org/wiki/Functional_safety) refers to a system's ability to detect potential faults and respond appropriately to ensure that the system remains in a safe state, preventing harm to individuals or damage to equipment. + +This is particularly important in **automotive, autonomous driving, medical devices, industrial control, robotics and aerospace** applications, where system failures can lead to severe consequences. + +In software development, Functional Safety focuses on minimizing risks through **software design, testing, and validation** to ensure that critical systems operate in a predictable, reliable, and verifiable manner. This means developers must consider: +- **Error detection mechanisms** +- **Exception handling** +- **Redundancy design** +- **Development processes compliant with safety standards** + +### Definition and Importance of Functional Safety + +The core of Functional Safety lies in **risk management**, which aims to reduce the impact of system failures. + +In autonomous vehicles, Functional Safety ensures that if sensor data is incorrect, the system can enter a **safe state**, preventing incorrect driving decisions. + +The three core objectives of Functional Safety are: +1. **Prevention** + - Reducing the likelihood of errors through rigorous software development processes and testing. In the electric vehicle, the battery systems monitor temperature to prevent overheating. +2. **Detection** + - Quickly identifying errors using built-in diagnostic mechanisms (e.g., Built-in Self-Test, BIST). +3. **Mitigation** + - Controlling the impact of failures to ensure the overall safety of the system. + +This approach is critical in applications such as **autonomous driving, flight control, and medical implants**, where failures can result in **severe consequences**. + +### ISO 26262: Automotive Functional Safety Standard + +[ISO 26262](https://www.iso.org/standard/68383.html) is a functional safety standard specifically for **automotive electronics and software systems**. It defines a comprehensive [V-model](https://en.wikipedia.org/wiki/V-model) aligned safety lifecycle, covering all phases from **requirement analysis, design, development, testing, to maintenance**. + +Key Concepts of ISO 26262: +- **ASIL (Automotive Safety Integrity Level)** + - Evaluates the risk level of different system components (A, B, C, D, where **D represents the highest safety requirement**). + - For example: ASIL A can be Dashboard light failure (low risk) and ASIL D is Brake system failure (high risk). + https://en.wikipedia.org/wiki/Automotive_Safety_Integrity_Level +- **HARA (Hazard Analysis and Risk Assessment)** + - Analyzes hazards and assesses risks to determine necessary safety measures. +- **Safety Mechanisms** + - Includes real-time error detection, system-level fault tolerance, and defined fail-safe or fail-operational fallback states. + +Typical Application Scenarios: +- **Autonomous Driving Systems**: + - Ensures that even if sensors (e.g., LiDAR, radar, cameras) provide faulty data, the vehicle will not make dangerous decisions. +- **Powertrain Control**: + - Prevents braking system failures that could lead to loss of control. +- **Battery Management System (BMS)**: + - Prevents battery overheating or excessive discharge in electric vehicles. + +For more details, you can check this video: [What is Functional Safety?](https://www.youtube.com/watch?v=R0CPzfYHdpQ) + + +### Common Use Cases of Functional Safety in Automotive +- **Autonomous Driving**: + - Ensures the vehicle can operate safely or enter a fail-safe state when sensors like LiDAR, radar, or cameras malfunction. + - Functional Safety enables real-time fault detection and fallback logic to prevent unsafe driving decisions. + +- **Powertrain Control**: + - Monitors throttle and brake signals to prevent unintended acceleration or braking loss. + - Includes redundancy, plausibility checks, and emergency overrides to maintain control under failure conditions. + +- **Battery Management Systems (BMS)**: + - Protects EV batteries from overheating, overcharging, or deep discharge. + - Safety functions include temperature monitoring, voltage balancing, and relay cut-off mechanisms to prevent thermal runaway. + +These use cases highlight the need for a dedicated architectural layer that can enforce Functional Safety principles with real-time guarantees. +A widely adopted approach in modern automotive platforms is the Safety Island—an isolated compute domain designed to execute critical control logic independently of the main system. + +### Safety Island: Enabling Functional Safety in Autonomous Systems + +In automotive systems, a **General ECU (Electronic Control Unit)** typically runs non-critical tasks such as infotainment or navigation, whereas a **Safety Island** is dedicated to executing safety-critical control logic (e.g., braking, steering) with strong isolation, redundancy, and determinism. + +The table below compares the characteristics of a General ECU and a Safety Island in terms of their role in supporting Functional Safety. + +| Feature | General ECU | Safety Island | +|------------------------|----------------------------|--------------------------------------| +| Purpose | Comfort / non-safety logic | Safety-critical decision making | +| OS/Runtime | Linux, Android | RTOS, Hypervisor, or bare-metal | +| Isolation | Soft partitioning | Hard isolation (hardware-enforced) | +| Functional Safety Req | None to moderate | ISO 26262 ASIL-B to ASIL-D compliant | +| Fault Handling | Best-effort recovery | Deterministic safe-state response | + +This contrast highlights why safety-focused software needs a dedicated hardware domain with certified execution behavior. + +**Safety Island** is an independent safety subsystem separate from the main processor. It is responsible for monitoring and managing system safety. If the main processor fails or becomes inoperable, Safety Island can take over critical safety functions such as **deceleration, stopping, and fault handling** to prevent catastrophic system failures. + +Key Capabilities of Safety Island +- **System Health Monitoring** + - Continuously monitors the operational status of the main processor (e.g., ADAS control unit, ECU) and detects potential errors or anomalies. +- **Fault Detection and Isolation** + - Independently evaluates and initiates emergency handling if the main processing unit encounters errors, overheating, computational failures, or unresponsiveness. +- **Providing Essential Safety Functions** + - Even if the main system crashes, Safety Island can still execute minimal safety operations, such as: + - Autonomous Vehicles → Safe stopping (Fail-Safe Mode) + - Industrial Equipment → Emergency power cutoff or speed reduction + + +### Why Safety Island Matters for Functional Safety + +Safety Island plays a critical role in Functional Safety by ensuring that the system can handle high-risk scenarios and minimize catastrophic failures. + +How Safety Island Enhances Functional Safety +1. **Acts as an Independent Redundant Safety Layer** + - Even if the main system fails, it can still operate independently. +2. **Supports ASIL-D Safety Level** + - Monitors ECU health status and executes emergency safety strategies (e.g., emergency braking). +3. **Provides Independent Fault Detection and Recovery Mechanisms** + - **Fail-Safe**: Activates a **safe mode**, such as limiting vehicle speed or switching to manual control. + - **Fail-Operational**: Ensures that high-safety applications (e.g., aerospace systems) can continue operating under certain conditions. + +For more insights on **Arm's Functional Safety solutions**, you can refer to: [Arm Functional Safety Compute Blog](https://community.arm.com/arm-community-blogs/b/automotive-blog/posts/functional-safety-compute) + + +### Functional Safety in the Software Development Lifecycle + +Functional Safety impacts **both hardware and software development**, particularly in areas such as requirement changes, version management, and testing validation. +For example, in ASIL-D level applications, every code modification requires a complete impact analysis and regression testing to ensure that new changes do not introduce additional risks. + +### Functional Safety Requirements in Software Development +These practices ensure the software development process meets industry safety standards and can withstand system-level failures: +- **Requirement Specification** + - Clearly defining **safety-critical requirements** and conducting risk assessments. +- **Safety-Oriented Programming** + - Following **MISRA C, CERT C/C++ standards** and using static analysis tools to detect errors. +- **Fault Handling Mechanisms** + - Implementing **redundancy design and health monitoring** to handle anomalies. +- **Testing and Verification** + - Using **Hardware-in-the-Loop (HIL)** testing to ensure software safety in real hardware environments. +- **Version Management and Change Control** + - Using **Git, JIRA, Polarion** to track changes for safety audits. + +This learning path builds upon the previous containerized [learning path](https://learn.arm.com/learning-paths/automotive/openadkit1_container) guide and introduces Functional Safety design practices from the earliest development stages. + +By establishing an ASIL Partitioning software development environment and leveraging [**SOAFEE**](https://www.soafee.io/) technologies, developers can enhance software consistency and maintainability in Functional Safety applications. diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md b/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md new file mode 100644 index 0000000000..64aab3cab5 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md @@ -0,0 +1,89 @@ +--- +title: How to Use Data Distribution Service (DDS) +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Introduction to DDS +Data Distribution Service (DDS) is a real-time, high-performance middleware designed for distributed systems. +It is particularly valuable in automotive software development, including applications such as **autonomous driving (AD)** and **advanced driver assistance systems (ADAS)**. + +DDS offers a decentralized architecture that enables scalable, low-latency, and reliable data exchange—making it ideal for managing high-frequency sensor streams. + +In modern vehicles, multiple sensors (LiDAR, radar, cameras) must continuously communicate with compute modules. + +DDS ensures these components share data seamlessly and in real time, both within the vehicle and across infrastructure (e.g., V2X systems like traffic lights and road sensors). + + +### Why Automotive Software Needs DDS + +Next-generation automotive software architectures —like [SOAFEE](https://www.soafee.io/)- depend on deterministic, distributed communication. Traditional client-server models introduce latency and single points of failure, while DDS’s publish-subscribe model enables direct, peer-to-peer communication across system components. + +For example, a LiDAR sensor broadcasting obstacle data can simultaneously deliver updates to perception, SLAM, and motion planning modules—without redundant network traffic or central coordination. + +Additionally, DDS provides a flexible Quality of Service (QoS) configuration, allowing engineers to fine-tune communication parameters based on system requirements. Low-latency modes are ideal for real-time decision-making in vehicle control, while high-reliability configurations ensure data integrity in safety-critical applications like V2X communication. + +These capabilities make DDS an essential backbone for autonomous vehicle stacks, where real-time sensor fusion and control coordination are critical for safety and performance. + +### DDS Architecture and Operation + +DDS uses a **data-centric publish-subscribe (DCPS)** model, allowing producers and consumers of data to communicate without direct dependencies. This modular approach enhances system flexibility and maintainability, making it well-suited for complex automotive environments. + +DDS organizes communication within **domains**, which act as isolated scopes. Inside each domain: +- ***Topics*** represent named data streams (e.g., /vehicle/speed, /perception/objects) +- ***DataWriters*** (publishers) send data to topics +- ***DataReaders*** (subscribers) receive data from topics +This structure enables concurrent, decoupled communication between multiple modules without hardcoding communication links. + +Each domain contains multiple **topics**, representing specific data types such as vehicle speed, obstacle detection, or sensor fusion results. **Publishers** use **DataWriters** to send data to these topics, while **subscribers** use **DataReaders** to receive the data. This architecture supports concurrent data processing, ensuring that multiple modules can work with the same data stream simultaneously. + +For example, in an autonomous vehicle, LiDAR, radar, and cameras continuously generate large amounts of sensor data. The perception module subscribes to these sensor topics, processes the data, and then publishes detected objects and road conditions to other components like path planning and motion control. Since DDS automatically handles participant discovery and message distribution, engineers do not need to manually configure communication paths, reducing development complexity. + + +### Real-World Use in Autonomous Driving +DDS is widely used in autonomous driving systems, where real-time data exchange is crucial. A typical use case involves high-frequency sensor data transmission and decision-making coordination between vehicle subsystems. + +For instance, a LiDAR sensor generates millions of data points per second, which need to be shared with multiple modules. DDS allows this data to be published once and received by multiple subscribers, including perception, localization, and mapping components. After processing, the detected objects and road features are forwarded to the path planning module, which calculates the vehicle's next movement. Finally, control commands are sent to the vehicle actuators, ensuring precise execution. + +This real-time data flow must occur within milliseconds to enable safe autonomous driving. DDS ensures minimal transmission delay, enabling rapid response to dynamic road conditions. In emergency scenarios, such as detecting a pedestrian or sudden braking by a nearby vehicle, DDS facilitates instant data propagation, allowing the system to take immediate corrective action. + +For example: [Autoware](https://www.autoware.org/)—an open-source autonomous driving software stack—uses DDS to handle high-throughput communication across its modules. + +The **Perception** stack publishes detected objects from LiDAR and camera sensors to a shared topic, which is then consumed by the **Planning** module in real-time. Using DDS allows each subsystem to scale independently while preserving low-latency and deterministic communication. + +### Publish-Subscribe Model and Data Transmission +Let’s explore how DDS’s publish-subscribe model fundamentally differs from traditional communication methods in terms of scalability, latency, and reliability. + +Traditional client-server communication requires a centralized server to manage data exchange. This architecture introduces several drawbacks, including increased latency and network congestion, which can be problematic in real-time automotive applications. + +DDS adopts a publish-subscribe model, enabling direct communication between system components. Instead of relying on a central entity to relay messages, DDS allows each participant to subscribe to relevant topics and receive updates as soon as new data becomes available. This approach reduces dependency on centralized infrastructure and improves overall system performance. + +For example, in an automotive perception system, LiDAR, radar, and cameras continuously publish sensor data. Multiple subscribers, including object detection, lane recognition, and obstacle avoidance modules, can access this data simultaneously without additional network overhead. DDS automatically manages message distribution, ensuring efficient resource utilization. + +DDS supports multiple transport mechanisms to optimize communication efficiency: +- **Shared memory transport**: Ideal for ultra-low-latency communication within an ECU, minimizing processing overhead. +- **UDP or TCP/IP**: Used for inter-device communication, such as V2X applications where vehicles exchange safety-critical messages. +- **Automatic participant discovery**: Eliminates the need for manual configuration, allowing DDS nodes to detect and establish connections dynamically. + +#### Comparison of DDS and Traditional Communication Methods + +The following table highlights how DDS improves upon traditional client-server communication patterns in the context of real-time automotive applications: + +| **Feature** | **Traditional Client-Server Architecture** | **DDS Publish-Subscribe Model** | +|-----------------------|--------------------------------------------|--------------------------- | +| **Data Transmission** | Relies on a central server | Direct peer-to-peer communication | +| **Latency** | Higher latency | Low latency | +| **Scalability** | Limited by server capacity | Suitable for large-scale systems | +| **Reliability** | Server failure affects the whole system | No single point of failure | +| **Use Cases** | Small-scale applications | V2X, autonomous driving | + +These features make DDS a highly adaptable solution for automotive software engineers seeking to develop scalable, real-time communication frameworks. + +In this section, you learned how DDS enables low-latency, scalable, and fault-tolerant communication for autonomous vehicle systems. + +Its data-centric publish-subscribe architecture eliminates the limitations of traditional client-server models and forms the backbone of modern automotive software frameworks such as ROS 2 and SOAFEE. + +To get started with open-source DDS on Arm platforms, refer to this [installation guide for Cyclonedds](https://learn.arm.com/install-guides/cyclonedds) on how to install open-source DDS on an Arm platform. + diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md b/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md new file mode 100644 index 0000000000..52a0b56131 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md @@ -0,0 +1,372 @@ +--- +title: Split into multiple cloud container instances +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### System Architecture and Component Design + +Now that you’ve explored the concept of a Safety Island -- a dedicated subsystem responsible for executing safety-critical control logic—and learned how DDS (Data Distribution Service) enables real-time, distributed communication, you’ll refactor the original OpenAD Kit architecture into a multi-instance deployment. + +In the [previous learning path](http://learn.arm.com/learning-paths/automotive/openadkit1_container/), OpenAD Kit deployed three container components on a single Arm-based instance, handling: +- ***Simulation environment*** +- ***Visualization*** +- ***Planning-Control*** + +In this session, you will split the simulation and visualization stack from the planning-control logic and deploy them across two independent Arm-based instances. + +These nodes communicate using ROS 2 with DDS as the middleware layer, ensuring low-latency and fault-tolerant data exchange between components. + +### Architectural Benefits +This architecture brings several practical benefits: + +- ***Enhanced System Stability:*** +Decoupling components prevents resource contention and ensures that safety-critical functions remain deterministic and responsive. + +- ***Real-Time, Scalable Communication:*** +DDS enables built-in peer discovery and configurable QoS, removing the need for a central broker or manual network setup. + +- ***Improved Scalability and Performance Tuning:*** +Each instance can be tuned based on its workload—e.g., simulation tasks can use GPU-heavy hardware, while planning logic may benefit from CPU-optimized setups. + +- ***Support for Modular CI/CD Workflows:*** +With containerized separation, you can build, test, and deploy each module independently—enabling agile development and faster iteration cycles. + +![img1 alt-text#center](aws_example.jpg "Figure 1: Split instance example in AWS") + + +### Networking Setting + +To begin, launch two Arm-based instances—either as cloud VMs (e.g., AWS EC2) or on-premise Arm servers. +These instances will independently host your simulation and control workloads. + +{{% notice Note %}} +The specifications of the two Arm instances don’t need to be identical. In my tests, 16 CPUs and 32GB of RAM have already provided good performance. +{{% /notice %}} + +After provisioning the machines, determine where you want the `Planning-Control` container to run. +The other instance will host the `Simulation Environment` and `Visualization` components. + +To enable ROS 2 and DDS communication between the two nodes, configure network access accordingly. +If you are using AWS EC2, both instances should be assigned to the same ***Security Group***. + +Within the EC2 Security Group settings: +- Add an Inbound Rule that allows all traffic from the same Security Group (i.e., set the source to the group itself). +- Outbound traffic is typically allowed by default and usually does not require changes. + +![img2 alt-text#center](security_group.jpg "Figure 2: AWS Security Group Setting") + +This configuration allows automatic discovery and peer-to-peer communication between DDS participants across the two instances. + +Once both systems are operational, record the private IP addresses of each instance. You will need them when configuring CycloneDDS peer discovery in the next step. + +### New Docker YAML Configuration Setting + +Before you begin, ensure that Docker is installed on both of your development instances. +You will also need to clone the demo repository used in the previous learning path. + +First, you need clone the demo repo and create xml file called `cycloneDDS.xml` + +#### Step 1: Clone the repository and prepare configuration files + +```bash +git clone https://github.com/odincodeshen/openadkit_demo.autoware.git + +cd openadkit_demo.autoware +cp docker/docker-compose.yml docker/docker-compose-2ins.yml +touch docker/cycloneDDS.xml +``` + +This will create a duplicate Compose configuration (docker-compose-2ins.yml) and an empty CycloneDDS configuration file to be shared across containers. + +To ensure a smooth experience during testing and simulation, it’s a good idea to pull all required container images before moving on. + +This avoids interruptions in later steps when you run `docker compose up` or `docker compose run`, especially during the cross-instance DDS validation or full scenario launch. + +Run the following command in your project directory: + +```bash +docker compose -f docker-compose.yml pull +``` + +{{% notice info %}} +Each image is around 4–6 GB, so pulling them may vary depending on your network speed. +{{% /notice %}} + +This command will download all images defined in the docker-compose-2ins.yml file, including: +- ***odinlmshen/autoware-simulator:v1.0*** +- ***odinlmshen/autoware-planning-control:v1.0*** +- ***odinlmshen/autoware-visualizer:v1.0*** + +#### Step 2: Configure CycloneDDS for Peer-to-Peer Communication + +The cycloneDDS.xml file is used to customize how CycloneDDS (the middleware used by ROS 2) discovers and communicates between distributed nodes. + +Please copy the following configuration into docker/cycloneDDS.xml on both machines, and replace the IP addresses with the private IPs of each EC2 instance (e.g., 192.168.xx.yy and 192.168.aa.bb): + +```xml + + + + false + + + + + + 1000 + auto + + + + + + + /root/workspace/cyclonelog.log + config + + + +``` + +{{% notice Note %}} +1. Make sure the network interface name (ens5) matches the one on your EC2 instances. You can verify this using ip -br a. +2. This configuration disables multicast and enables static peer discovery between the two machines using unicast. +3. You can find the more detail about CycloneDDS setting [Configuration](https://cyclonedds.io/docs/cyclonedds/latest/config/config_file_reference.html#cyclonedds-domain-internal-socketreceivebuffersize) +{{% /notice %}} + + +#### Step 3: Update the Docker Compose Configuration for Multi-Host Deployment + +To support running containers across two separate hosts, you’ll need to modify the docker/docker-compose-2ins.yml file. +This includes removing inter-container dependencies and updating the network and environment configuration. + +##### Remove Cross-Container Dependency + +Since the planning-control and simulator containers will now run on different machines, you must remove any depends_on references between them to prevent Docker from attempting to start them on the same host. + +```YAML + planning-control: + # Remove this block + # depends_on: + # - simulator +``` + +##### Enable Host Networking +All three containers (visualizer, simulator, planning-control) need access to the host’s network interfaces for DDS-based peer discovery. +Replace Docker's default bridge network with host networking: + +```YAML + visualizer: + network_mode: host +``` + +##### Use CycloneDDS Configuration via Environment Variable + +To ensure that each container uses your custom DDS configuration, mount the current working directory and set the CYCLONEDDS_URI environment variable: + +```YAML + volumes: + - .:/root/workspace + environment: + - CYCLONEDDS_URI=/root/workspace/cycloneDDS.xml +``` + +Add this to every container definition to ensure consistent behavior across the deployment. + +Here is the complete XML file: +```YAML +services: + simulator: + image: odinlmshen/autoware-simulator:v1.0 + container_name: simulator + network_mode: host + volumes: + - ./etc/simulation:/autoware/scenario-sim + - .:/root/workspace + environment: + - ROS_DOMAIN_ID=88 + - RMW_IMPLEMENTATION=rmw_cyclonedds_cpp + - CYCLONEDDS_URI=/root/workspace/cycloneDDS.xml + command: > + ros2 launch scenario_test_runner scenario_test_runner.launch.py + record:=false + scenario:=/autoware/scenario-sim/scenario/yield_maneuver_demo.yaml + sensor_model:=sample_sensor_kit + vehicle_model:=sample_vehicle + initialize_duration:=90 + global_timeout:=$TIMEOUT + global_frame_rate:=20 + launch_autoware:=false + launch_rviz:=false + + planning-control: + image: odinlmshen/autoware-planning-control:v1.0 + container_name: planning-control + network_mode: host + deploy: + volumes: + - ./etc/simulation:/autoware/scenario-sim + - $CONF_FILE:/opt/autoware/share/autoware_launch/config/planning/scenario_planning/lane_driving/behavior_planning/behavior_path_planner/autoware_behavior_path_static_obstacle_avoidance_module/static_obstacle_avoidance.param.yaml + - $COMMON_FILE:/opt/autoware/share/autoware_launch/config/planning/scenario_planning/common/common.param.yaml + - .:/root/workspace + environment: + - ROS_DOMAIN_ID=88 + - RMW_IMPLEMENTATION=rmw_cyclonedds_cpp + - CYCLONEDDS_URI=/root/workspace/cycloneDDS.xml + command: > + ros2 launch autoware_launch planning_simulator.launch.xml + map_path:=/autoware/scenario-sim/map + vehicle_model:=sample_vehicle + sensor_model:=sample_sensor_kit + scenario_simulation:=true + rviz:=false + perception/enable_traffic_light:=false + + visualizer: + image: odinlmshen/autoware-visualizer:v1.0 + network_mode: host + container_name: visualizer + volumes: + - ./etc/simulation:/autoware/scenario-sim + - .:/root/workspace + ports: + - 6080:6080 + - 5999:5999 + environment: + - ROS_DOMAIN_ID=88 + - VNC_ENABLED=true + - RVIZ_CONFIG=/autoware/scenario-sim/rviz/scenario_simulator.rviz + - NGROK_AUTHTOKEN=${NGROK_AUTHTOKEN} + - NGROK_URL=${NGROK_URL} + - CYCLONEDDS_URI=/root/workspace/cycloneDDS.xml + command: >- + sleep infinity +``` + +Before moving to the next step, make sure that `docker-compose-2ins.yml` and `cycloneDDS.xml` are already present on both instances. + +#### Step 4: Optimize Network Settings for DDS Communication + +In a distributed DDS setup, `high-frequency UDP traffic` between nodes may lead to `IP packet fragmentation` or `buffer overflows`, especially under load. +These issues can degrade performance or cause unexpected system behavior. + + +```bash +sudo sysctl net.ipv4.ipfrag_time=3 +sudo sysctl net.ipv4.ipfrag_high_thresh=134217728 +sudo sysctl -w net.core.rmem_max=2147483647 +``` + +Explanation of Parameters +- ***net.ipv4.ipfrag_time=3***: Reduces the timeout for holding incomplete IP fragments, helping free up memory more quickly. +- ***net.ipv4.ipfrag_high_thresh=134217728***: Increases the memory threshold for IP fragment buffers to 128 MB, preventing early drops under high load. +- ***net.core.rmem_max=2147483647***: Expands the maximum socket receive buffer size to support high-throughput DDS traffic. + +To ensure these settings persist after reboot, create a configuration file under /etc/sysctl.d/: + +```bash +sudo bash -c 'cat > /etc/sysctl.d/10-cyclone-max.conf <}} + {{< tab header="Planning-Control" language="bash">}} + !/bin/bash + # Configure the environment variables + export SCRIPT_DIR=/home/ubuntu/openadkit_demo.autoware/docker + CONF_FILE_PASS=$SCRIPT_DIR/etc/simulation/config/pass_static_obstacle_avoidance.param.yaml + CONF_FILE_FAIL=$SCRIPT_DIR/etc/simulation/config/fail_static_obstacle_avoidance.param.yaml + + export CONF_FILE=$CONF_FILE_FAIL + export COMMON_FILE=$SCRIPT_DIR/etc/simulation/config/common.param.yaml + export NGROK_AUTHTOKEN=$NGROK_AUTHTOKEN + export NGROK_URL=$NGROK_URL + + # Start planning-control + echo "Running planning v1.." + TIMEOUT=120 CONF_FILE=$CONF_FILE_PASS docker compose -f "$SCRIPT_DIR/docker-compose-2ins.yml" up planning-control -d + {{< /tab >}} + + {{< tab header="Visualizer & Simulator" language="bash">}} + #!/bin/bash + SCRIPT_DIR=/home/ubuntu/openadkit_demo.autoware/docker + + export CONF_FILE_FAIL=$SCRIPT_DIR/etc/simulation/config/fail_static_obstacle_avoidance.param.yaml + export CONF_FILE=$CONF_FILE_FAIL + export COMMON_FILE=$SCRIPT_DIR/etc/simulation/config/common.param.yaml + export NGROK_AUTHTOKEN=$NGROK_AUTHTOKEN + export NGROK_URL=$NGROK_URL + export TIMEOUT=300 + + # Start visualizer once + docker compose -f "$SCRIPT_DIR/docker-compose-2ins.yml" up visualizer -d + echo "Waiting 10 seconds for visualizer to start..." + sleep 10 + + # Run simulator scenario 3 times + for i in {1..3}; do + echo "Running simulator demo round $i..." + docker compose -f "$SCRIPT_DIR/docker-compose-2ins.yml" run --rm simulator + echo "Round $i complete. Waiting 5 seconds before next run..." + sleep 5 + done + echo "All simulator runs complete." + {{< /tab >}} +{{< /tabpane >}} + +You can also find the prepared launch scripts—`opad_planning.sh` and `opad_sim_vis.sh` —inside the `openadkit_demo.autoware/docker` directory on both instances. + +These scripts encapsulate the required environment variables and container commands for each role. + +#### Running the Distributed OpenAD Kit Demo + +On the Planning-Control node, execute: + +```bash +./opad_planning.sh +``` + +On the Simulation and Visualization node, execute: + +```bash +./opad_sim_vis.sh +``` + +Once both machines are running their respective launch scripts, the Visualizer will generate a web-accessible interface using the machine’s public IP address. +You can open this link in a browser to observe the demo behavior, which will closely resemble the output from the [previous learning path](http://learn.arm.com/learning-paths/automotive/openadkit1_container/4_run_openadkit/). + +![img3 alt-text#center](split_aws_run.gif "Figure 4: Simulation") + +Unlike the previous setup, the containers are now distributed across two separate instances, enabling real-time, cross-node communication. +Behind the scenes, this architecture demonstrates how DDS manages low-latency, peer-to-peer data exchange in a distributed ROS 2 environment. + +To support demonstration and validation, the simulator is configured to run `three times` sequentially, giving you multiple opportunities to observe how data flows between nodes and verify that communication remains stable across each cycle. + +Now that you’ve seen the distributed system in action, consider exploring different QoS settings, network conditions, or even adding a third node to expand the architecture further. diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md b/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md new file mode 100644 index 0000000000..073dc79f51 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md @@ -0,0 +1,54 @@ +--- +title: Prototyping Safety-Critical Isolation for Autonomous Application on Neoverse + +draft: true +cascade: + draft: true + +minutes_to_complete: 60 + +who_is_this_for: This Learning Path targets advanced automotive software engineers developing safety-critical systems. It demonstrates how to use Arm Neoverse cloud infrastructure to accelerate ISO-26262-compliant software prototyping and testing workflows. + +learning_objectives: + - Learn the Functional Safety principles—including risk prevention, fault detection, and ASIL compliance—to design robust and certifiable automotive software systems. + - Understand how DDS enables low-latency, scalable, and fault-tolerant data communication for autonomous driving systems using a publish-subscribe architecture. + - Distributed Development for Functional Safety. Learn how to split the simulation platform into two independent units and leverage distributed development architecture to ensure functional safety. + +prerequisites: + - Two Arm-based Neoverse cloud instances or a local Arm Neoverse Linux computer with at least 16 CPUs and 32GB of RAM. + - Completion of the previous learning path. http://learn.arm.com/learning-paths/automotive/openadkit1_container/ + - Basic knowledge of Docker operations. + +author: + - Odin Shen + - Julien Jayat + +### Tags +skilllevels: Advanced +subjects: Containers and Virtualization +armips: + - Neoverse +tools_software_languages: + - Python + - Docker + - ROS2 +operatingsystems: + - Linux + +further_reading: + - resource: + title: eclipse-zenoh github + link: https://learn.arm.com/learning-paths/automotive/openadkit1_container/ + type: documentation + - resource: + title: Eclipse Cyclone DDS + link: https://github.com/eclipse-cyclonedds/cyclonedds + type: documentation + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/_next-steps.md b/content/learning-paths/automotive/openadkit2_safetyisolation/_next-steps.md similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/_next-steps.md rename to content/learning-paths/automotive/openadkit2_safetyisolation/_next-steps.md diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/aws_example.jpg b/content/learning-paths/automotive/openadkit2_safetyisolation/aws_example.jpg new file mode 100644 index 0000000000..c255da513f Binary files /dev/null and b/content/learning-paths/automotive/openadkit2_safetyisolation/aws_example.jpg differ diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/security_group.jpg b/content/learning-paths/automotive/openadkit2_safetyisolation/security_group.jpg new file mode 100644 index 0000000000..6dcf545c48 Binary files /dev/null and b/content/learning-paths/automotive/openadkit2_safetyisolation/security_group.jpg differ diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/split_aws_run.gif b/content/learning-paths/automotive/openadkit2_safetyisolation/split_aws_run.gif new file mode 100644 index 0000000000..fd36faa13b Binary files /dev/null and b/content/learning-paths/automotive/openadkit2_safetyisolation/split_aws_run.gif differ diff --git a/content/learning-paths/embedded-and-microcontrollers/_index.md b/content/learning-paths/embedded-and-microcontrollers/_index.md index eeb8fd0e51..89d08161fc 100644 --- a/content/learning-paths/embedded-and-microcontrollers/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/_index.md @@ -20,7 +20,7 @@ subjects_filter: - Containers and Virtualization: 6 - Embedded Linux: 4 - Libraries: 3 -- ML: 13 +- ML: 14 - Performance and Architecture: 21 - RTOS Fundamentals: 4 - Security: 2 diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/_index.md b/content/learning-paths/embedded-and-microcontrollers/edge/_index.md similarity index 99% rename from content/learning-paths/embedded-and-microcontrollers/Egde/_index.md rename to content/learning-paths/embedded-and-microcontrollers/edge/_index.md index 72e4f00957..46cc03fa70 100644 --- a/content/learning-paths/embedded-and-microcontrollers/Egde/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/edge/_index.md @@ -24,7 +24,7 @@ prerequisites: author: Bright Edudzi Gershon Kordorwu ### Tags skilllevels: Introductory -subjects: tinyML +subjects: ML armips: - Cortex-M diff --git a/content/learning-paths/embedded-and-microcontrollers/edge/_next-steps.md b/content/learning-paths/embedded-and-microcontrollers/edge/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/edge/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Connect and set up arduino.md b/content/learning-paths/embedded-and-microcontrollers/edge/connect-and-set-up-arduino.md similarity index 93% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Connect and set up arduino.md rename to content/learning-paths/embedded-and-microcontrollers/edge/connect-and-set-up-arduino.md index 0983867b0c..bab4d7638e 100644 --- a/content/learning-paths/embedded-and-microcontrollers/Egde/Connect and set up arduino.md +++ b/content/learning-paths/embedded-and-microcontrollers/edge/connect-and-set-up-arduino.md @@ -10,7 +10,7 @@ layout: learningpathall To get started with your first **TinyML project**, a great option is the **Arduino Nano RP2040 Connect**. Built by Arduino, it uses the powerful **RP2040 microcontroller** and is fully supported by the Arduino core package. The board comes with built-in Wi-Fi, Bluetooth, and an onboard IMU—features that make it ideal for deploying machine learning models at the edge. -![example image alt-text#center](Images/nano.png "Arduino Nano RP2040") +![example image alt-text#center](images/nano.png "Arduino Nano RP2040") Its compatibility with popular tools like Edge Impulse and the Arduino IDE makes it a beginner-friendly yet powerful choice for TinyML applications. You can learn more about the Arduino Nano RP2040 Connect on the [official Arduino website](https://store.arduino.cc/products/arduino-nano-rp2040-connect-with-headers?_gl=1*1laabar*_up*MQ..*_ga*MTk1Nzk5OTUwMS4xNzQ2NTc2NTI4*_ga_NEXN8H46L5*czE3NDY1NzY1MjUkbzEkZzEkdDE3NDY1NzY5NTkkajAkbDAkaDE1MDk0MDg0ODc.). @@ -32,9 +32,9 @@ To visualize the output of the voice command model, we will use a simple LED cir - **Anode (long leg) of the LED** → Connect to **GPIO pin D2** via the 220Ω resistor - **Cathode (short leg)** → Connect to **GND** -![example image alt-text#center](Images/LED_Connection.png "Figure 14. Circuit Connection") +![example image alt-text#center](images/led_connection.png "Figure 14. Circuit Connection") -![example image alt-text#center](Images/LED_Connection_Schematic.png "Figure 15. Circuit Schematic Connection") +![example image alt-text#center](images/led_connection_schematic.png "Figure 15. Circuit Schematic Connection") ### Step 2: Set Up the Arduino IDEs diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/1.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/1.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/1.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/1.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/10.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/10.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/10.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/10.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/11.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/11.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/11.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/11.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/12.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/12.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/12.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/12.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/13.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/13.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/13.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/13.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/14.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/14.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/14.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/14.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/15.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/15.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/15.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/15.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/16.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/16.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/16.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/16.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/17.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/17.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/17.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/17.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/2.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/2.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/2.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/2.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/3.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/3.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/3.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/3.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/3b.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/3b.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/3b.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/3b.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/4.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/4.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/4.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/4.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/5.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/5.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/5.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/5.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/6.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/6.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/6.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/6.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/7.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/7.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/7.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/7.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/8.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/8.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/8.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/8.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/9.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/9.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/9.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/9.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/LED_Connection.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/led_connection.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/LED_Connection.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/led_connection.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/LED_Connection_Schematic.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/led_connection_schematic.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/LED_Connection_Schematic.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/led_connection_schematic.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/nano.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/nano.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/nano.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/nano.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Images/Serial_monitor.png b/content/learning-paths/embedded-and-microcontrollers/edge/images/serial_monitor.png similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Images/Serial_monitor.png rename to content/learning-paths/embedded-and-microcontrollers/edge/images/serial_monitor.png diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Overview.md b/content/learning-paths/embedded-and-microcontrollers/edge/overview.md similarity index 100% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Overview.md rename to content/learning-paths/embedded-and-microcontrollers/edge/overview.md diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Program and deployment.md b/content/learning-paths/embedded-and-microcontrollers/edge/program-and-deployment.md similarity index 99% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Program and deployment.md rename to content/learning-paths/embedded-and-microcontrollers/edge/program-and-deployment.md index c44a096124..4993fcfe29 100644 --- a/content/learning-paths/embedded-and-microcontrollers/Egde/Program and deployment.md +++ b/content/learning-paths/embedded-and-microcontrollers/edge/program-and-deployment.md @@ -325,7 +325,7 @@ These messages indicate that your model is working and processing voice input as Your Serial Monitor should look like the image below. -![example image alt-text#center](Images/Serial_monitor.png "Figure 16. Circuit Connection") +![example image alt-text#center](images/serial_monitor.png "Figure 16. Circuit Connection") {{% notice Congratulations %}} You’ve successfully programmed your first TinyML microcontroller! You've also built a functional, smart system to control an LED with your voice. diff --git a/content/learning-paths/embedded-and-microcontrollers/Egde/Software_Edge_Impulse.md b/content/learning-paths/embedded-and-microcontrollers/edge/software-edge-impulse.md similarity index 92% rename from content/learning-paths/embedded-and-microcontrollers/Egde/Software_Edge_Impulse.md rename to content/learning-paths/embedded-and-microcontrollers/edge/software-edge-impulse.md index 62fe07e8ae..5d790d22ec 100644 --- a/content/learning-paths/embedded-and-microcontrollers/Egde/Software_Edge_Impulse.md +++ b/content/learning-paths/embedded-and-microcontrollers/edge/software-edge-impulse.md @@ -37,13 +37,13 @@ Now that the foundational concepts of TinyML and Edge AI are clear, it's time to To begin working with TinyML models, visit the **[Edge Impulse](https://edgeimpulse.com)**. You’ll need to create a free account to access the full platform. In the following sections, you will walk through each key page on the Edge Impulse platform using the attached snapshots as guide. These will help you understand what actions to take and how each part of the interface contributes to building and deploying your machine learning model. -![example image alt-text#center](Images/1.png "Figure 1. Home Page of Edge Impulse") +![example image alt-text#center](images/1.png "Figure 1. Home Page of Edge Impulse") ### Step 1: Create a New Project Once you’ve created your account and logged in, the first step is to **create a new project**. Give your project a name that clearly reflects its purpose—this helps with easy identification, especially if you plan to build multiple models later on. For example, if you're building a keyword spotting model, you might name it "Wake Word Detection". You’ll also need to select the appropriate **project type** and **project setting**, as shown in the snapshot below. -![example image alt-text#center](Images/3.png "Figure 2. New Project Setup") +![example image alt-text#center](images/3.png "Figure 2. New Project Setup") ### Step 2: Configure the Target Device @@ -53,7 +53,7 @@ The specifications of the Arduino Nano RP2040 Connect board can be found on [Ard Follow the exact settings in the attached snapshot to complete the configuration. -![example image alt-text#center](Images/4.png "Figure 3. Configure Arduino Nano RP2040") +![example image alt-text#center](images/4.png "Figure 3. Configure Arduino Nano RP2040") ### Step 3: Add the Dataset @@ -61,13 +61,13 @@ With your device configured, the next step is to **add your dataset** to the pro The dataset for this project can be downloaded from the following link: [Download Dataset](https://github.com/e-dudzi/Learning-Path.git). The Dataset has already been split into **training** and **testing**. -![example image alt-text#center](Images/6.png "Figure 4. Add Existing Data") +![example image alt-text#center](images/6.png "Figure 4. Add Existing Data") {{% notice Note %}} Do **not** check the **Green** highlighted area during upload. The dataset already includes metadata. Enabling that option may result in **much slower upload times** and is unnecessary for this project. {{% /notice %}} -![example image alt-text#center](Images/7.png "Figure 5. Dataset Overview") +![example image alt-text#center](images/7.png "Figure 5. Dataset Overview") ### Dataset Uploaded Successfully @@ -77,7 +77,7 @@ This is what you should see after the dataset has been successfully uploaded. Th This dataset is made up of **four labels**: `on`, `off`, `noise`, and `unknown`. {{% /notice %}} -![example image alt-text#center](Images/8.png "Figure 6. Dataset Overview") +![example image alt-text#center](images/8.png "Figure 6. Dataset Overview") ### Step 4: Create the Impulse @@ -85,7 +85,7 @@ Now that your data is ready, it's time to create the **impulse**, which defines After configuring everything, **don’t forget to save your impulse**. -![example image alt-text#center](Images/9.png "Figure 7. Create Impulse") +![example image alt-text#center](images/9.png "Figure 7. Create Impulse") ### Step 5: Configure the MFCC Block @@ -93,7 +93,7 @@ Next, you'll configure the **MFCC (Mel Frequency Cepstral Coefficients)** proces Set the parameters exactly as shown in the snapshot below. These settings determine how the audio input is broken down and analyzed. Once you're done, be sure to **save the parameters**. These parameters are chosen for this path. Modifications can be made once you are familiar with Edge Impulse. -![example image alt-text#center](Images/10.png "Figure 8. MFCC Block Configuration") +![example image alt-text#center](images/10.png "Figure 8. MFCC Block Configuration") {{% notice Note %}} The **green highlighted section** on the MFCC configuration page gives an estimate of how the model will perform **on the target device**. This includes information like memory usage (RAM/Flash) and latency, helping you ensure the model fits within the constraints of your hardware. @@ -105,7 +105,7 @@ After saving the MFCC parameters, the next step is to generate features from you Once the feature generation is complete, you'll see a **2D visualization plot** that shows how the dataset is distributed across the four labels: `on`, `off`, `noise`, and `unknown`. This helps to visually confirm whether the different classes are well-separated and learnable by the model. -![example image alt-text#center](Images/12.png "Figure 9. Feature Explorer") +![example image alt-text#center](images/12.png "Figure 9. Feature Explorer") ### Step 7: Setting Up the Classifier @@ -117,13 +117,13 @@ For this learning path, a learning rate of `0.002` was chosen, although the snap Once all the parameters are set, click on **"Save and train"** to start training your model. -![example image alt-text#center](Images/13.png "Figure 10. Classifier Settings") +![example image alt-text#center](images/13.png "Figure 10. Classifier Settings") ### Step 8: Reviewing Model Performance After the training process is complete, Edge Impulse will display the **model's performance**, including its overall **accuracy**, **loss**, and a **confusion matrix**. -![example image alt-text#center](Images/14.png "Figure 11. Model Performance") +![example image alt-text#center](images/14.png "Figure 11. Model Performance") - **Accuracy** reflects how often the model predicts the correct label. - **Loss** indicates how far the model’s predictions are from the actual labels during training — a lower loss generally means better performance. @@ -139,7 +139,7 @@ Review these metrics to determine if the model is learning effectively. If neede | Peak RAM Usage | 12.5 KB | | Flash Usage | 49.7 KB | -![example image alt-text#center](Images/15.png "Figure 12. Model Performance") +![example image alt-text#center](images/15.png "Figure 12. Model Performance") You can also [download](https://github.com/e-dudzi/Learning-Path.git) a pre-trained model and continue from here. @@ -152,7 +152,7 @@ To use the trained model on your Arduino Nano RP2040, follow the steps below to 3. Select **Arduino library** from the list. 4. The export process will start automatically, and the model will be downloaded as a `.zip` file. -![example image alt-text#center](Images/16.png "Figure 13. Model Deployment") +![example image alt-text#center](images/16.png "Figure 13. Model Deployment") ## Next Steps diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/_index.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/_index.md index 103dc30bdf..59fa4fd0e8 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/_index.md @@ -3,13 +3,9 @@ title: Run and Debug a Linux Software Stack on Arm Virtual Platforms minutes_to_complete: 180 -draft: true -cascade: - draft: true - who_is_this_for: This introductory topic is designed for developers interested in running Linux on Arm Fixed Virtual Platforms (FVPs) and debugging Trusted Firmware-A and the Linux Kernel using Arm Development Studio. -learning_objectives: +learning_objectives: - Run a Linux software stack using Arm Fixed Virtual Platforms. - Debug the firmware and Linux kernel using Arm Development Studio. @@ -19,6 +15,10 @@ prerequisites: author: Qixiang Xu +draft: true +cascade: + draft: true + ### Tags skilllevels: Introductory subjects: Embedded Linux @@ -33,7 +33,7 @@ tools_software_languages: further_reading: - resource: - title: Fast Models Fixed Virtual Platforms Reference Guide + title: Fast Models Fixed Virtual Platforms Reference Guide link: https://developer.arm.com/documentation/100966/ type: documentation - resource: diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/debug.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/debug.md index b01f537bb9..cfde22743a 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/debug.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/debug.md @@ -6,90 +6,32 @@ weight: 6 layout: learningpathall --- -## Debug using Arm Development Studio - -To debug TF-A and Linux kernel, you can use the Arm Development Studio (Arm DS) debugger. You can get it from the [Arm Development Studio download page](https://developer.arm.com/downloads/view/DS000B). - -DWARF 5 is the default option in GCC 11 and Arm Development Studio version v2022.2 includes initial support for DWARF 5 debug information. - -If your GCC version is later than GCC 11, download the latest Arm Development Studio to get support for DWARF 5. - -Arm Development Studio includes the following Base FVP models: - -* FVP_Base_Cortex-A32x1 -* FVP_Base_Cortex-A35x1 -* FVP_Base_Cortex-A510x2 -* FVP_Base_Cortex-A53x1 -* FVP_Base_Cortex-A55x1 -* FVP_Base_Cortex-A55x4+Cortex-A75x2 -* FVP_Base_Cortex-A55x4+Cortex-A76x2 -* FVP_Base_Cortex-A57x1 -* FVP_Base_Cortex-A57x2-A53x4 -* FVP_Base_Cortex-A65AEx2 -* FVP_Base_Cortex-A65x2 -* FVP_Base_Cortex-A710x2 -* FVP_Base_Cortex-A72x1 -* FVP_Base_Cortex-A72x2-A53x4 -* FVP_Base_Cortex-A73x1 -* FVP_Base_Cortex-A73x2-A53x4 -* FVP_Base_Cortex-A75x1 -* FVP_Base_Cortex-A76AEx2 -* FVP_Base_Cortex-A76x1 -* FVP_Base_Cortex-A77x2 -* FVP_Base_Cortex-A78AEx2 -* FVP_Base_Cortex-A78Cx2 -* FVP_Base_Cortex-A78x2 -* FVP_Base_Cortex-X1Cx2 -* FVP_Base_Cortex-X1x2 -* FVP_Base_Cortex-X2x2 - -Arm FVPs that are not provided by Arm DS installation must be defined in the PATH environment variable to be available for Arm Development Studio. Otherwise, you might get the following error when starting the debug connection. - -![Connection Failed Screen #center](failed.png") - -For Linux, set up the PATH in the appropriate shell configuration file. - -For example, add the following line in your `~/.bashrc` file: - -```console -export PATH=/bin:$PATH -``` - -After changing the search PATH, you need to start Arm Development Studio from a terminal with the new PATH. - -Start Development Studio by running the `armds_ide` command: - -```console -/opt/arm/developmentstudio-2022.2/bin/armds_ide -``` - -![Arm DS IDE #center](armds_ide.png) - -## FVP Debug Connection - -Before debugging, the FVP model you want to use must be available in the Arm DS configuration database so that you can select it in the Model Connection dialog box. - -If the FVP model is not available, you must import it and create a new model configuration. For details, see [create a new model](https://developer.arm.com/documentation/101470/2022-2/Platform-Configuration/Model-targets/Create-a-new-model-configuration). +## Debug the Software Stack with Arm Development Studio -Most CPU FVP models are available for your edition of Arm DS and the FVPs are listed under the Arm FVP (Installed with Arm DS) and Arm FVP as shown in the following figure: +Once your software stack is running on the FVP, you can debug Trusted Firmware-A and the Linux kernel using Arm Development Studio (Arm DS). -![Debug Configurations screen #center](debug_config.png) +### Step 1: Install Arm Development Studio -To use Arm DS to connect to an FVP model for bare-metal debugging, perform the following steps: +Download and install the latest version from the [Arm Development Studio download page](https://developer.arm.com/downloads/view/DS000B). -1. From the Arm Development Studio main menu, select Run > Debug Configurations. +DWARF 5 is enabled by default in GCC 11 and later. Arm DS v2022.2 or newer is recommended to support DWARF 5 debug information. -2. Select the Generic Arm C/C++ Application from the left panel and click the new launch configuration button to create a new debug configuration. -3. In the Connection tab, select the target and connection settings: -In the Select target panel confirm the target selected. For example, select Arm FVP (Installed with Arm DS) Base_A55 x4 > Bare Metal Debug: +Launch Arm DS: +``` +/opt/arm/developmentstudio-2022.2/bin/armds_ide +``` -![Select target #center](Select_target.png) -Specify the Model parameters under the Connections. The model parameters are similar to those listed below. These parameters are described at the section Run software stack on FVP. Different CPU FVPs might have different parameters. +### Step 2: Create a Debug Configuration +1. Open Arm DS, go to Run > Debug Configurations. +2. Select Generic Arm C/C++ Application and create a new configuration. +3. In the Connection tab: + - Choose your FVP model (e.g., Base_A55x4). + - Enter model parameters: -```console +```output -C pctl.startup=0.0.0.0 \ -C bp.secure_memory=0 \ -C cache_state_modelled=0 \ @@ -108,27 +50,40 @@ Specify the Model parameters under the Connections. The model parameters are sim --data cluster0.cpu0=/output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb@0x83000000 ``` -4. Configure debugger settings in the Debugger +### Step 3: Load Debug Symbols -In Run control, choose Connect only to the target. +In the Debugger tab: +- Select “Connect only to the target.” +- Enable Execute debugger commands and add: +```output +add-symbol-file "~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl1/bl1.elf" EL3:0 +add-symbol-file "~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl2/bl2.elf" EL1S:0 +add-symbol-file "~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl31/bl31.elf" EL3:0 +add-symbol-file "~/arm/sw/cpufvp-a/linux/out/aemfvp-a/defconfig/vmlinux" EL2N:0 +``` -5. Select the Execute debugger commands option and add load symbols commands if you want to debug your application at source level. For example, add TF-A and Linux kernel debug symbols as follows: +Click Apply and then Close. -``` -add-symbol-file “~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl1/bl1.elf” EL3:0 -add-symbol-file “~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl2/bl2.elf” EL1S:0 -add-symbol-file “~/arm/sw/cpufvp-a/arm-tf/build/fvp/debug/bl31/bl31.elf” EL3:0 -add-symbol-file “~/arm/sw/cpufvp-a/linux/out/aemfvp-a/defconfig/vmlinux” EL2N:0 -``` +### Step 4: Start Debugging + +1. In the Debug Control view, double-click your new configuration. +2. Wait for the target to connect and symbols to load. +3. Set breakpoints, step through code, and inspect registers or memory. + +You might get the following error when starting the debug connection. -6. Click Apply and then Close to save the configuration settings and close the Debug Configurations dialog box. +![Connection Failed Screen #center](failed.png) -7. In the Debug Control view, double-click the debug configuration that you create. +This means your Arm FVP is not provided by default in the Arm DS installation. Set the `PATH` in this case: -This step starts the debug connection, loads the application on the model, and loads the debug information into the debugger. +```bash +export PATH=/bin:$PATH +``` -8. Set breakpoints and click Continue running application to continue running your FVP. +{{% notice tip %}} +Ensure your FVP instance is running and matches the model and parameters selected in Arm DS. +{{% /notice %}} -9. After these steps, you can debug the software stack as shown in the following figure: +After these steps, you can debug the software stack as shown in the following figure: ![FVP running #center](Select_target.png) diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/example-picture.png b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/example-picture.png deleted file mode 100644 index c69844bed4..0000000000 Binary files a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/example-picture.png and /dev/null differ diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/intro.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/intro.md index 06e9d130f5..5845609060 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/intro.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/intro.md @@ -1,76 +1,46 @@ --- -title: Introduction to Arm Ecosystem Fixed Virtual Platforms +title: Introduction to Arm Fixed Virtual Platforms (FVPs) weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -Arm Ecosystem Fixed Virtual Platforms (FVPs) model hardware subsystems and target different market segments and applications. +Arm Fixed Virtual Platforms (FVPs) are simulation models that let you run and test full software stacks on Arm systems before physical hardware is available. They replicate the behavior of Arm CPUs, memory, and peripherals using fast binary translation. -FVPs use binary translation technology to deliver fast, functional simulations of Arm-based systems, including processor, memory, and peripherals. They implement a programmer's view suitable for software development and enable execution of full software stacks, providing an available platform to run software before silicon is available. +### Why Use FVPs? +FVPs are useful for developers who want to: +- Prototype software before silicon availability +- Debug firmware and kernel issues +- Simulate multicore systems -Arm provides two different types of FVPs. +FVPs provide a programmer's view of the hardware, making them ideal for system bring-up, kernel porting, and low-level debugging. -## Arm Ecosystem FVPs +### Freely Available Arm Ecosystem FVPs +Several pre-built Armv8-A FVPs can be downloaded for free from the [Arm Ecosystem Models](https://developer.arm.com/Tools%20and%20Software/Fixed%20Virtual%20Platforms#Downloads) page. Categories include: +- Architecture +- Automotive +- Infrastructure +- IoT -There are several freely available, pre-built Armv8‑A FVPs for download from [Arm Ecosystem Models](https://developer.arm.com/Tools%20and%20Software/Fixed%20Virtual%20Platforms#Downloads) on the Arm Developer website. You can use these FVPs without a license. +A popular model is the **AEMv8-A Base Platform RevC**, which supports Armv8.7 and Armv9-A. The [Arm reference software stack](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) is designed for this model. -There are multiple categories of Ecosystem FVPs such as: -- Architecture FVPs -- Automotive FVPs -- Infrastructure FVPs -- IoT FVPs +### CPU-Specific Arm Base FVPs +Other FVPs target specific CPU types and come pre-configured with a fixed number of cores. These are often called **CPU FVPs**. -For example, in the architecture category, the AEMv8-A Base Platform RevC FVP is freely available, and it supports the latest Armv8‑A architecture versions up to v8.7 and Armv9-A. +Here are some examples: +- FVP_Base_Cortex-A55x4 +- FVP_Base_Cortex-A72x4 +- FVP_Base_Cortex-A78x4 +- FVP_Base_Cortex-A510x4+Cortex-A710x4 -The [Arm reference software stack](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) is based on the above RevC model. +To use these, request access via [support@arm.com](mailto:support@arm.com). -## Arm Base FVPs specific CPU types +### Setting Up Your Environment +This Learning Path uses the [Arm reference software stack](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst). -Arm Base Armv8-A FVPs with specific CPU types are configured with a fixed number of cores. These are also called CPU FVPs because they specify the CPU types instead of the architecture version. - -The FVP_Base_Cortex-\ FVP is available for you to build and run on Linux computers. Contact Arm Support [support@arm.com](mailto:support@arm.com) to request access. - -You can use any of the FVPs listed below to run the reference software stack: - -* FVP_Base_Cortex-A510x4 -* FVP_Base_Cortex-A510x4+Cortex-A710x4 -* FVP_Base_Cortex-A53x4 -* FVP_Base_Cortex-A55x4 -* FVP_Base_Cortex-A55x4+Cortex-A75x4 -* FVP_Base_Cortex-A55x4+Cortex-A78x4 -* FVP_Base_Cortex-A57x2-A35x4 -* FVP_Base_Cortex-A57x2-A53x4 -* FVP_Base_Cortex-A57x4 -* FVP_Base_Cortex-A57x4-A35x4 -* FVP_Base_Cortex-A57x4-A53x4 -* FVP_Base_Cortex-A65AEx4 -* FVP_Base_Cortex-A65AEx4+Cortex-A76AEx4 -* FVP_Base_Cortex-A65x4 -* FVP_Base_Cortex-A710x4 -* FVP_Base_Cortex-A72x2-A53x4 -* FVP_Base_Cortex-A72x4 -* FVP_Base_Cortex-A72x4-A53x4 -* FVP_Base_Cortex-A73x2-A53x4 -* FVP_Base_Cortex-A73x4 -* FVP_Base_Cortex-A73x4-A53x4 -* FVP_Base_Cortex-A75x4 -* FVP_Base_Cortex-A76AEx4 -* FVP_Base_Cortex-A76x4 -* FVP_Base_Cortex-A77x4 -* FVP_Base_Cortex-A78AEx4 -* FVP_Base_Cortex-A78Cx4 -* FVP_Base_Cortex-A78x4 -* FVP_Base_Cortex-X1Cx4 -* FVP_Base_Cortex-X1x4 -* FVP_Base_Cortex-X2x4 -* FVP_Base_Neoverse-E1x4 -* FVP_Base_Neoverse-N1x4 - -### Set up the environment - -This Learning Path uses the [Arm reference software](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) stack. - -Follow the [Armv-A Base AEM FVP Platform Software User Guide](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) to set up the environment, download the software stack, and get the toolchain. +To get started: +1. Follow the software user guide to download the stack. +2. Set up the required toolchain and environment variables. +Once configured, you’ll be ready to run and debug Linux on your selected Arm FVP model. \ No newline at end of file diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/modify.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/modify.md index eca21d7c95..7a4f4ea097 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/modify.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/modify.md @@ -6,244 +6,63 @@ weight: 4 layout: learningpathall --- -## Remove PCI and SMMU nodes - -CPU FVPs do not include PCI and SMMU support. To run the reference software stack on CPU FVPs remove the PCI and SMMU nodes from the device tree. - -If you build the software without modifying the device tree, you get the following error message after running the software stack: - -```console -[ 0.563774] pci-host-generic 40000000.pci: host bridge /pci@40000000 ranges: -[ 0.563972] pci-host-generic 40000000.pci: MEM 0x0050000000..0x005fffffff -> 0x0050000000 -[ 0.564233] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x4fffffff] for [bus 00-01] -[ 0.564878] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00 -[ 0.565014] pci_bus 0000:00: root bus resource [bus 00-01] -[ 0.565138] pci_bus 0000:00: root bus resource [mem 0x50000000-0x5fffffff] -[ 0.565266] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP -[ 0.565366] Modules linked in: -[ 0.565448] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.13 #1 -[ 0.565534] Hardware name: FVP Base RevC (DT) -[ 0.565621] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO BTYPE=--) -[ 0.565746] pc : pci_generic_config_read+0x3c/0xe4 -[ 0.565832] lr : pci_generic_config_read+0x28/0xe4 -[ 0.565966] sp : ffff80001205b960 -[ 0.566006] x29: ffff80001205b960 x28: 0000000000000000 x27: ffff8000116904b8 -[ 0.566179] x26: ffff800011741060 x25: 0000000000000000 x24: ffff800011fca8b8 -[ 0.566366] x23: 0000000000000000 x22: ffff80001205ba64 x21: 0000000000000087 -[ 0.566566] x20: 0000000000000004 x19: ffff80001205b9c4 x18: 0000000000000030 -[ 0.566723] x17: 000000000000003f x16: 000000000000000b x15: ffffffffffffffff -[ 0.566923] x14: ffff80009205b697 x13: ffff800011ca2a30 x12: 0000000000000267 -[ 0.567096] x11: 00000000000000cd x10: ffff800011cfaa30 x9 : 00000000fffff000 -[ 0.567270] x8 : ffff800011ca2a30 x7 : ffff800011cfaa30 x6 : 0000000000000001 -[ 0.567443] x5 : 0000000000000000 x4 : ffff800012800000 x3 : 0000000000000000 -[ 0.567624] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800012800000 -[ 0.567790] Call trace: -[ 0.567866] pci_generic_config_read+0x3c/0xe4 -[ 0.567966] pci_bus_read_config_dword+0x7c/0xd0 -[ 0.568066] pci_bus_generic_read_dev_vendor_id+0x38/0x1b4 -[ 0.568198] pci_scan_single_device+0xa0/0x150 -[ 0.568311] pci_scan_slot+0x44/0x130 -[ 0.568435] pci_scan_child_bus_extend+0x54/0x2a0 -[ 0.568525] pci_scan_root_bus_bridge+0x68/0xdc -[ 0.568666] pci_host_probe+0x1c/0xc4 -[ 0.568782] pci_host_common_probe+0x11c/0x19c -[ 0.568918] platform_probe+0x6c/0xdc -[ 0.569017] really_probe+0xe4/0x510 -[ 0.569129] driver_probe_device+0x64/0xc4 -[ 0.569263] device_driver_attach+0xc4/0xd0 -[ 0.569366] __driver_attach+0x94/0x134 -[ 0.569508] bus_for_each_dev+0x70/0xd0 -[ 0.569613] driver_attach+0x28/0x34 -[ 0.569737] bus_add_driver+0x108/0x1f0 -[ 0.569836] driver_register+0x7c/0x130 -[ 0.569966] __platform_driver_register+0x2c/0x40 -[ 0.570084] gen_pci_driver_init+0x20/0x2c -[ 0.570171] do_one_initcall+0x50/0x1b0 -[ 0.570266] kernel_init_freeable+0x220/0x2a4 -[ 0.570409] kernel_init+0x18/0x124 -[ 0.570491] ret_from_fork+0x10/0x30 -[ 0.570566] Code: 7100069f 540001c0 71000a9f 54000300 (b9400001) -[ 0.570778] ---[ end trace 6db9afd6e7186a9a ]--- -[ 0.570827] note: swapper/0[1] exited with preempt_count 1 -[ 0.570952] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b -[ 0.571066] SMP: stopping secondary CPUs -[ 0.571147] Kernel Offset: disabled -[ 0.571174] CPU features: 0x10000081,a3300e46 -[ 0.571266] Memory Limit: none -[ 0.571366] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- -``` +## Modify the Device Tree for CPU FVPs -The [Arm reference software stack](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) uses the following device tree: +To run Linux on Arm CPU FVPs, you need to adjust the device tree to match the hardware features of these platforms. This involves removing unsupported nodes (like SMMU (System Memory Management Unit)and PCI (Peripheral Component Interconnect)) and ensuring CPU affinity values are set correctly. -```output -linux/arch/arm64/boot/dts/arm/fvp-base-revc.dts -``` - -Use a text editor to delete the following lines from the device tree: - -```console -155 pci: pci@40000000 { -156 #address-cells = <0x3>; -157 #size-cells = <0x2>; -158 #interrupt-cells = <0x1>; -159 compatible = "pci-host-ecam-generic"; -160 device_type = "pci"; -161 bus-range = <0x0 0x1>; -162 reg = <0x0 0x40000000 0x0 0x10000000>; -163 ranges = <0x2000000 0x0 0x50000000 0x0 0x50000000 0x0 0x10000000>; -164 interrupt-map = <0 0 0 1 &gic 0 0 GIC_SPI 168 IRQ_TYPE_LEVEL_HIGH>, -165 <0 0 0 2 &gic 0 0 GIC_SPI 169 IRQ_TYPE_LEVEL_HIGH>, -166 <0 0 0 3 &gic 0 0 GIC_SPI 170 IRQ_TYPE_LEVEL_HIGH>, -167 <0 0 0 4 &gic 0 0 GIC_SPI 171 IRQ_TYPE_LEVEL_HIGH>; -168 interrupt-map-mask = <0x0 0x0 0x0 0x7>; -169 msi-map = <0x0 &its 0x0 0x10000>; -170 iommu-map = <0x0 &smmu 0x0 0x10000>; -171 -172 dma-coherent; -173 }; -174 -175 smmu: iommu@2b400000 { -176 compatible = "arm,smmu-v3"; -177 reg = <0x0 0x2b400000 0x0 0x100000>; -178 interrupts = , -179 , -180 , -181 ; -182 interrupt-names = "eventq", "gerror", "priq", "cmdq-sync"; -183 dma-coherent; -184 #iommu-cells = <1>; -185 msi-parent = <&its 0x10000>; -186 }; -``` +### Step 1: Remove PCI and SMMU Nodes -Rebuild Linux using the following commands: +CPU FVPs don't support PCI and SMMU. If you don't remove these nodes, Linux will crash at boot with a kernel panic. -```console -./build-scripts/build-linux.sh -p aemfvp-a -f busybox clean -./build-scripts/build-linux.sh -p aemfvp-a -f busybox build +1. Open the device tree file in a text editor: +```bash +vim linux/arch/arm64/boot/dts/arm/fvp-base-revc.dts ``` +2. Delete the following two blocks: +- `pci@40000000` +- `iommu@2b400000` -Package the built Linux to the BusyBox disk image by using the following command: - -```console -./build-scripts/aemfvp-a/build-test-busybox.sh -p aemfvp-a package -``` - -### Modify CPU nodes - -On Arm64 SMP systems, cores are identified by the MPIDR_EL1 register. The MPIDR_EL1 in the [Arm Architecture Reference Manual](https://developer.arm.com/documentation/ddi0487/latest) does not provide strict enforcement of MPIDR_EL1 layout. - -For example, Cortex-A55 has a layout different from the layout of Cortex-A53. - -The following two tables compare the MPIDR_EL1 Layout between Cortex-A55 and Cortex-A53. - -![Cortex-A55 MPIDR_EL1 Layout #center](cortex-a55_MPIDR_EL1.png) - -![Cortex-A53 MPIDR_EL1 Layout #center](Cortex-a53_MPIDR_EL1.png) - -Linux Kernel boots the CPU cores according to the affinity values in the device tree. However, different CPU FVP platforms use different affinity values. If the affinity value in the device tree is not consistent with the CPU FVP platform, you might get the following error: +{{% notice warning %}} +If you skip this, you’ll get an error like: ```output -[ 0.023728] psci: failed to boot CPU2 (-22) -[ 0.023948] CPU2: failed to boot: -22 -[ 0.025770] psci: failed to boot CPU3 (-22) -[ 0.025867] CPU3: failed to boot: -22 -[ 0.027567] psci: failed to boot CPU4 (-22) -[ 0.027679] CPU4: failed to boot: -22 -[ 0.029541] psci: failed to boot CPU5 (-22) -[ 0.029615] CPU5: failed to boot: -22 -[ 0.031410] psci: failed to boot CPU6 (-22) -[ 0.031467] CPU6: failed to boot: -22 -[ 0.033319] psci: failed to boot CPU7 (-22) -[ 0.033383] CPU7: failed to boot: -22 +Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ``` +{{% /notice %}} -Affinity values in the device tree must be consistent with the affinity values in the FVP platform. Perform the following steps to modify affinity values: +### Step 2: Set CPU Affinity Values -### Get correct affinity value for the FVP platforms - -The parameter pctl.CPU-affinities in the FVP platform shows the correct affinity values, for example: - -```console -$FVP_Base_Cortex-A55x4 -l |grep pctl.CPU-affinities -pctl.CPU-affinities=0.0.0.0, 0.0.1.0, 0.0.2.0, 0.0.3.0 -$FVP_Base_Cortex-A57x2-A53x4 -l |grep pctl.CPU-affinities -pctl.CPU-affinities=0.0.0.0, 0.0.0.1, 0.0.1.0, 0.0.1.1, 0.0.1.2, 0.0.1.3 -``` - -The CPU affinity values on the FVP_Base_Cortex-A55x4 are: - -```console -0x0,0x0100,0x0200,0x300 +Each FVP model uses specific CPU affinity values. If these don’t match what’s in the device tree, some CPU cores won’t boot. +1. Find the correct affinities: +```bash +FVP_Base_Cortex-A55x4 -l | grep pctl.CPU-affinities ``` -The CPU affinity values on the FVP_Base_Cortex-A57x2-A53x4 are: +Example output: -```console -0x0,0x1,0x0100,0x0101,0x0102,0x0103 +```output +pctl.CPU-affinities=0.0.0.0, 0.0.1.0, 0.0.2.0, 0.0.3.0 ``` -### Modify the affinity values in the device tree to be consistent with the FVP platform +2. Convert each to hex for the reg field: -You must change the affinity value in the device tree: - -```console -linux/arch/arm64/boot/dts/arm/fvp-base-revc.dts +```output +0x0, 0x0100, 0x0200, 0x0300 ``` -Consider FVP_Base_Cortex-A57x2-A53x4 as an example: - -```console -cpu0: cpu@0 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x0 0x000>; - enable-method = "psci"; -}; -cpu1: cpu@1 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x0 0x01>; - enable-method = "psci"; -}; -cpu2: cpu@100 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x0 0x100>; - enable-method = "psci"; -}; -cpu3: cpu@101 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x00 0x101>; - enable-method = "psci"; -}; -cpu4: cpu@102 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x00 0x102>; - enable-method = "psci"; -}; -cpu5: cpu@103 { - device_type = "cpu"; - compatible = "arm,armv8"; - reg = <0x00 0x103>; - enable-method = "psci"; -}; -``` +3. Update the CPU nodes in your device tree file to use these reg values. -### Remove unnecessary CPU nodes +{{% notice tip %}} +To avoid boot errors like psci: failed to boot CPUx (-22), make sure every cpu@xxx entry matches the FVP layout. +{{% /notice %}} -Remove unnecessary CPU nodes from the device tree, according to the CPU numbers of the FVP platform. +### Step 3: Rebuild Linux -Rebuild the Linux by using the following commands: +After editing the device tree, rebuild Linux: -``` +```bash ./build-scripts/build-linux.sh -p aemfvp-a -f busybox clean ./build-scripts/build-linux.sh -p aemfvp-a -f busybox build +./build-scripts/aemfvp-a/build-test-busybox.sh -p aemfvp-a package ``` -Package the built Linux to the BusyBox disk image by using the following command: - -``` -./build-scripts/aemfvp-a/build-test-busybox.sh -p aemfvp-a package -``` \ No newline at end of file +This regenerates the image with the updated device tree, ready for use with your FVP. \ No newline at end of file diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/run.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/run.md index c6f2c042e5..4c5f4ac836 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/run.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/run.md @@ -6,17 +6,19 @@ weight: 5 layout: learningpathall --- -### Firmware and Kernel Images -After adding extra TF-A build options and removing the PCI and MMU nodes from the device tree, you can follow the [Arm Reference Solution Guide](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) to build the software stack. +## Run the Linux Software Stack on an FVP -When the build is complete, you can use the `tree` command to look at the directory structure. +Once you've built the Linux stack with the correct configuration, you're ready to run it on an Arm CPU Fixed Virtual Platform (FVP). -```console +### Step 1: Verify the Build Output + +After building, check the output directory to make sure the expected files were generated: + +```bash tree output/aemfvp-a/aemfvp-a/ ``` - -The directory structure is similar to: +Expected output: ```output output/aemfvp-a/aemfvp-a/ @@ -33,15 +35,10 @@ output/aemfvp-a/aemfvp-a/ └── uefi.bin -> ../components/aemfvp-a/uefi.bin ``` -### Run software stack on FVP - -After the build of the software stack, you can follow the [guide](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/tree/master) to run the software stack on the Armv-A Base AEM FVP platform. - -CPU FVP platforms have a fixed number of cores, so you cannot use the model-scripts/aemfvp-a/boot.sh to run the CPU FVP platforms. - -To run the software stack in the CPU FVP platform, use a command similar to: +### Step 2: Run the Software Stack -```console +To launch the software stack on the FVP, use a command like the following: +```bash FVP_Base_Cortex-A55x4 \ -C pctl.startup=0.0.0.0 \ -C bp.secure_memory=0 \ @@ -60,63 +57,41 @@ FVP_Base_Cortex-A55x4 \ --data cluster0.cpu0=/output/aemfvp-a/aemfvp-a/Image@0x80080000 \ --data cluster0.cpu0=/output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb@0x83000000 ``` +This will boot Trusted Firmware-A, UEFI/U-Boot, Linux, and BusyBox in sequence. -After the previous command is run, the CPU FVP starts booting Trusted Firmware-A, followed by UEFI/U-Boot, Linux, and BusyBox. +### Step 3: Troubleshoot FVP Launch Issues -If you use the GUI, you can run it as shown in the following figure: - -![GUI #center](FVP.png) - -### Alternative running option - -On some CPU FVP platforms, you might encounter the following error and the system cannot boot successfully. +Different FVP models use different CPU instance names. If you see an error like: ```output -Warning: target instance not found: 'FVP_Base_Cortex_A65AEx4_Cortex_A76AEx4.cluster0.cpu0' (data: 'output/aemfvp-a/aemfvp-a/Image') -In file: /tmp/plgbuild/abs_build/1153836_60931/trunk/work/fastsim/Framework/scx/SCXExportedVirtualSubsystem.cpp:358 Warning: target instance not found: 'FVP_Base_Cortex_A65AEx4_Cortex_A76AEx4.cluster0.cpu0' (data: 'output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb') -In file: /tmp/plgbuild/abs_build/1153836_60931/trunk/work/fastsim/Framework/scx/SCXExportedVirtualSubsystem.cpp:358 -``` - -The FVP platform contains multiple CPU instances, and the CPU instance names are different on different CPU FVP platforms. - -To load raw data into right CPU instances, use the `––dataoption` to specify correct CPU instance names, according to the CPU FVP platform. For example, on FVP_Base_Cortex-A55x4, the CPU0 instance is cluster0.cpu0: - -```console -data cluster0.cpu0=/output/aemfvp-a/aemfvp-a/Image@0x80080000 \ ---data cluster0.cpu0=/output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb@0x83000000 ``` -Run the FVP to dump the parameters and use `grep` to get the correct CPU0 instance name in the CPU FVP platform. +You need to identify the correct instance name for your platform. Run: -```console +```bash FVP_Base_Cortex-A65AEx4+Cortex-A76AEx4 -l | grep RVBARADDR | grep cpu0 ``` -The output is similar to: +Example output: ```output -cluster0.subcluster0.cpu0.thread0.RVBARADDR=0 # (int , init-time) default = '0x0' : Value of RVBAR_ELx register. -cluster0.subcluster0.cpu0.thread1.RVBARADDR=0 # (int , init-time) default = '0x0' : Value of RVBAR_ELx register. cluster0.subcluster1.cpu0.RVBARADDR=0 # (int , init-time) default = '0x0' : Value of RVBAR_ELx register. ``` -For another FVP: +Update your --data parameters accordingly: -```console -FVP_Base_Cortex-A55x4+Cortex-A78x4 -l | grep RVBARADDR | grep cpu0 +```output +--data cluster0.subcluster0.cpu0.thread0=/output/aemfvp-a/aemfvp-a/Image@0x80080000 \ +--data cluster0.subcluster0.cpu0.thread0=/output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb@0x83000000 ``` -The output is similar to: +{{% notice tip %}} +Always confirm the CPU instance name when switching between different FVP models. +{{% /notice %}} -```output -cluster0.subcluster0.cpu0.RVBARADDR=0 # (int , init-time) default = '0x0' : Value of RVBAR_ELx register. -cluster0.subcluster1.cpu0.RVBARADDR=0 # (int , init-time) default = '0x0' : Value of RVBAR_ELx register. -``` +### Optional: Use the GUI -For FVP_Base_Cortex-A65AEx4+Cortex-A76AEx4, use the –data option like the following to run the FVP: +You can also run the FVP using its graphical user interface: -```console ---data cluster0.subcluster0.cpu0.thread0=/output/aemfvp-a/aemfvp-a/Image@0x80080000 \ ---data cluster0.subcluster0.cpu0.thread0=/output/aemfvp-a/aemfvp-a/fvp-base-revc.dtb@0x83000000 -``` +![GUI #center](FVP.png) diff --git a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/steps.md b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/steps.md index 4f51c9148b..3ab3891c1d 100644 --- a/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/steps.md +++ b/content/learning-paths/embedded-and-microcontrollers/linux-on-fvp/steps.md @@ -6,145 +6,58 @@ weight: 3 layout: learningpathall --- -## What are cpu_ops? +## Build TF-A with CPU Operations Support -In the context of Arm Trusted Firmware-A (TF-A), cpu_ops refers to a framework that defines CPU-specific operations essential for managing power states and initialization sequences. This framework is particularly crucial for implementing the Power State Coordination Interface (PSCI), which standardizes power management in Arm systems.   +Some Arm FVPs require CPU-specific initialization routines to boot properly. These routines are part of the TF-A `cpu_ops` framework. -## What is the purpose of cpu_ops? +### What are cpu_ops? +The `cpu_ops` framework in Trusted Firmware-A contains functions to: +- Handle CPU resets +- Manage power states +- Apply errata workarounds -The cpu_ops framework provides a set of function pointers tailored to specific CPU architectures. These functions handle operations such as: - • Reset Handling: Executing CPU-specific reset sequences. - • Power Management: Managing CPU power-down and power-up sequences. - • Errata Workarounds: Applying necessary workarounds for known CPU errata during initialization. - -By abstracting these operations, TF-A can support multiple CPU types seamlessly, allowing for a modular and maintainable codebase. - -## Why build cpu_ops into images? - -If you build the software without any modification, you might get the following error message after running the software stack: - -```console -ASSERT: File lib/cpus/aarch64/cpu_helpers.S Line 00035 -``` - -The previous error message occurs because the TF-A does not build the cpu_ops into the images: - -```console -31 /* Get the matching cpu_ops pointer */ - -32 bl get_cpu_ops_ptr - -33 #if ENABLE_ASSERTIONS - -34 cmp x0, #0 - -35 ASM_ASSERT(ne) - -36 #endif -``` - - -The cpu_ops are defined in the source file as follows: - -```console -lib/cpus/aarch64/cortex_a510.S -lib/cpus/aarch64/cortex_a53.S +Each CPU type has its own implementation, defined in files like: +```output lib/cpus/aarch64/cortex_a55.S -lib/cpus/aarch64/cortex_a57.S -lib/cpus/aarch64/cortex_a65.S -lib/cpus/aarch64/cortex_a65ae.S -lib/cpus/aarch64/cortex_a710.S -lib/cpus/aarch64/cortex_a715.S -lib/cpus/aarch64/cortex_a72.S -lib/cpus/aarch64/cortex_a73.S -``` - -Check the Makefile (plat/arm/board/fvp/platform.mk) of the FVP platform. - -Find the following code: - -```console -ifeq (${HW_ASSISTED_COHERENCY}, 0) -# Cores used without DSU - FVP_CPU_LIBS += lib/cpus/aarch64/cortex_a35.S \ - lib/cpus/aarch64/cortex_a53.S \ - lib/cpus/aarch64/cortex_a57.S \ - lib/cpus/aarch64/cortex_a72.S \ - lib/cpus/aarch64/cortex_a73.S -else -# Cores used with DSU only - ifeq (${CTX_INCLUDE_AARCH32_REGS}, 0) - # AArch64-only cores - FVP_CPU_LIBS += lib/cpus/aarch64/cortex_a76.S \ - lib/cpus/aarch64/cortex_a76ae.S \ - lib/cpus/aarch64/cortex_a77.S \ - lib/cpus/aarch64/cortex_a78.S \ - lib/cpus/aarch64/neoverse_n_common.S \ - lib/cpus/aarch64/neoverse_n1.S \ - lib/cpus/aarch64/neoverse_n2.S \ - lib/cpus/aarch64/neoverse_e1.S \ - lib/cpus/aarch64/neoverse_v1.S \ - lib/cpus/aarch64/neoverse_v2.S \ - lib/cpus/aarch64/cortex_a78_ae.S \ - lib/cpus/aarch64/cortex_a510.S \ - lib/cpus/aarch64/cortex_a710.S \ - lib/cpus/aarch64/cortex_a715.S \ - lib/cpus/aarch64/cortex_x3.S \ - lib/cpus/aarch64/cortex_a65.S \ - lib/cpus/aarch64/cortex_a65ae.S \ - lib/cpus/aarch64/cortex_a78c.S \ - lib/cpus/aarch64/cortex_hayes.S \ - lib/cpus/aarch64/cortex_hunter.S \ - lib/cpus/aarch64/cortex_x2.S \ - lib/cpus/aarch64/neoverse_poseidon.S - endif - # AArch64/AArch32 cores - FVP_CPU_LIBS += lib/cpus/aarch64/cortex_a55.S \ - lib/cpus/aarch64/cortex_a75.S -endif +lib/cpus/aarch64/cortex_a53.S +... etc. ``` -Default build options are `HW_ASSISTED_COHERENCY = 0` and `CTX_INCLUDE_AARCH32_REGS = 1`. - -### What are the required build options to fix the problem? +## Why you need this -Building the cpu_ops into the TF-A image requires different build options, depending on the CPU type. For example, different platforms require different build options when building the TF-A: +If the firmware is built without proper cpu_ops, you’ll hit an assertion failure like: -* For the A55 CPU FVP, add the HW_ASSISTED_COHERENCY=1 and USE_COHERENT_MEM=0 build options. -* For the A78 CPU FVP, add the HW_ASSISTED_COHERENCY=1, USE_COHERENT_MEM=0, and CTX_INCLUDE_AARCH32_REGS=0 build options. -* For the A53 CPU FVP, you do not need extra build options. - -Note: The build option USE_COHERENT_MEM cannot be enabled with HW_ASSISTED_COHERENCY=1. +```output +ASSERT: File lib/cpus/aarch64/cpu_helpers.S Line 00035 +``` -### What are the steps to build cpu_ops into the TF-A image? +This means the required CPU operation routines are missing from the build. -Perform the following steps to build cpu_ops into the TF-A image: +## Step-by-Step: Add TF-A Build Flags -Modify the following build script to add build options. The [Arm reference software stack](https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/aemfvp-a/user-guide.rst) uses the [build-scripts](https://gitlab.arm.com/arm-reference-solutions/build-scripts) to build the TF-A. +To include the correct `cpu_ops`, you need to set TF-A build options depending on the CPU. -Add TF-A build options, depending on the CPU type. For example: +### Example: A55 CPU FVP -* For A55 CPU FVP, add the following line: +Add the following line to your TF-A build script: -```console -ARM_TF_BUILD_FLAGS="$ARM_TF_BUILD_FLAGS HW_ASSISTED_COHERENCY=1 USE_COHERENT_MEM=0 " +```output +ARM_TF_BUILD_FLAGS="$ARM_TF_BUILD_FLAGS HW_ASSISTED_COHERENCY=1 USE_COHERENT_MEM=0" ``` -* For A78 CPU FVP, add the following line: - -```console +### Example: A78 CPU FVP +```output ARM_TF_BUILD_FLAGS="$ARM_TF_BUILD_FLAGS HW_ASSISTED_COHERENCY=1 USE_COHERENT_MEM=0 CTX_INCLUDE_AARCH32_REGS=0" ``` +{{% notice Note %}} +USE_COHERENT_MEM=1 cannot be used with HW_ASSISTED_COHERENCY=1. +{{% /notice %}} -Rebuild the TF-A by using the following commands: +## Rebuild and Package -```console +Run the following commands to rebuild TF-A and integrate it into the BusyBox image: +```bash ./build-scripts/build-arm-tf.sh -p aemfvp-a -f busybox clean ./build-scripts/build-arm-tf.sh -p aemfvp-a -f busybox build -``` - -Package the built TF-A into the BusyBox disk image by using the following command: - -```console ./build-scripts/aemfvp-a/build-test-busybox.sh -p aemfvp-a package ``` diff --git a/content/learning-paths/laptops-and-desktops/_index.md b/content/learning-paths/laptops-and-desktops/_index.md index b3aa2da78f..045d90596b 100644 --- a/content/learning-paths/laptops-and-desktops/_index.md +++ b/content/learning-paths/laptops-and-desktops/_index.md @@ -34,8 +34,8 @@ tools_software_languages_filter: - C/C++: 4 - CCA: 1 - Clang: 11 -- cmake: 1 - CMake: 2 +- cmake: 1 - Coding: 16 - CSS: 1 - Daytona: 1 diff --git a/content/learning-paths/servers-and-cloud-computing/_index.md b/content/learning-paths/servers-and-cloud-computing/_index.md index dcef07a3e8..94addb9e43 100644 --- a/content/learning-paths/servers-and-cloud-computing/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/_index.md @@ -8,7 +8,7 @@ key_ip: maintopic: true operatingsystems_filter: - Android: 2 -- Linux: 147 +- Linux: 150 - macOS: 10 - Windows: 14 pinned_modules: @@ -19,17 +19,17 @@ pinned_modules: - migration subjects_filter: - CI-CD: 5 -- Containers and Virtualization: 27 +- Containers and Virtualization: 28 - Databases: 15 - Libraries: 9 - ML: 27 -- Performance and Architecture: 57 +- Performance and Architecture: 59 - Storage: 1 - Web: 10 subtitle: Optimize cloud native apps on Arm for performance and cost title: Servers and Cloud Computing tools_software_languages_filter: -- .NET: 2 +- .NET: 3 - .NET SDK: 1 - 5G: 1 - ACL: 1 @@ -51,12 +51,14 @@ tools_software_languages_filter: - AWS Elastic Container Service (ECS): 1 - AWS Elastic Kubernetes Service (EKS): 3 - AWS Graviton: 1 -- Bash: 1 +- Azure CLI: 1 +- Azure Portal: 1 - bash: 2 +- Bash: 1 - Bastion: 3 - BOLT: 2 - bpftool: 1 -- C: 4 +- C: 5 - C#: 2 - C++: 8 - C/C++: 2 @@ -88,7 +90,7 @@ tools_software_languages_filter: - GitHub: 6 - GitLab: 1 - Glibc: 1 -- Go: 3 +- Go: 4 - Google Axion: 3 - Google Benchmark: 1 - Google Cloud: 1 @@ -126,6 +128,7 @@ tools_software_languages_filter: - Ollama: 1 - ONNX Runtime: 1 - OpenBLAS: 1 +- OrchardCore: 1 - PAPI: 1 - perf: 5 - Perf: 1 @@ -169,7 +172,7 @@ tools_software_languages_filter: weight: 1 cloud_service_providers_filter: - AWS: 17 -- Google Cloud: 12 +- Google Cloud: 13 - Microsoft Azure: 9 - Oracle: 2 --- diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md index 616f37d088..57ce4e6537 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md @@ -1,5 +1,5 @@ --- -title: Optimizing Arm binaries and libraries with LLVM-BOLT and profile merging +title: Optimize Arm applications and shared libraries with BOLT draft: true cascade: @@ -7,23 +7,22 @@ cascade: minutes_to_complete: 30 -who_is_this_for: Performance engineers, software developers working on Arm platforms who want to optimize both application binaries and shared libraries using LLVM-BOLT. +who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT. learning_objectives: - - Instrument and optimize binaries for individual workload features using LLVM-BOLT. + - Instrument and optimize application binaries for individual workload features using BOLT. - Collect separate BOLT profiles and merge them for comprehensive code coverage. - Optimize shared libraries independently. - Integrate optimized shared libraries into applications. - Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios. prerequisites: - - An Arm based system running Linux with BOLT and Linux Perf installed. The Linux kernel should be version 5.15 or later. - - (Optional) A second, more powerful Linux system to build the software executable and run BOLT. + - An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. author: Gayathri Narayana Yegna Narayanan ### Tags -skilllevels: Introductory +skilllevels: Advanced subjects: Performance and Architecture armips: - Neoverse diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png b/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png new file mode 100644 index 0000000000..8bbee946af Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md index 1d80f6e6e7..9914183789 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md @@ -1,5 +1,5 @@ --- -title: Overview of BOLT Merge +title: BOLT overview weight: 2 ### FIXED, DO NOT MODIFY @@ -8,20 +8,56 @@ layout: learningpathall [BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance. -In this Learning Path, you'll learn how to: -- Collect and merge BOLT profiles from multiple workload features (e.g., read-only and write-only) -- Independently optimize application binaries and external user-space libraries (e.g., `libssl.so`, `libcrypto.so`) -- Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack - -While MySQL and sysbench are used as examples, this method applies to **any feature-rich application** that: -- Exhibits multiple runtime paths -- Uses dynamic libraries -- Requires full-stack binary optimization for performance-critical deployment - -The workflow includes: -1. Profiling each workload feature separately -2. Profiling external libraries independently -3. Merging profiles for broader code coverage -4. Applying BOLT to each binary and library -5. Linking bolted libraries with the merged-profile binary +Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. + +You should use an Arm Linux system with at least 8 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible. + +## What will I do in this Learning Path? + +In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the application and two share libraries which are used by MySQL are also optimized using BOLT. + +Here is an outline of the steps: + +1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only + + A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns. + +2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so` + + This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack. + +3. Merge profile data for broader code coverage + + By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload. + +4. Run BOLT on each binary application and library + + With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack. + +5. Link the final optimized binary with the separately optimized libraries to deploy a fully optimized runtime stack + + After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements. + +## What is BOLT profile merging? + +BOLT profile merging is the process of combining profiling from multiple runs into a single profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations. + +![Why BOLT Profile Merging?](Bolt-merge.png) + +## What are good applications for BOLT? + +MySQL and Sysbench are used as example applications, but you can use this method for any feature-rich application that: + +1. Exhibits multiple runtime paths + + Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage. + +2. Uses dynamic libraries + + Most modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application. + +3. Requires full-stack binary optimization for performance-critical deployment + + In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits. + diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md index c67ed17850..8bb5274402 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md @@ -1,44 +1,107 @@ --- -title: BOLT Optimization - First feature +title: Instrument MySQL with BOLT weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- -In this step, you will instrument an application binary (such as `mysqld`) with BOLT to collect runtime profile data for a specific feature — for example, a **read-only workload**. +In this step, you will use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads. -The collected profile will later be merged with others and used to optimize the application's code layout. +The collected profiles will be merged with others and used to optimize the application's code layout. -### Step 1: Build or obtain the uninstrumented binary +## Build mysqld from source -Make sure your application binary is: +Follow these steps to build the MySQL server (`mysqld`) from source: -- Built from source (e.g., `mysqld`) +Install the required dependencies: + +```bash +sudo apt update +sudo apt install -y build-essential cmake libncurses5-dev libssl-dev libboost-all-dev \ + bison pkg-config libaio-dev libtirpc-dev git ninja-build liblz4-dev +``` + +Download the MySQL source code. You can change to another version in the `checkout` command below if needed. + +```bash +git clone https://github.com/mysql/mysql-server.git +cd mysql-server +git checkout mysql-8.0.37 +``` + +Configure the build for debug: + +```bash +mkdir build && cd build +cmake .. -DCMAKE_C_FLAGS="-O3 -mcpu=neoverse-n2 -Wno-enum-constexpr-conversion -fno-reorder-blocks-and-partition" \ + -DCMAKE_CXX_FLAGS="-O3 -mcpu=neoverse-n2 -Wno-enum-constexpr-conversion -fno-reorder-blocks-and-partition" \ + -DCMAKE_CXX_LINK_FLAGS="-Wl,--emit-relocs" -DCMAKE_C_LINK_FLAGS="-Wl,--emit-relocs" -G Ninja \ + -DWITH_BOOST=$HOME/boost -DDOWNLOAD_BOOST=On -DWITH_ZLIB=bundled -DWITH_LZ4=system -DWITH_SSL=system +``` + +Build MySQL: + +```bash +ninja +``` + +After the build completes, the `mysqld` binary is located at `$HOME/mysql-server/build/runtime_output_directory/mysqld` + +{{% notice Note %}} +You can run `mysqld` directly from the build directory as shown, or run `make install` to install it system-wide. For testing and instrumentation, running from the build directory is usually preferred. +{{% /notice %}} + +After building mysqld, install MySQL server and client utilities system-wide: + +```bash +sudo ninja install +``` + +This will make the `mysql` client and other utilities available in your PATH. + +```bash +echo 'export PATH="$PATH:/usr/local/mysql/bin"' >> ~/.bashrc +source ~/.bashrc +``` + +Ensure the binary is unstripped and includes debug symbols for BOLT instrumentation. + +To work with BOLT, your application binary should be: + +- Built from source - Unstripped, with symbol information available - Compiled with frame pointers enabled (`-fno-omit-frame-pointer`) You can verify this with: ```bash -readelf -s /path/to/mysqld | grep main +readelf -s $HOME/mysql-server/build/runtime_output_directory/mysqld | grep main ``` -If the symbols are missing, rebuild the binary with debug info and no stripping. +The partial output is: ---- +```output + 23837: 000000000950dfe8 8 OBJECT GLOBAL DEFAULT 27 mysql_main + 34522: 000000000915bfd0 8 OBJECT GLOBAL DEFAULT 26 server_main_callback + 42773: 00000000051730e4 80 FUNC GLOBAL DEFAULT 13 _Z18my_main_thre[...] + 44882: 000000000357dc98 40 FUNC GLOBAL DEFAULT 13 main + 61046: 0000000005ffd5c0 40 FUNC GLOBAL DEFAULT 13 _Z21record_main_[...] +``` + +If the symbols are missing, rebuild the binary with debug info and no stripping. -### Step 2: Instrument the binary with BOLT +## Instrument the binary with BOLT Use `llvm-bolt` to create an instrumented version of the binary: ```bash -llvm-bolt /path/to/mysqld \\ - -instrument \\ - -o /path/to/mysqld.instrumented \\ - --instrumentation-file=/path/to/profile-readonly.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \ + -instrument \ + -o $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \ + --instrumentation-file=$HOME/mysql-server/build/profile-readonly.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` @@ -48,42 +111,104 @@ llvm-bolt /path/to/mysqld \\ - `--instrumentation-file`: Path where the profile output will be saved - `--instrumentation-wait-forks`: Ensures the instrumentation continues through forks (important for daemon processes) ---- -### Step 3: Run the instrumented binary under a feature-specific workload +## Start the instrumented MySQL server + +Before running the workload, start the instrumented MySQL server in a separate terminal. You may need to initialize a new data directory if this is your first run: + +```bash +# Initialize a new data directory (if needed) +$HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented --initialize-insecure --datadir=$HOME/mysql-bolt-data + +# Start the instrumented server +# On an 8-core system, use available cores (e.g., 6 for mysqld, 7 for sysbench) +taskset -c 6 $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \ + --datadir=$HOME/mysql-bolt-data \ + --socket=$HOME/mysql-bolt.sock \ + --port=3306 \ + --user=$(whoami) & +``` + +Adjust `--datadir`, `--socket`, and `--port` as needed for your environment. Make sure the server is running and accessible before proceeding. + +With the database running, open a second terminal to run the client commands. + +## Install sysbench + +You will need sysbench to generate workloads for MySQL. On most Arm Linux distributions, you can install it using your package manager: + +```bash +sudo apt update +sudo apt install -y sysbench +``` + +Alternatively, see the [sysbench GitHub page](https://github.com/akopytov/sysbench) for build-from-source instructions if a package is not available for your platform. + +## Create a test database and user + +For sysbench to work, you need a test database and user. Connect to the MySQL server as the root user (or another admin user) and run: + +```bash +mysql -u root --socket=$HOME/mysql-bolt.sock +``` + +Then, in the MySQL shell: + +```sql +CREATE DATABASE IF NOT EXISTS bench; +CREATE USER IF NOT EXISTS 'bench'@'localhost' IDENTIFIED BY 'bench'; +GRANT ALL PRIVILEGES ON bench.* TO 'bench'@'localhost'; +FLUSH PRIVILEGES; +EXIT; +``` + +## Run the instrumented binary under a feature-specific workload + +Run `sysbench` with the `prepare` option: + +```bash +sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_only.lua prepare +``` Use a workload generator to stress the binary in a feature-specific way. For example, to simulate **read-only traffic** with sysbench: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_only.lua run +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_only.lua run ``` -> Adjust this command as needed for your workload and CPU/core binding. +{{% notice Note %}} +On an 8-core system, cores are numbered 0-7. Adjust the `taskset -c` values as needed for your system. Avoid using the same core for both mysqld and sysbench to reduce contention. +{{% /notice %}} The `.fdata` file defined in `--instrumentation-file` will be populated with runtime execution data. ---- - -### Step 4: Verify the profile was created +## Verify the profile was created After running the workload: ```bash -ls -lh /path/to/profile-readonly.fdata +ls -lh $HOME/mysql-server/build/profile-readonly.fdata ``` You should see a non-empty file. This file will later be merged with other profiles (e.g., for write-only traffic) to generate a complete merged profile. ---- - - diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md index f1ea41f09c..4c4f141d14 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md @@ -1,38 +1,39 @@ --- -title: BOLT Optimization - Second Feature & BOLT Merge to combine +title: Run a new workload using BOLT and merge the results weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -In this step, you'll collect profile data for a **write-heavy** workload and also **instrument external libraries** such as `libcrypto.so` and `libssl.so` used by the application (e.g., MySQL). +Next, you will collect profile data for a **write-heavy** workload and merge the results with the **read-heavy** workload in the previous section. - -### Step 1: Run Write-Only Workload for Application Binary +## Run Write-Only Workload for Application Binary Use the same BOLT-instrumented MySQL binary and drive it with a write-only workload to capture `profile-writeonly.fdata`: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_write_only.lua run +# On an 8-core system, use available cores (e.g., 7 for sysbench) +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_write_only.lua run ``` Make sure that the `--instrumentation-file` is set appropriately to save `profile-writeonly.fdata`. ---- -### Step 2: Verify the Second Profile Was Generated + + +### Verify the Second Profile Was Generated ```bash -ls -lh /path/to/profile-writeonly.fdata +ls -lh $HOME/mysql-server/build/profile-writeonly.fdata ``` Both `.fdata` files should now exist and contain valid data: @@ -40,22 +41,13 @@ Both `.fdata` files should now exist and contain valid data: - `profile-readonly.fdata` - `profile-writeonly.fdata` ---- - -### Step 3: Merge the Feature Profiles +### Merge the Feature Profiles Use `merge-fdata` to combine the feature-specific profiles into one comprehensive `.fdata` file: ```bash -merge-fdata /path/to/profile-readonly.fdata /path/to/profile-writeonly.fdata \\ - -o /path/to/profile-merged.fdata -``` - -**Example command from an actual setup:** - -```bash -/home/ubuntu/llvm-latest/build/bin/merge-fdata prof-instrumentation-readonly.fdata prof-instrumentation-writeonly.fdata \\ - -o prof-instrumentation-readwritemerged.fdata +merge-fdata $HOME/mysql-server/build/profile-readonly.fdata $HOME/mysql-server/build/profile-writeonly.fdata \ + -o $HOME/mysql-server/build/profile-merged.fdata ``` Output: @@ -67,31 +59,36 @@ Profile from 2 files merged. This creates a single merged profile (`profile-merged.fdata`) covering both read-only and write-only workload behaviors. ---- - -### Step 4: Verify the Merged Profile +### Verify the Merged Profile Check the merged `.fdata` file: ```bash -ls -lh /path/to/profile-merged.fdata +ls -lh $HOME/mysql-server/build/profile-merged.fdata ``` ---- -### Step 5: Generate the Final Binary with the Merged Profile +### Generate the Final Binary with the Merged Profile Use LLVM-BOLT to generate the final optimized binary using the merged `.fdata` file: ```bash -llvm-bolt build/bin/mysqld \\ - -o build/bin/mysqldreadwrite_merged.bolt_instrumentation \\ - -data=/home/ubuntu/mysql-server-8.0.33/sysbench/prof-instrumentation-readwritemerged.fdata \\ - -reorder-blocks=ext-tsp \\ - -reorder-functions=hfsort \\ - -split-functions \\ - -split-all-cold \\ - -split-eh \\ - -dyno-stats \\ +llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \ + -instrument \ + -o $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \ + --instrumentation-file=$HOME/mysql-server/build/profile-readonly.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ + --instrumentation-wait-forks + +llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \ + -o $HOME/mysql-server/build/mysqldreadwrite_merged.bolt_instrumentation \ + -data=$HOME/mysql-server/build/prof-instrumentation-readwritemerged.fdata \ + -reorder-blocks=ext-tsp \ + -reorder-functions=hfsort \ + -split-functions \ + -split-all-cold \ + -split-eh \ + -dyno-stats \ --print-profile-stats 2>&1 | tee bolt_orig.log ``` diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md index 376c249164..a237c7d4cc 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md @@ -1,15 +1,16 @@ --- -title: BOLT the Libraries separately +title: Instrument shared libraries with BOLT weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### Step 1: Instrument Shared Libraries (e.g., libcrypto, libssl) +### Instrument Shared Libraries (e.g., libcrypto, libssl) If system libraries like `/usr/lib/libssl.so` are stripped, rebuild OpenSSL from source with relocations: ```bash +cd $HOME git clone https://github.com/openssl/openssl.git cd openssl ./config -O2 -Wl,--emit-relocs --prefix=$HOME/bolt-libs/openssl @@ -17,97 +18,87 @@ make -j$(nproc) make install ``` ---- - -### Step 2: BOLT-Instrument libssl.so.3 +### Instrument libssl Use `llvm-bolt` to instrument `libssl.so.3`: ```bash -llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \\ - -instrument \\ - -o $HOME/bolt-libs/openssl/lib/libssl.so.3.instrumented \\ - --instrumentation-file=libssl-readwrite.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ + -instrument \ + -o $HOME/bolt-libs/openssl/lib/libssl.so.3.instrumented \ + --instrumentation-file=$HOME/bolt-libs/openssl/lib/libssl-readwrite.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` Then launch MySQL using the **instrumented shared library** and run a **read+write** sysbench test to populate the profile: ---- - -### Step 3: Optimize 'libssl.so' Using Its Profile +### Optimize libssl using the profile After running the read+write test, ensure `libssl-readwrite.fdata` is populated. - Run BOLT on the uninstrumented `libssl.so` with the collected read-write profile: ```bash -llvm-bolt /path/to/libssl.so.3 \\ - -o /path/to/libssl.so.optimized \\ - -data=/path/to/prof-instrumentation-libssl-readwrite.fdata \\ - -reorder-blocks=ext-tsp \\ - -reorder-functions=hfsort \\ - -split-functions \\ - -split-all-cold \\ - -split-eh \\ - -dyno-stats \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ + -o $HOME/bolt-libs/openssl/lib/libssl.so.optimized \ + -data=$HOME/bolt-libs/openssl/lib/libssl-readwrite.fdata \ + -reorder-blocks=ext-tsp \ + -reorder-functions=hfsort \ + -split-functions \ + -split-all-cold \ + -split-eh \ + -dyno-stats \ --print-profile-stats ``` ---- - -### Step 3: Replace the Library at Runtime +### Replace the library at runtime Copy the optimized version over the original and export the path: ```bash -cp /path/to/libssl.so.optimized /path/to/libssl.so.3 -export LD_LIBRARY_PATH=/path/to/ +cp $HOME/bolt-libs/openssl/lib/libssl.so.optimized $HOME/bolt-libs/openssl/lib/libssl.so.3 +export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib ``` This ensures MySQL will dynamically load the optimized `libssl.so`. ---- - -### Step 4: Run Final Workload and Validate Performance +### Run the final workload and validate performance Start the BOLT-optimized MySQL binary and link it against the optimized `libssl.so`. Run the combined workload: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_write.lua run +# On an 8-core system, use available cores (e.g., 7 for sysbench) +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_write.lua run ``` ---- -In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged - scenarios. +In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged scenarios. -### Step 1: BOLT optimization for 'libcrypto.so' +### BOLT optimization for libcrypto Follow these steps to instrument and optimize `libcrypto.so`: #### Instrument `libcrypto.so`: ```bash -llvm-bolt /path/to/libcrypto.so.3 \\ - -instrument \\ - -o /path/to/libcrypto.so.3.instrumented \\ - --instrumentation-file=libcrypto-readwrite.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libcrypto.so.3 \ + -instrument \ + -o $HOME/bolt-libs/openssl/lib/libcrypto.so.3.instrumented \ + --instrumentation-file=$HOME/bolt-libs/openssl/lib/libcrypto-readwrite.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` @@ -115,39 +106,39 @@ Run MySQL under the read-write workload to populate `libcrypto-readwrite.fdata`: ```bash export LD_LIBRARY_PATH=/path/to/libcrypto-instrumented -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_write.lua run +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_write.lua run ``` -#### Optimize the `libcrypto.so` library: +#### Optimize the crypto library ```bash -llvm-bolt /path/to/original/libcrypto.so.3 \\ - -o /path/to/libcrypto.so.optimized \\ - -data=libcrypto-readwrite.fdata \\ - -reorder-blocks=ext-tsp \\ - -reorder-functions=hfsort \\ - -split-functions \\ - -split-all-cold \\ - -split-eh \\ - -dyno-stats \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libcrypto.so.3 \ + -o $HOME/bolt-libs/openssl/lib/libcrypto.so.optimized \ + -data=libcrypto-readwrite.fdata \ + -reorder-blocks=ext-tsp \ + -reorder-functions=hfsort \ + -split-functions \ + -split-all-cold \ + -split-eh \ + -dyno-stats \ --print-profile-stats ``` Replace the original at runtime: ```bash -cp /path/to/libcrypto.so.optimized /path/to/libcrypto.so.3 -export LD_LIBRARY_PATH=/path/to/ +cp $HOME/bolt-libs/openssl/lib/libcrypto.so.optimized $HOME/bolt-libs/openssl/lib/libcrypto.so.3 +export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib ``` Run a final validation workload to ensure functionality and measure performance improvements. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md index 07cd298c5f..8c2b963995 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md @@ -1,5 +1,5 @@ --- -title: Performance Results - Baseline, BOLT Merge, and Full Optimization +title: Review the performance results weight: 6 ### FIXED, DO NOT MODIFY @@ -18,8 +18,6 @@ This step presents the performance comparisons across various BOLT optimization | Latency 95th % (ms) | 1.04 | 0.83 | 1.79 | | Total time (s) | 9.93 | 4.73 | 15.40 | ---- - ### 2. Performance Comparison: Merged vs Non-Merged Instrumentation | Metric | Regular BOLT R+W (No Merge, system libssl) | Merged BOLT (BOLTed Read+Write + BOLTed libssl) | @@ -40,8 +38,6 @@ Second run: | Latency 95th % (ms) | 1.39 | 1.37 | | Total time (s) | 239.9 | 239.9 | ---- - ### 3. BOLTed READ, BOLTed WRITE, MERGED BOLT (Read+Write+BOLTed Libraries) | Metric | Bolted Read-Only | Bolted Write-Only | Merged BOLT (Read+Write+libssl) | Merged BOLT (Read+Write+libcrypto) | Merged BOLT (Read+Write+libssl+libcrypto) | @@ -52,17 +48,18 @@ Second run: | Latency 95th % (ms) | 0.77 | 0.55 | 1.37 | 1.34 | 1.34 | | Total time (s) | 239.8 | 239.72 | 239.9 | 239.9 | 239.9 | ---- +{{% notice Note %}} +All sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for sysbench and avoid contention with mysqld. +{{% /notice %}} -### Key Metrics to Analyze +### Key metrics to analyze - **TPS (Transactions Per Second)**: Higher is better. - **QPS (Queries Per Second)**: Higher is better. - **Latency (Average and 95th Percentile)**: Lower is better. ---- - ### Conclusion + - BOLT substantially improves performance over non-optimized binaries due to better instruction cache utilization and reduced execution path latency. - Merging feature-specific profiles does not negatively affect performance; instead, it captures a broader set of runtime behaviors, making the binary better tuned for varied real-world workloads. - Separately optimizing external user-space libraries, even though providing smaller incremental gains, further complements the overall application optimization, delivering a fully optimized execution environment. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt/_index.md index f22cef3cdb..34247174bd 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt/_index.md @@ -5,13 +5,13 @@ minutes_to_complete: 30 who_is_this_for: This is an introductory topic for software developers who want to learn how to use BOLT on an Arm executable. -learning_objectives: - - Build an application which is ready to be optimized by BOLT +learning_objectives: + - Build an application which is ready to be optimized by BOLT - Profile an application and collect performance information - - Run BOLT to create an optimized executable + - Run BOLT to create an optimized executable prerequisites: - - An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available. + - An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available. For [SPE](./bolt-spe) the version should be 6.14 or later. - (Optional) A second, more powerful Linux system to build the software executable and run BOLT. author: Jonathan Davies diff --git a/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md b/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md index 8156e2d7aa..db6742c824 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md @@ -7,59 +7,60 @@ layout: learningpathall --- ## BOLT with SPE +The steps to use BOLT with Perf SPE are listed below. -{{% notice Important Note %}} -Currently, BOLT may not generate a faster binary when using Perf SPE due to limitations within `perf` and BOLT itself. -For more information and the latest updates see: [[AArch64] BOLT does not support SPE branch data](https://github.com/llvm/llvm-project/issues/115333). -{{% /notice %}} +### Collect Perf data with SPE -The steps to use BOLT with Perf SPE are listed below. +First, make sure you are using Linux Perf version v6.14 or later, which supports the 'brstack' field and captures all branch information. -### Collect Perf data with SPE +```bash { output_lines = "2" } +perf --version +perf version 6.14 +``` -Run your executable in the normal use case and collect a SPE performance profile. This will output a `perf.data` file containing the profile and will be used to optimize the executable. +Next, run your executable in the normal use case to collect an SPE performance profile. This generates a `perf.data` file containing the profile, which will be used to optimize the executable. -Record samples while running your application. Substitute the actual name of your application for `executable`: +Record samples while running your application, replacing `executable` below: ```bash { target="ubuntu:latest" } -perf record -e arm_spe/branch_filter=1/u -o perf.data-- ./executable +perf record -e 'arm_spe/branch_filter=1/u' -o perf.data -- ./executable ``` +Once the execution is complete, perf will print a summary that includes the size of the `perf.data` file: -Perf prints the size of the `perf.data` file: - -```output +```bash { target="ubuntu:latest" } [ perf record: Woken up 79 times to write data ] [ perf record: Captured and wrote 4.910 MB perf.data ] ``` -### Convert the Profile into BOLT format +The `-jitter=1` flag can help avoid resonance, while `-c`/`--count` controls the sampling period. + +### Convert the Profile to BOLT format -`perf2bolt` converts the profile into a BOLT data format. For the given sample data, `perf2bolt` finds all instruction pointers in the profile, maps them back to the assembly instructions, and outputs a count of how many times each assembly instruction was sampled. +`perf2bolt` converts the profile into BOLT's data format. It maps branch events from the profile to assembly instructions and outputs branch execution traces with sample counts. -If you application is named `executable`, run the commend below to convert the profile data: +If you application is named `executable`, run the command below to convert the profile data: ```bash { target="ubuntu:latest" } -perf2bolt -p perf.data -o perf.fdata -nl ./executable +perf2bolt -p perf.data -o perf.fdata --spe ./executable ``` -Below is example output from `perf2bolt`, it has read all samples and created the file `perf.fdata`. +Below is example output from `perf2bolt`, it has read all samples from `perf.data` and created the converted profile `perf.fdata`. ```output BOLT-INFO: shared object or position-independent executable detected PERF2BOLT: Starting data aggregation job for perf.data -PERF2BOLT: spawning perf job to read events without LBR +PERF2BOLT: spawning perf job to read SPE brstack events PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: aarch64 -BOLT-INFO: BOLT version: c66c15a76dc7b021c29479a54aa1785928e9d1bf +BOLT-INFO: BOLT version: b1516a9d688fed835dce5efc614302649c3baf0e BOLT-INFO: first alloc address is 0x0 -BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000 +BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4600000 BOLT-INFO: enabling relocation mode -BOLT-INFO: disabling -align-macro-fusion on non-x86 platform BOLT-INFO: enabling strict relocation mode for aggregation purposes BOLT-INFO: pre-processing profile using perf data aggregator -BOLT-INFO: binary build-id is: 21dbca691155f1e57825e6381d727842f3d43039 +BOLT-INFO: binary build-id is: 8bb7beda9bae10bc546eace62775dd2958a9c940 PERF2BOLT: spawning perf job to read buildid list PERF2BOLT: matched build-id and file name PERF2BOLT: waiting for perf mmap events collection to finish... @@ -68,13 +69,18 @@ PERF2BOLT: waiting for perf task events collection to finish... PERF2BOLT: parsing perf-script task events output PERF2BOLT: input binary is associated with 1 PID(s) PERF2BOLT: waiting for perf events collection to finish... -PERF2BOLT: parsing basic events (without LBR)... +PERF2BOLT: SPE branch events in LBR-format... +PERF2BOLT: read 3592267 samples and 3046129 LBR entries +PERF2BOLT: ignored samples: 0 (0.0%) PERF2BOLT: waiting for perf mem events collection to finish... PERF2BOLT: parsing memory events... -PERF2BOLT: processing basic events (without LBR)... -PERF2BOLT: read 79 samples -PERF2BOLT: out of range samples recorded in unknown regions: 5 (6.3%) -PERF2BOLT: wrote 14 objects and 0 memory objects to perf.fdata +PERF2BOLT: processing branch events... +PERF2BOLT: traces mismatching disassembled function contents: 0 +PERF2BOLT: out of range traces involving unknown regions: 0 +PERF2BOLT: wrote 21027 objects and 0 memory objects to perf.fdata +BOLT-INFO: 2178 out of 72028 functions in the binary (3.0%) have non-empty execution profile +BOLT-INFO: 12 functions with profile could not be optimized +BOLT-INFO: Functions with density >= 0.0 account for 99.00% total sample counts ``` ### Run BOLT to generate the optimized executable @@ -155,4 +161,4 @@ BOLT-INFO: setting __hot_end to 0x4002b0 BOLT-INFO: patched build-id (flipped last bit) ``` -The optimized executable is now available as `new_executable`. +The optimized executable is now available as `new_executable`. diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/1-create-cobalt-vm.md b/content/learning-paths/servers-and-cloud-computing/cobalt/1-create-cobalt-vm.md new file mode 100644 index 0000000000..1a10514dbb --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/cobalt/1-create-cobalt-vm.md @@ -0,0 +1,36 @@ +--- +title: Create the Cobalt 100 virtual machine +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Use the Azure Portal to deploy a Cobalt 100 VM + +Cobalt 100 is Microsoft’s first Arm-based server processor, built on the Armv9 Neoverse-N2 CPU architecture. It is optimized for the performance and efficiency of scale-out, cloud-based applications. + +Azure offers Cobalt 100–powered virtual machines in two series: + +- **Dpsv6** and **Dplsv6** (general-purpose) +- **Epsv6** (memory-optimized) + + +To create a Cobalt 100 VM, follow these steps: + +1. Sign in to the [Azure Portal](https://portal.azure.com/). +2. Select **Create a resource → Compute → Virtual machine**. +3. Complete the fields in the **Basics** tab using the values shown in the figure below: + + ![Azure Portal – Basics tab for the VM wizard#center](images/create-cobalt-vm.png "Configuring the Basics tab") + + Cobalt 100 powers the Dpsv6 and Dplsv6 series. Selecting **Standard_D4ps_v6** creates a Cobalt VM with four physical cores. + You can choose a different size if you need more or fewer cores. +4. Upload your public SSH key or generate a new one in the wizard. +5. For the **Public inbound ports** field, select **None**. +6. On the **Disks** tab, accept the default options. +7. On the **Networking** tab, ensure that a **Public IP** is selected. You will need it to connect to the VM later. Leave the NSG settings as **Basic**. + +8. Select **Review + create**, then **Create**. Azure deploys the VM and the automatically-generated Network Security Group (NSG). Provisioning takes ~2 minutes. + +9. Navigate to the **Deployment in progress** pane or open the **Notifications** panel to track progress. When the deployment completes, proceed to the next step to expose an inbound port. diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/2-open-port.md b/content/learning-paths/servers-and-cloud-computing/cobalt/2-open-port.md new file mode 100644 index 0000000000..7f9b44d24a --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/cobalt/2-open-port.md @@ -0,0 +1,27 @@ +--- +title: Open inbound ports in the Network Security Group +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Allow external traffic to TCP ports 22 (SSH) and 8080 + +Every new virtual machine created through the Azure wizard is associated with a **Network Security Group (NSG)**. An NSG acts as a stateful firewall – if no rule explicitly allows traffic, Azure blocks it by default. + +In this step, you'll open port 22 for SSH and port 8080 so that a web application running on the VM is reachable from your IP for testing. Substitute a different port if required by your workload, or a different IP range if you'd like broader accessibility. + +1. In the Azure Portal, open the newly-created VM resource and select **Networking → Network settings** from the left-hand menu. +2. Select the **Network security group**. +3. Select **Create Port Rule**, then choose **Inbound port rule** from the drop-down menu. + +4. Fill in the form with **My IP address** as the source and 22 as the destination port: + + ![Add inbound security rule with source of my IP and destination port 22#center](images/create-nsg-rule.png "Create Port Rule form") + +5. Select **Add**. + +6. To open port 8080, repeat steps 3-5 and enter 8080 as the destination port. + +You have now opened ports 22 and 8080 to your IP. In the next step, you'll verify connectivity from your local machine. diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/3-verify-connectivity.md b/content/learning-paths/servers-and-cloud-computing/cobalt/3-verify-connectivity.md new file mode 100644 index 0000000000..35de9b698e --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/cobalt/3-verify-connectivity.md @@ -0,0 +1,48 @@ +--- +title: Verify connectivity to the Cobalt 100 VM +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Connect over SSH and test the open port + +On the **Overview** page of the VM, copy the **Public IP address**. Open a terminal on your local machine, and SSH to the VM (replace *azureuser* if you chose a different admin username): + +```bash +ssh -i [path to your pem file] azureuser@[public IP] +``` + +Replace `[public IP]` with your VM's public IP address, and `[path to your pem file]` with the path to your SSH private key file. + +When prompted, confirm the connection to add the VM to your *known_hosts* file. + +### Start a temporary HTTP server + +If you don't already have an application listening on TCP 8080, you can start one temporarily: + +```bash +sudo apt update -y && sudo apt install -y python3 +python3 -m http.server 8080 +``` + +Leave this terminal open – the server runs in the foreground. + +### Test from your local machine + +In a second local terminal run `curl` to confirm that you can reach the server through the NSG rule you created: + +```bash +curl http://[public IP]:8080 +``` + +Replace `[public IP]` with your VM's public IP address. + +You should see an HTML directory listing (or your application response). A successful response confirms that TCP port 8080 is open and the VM is reachable from the public internet. + +To stop the server, press `Ctrl + C`. + +You now have an Arm-based Cobalt 100 VM with port 8080 open and ready to receive external traffic. You can use it to run any test server or deploy your own application. + +To learn about optimizing .NET workloads on Cobalt, check out [Migrating a .NET application to Azure Cobalt](../../dotnet-migration/). \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/_index.md b/content/learning-paths/servers-and-cloud-computing/cobalt/_index.md new file mode 100644 index 0000000000..4cb83a8d31 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/cobalt/_index.md @@ -0,0 +1,54 @@ +--- +title: Deploy a Cobalt 100 Virtual Machine on Azure + +minutes_to_complete: 10 + +who_is_this_for: This is an introductory topic for developers and DevOps engineers who want to deploy an Arm-based virtual machine on Azure and expose an application port to the internet. + +learning_objectives: + - Deploy an Arm-based Cobalt 100 virtual machine (VM) on Microsoft Azure + - Connect to the Cobalt 100 VM using SSH + - Configure an inbound TCP port in the associated Network Security Group (NSG) + - Verify external connectivity to the newly-opened port + +prerequisites: + - A Microsoft Azure subscription with permissions to create virtual machines and networking resources + - Basic familiarity with SSH + +author: Joe Stech + +### Tags +# Tagging metadata, see the Learning Path guide for the allowed values +skilllevels: Introductory +subjects: Containers and Virtualization +arm_ips: + - Neoverse +tools_software_languages: + - Azure Portal + - Azure CLI +operatingsystems: + - Linux + + +further_reading: + - resource: + title: Azure Cobalt 100 VM documentation + link: https://learn.microsoft.com/azure/virtual-machines/cobalt-100 + type: Documentation + - resource: + title: Azure Virtual Machines overview + link: https://learn.microsoft.com/azure/virtual-machines/ + type: Documentation + - resource: + title: Configure Azure network security group rules + link: https://learn.microsoft.com/azure/virtual-network/security-overview + type: Documentation + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/cobalt/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/cobalt/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-cobalt-vm.png b/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-cobalt-vm.png new file mode 100644 index 0000000000..ccbbb5d78d Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-cobalt-vm.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-nsg-rule.png b/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-nsg-rule.png new file mode 100644 index 0000000000..7e97b89bad Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cobalt/images/create-nsg-rule.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/1-create-orchardcore-app.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/1-create-orchardcore-app.md new file mode 100644 index 0000000000..a25f47775f --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/1-create-orchardcore-app.md @@ -0,0 +1,126 @@ +--- +title: Build and run an OrchardCore CMS app on Azure Cobalt (Arm64) +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Getting started with the OrchardCore app + +In this section, you'll build and run a basic [OrchardCore](https://github.com/OrchardCMS/OrchardCore) CMS application, which is a popular Linux-based .NET workload. OrchardCore is a modular and multi-tenant application framework built with ASP.NET Core, that's commonly used to create content-driven websites. + +### Set up your development environment + +First, launch an Azure Cobalt 100 instance (Arm-based VM) running Ubuntu 24.04, and open port 8080 to the internet. + +For setup instructions, see the [Create an Azure Cobalt 100 VM](../../cobalt) Learning Path. + +Next, install .NET SDK: + +```bash +wget https://packages.microsoft.com/config/ubuntu/24.04/packages-microsoft-prod.deb +sudo dpkg -i packages-microsoft-prod.deb +sudo apt-get update +sudo apt-get install -y dotnet-sdk-8.0 +``` + +Verify the installation: + +```bash +dotnet --version +``` + +You should then see output similar to: + +```output +8.0.117 +``` + +Install gcc for compiling your application: + +```bash +sudo apt install gcc g++ build-essential -y +``` + +### Install the OrchardCore templates + +Install the OrchardCore templates: + +```bash +dotnet new install OrchardCore.ProjectTemplates::2.1.7 +``` + +This command installs the OrchardCore project templates you'll use to create a new OrchardCore application. + +Expected output: + +```output +Success: OrchardCore.ProjectTemplates::2.1.7 installed the following templates: +Template Name Short Name Language Tags +------------------------ ----------- -------- -------------------- +Orchard Core Cms Module ocmodulecms [C#] Web/Orchard Core/CMS +Orchard Core Cms Web App occms [C#] Web/Orchard Core/CMS +Orchard Core Mvc Module ocmodulemvc [C#] Web/Orchard Core/Mvc +Orchard Core Mvc Web App ocmvc [C#] Web/Orchard Core/Mvc +Orchard Core Theme octheme [C#] Web/Orchard Core/CMS +``` + +### Create a new OrchardCore application + +First, create a new project using the `dotnet` CLI to create a new OrchardCore application: + +```bash +dotnet new occms -n MyOrchardCoreApp +``` + +This command creates a new OrchardCore CMS application in a directory named `MyOrchardCoreApp`. + +Now navigate to the project directory: + +```bash +cd MyOrchardCoreApp +``` + +### Run the OrchardCore application + +Build the application: + +```bash +dotnet build +``` + +The output should look like: + +```output +MSBuild version 17.8.27+3ab07f0cf for .NET + Determining projects to restore... + Restored /home/azureuser/MyOrchardCoreApp/MyOrchardCoreApp.csproj (in 28.95 sec). + MyOrchardCoreApp -> /home/azureuser/MyOrchardCoreApp/bin/Debug/net8.0/MyOrchardCoreApp.dll + Copying translation files: MyOrchardCoreApp + +Build succeeded. + 0 Warning(s) + 0 Error(s) + +Time Elapsed 00:00:38.05 +``` +Run the application: + +```bash +dotnet run --urls http://0.0.0.0:8080 +``` + +Access the application: + +* In your browser, navigate to `http://[instance IP]:8080` to see your OrchardCore application in action. + +* Replace `[instance IP]` with your VM’s public IP address. You can find it in the Azure portal under the **Networking** tab of your virtual machine. + +Configure the application: + +* On the setup screen, choose the Blog recipe and complete the admin credentials and database configuration to finish setup. + +### Summary and next steps + +You have successfully created and run a basic OrchardCore CMS application. In the next sections, you will learn how to integrate a C shared library into your .NET application and explore performance optimizations for Arm architecture. diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/2-add-shared-c-library.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/2-add-shared-c-library.md new file mode 100644 index 0000000000..4aca66a4a3 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/2-add-shared-c-library.md @@ -0,0 +1,121 @@ +--- +title: Integrate a C shared library into your .NET OrchardCore app +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## Create a C shared library + +In this section, you’ll integrate a simple C shared library into your .NET OrchardCore application, by doing the following: + +- Write a C function +- Compile it into a shared object (`.so`) +- Call it from C# using `DllImport` + +This allows you to reuse existing C code and improve performance by accessing native functionality. + +Create a file named `mylib.c` with the following: + +```c +#include + +void greet() { + printf("Hello from the C library!\n"); +} +``` + +Compile the C file into a shared library: + +```bash +gcc -shared -o libmylib.so -fPIC mylib.c +``` + +This creates a shared object file named `libmylib.so` which your .NET application can call at runtime. + +## Use the C library in your .NET application + +Now that you have a shared library, you can use it in your .NET application. + +In your OrchardCore application, create a new class file named `NativeMethods.cs`: + +```csharp +using System; +using System.Runtime.InteropServices; + +public static class NativeMethods +{ + [DllImport("mylib", EntryPoint = "greet")] + public static extern void Greet(); +} +``` +{{% notice Note %}} +On Linux, the `DllImport("mylib")` attribute resolves to `libmylib.so`. On Windows, the runtime would look for `mylib.dll`, and on macOS, `libmylib.dylib`. +{{% /notice %}} + +Call the `Greet` method from your application. For example, you can add the following code to your main program `Program.cs` as shown: + +```csharp +using OrchardCore.Logging; + +var builder = WebApplication.CreateBuilder(args); + +builder.Host.UseNLogHost(); + +builder.Services + .AddOrchardCms() + // // Orchard Specific Pipeline + // .ConfigureServices( services => { + // }) + // .Configure( (app, routes, services) => { + // }) +; + +var app = builder.Build(); + +Console.WriteLine("Calling native greet..."); // NEW INTEROP LINE +NativeMethods.Greet(); // NEW INTEROP LINE + +if (!app.Environment.IsDevelopment()) +{ + app.UseExceptionHandler("/Error"); + // The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts. + app.UseHsts(); +} + +app.UseHttpsRedirection(); +app.UseStaticFiles(); + +app.UseOrchardCore(); + +app.Run(); +``` + +Ensure that dotnet can find your shared library: + +```bash +export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH +``` + +## Run your application + +When you run `dotnet run`, you’ll see the following output: + +```bash +Calling native greet... +Hello from the C library! +``` + +## Compiling for Arm + +If you are compiling for Arm directly on Azure Cobalt, the compiler understands the default processor optimizations it should use, and you can compile in the same way as above. + +However, if you are cross-compiling in your build pipeline, you should specify `-mcpu=neoverse-n2 -O3` when running the cross-compiler: + +```bash +aarch64-linux-gnu-gcc -mcpu=neoverse-n2 -O3 -shared -o libmylib.so -fPIC mylib.c +``` + +The `-mcpu=neoverse-n2` flag specifies the Cobalt architecture, and `-O3` ensures that maximum optimizations are completed (including SIMD optimizations). + +In the next section, you’ll make your native interop cross-platform by using the AnyCPU feature and runtime dispatch strategies. diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/3-any-cpu.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/3-any-cpu.md new file mode 100644 index 0000000000..f192c56fed --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/3-any-cpu.md @@ -0,0 +1,65 @@ +--- +title: Configure and run an OrchardCore app +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Run a .NET OrchardCore application on Arm and x86 + +In this section, you will configure and run your OrchardCore application on both Arm and x86 architectures using .NET's AnyCPU configuration. This architecture-agnostic approach simplifies deployment and ensures that your application runs smoothly on diverse hardware, including cloud VMs, local development boxes, and edge devices. + +The AnyCPU feature has existed since .NET Framework 2.0, but its current behavior (particularly how it handles 32-bit vs. 64-bit execution) was defined in .NET Framework 4.5. + +{{% notice Note %}} +In .NET Core and .NET 5+, AnyCPU still lets the runtime decide which architecture to use, but keep in mind that your build must match the runtime environment's bitness (32-bit vs. 64-bit). Since .NET 8 targets 64-bit by default, this Learning Path assumes 64-bit runtime environments on both Arm64 and x86_64. +{{% /notice %}} + +## Configure the project for AnyCPU + +To make your OrchardCore application architecture-agnostic, configure the project to use the AnyCPU platform target. This allows the .NET runtime to select the appropriate architecture at runtime. + +1. Open your OrchardCore project `MyOrchardCoreApp.csproj` in your IDE. + +2. Add the `` element to your existing ``: + +```xml + + AnyCPU + +``` + +3. Save the `.csproj` file. + +## Build and run on any platform + +You can now build your application once and run it on either an x86_64 or Arm64 system. + +Build the application: + +```bash +dotnet build -c Release +``` + +Run the application: + +```bash +dotnet run --urls http://0.0.0.0:8080 +``` + +Your application should now be runnable on any architecture. All you have to do is copy the `MyOrchardCoreApp` directory to any computer with the .NET 8 runtime installed and run the command shown from within the `MyOrchardCoreApp` directory: + +```bash +dotnet ./bin/Release/net8.0/MyOrchardCoreApp.dll --urls http://0.0.0.0:8080 +``` + +## Benefits of architecture-agnostic applications + +Using the AnyCPU configuration offers several advantages: + +- **Flexibility**: Deploy your application on a wide range of devices without modifying your code. +- **Efficiency**: Eliminate the need to maintain separate builds for different architectures. +- **Scalability**: Easily scale your application across different hardware platforms. + +This approach ensures that your OrchardCore application runs consistently on both Arm and x86 architectures. diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/4-dotnet-versions.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/4-dotnet-versions.md new file mode 100644 index 0000000000..b188296812 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/4-dotnet-versions.md @@ -0,0 +1,90 @@ +--- +title: Evaluate .NET performance across versions on Arm +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Version-by-version feature and throughput improvements + +Understanding which versions perform best and the features they offer can help you make informed decisions when developing applications for Arm-based systems. + +.NET has evolved significantly over the years, with each version introducing new features and performance improvements. Here, you will learn about key versions that have notable performance implications for Arm architecture. + +{{% notice Support status summary %}} + +- .NET 8 – Current LTS (support until Nov 2026) +- .NET 9 – STS (preview; GA Q4 2025) +- .NET 10 – Next LTS (preview; expected 2025 Q4–Q1 2026) +- .NET 3.1, 5, 6, 7 – End of life +{{% /notice %}} + + +## .NET Core 3.1 (end-of-life 2022) + +.NET Core 3.1 was the first LTS with meaningful Arm64 support. + +Highlights were: + +- Initial JIT (Just-In-Time) optimizations for Arm64 (but the bulk of Arm throughput work arrived in .NET 5). +- Faster garbage collection thanks to refinements to the background GC mode. +- Initial set of Arm64 hardware intrinsics (AdvSIMD, AES, CRC32) exposed in `System.Runtime.Intrinsics`. + +## .NET 5 (end-of-life 2022) + +With .NET 5 Microsoft started the “one .NET” unification. Even though it had only 18 months of support, it delivered notable Arm gains: + +- Cross-gen2 shipped, delivering better Arm64 code quality (but only became the default in .NET 6). +- Single-file application publishing (with optional IL-trimming) simplified deployment to Arm edge devices. +- Major ASP.NET Core throughput wins on Arm64 (Kestrel & gRPC) compared with .NET Core 3.1. + +## .NET 6 (end-of-life 2024) + +.NET 6 laid the foundation for the modern performance story on Arm64: + +- Tiered PGO entered preview, combining tiered compilation with profile-guided optimization. +- Better scalability on many-core Arm servers thanks to the new ThreadPool implementation. +- First-class support for Apple M1, enabling full .NET development on Arm-based macOS, as well as for Windows Arm64. + + +## .NET 7 (end-of-life 2024) + +.NET 7 was an STS (Standard-Term Support) release which is now out of support, but it pushed the performance envelope and is therefore interesting from a historical perspective. + +Key highlights were: + +- General-availability of Native AOT publishing for console applications, producing self-contained, very small binaries with fast start-up on Arm64. +- Dynamic PGO (Profile-Guided Optimization) and On-Stack Replacement became the default, letting the JIT optimise the hottest code paths based on real run-time data. +- New Arm64 hardware intrinsics (e.g. SHA-1/SHA-256, AES, CRC-32) exposed through System.Runtime.Intrinsics, enabling high-performance crypto workloads. + +## .NET 8 (current LTS – support until November 2026) + +.NET 8 is the current Long-Term Support release and should be your baseline for new production workloads. + +Important Arm-related improvements include: + +- Native AOT support for ASP.NET Core, trimming enhancements and even smaller self-contained binaries, translating into faster cold-start for containerized Arm services. +- Further JIT tuning for Arm64 delivering single-digit to low double-digit throughput gains in real-world benchmarks. +- Smaller base container images (`mcr.microsoft.com/dotnet/aspnet:8.0` and `…/runtime:8.0`) thanks to a redesigned layering strategy, particularly beneficial on Arm where network bandwidth is often at a premium. +- Garbage-collector refinements that reduce pause times on highly-threaded, many-core servers. + +## .NET 9 + +.NET 9 is still in preview, so features may change, but public builds already show promising Arm-centric updates: + +- PGO is now enabled for release builds by default and its heuristics have been retuned for Arm workloads, yielding notable throughput improvements with zero developer effort. +- The JIT has started to exploit Arm v9 instructions such as SVE2 where hardware is available, opening the door to even wider SIMD operations. +- C# 13 and F# 8 previews ship with the SDK, bringing useful productivity improvements fully supported on Arm devices. + +Although .NET 9 will receive only 18 months of support, it is an excellent choice when you need the very latest performance improvements or want to trial new language/runtime capabilities ahead of the next LTS. + +## .NET 10 (preview – next LTS) + +.NET 10 is still in preview and will likely evolve prior to its GA release, but it will be the next LTS version of .NET, with the following benefits: + +- [Extended SVE2 intrinsics](https://github.com/dotnet/runtime/issues/109652) to unlock efficient implementation of large-scale numerical algorithms and on-device AI inference on Arm v9. +- C# 14 is expected to ship alongside .NET 10, bringing additional compile-time metaprogramming features that can reduce boilerplate on resource-constrained Arm edge devices. + +Developers targeting Arm-based systems should track preview builds and roadmap updates closely to validate feature availability and compatibility with their target platforms. + diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_index.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_index.md new file mode 100644 index 0000000000..24f910e5ec --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_index.md @@ -0,0 +1,58 @@ +--- +title: Migrate a .NET application to Azure Cobalt 100 + + +minutes_to_complete: 25 + +who_is_this_for: This is an advanced topic for .NET developers who want to take advantage of the performance and cost benefits of Azure Cobalt processors. + +learning_objectives: + - Build and run a basic OrchardCore CMS application + - Integrate a simple C shared library into a .NET application + - Configure architecture-agnostic builds using AnyCPU + - Evaluate the performance of different .NET versions + +prerequisites: + - A Microsoft Azure account with permissions to deploy virtual machines + - .NET SDK 8.0 or later + - Basic knowledge of C and C# + - GCC installed (Linux) or access to a cross-compiler + - OrchardCore application created using the .NET CLI or Visual Studio + +author: Joe Stech + +### Tags +skilllevels: Advanced +subjects: Performance and Architecture +armips: + - Neoverse +tools_software_languages: + - .NET + - OrchardCore + - C +operatingsystems: + - Linux + + +further_reading: + - resource: + title: OrchardCore documentation + link: https://docs.orchardcore.net/ + type: documentation + - resource: + title: OrchardCore GitHub Repository + link: https://github.com/OrchardCMS/OrchardCore + type: documentation + - resource: + title: .NET documentation + link: https://learn.microsoft.com/en-us/dotnet/ + type: documentation + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/dotnet-migration/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_index.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_index.md new file mode 100644 index 0000000000..820701deea --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_index.md @@ -0,0 +1,41 @@ +--- +title: Go Benchmarks with Sweet and Benchstat + +draft: true +cascade: + draft: true + +minutes_to_complete: 60 + +who_is_this_for: This is an introductory topic for developers who are interested in measuring the performance of Go-based applications on Arm-based servers. + +learning_objectives: + - Learn how to start up Arm64 and x64 instances of GCP VMs + - Install Go, benchmarks, benchstat, and sweet on the two VMs + - Use sweet and benchstat to compare the performance of Go applications on the two VMs + +prerequisites: + - A [Google Cloud account](https://console.cloud.google.com/). This learning path can be run on on-prem or on any cloud provider instance, but specifically documents the process for running on Google Axion. + - A local machine with [Google Cloud CLI](/install-guides/gcloud/) installed. + +author: Geremy Cohen + +### Tags +skilllevels: Introductory +subjects: Performance and Architecture +armips: + - Neoverse +cloud_service_providers: Google Cloud +tools_software_languages: + - Go +operatingsystems: + - Linux + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_next-steps.md new file mode 100644 index 0000000000..3b237f0d1c --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 999 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4_vm.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4_vm.md new file mode 100644 index 0000000000..2885faa9ca --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4_vm.md @@ -0,0 +1,30 @@ +--- +title: Launching a Intel Emerald Rapids Instance +weight: 30 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Section Overview +In this section, you will set up the second benchmarking system, an Intel Emerald Rapids `c4-standard-8` instance. + +## Creating the Instance + +To create the second system, follow the previous lesson's c4a install instructions, but make the following changes: + +1. **Name your instance:** For the `Name` field, enter "c4". + +2. **Select machine series:** Scroll down to the Machine series section, and select the C4 radio button. + +![](images/launch_c4/3.png) + +3. **View machine types:** Scroll down to the Machine type dropdown, and click it to show all available options. + +![](images/launch_c4/4.png) + +4. **Choose machine size:** Select "c4-standard-8" under the Standard tab. + +![](images/launch_c4/5.png) + +After the c4 instance starts up, you are ready to continue to the next section, where you'll install the benchmarking software. diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4a_vm.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4a_vm.md new file mode 100644 index 0000000000..106352dc7c --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/add_c4a_vm.md @@ -0,0 +1,67 @@ +--- +title: Launching a Google Axion Instance +weight: 20 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Overview +In this section, you'll learn how to spin up the first of two different VMs used for benchmarking Go tests, an Arm-based Google Axion c4a-standard-4 (c4a for short). + +## Creating the c4a-standard-4 Instance + +1. **Access Google Cloud Console:** Navigate to [https://console.cloud.google.com/welcome](https://console.cloud.google.com/welcome) + +2. **Search for VM instances:** Click into the Search field. + +3. **Find VM Instances:** Start typing `vm` until the UI auto-completes `VM Instances`, then click it. + +![](images/launch_c4a/3.png) + +The VM Instances page appears. + +4. **Create a new instance:** Click `Create instance` + +![](images/launch_c4a/4.png) + +The Machine configuration page appears. + +5. **Name your instance:** Click the `Name` field, and enter "c4a" for the `Name`. + +![](images/launch_c4a/5.png) + +6. **Select machine series:** Scroll down to the Machine series section, and select the C4A radio button. + +![](images/launch_c4a/7.png) + +7. **View machine types:** Scroll down to the Machine type dropdown, and click it to show all available options. + +![](images/launch_c4a/8.png) + +8. **Choose machine size:** Select "c4a-standard-4" under the Standard tab. + +![](images/launch_c4a/9.png) + +9. **Configure storage:** Click the "OS and Storage" tab. + +![](images/launch_c4a/10.png) + +10. **Modify storage settings:** Click "Change" + +![](images/launch_c4a/11.png) + +11. **Set disk size:** Double-click the "Size (GB)" field, then enter "1000" for the value. + +![](images/launch_c4a/16.png) + +12. **Confirm storage settings:** Click "Select" to continue. + +![](images/launch_c4a/18.png) + +13. **Launch the instance:** Click "Create" to bring up the instance. + +![](images/launch_c4a/19.png) + +After a few seconds, your c4a instance starts up, and you are ready to continue to the next section. In the next step, you will launch the second VM, an Intel-based Emerald Rapids c4-standard-8 (c4 for short), which will serve as the comparison system for our benchmarking tests. + diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/11.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/11.png new file mode 100644 index 0000000000..daf435ca54 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/11.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/12.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/12.png new file mode 100644 index 0000000000..2272290d64 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/12.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/13.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/13.png new file mode 100644 index 0000000000..9011f185f3 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/13.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/15.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/15.png new file mode 100644 index 0000000000..c4acc35c47 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/15.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/16.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/16.png new file mode 100644 index 0000000000..a8b317beae Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/16.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/3.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/3.png new file mode 100644 index 0000000000..108c89e021 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/3.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/4.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/4.png new file mode 100644 index 0000000000..502d70c81e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/4.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/5.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/5.png new file mode 100644 index 0000000000..0cc7da7c9c Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4/5.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/0.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/0.png new file mode 100644 index 0000000000..854c12ef57 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/0.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/1.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/1.png new file mode 100644 index 0000000000..9a854d354f Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/1.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/10.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/10.png new file mode 100644 index 0000000000..d3db445885 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/10.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/11.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/11.png new file mode 100644 index 0000000000..fd9649cdaa Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/11.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/16.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/16.png new file mode 100644 index 0000000000..121dc7b71a Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/16.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/18.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/18.png new file mode 100644 index 0000000000..2dc9951856 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/18.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/19.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/19.png new file mode 100644 index 0000000000..d46999f111 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/19.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/3.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/3.png new file mode 100644 index 0000000000..91df042524 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/3.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/4.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/4.png new file mode 100644 index 0000000000..cadc235255 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/4.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/5.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/5.png new file mode 100644 index 0000000000..709c0bd493 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/5.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/7.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/7.png new file mode 100644 index 0000000000..8879a456ec Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/7.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/8.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/8.png new file mode 100644 index 0000000000..58c5ae19e9 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/8.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/9.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/9.png new file mode 100644 index 0000000000..fdd0e1fa25 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/launch_c4a/9.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/1.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/1.png new file mode 100644 index 0000000000..272bd23f2b Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/1.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/2.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/2.png new file mode 100644 index 0000000000..a67077541d Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_auto/2.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/11.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/11.png new file mode 100644 index 0000000000..03b8028518 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/11.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/16.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/16.png new file mode 100644 index 0000000000..0e8ec05f54 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/16.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/17.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/17.png new file mode 100644 index 0000000000..4863740bd2 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/17.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/2.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/2.png new file mode 100644 index 0000000000..56d9f59e13 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/2.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/3.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/3.png new file mode 100644 index 0000000000..503f9b341c Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/3.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/4.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/4.png new file mode 100644 index 0000000000..f1888b0a57 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/4.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/5.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/5.png new file mode 100644 index 0000000000..3794c19753 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/5.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/6.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/6.png new file mode 100644 index 0000000000..084c5444e7 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/6.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/7.png b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/7.png new file mode 100644 index 0000000000..57cbd90d53 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/images/run_manually/7.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/installing_go_and_sweet.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/installing_go_and_sweet.md new file mode 100644 index 0000000000..2cb26b45ae --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/installing_go_and_sweet.md @@ -0,0 +1,139 @@ +--- +title: Installing Go and Sweet +weight: 40 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In this section, you'll install Go, Sweet, and the Benchstat comparison tool on both VMs. + +## Installation Script + +Sweet is a Go benchmarking tool that provides a standardized way to run performance tests across different systems. Benchstat is a companion tool that analyzes and compares benchmark results, helping you understand performance differences between systems. Together, these tools will allow you to accurately measure and compare Go performance on Arm and x86 architectures. + + +{{% notice Note %}} +Subsequent steps in the learning path assume you are running this script (installing) from your home directory (`$HOME`), resulting in the creation of a `$HOME/benchmarks/sweet` final install path. If you decide to install elsewhere, you will need to adjust the path accordingly when prompted to run the benchmark logic later in the learning path. +{{% /notice %}} + + +Start by copying and pasting the script below on **both** of your GCP VMs. This script checks the architecture of your running VM, installs the required Go package on your VM. It then installs sweet, benchmarks, and the benchstat tools. + +**You don't need to run it after pasting**, just paste it into your home directory and press enter to install all needed dependencies: + +```bash +#!/usr/bin/env bash + +# Write the script to filesystem using a HEREDOC +cat <<'EOF' > install_go_and_sweet.sh + +sudo apt-get -y update +sudo apt-get -y install git build-essential + +# Detect architecture - this allows the same script to work on both +# our Arm (c4a) and x86 (c4) VMs without modification +ARCH=$(uname -m) +case "$ARCH" in + arm64|aarch64) + GO_PKG="go1.24.2.linux-arm64.tar.gz" + ;; + x86_64|amd64) + GO_PKG="go1.24.2.linux-amd64.tar.gz" + ;; + *) + echo "Unsupported architecture: $ARCH" + exit 1 + ;; +esac + +# Download and install architecture-specific Go environments + +URL="https://go.dev/dl/${GO_PKG}" +echo "Downloading $URL..." +wget -q --show-progress "$URL" + +echo "Extracting $GO_PKG to /usr/local..." +sudo tar -C /usr/local -xzf "$GO_PKG" + + +echo "Go 1.24.2 installed successfully for $ARCH." + +export GOPATH=$HOME/go +export GOBIN=$GOPATH/bin +export PATH=$PATH:$GOBIN:/usr/local/go/bin + +# Install Sweet, benchmarks, and benchstat tools +go install golang.org/x/benchmarks/sweet/cmd/sweet@latest +go install golang.org/x/perf/cmd/benchstat@latest + +git clone https://github.com/golang/benchmarks +cd benchmarks/sweet +sweet get -force # to get assets + +# Create a configuration file + +cat < config.toml +[[config]] + name = "arm-benchmarks" + goroot = "/usr/local/go" + +CONFFILE + +EOF + +# Make the script executable +chmod 755 install_go_and_sweet.sh + +# Run the script +./install_go_and_sweet.sh + +``` + +The end of the output should look like: + +```output +Sweet v0.3.0: Go Benchmarking Suite + +Retrieves assets for benchmarks from GCS. + +Usage: sweet get [flags] + -cache string + cache location for assets (default "/home/pareena_verma_arm_com/.cache/go-sweet") + -clean + delete all cached assets before installing new ones + -copy string + location to extract assets into, useful for development + -version string + the version to download assets for (default "v0.3.0") +``` + + +## Verify Installation + +To test that everything is installed correctly, set the environment variables shown below on each VM: + +```bash +export GOPATH=$HOME/go +export GOBIN=$GOPATH/bin +export PATH=$PATH:$GOBIN:/usr/local/go/bin +``` +Now run the `markdown` benchmark with `sweet` on both VMs as shown: + +```bash +cd benchmarks/sweet +sweet get +sweet run -count 10 -run="markdown" config.toml # run one, 1X +``` + +You should see output similar to the following: + +```bash +# Example output: +[sweet] Work directory: /tmp/gosweet3444550660 +[sweet] Benchmarks: markdown (10 runs) +[sweet] Setting up benchmark: markdown +[sweet] Running benchmark markdown for arm-benchmarks: run 1 +... +[sweet] Running benchmark markdown for arm-benchmarks: run 10 +``` diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchmark.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchmark.md new file mode 100644 index 0000000000..281d6bc3a8 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchmark.md @@ -0,0 +1,34 @@ +--- +title: Manually running benchmarks +weight: 51 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In this section, you'll download the results of the benchmark you ran manually in the previous sections from each VM. You will use these results to understand how `sweet` and `benchstat` work together. + +## Download Benchmark Results from each VM +Lets walk through the steps to manually download the sweet benchmark results from your initial run on each VM. + + +1. **Locate results:** Change directory to the `results/markdown` directory and list the files to see the `arm-benchmarks.result` file: + + ```bash + cd results/markdown + ls -d $PWD/* + ``` + +2. **Copy result path:** Copy the absolute pathname of `arm-benchmarks.result`. + +3. **Download results:** Click `DOWNLOAD FILE`, and paste the **ABSOLUTE PATHNAME** you just copied for the filename, and then click `Download`. This will download the benchmark results to your local machine. + + ![](images/run_manually/6.png) + +4. **Rename the file:** Once downloaded, on your local machine, rename this file to `c4a.result` so you can distinguish it from the x86 results you'll download later. This naming convention will help you clearly identify which results came from which architecture. You'll know the file downloaded successfully if you see the file in your Downloads directory with the name `c4a.result`, as well as the confirmation dialog in your browser: + + ![](images/run_manually/7.png) + +5. **Repeat for c4 instance:** Repeat steps 2-8 with your `c4` (x86) instance. Do everything the same, except after downloading the c4's `arm-benchmarks.result` file, rename it to `c4.result`. + +Now that you have the results from both VMs, in the next section, you'll learn how to use benchstat to analyze these results and understand the performance differences between the two architectures. diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchstat.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchstat.md new file mode 100755 index 0000000000..66ff075f26 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/manual_run_benchstat.md @@ -0,0 +1,121 @@ +--- +title: Manually running benchstat +weight: 52 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +You've successfully run and downloaded the benchmark results from both your Arm-based and x86-based VMs. In this section, you'll compare them to each other using the benchstat tool. + + +## Inspecting the Results Files + +With the results files downloaded to your local machine, if you're curious to what they look like, you can inspect them to understand better what `benchstat` is analyzing. + +1. **View raw results:** Open the `c4a.result` file in a text editor, and you'll see something like this: + + ![](images/run_manually/11.png) + +The file contains the results of the `markdown` benchmark run on the Arm-based c4a VM, showing time and memory stats taken for each iteration. If you open the `c4.result` file, you'll see similar results for the x86-based c4 VM. + +2. **Close the editor:** Close the text editor when done. + +## Running Benchstat to Compare Results + +To compare the results, you'll use `benchstat` to analyze the two result files you downloaded. Since all the prerequisites are already installed on the `c4` and `c4a` instances, benchstat will be run from one of those instances. + + +1. **Create working directory:** Make a temporary benchstat directory to hold the results files on either the c4a or c4 instance, and change directory into it: + + ```bash + mkdir benchstat_results + cd benchstat_results + ``` + +2. **Upload result files:** Click the `UPLOAD FILE` button in the GCP console, and upload the `c4a.results` AND `c4.results` files you downloaded earlier. (This uploads them to your home directory, not to the current directory.) + + ![](images/run_manually/16.png) + +3. **Verify upload:** You'll know it worked correctly via the confirmation dialog in your terminal: + + ![](images/run_manually/17.png) + +4. **Move files to working directory:** Move the results files to the `benchstat_results` directory, and confirm their presence: + + ```bash + mv ~/c4a.results ~/c4.results . + ls -al + ``` + + You should see both files in the `benchstat_results` directory: + + ```bash + c4.results c4a.results + ``` + +5. **Run benchstat:** Now you can run `benchstat` to compare the two results files: + + ```bash + export GOPATH=$HOME/go + export GOBIN=$GOPATH/bin + export PATH=$PATH:$GOBIN:/usr/local/go/bin + benchstat c4a.results c4.results > c4a_vs_c4.txt + ``` + +6. **View comparison results:** Run the `cat` command to view the results: + + ```bash + cat c4a_vs_c4.txt + ``` + + You should see output similar to the following: + + ```output + │ c4a.results │ c4.results │ + │ sec/op │ sec/op vs base │ + MarkdownRenderXHTML-48 143.9m ± 1% + MarkdownRenderXHTML-96 158.3m ± 0% + geomean 143.9m 158.3m ? ¹ ² + ¹ benchmark set differs from baseline; geomeans may not be comparable + ² ratios must be >0 to compute geomean + + │ c4a.results │ c4.results │ + │ average-RSS-bytes │ average-RSS-bytes vs base │ + MarkdownRenderXHTML-48 22.49Mi ± 6% + MarkdownRenderXHTML-96 24.78Mi ± 2% + geomean 22.49Mi 24.78Mi ? ¹ ² + ¹ benchmark set differs from baseline; geomeans may not be comparable + ² ratios must be >0 to compute geomean + + │ c4a.results │ c4.results │ + │ peak-RSS-bytes │ peak-RSS-bytes vs base │ + MarkdownRenderXHTML-48 23.67Mi ± 4% + MarkdownRenderXHTML-96 25.11Mi ± 7% + geomean 23.67Mi 25.11Mi ? ¹ ² + ¹ benchmark set differs from baseline; geomeans may not be comparable + ² ratios must be >0 to compute geomean + + │ c4a.results │ c4.results │ + │ peak-VM-bytes │ peak-VM-bytes vs base │ + MarkdownRenderXHTML-48 1.176Gi ± 0% + MarkdownRenderXHTML-96 1.176Gi ± 0% + geomean 1.176Gi 1.176Gi ? ¹ ² + ¹ benchmark set differs from baseline; geomeans may not be comparable + ² ratios must be >0 to compute geomean + ``` + + This output shows the performance differences between the two VMs for the `markdown` benchmark in text format. The key metrics to observe are: + + - **sec/op**: Shows the execution time per operation (lower is better) + - **average-RSS-bytes**: Shows the average resident set size memory usage (lower is better) + - **peak-RSS-bytes**: Shows the maximum resident set size memory usage (lower is better) + - **peak-VM-bytes**: Shows the maximum virtual memory usage (lower is better) + + In this example, you can see that the c4a (Arm) instance completed the markdown benchmark in 143.9m seconds, while the c4 (x86) instance took 158.3m seconds, indicating better performance on the Arm system for this particular workload. + + If you wanted the results in CSV format, you could run the `benchstat` command with the `-format csv` option instead. + +At this point, you can download the `c4a_vs_c4.txt` for further analysis or reporting. You can also run the same or different benchmarks with the same, or different combinations of VMs, and continue comparing results using `benchstat`. + +In the next section, you will learn how to automate and gain enhanced visuals with sweet and benchstat. diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/overview.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/overview.md new file mode 100644 index 0000000000..4a5995608f --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/overview.md @@ -0,0 +1,34 @@ +--- +title: Overview +weight: 10 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +# Go Benchmarking Overview + +In this section, you will learn how to measure, collect, and compare Go performance data across different CPU architectures. This knowledge is essential for developers and system architects who need to make informed decisions about infrastructure choices for their Go applications. + +You'll gain hands-on experience with: + +- **Go Benchmarks**, a collection of pre-written benchmark definitions that standardizes performance tests for popular Go applications, leveraging Go's built-in benchmark support. + +- **Sweet**, a benchmark runner that automates running Go benchmarks across multiple environments, collecting and formatting results for comparison. + +- **Benchstat**, a statistical comparison tool that analyzes benchmark results to identify meaningful performance differences between systems. + +Benchmarking is critical for modern software development because it allows you to: +- Quantify the performance impact of code changes +- Compare different hardware platforms objectively +- Make data-driven decisions about infrastructure investments +- Identify optimization opportunities in your applications + +You'll use Intel c4-standard-8 and Arm-based c4a-standard-4 (both four-core) instances running on GCP to run and compare benchmarks using these tools. + +{{% notice Note %}} +Arm-based c4a-standard-4 instances and Intel-based c4-standard-8 instances both utilize four cores. Both instances are categorized by GCP as members of the **consistently high performing** series; the main difference between the two is that the c4a has 16 GB of RAM, while the c4 has 30 GB of RAM. We've chosen to keep CPU cores equivalent across the two instances of the same series to keep the comparison as close as possible. +{{% /notice %}} + + + diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/requirements.txt b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/requirements.txt new file mode 100644 index 0000000000..62f9d150da --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/requirements.txt @@ -0,0 +1,3 @@ +pandas==2.1.0 +plotly==5.13.1 +typer==0.9.0 \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_install.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_install.md new file mode 100644 index 0000000000..86882fcfd8 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_install.md @@ -0,0 +1,59 @@ +--- +title: Installing the Automated Benchmark and Benchstat Runner +weight: 53 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In the last section, you learned how to run benchmarks and benchstat manually. Now you'll learn how to run them automatically, with enhanced visualization of the results. + +## Introducing rexec_sweet.py + +The `rexec_sweet.py` script is a powerful automation tool that simplifies the benchmarking workflow. This tool connects to your GCP instances, runs the benchmarks, collects the results, and generates comprehensive reports—all in one seamless operation. It provides several key benefits: + +- **Automation**: Runs benchmarks on multiple VMs without manual SSH connections +- **Consistency**: Ensures benchmarks are executed with identical parameters +- **Visualization**: Generates HTML reports with interactive charts for easier analysis + +The only dependency you are responsible for satisfying before the script runs is completion of the "Installing Go and Sweet" sections of this learning path. Additional dependencies are dynamically loaded at install time by the install script. + +## Setting up rexec_sweet + +1. **Create a working directory:** On your local machine, open a terminal, then create and change into a directory to store the `rexec_sweet.py` script and related files: + + ```bash + mkdir rexec_sweet + cd rexec_sweet + ``` + +2. **Clone the repository inside the directory:** Get the `rexec_sweet.py` script from the GitHub repository: + + ```bash + git clone https://github.com/geremyCohen/go_benchmarks.git + cd go_benchmarks + ``` + +3. **Run the installer:** Copy and paste this command into your terminal to run the installer: + + ```bash + ./install.sh + ``` + + If the install.sh script detects that you already have dependencies installed, it may ask you if you wish to reinstall them with the following prompt as shown: + + ```output + pyenv: /Users/gercoh01/.pyenv/versions/3.9.22 already exists + continue with installation? (y/N) + ``` + + If you see this prompt, enter `N` (not `Y`!) to continue with the installation without modifying the existing installed dependencies. + +4. **Verify VM status:** Make sure the GCP VM instances you created in the previous section are running. If not, start them now, and give them a few minutes to come up. + +{{% notice Note %}} +The install script will prompt you to authenticate with Google Cloud Platform (GCP) using the gcloud command-line tool at the end of install. If after installing you have issues running the script and/or get GCP authentication errors, you can manually authenticate with GCP by running the following command: `gcloud auth login` +{{% /notice %}} + + +Continue on to the next section to run the script and see how it simplifies the benchmarking process. diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_run.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_run.md new file mode 100644 index 0000000000..b2cfbf4ba5 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/rexec_sweet_run.md @@ -0,0 +1,102 @@ +--- +title: Running the Automated Benchmark and Benchstat Runner +weight: 54 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +With `rexec_sweet` installed, your benchmarking instances running, and your localhost authenticated with GCP, you'll now see how to run benchmarks in an automated fashion. + +## Run an Automated Benchmark and Analysis + +1. **Run the script:** Execute the `rexec_sweet` script from your local terminal: + +```bash +rexec_sweet +``` + +2. **Select a benchmark:** The script will prompt you for the name of the benchmark you want to run. Press enter to run the default benchmark, which is `markdown` (this is the recommended benchmark to run the first time.) + +```bash +Available benchmarks: +1. biogo-igor +2. biogo-krishna +3. bleve-index +4. cockroachdb +5. esbuild +6. etcd +7. go-build +8. gopher-lua +9. markdown (default) +10. tile38 +Enter number (1-10) [default: markdown]: +``` + +3. **Select instances:** The script will proceed and call into GCP to detect all running VMs. You should see the script output: + +```output +Available instances: +1. c4 (will be used as first instance) +2. c4a (will be used as second instance) + +Do you want to run the first two instances found with default install directories? [Y/n]: +``` + +4. **Choose your configuration:** You have two options: + + - **Use default settings:** If you want to run benchmarks on the instances labeled with "will be used as nth instance", and you installed Go and Sweet into the default directories as noted in the tutorial, you can press Enter to accept the defaults. + + - **Custom configuration:** If you are running more than two instances, and the script doesn't suggest the correct two to autorun, or you installed Go and Sweet to non-default folders, select "n" and press Enter. The script will then prompt you to select the instances and runtime paths. + +In this example, we'll manually select the instances and paths as shown below: + +```output +Available instances: +1. c4 (will be used as first instance) +2. c4a (will be used as second instance) + +Do you want to run the first two instances found with default install directories? [Y/n]: n + +Select 1st instance: +1. c4 +2. c4a +Enter number (1-2): 1 +Enter remote path for c4 [default: ~/benchmarks/sweet]: + +Select 2nd instance: +1. c4 +2. c4a +Enter number (1-2): 2 +Enter remote path for c4a [default: ~/benchmarks/sweet]: +Output directory: /private/tmp/a/go_benchmarks/results/c4-c4a-markdown-20250610T190407 +... +``` + +Upon entering instance names and paths for the VMs, the script will automatically: + - Run the benchmark on both VMs + - Run `benchstat` to compare the results + - Push the results to your local machine + +```output +Running benchmarks on the selected instances... +[c4a] [sweet] Work directory: /tmp/gosweet3216239593 +[c4] [sweet] Work directory: /tmp/gosweet2073316306... +[c4a] ✅ benchmark completed +[c4] ✅ benchmark completed +... +Report generated in results/c4-c4a-markdown-20250610T190407 +``` + +5. **View the report:** Once on your local machine, `rexec_sweet` will generate an HTML report that will open automatically in your web browser. + + If you close the tab or browser, you can always reopen the report by navigating to the `results` subdirectory of the current working directory of the `rexec_sweet.py` script, and opening `report.html`. + +![](images/run_auto/2.png) + + +{{% notice Note %}} +If you see output messages from `rexec_sweet.py` similar to "geomeans may not be comparable" or "Dn: ratios must be >0 to compute geomean", this is expected and can be ignored. These messages indicate that the benchmark sets differ between the two VMs, which is common when running benchmarks on different hardware or configurations. +{{% /notice %}} + +6. **Analyze results:** Upon completion, the script will generate a report in the `results` subdirectory of the current working directory of the `rexec_sweet.py` script, which opens automatically in your web browser to view the benchmark results and comparisons. diff --git a/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/running_benchmarks.md b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/running_benchmarks.md new file mode 100644 index 0000000000..8ddf05bec3 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/running_benchmarks.md @@ -0,0 +1,111 @@ +--- +title: Benchmark Types and Metrics +weight: 50 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +With setup complete, you can now run and analyze the benchmarks. Before you do, it's good to understand all the different pieces in more detail. + +## Choosing a Benchmark to Run + +Whether running manually or automatically, the benchmarking process consists of two main steps: + +1. **Running benchmarks with Sweet**: `sweet` executes the benchmarks on each VM, generating raw performance data + +2. **Analyzing results with Benchstat**: `benchstat` compares the results from different VMs to identify performance differences. Benchstat can output results in text format (default) or CSV format. The text format provides a human-readable tabular view, while CSV allows for further processing with other tools. + +Sweet comes ready to run with the following benchmarks: + +| Benchmark | Description | Command | +|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------| +| **biogo-igor** | Processes pairwise alignment data using the biogo library, grouping repeat feature families and outputting results in JSON format. | `sweet run -count 10 -run="biogo-igor" config.toml` | +| **biogo-krishna** | Pure-Go implementation of the PALS algorithm for pairwise sequence alignment, measuring alignment runtime performance. | `sweet run -count 10 -run="biogo-krishna" config.toml` | +| **bleve-index** | Indexes a subset of Wikipedia articles into a Bleve full-text search index to assess indexing throughput and resource usage. | `sweet run -count 10 -run="bleve-index" config.toml` | +| **cockroachdb** | Executes CockroachDB KV workloads with varying read percentages (0%, 50%, 95%) and node counts (1 & 3) to evaluate database performance. | `sweet run -count 10 -run="cockroachdb" config.toml` | +| **esbuild** | Bundles and minifies JavaScript/TypeScript code using esbuild on a representative codebase to measure build speed and efficiency. | `sweet run -count 10 -run="esbuild" config.toml` | +| **etcd** | Uses the official etcd benchmarking tool to stress-test an etcd cluster, measuring request latency and throughput for key-value operations. | `sweet run -count 10 -run="etcd" config.toml` | +| **go-build** | Compiles a representative Go module (or the Go toolchain) to measure compilation time and memory (RSS) usage on supported platforms. | `sweet run -count 10 -run="go-build" config.toml` | +| **gopher-lua** | Executes Lua scripts using the GopherLua VM to benchmark the performance of a pure-Go Lua interpreter. | `sweet run -count 10 -run="gopher-lua" config.toml` | +| **markdown** | Parses and renders Markdown documents to HTML using a Go-based markdown library to evaluate parsing and rendering throughput. | `sweet run -count 10 -run="markdown" config.toml` | +| **tile38** | Stress-tests a Tile38 geospatial database with WITHIN, INTERSECTS, and NEARBY queries to measure spatial query performance. | `sweet run -count 10 -run="tile38" config.toml` | + +## Metrics Summary + +When running benchmarks, several key metrics are collected to evaluate performance. The following summarizes the most common metrics and their significance: + +### Seconds per Operation - Lower is better + +This metric measures the time taken to complete a single operation, indicating the raw speed of execution. It directly reflects the performance efficiency of a system for a specific task, making it one of the most fundamental benchmarking metrics. + +A system with lower seconds per operation completes tasks faster. + This metric primarily reflects CPU performance but can also be influenced by memory access speeds and I/O operations. If seconds per operation is the only metric showing significant difference while memory metrics are similar, the performance difference is likely CPU-bound. + +### Operations per Second - Higher is better +This metric provides a clear measure of system performance capacity, making it essential for understanding raw processing power and scalability potential. A system performing more operations per second has greater processing capacity. + +This metric reflects overall system performance including CPU speed, memory access efficiency, and I/O capabilities. + +If operations per second is substantially higher while memory usage remains proportional, the system likely has superior CPU performance. High operations per second with disproportionately high memory usage may indicate performance gains through memory-intensive optimizations. A system showing higher operations per second but also higher resource consumption may be trading efficiency for raw speed. + +This metric is essentially the inverse of "seconds per operation" and provides a more intuitive way to understand throughput capacity. + +### Average RSS Bytes - Lower is better + +Resident Set Size (RSS) represents the portion of memory occupied by a process that is held in RAM (not swapped out). It shows the typical memory footprint during operation, indicating memory efficiency and potential for scalability. + +Lower average RSS indicates more efficient memory usage. A system with lower average RSS can typically handle more concurrent processes before memory becomes a bottleneck. This metric reflects both algorithm efficiency and memory management capabilities. + +If one VM has significantly higher seconds per operation but lower RSS, it may be trading speed for memory efficiency. Systems with similar CPU performance but different RSS values indicate different memory optimization approaches; lower RSS with similar CPU performance suggests better memory management, which is a critical indicator of performance in memory-constrained environments. + +### Peak RSS Bytes - Lower is better + +Peak RSS bytes is the maximum Resident Set Size reached during execution, representing the worst-case memory usage scenario. The peak RSS metric helps to understand memory requirements and potential for memory-related bottlenecks during intensive operations. + +Lower peak RSS indicates better handling of memory-intensive operations. High peak RSS values can indicate memory spikes that might cause performance degradation through swapping or out-of-memory conditions. + +Large differences between average and peak RSS suggest memory usage volatility. A system with lower peak RSS but similar performance is better suited for memory-constrained environments. + +### Peak VM Bytes - Lower is better + +Peak VM Bytes is the maximum Virtual Memory size used, including both RAM and swap space allocated to the process. + +Lower peak VM indicates more efficient use of the total memory address space. High VM usage with low RSS suggests the system is effectively using virtual memory management. Extremely high VM with proportionally low RSS might indicate memory fragmentation issues. + +If peak VM is much higher than peak RSS, the system is relying heavily on virtual memory management. Systems with similar performance but different VM usage patterns may have different memory allocation strategies. High VM with performance degradation suggests potential memory-bound operations due to excessive paging. + +## Summary of Efficiency Indicators + +When comparing metrics across two systems, keep the following in mind: + +### CPU-bound vs Memory-bound +A system is likely CPU-bound if seconds per operation differs significantly while memory metrics remain similar. + +A system is likely memory-bound if performance degrades as memory metrics increase, especially when peak RSS approaches available physical memory. + +### Efficiency Indicators +The ideal system shows lower values across all metrics - faster execution with smaller memory footprint. Systems with similar seconds per operation but significantly different memory metrics indicate different optimization priorities. + +### Scalability Potential +Lower memory metrics (especially peak values) suggest better scalability for concurrent workloads. Systems with lower seconds per operation but higher memory usage may perform well for single tasks but scale poorly. + +### Optimization Targets +Large gaps between average and peak memory usage suggest opportunities for memory optimization. High seconds per operation with low memory usage suggests CPU optimization potential. + +## Best Practices when benchmarking across different instance types + +Here are some general tips to keep in mind as you explore benchmarking across different apps and instance types: + +- Unlike Intel and AMD processors that use hyper-threading, Arm processors provide single-threaded cores without hyper-threading. A four-core Arm processor has four independent cores running four threads, while an four-core Intel processor provides eight logical cores through hyper-threading. This means each Arm vCPU represents a full physical core, while each Intel/AMD vCPU represents half a physical core. For fair comparison, this learning path uses a 4-vCPU Arm instance against an 8-vCPU Intel instance. When scaling up instance sizes during benchmarking, make sure to keep a 2:1 Intel/AMD:Arm vCPU ratio if you wish to keep parity on CPU resources. + +- It's suggested to run each benchmark at least 10 times (specified via the `count` parameter) to handle outlier/errant runs and ensure statistical significance. + +- Results may be bound by CPU, memory, or I/O performance. If you see significant differences in one metric but not others, it may indicate a bottleneck in that area; running the same benchmark with different configurations (e.g., more CPU cores, more memory) can help identify the bottleneck. + + + + + + + diff --git a/content/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama.md b/content/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama.md index 3762aebb73..0740412780 100644 --- a/content/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama.md +++ b/content/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama.md @@ -7,7 +7,7 @@ layout: learningpathall --- ## Before you begin -The instructions in this Learning Path are for any Arm server running Ubuntu 22.04 LTS. You need an Arm server instance with at least 16 cores and 64GB of RAM to run this example. Configure disk storage up to at least 50 GB. The instructions have been tested on an AWS Graviton4 r8g.4xlarge instance. +The instructions in this Learning Path are for any Arm server running Ubuntu 24.04 LTS. You need an Arm server instance with at least 16 cores and 64GB of RAM to run this example. Configure disk storage up to at least 50 GB. The instructions have been tested on an AWS Graviton4 r8g.4xlarge instance. ## Overview Arm CPUs are widely used in traditional ML and AI use cases. In this Learning Path, you learn how to run generative AI inference-based use cases like a LLM chatbot using PyTorch on Arm-based CPUs. PyTorch is a popular deep learning framework for AI applications. @@ -31,16 +31,17 @@ source torch_env/bin/activate ``` ### Install PyTorch and optimized libraries -Torchchat is a library developed by the PyTorch team that facilitates running large language models (LLMs) seamlessly on a variety of devices. TorchAO (Torch Architecture Optimization) is a PyTorch library designed for enhancing the performance of ML models through different quantization and sparsity methods. +Torchchat is a library developed by the PyTorch team that facilitates running large language models (LLMs) seamlessly on a variety of devices. TorchAO (Torch Architecture Optimization) is a PyTorch library designed for enhancing the performance of ML models through different quantization and sparsity methods. -Start by cloning the torchao and torchchat repositories and then applying the Arm specific patches: +Start by installing pytorch and cloning the torchao and torchchat repositories: ```sh +pip install torch git clone --recursive https://github.com/pytorch/ao.git cd ao -git checkout 174e630af2be8cd18bc47c5e530765a82e97f45b -wget https://raw.githubusercontent.com/ArmDeveloperEcosystem/PyTorch-arm-patches/main/0001-Feat-Add-support-for-kleidiai-quantization-schemes.patch -git apply --whitespace=nowarn 0001-Feat-Add-support-for-kleidiai-quantization-schemes.patch +git checkout e1cb44ab84eee0a3573bb161d65c18661dc4a307 +curl -L https://github.com/pytorch/ao/commit/738d7f2c5a48367822f2bf9d538160d19f02341e.patch | git apply +python3 setup.py install cd ../ git clone --recursive https://github.com/pytorch/torchchat.git @@ -50,18 +51,9 @@ wget https://raw.githubusercontent.com/ArmDeveloperEcosystem/PyTorch-arm-patches wget https://raw.githubusercontent.com/ArmDeveloperEcosystem/PyTorch-arm-patches/main/0001-Feat-Enable-int4-quantized-models-to-work-with-pytor.patch git apply 0001-Feat-Enable-int4-quantized-models-to-work-with-pytor.patch git apply --whitespace=nowarn 0001-modified-generate.py-for-cli-and-browser.patch +sed -i 's/"groupsize": 0/"groupsize": 32/' config/data/aarch64_cpu_channelwise.json pip install -r requirements.txt ``` -{{% notice Note %}} You will need Python version 3.10 to apply these patches. This is the default version of Python installed on an Ubuntu 22.04 Linux machine. {{% /notice %}} - -You will now override the installed PyTorch version with a specific version of PyTorch required to take advantage of Arm KleidiAI optimizations: - -``` -wget https://github.com/ArmDeveloperEcosystem/PyTorch-arm-patches/raw/main/torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -pip install --force-reinstall torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -cd .. -pip uninstall torchao && cd ao/ && rm -rf build && python setup.py install -``` ### Login to Hugging Face You can now download the LLM. @@ -73,7 +65,7 @@ pip install -U "huggingface_hub[cli]" [Generate an Access Token](https://huggingface.co/settings/tokens) to authenticate your identity with Hugging Face Hub. A token with read-only access is sufficient. -Log in to the Hugging Face repository and enter your Access Token key from Hugging face. +Log in to the Hugging Face repository and enter your Access Token key from Hugging face. ```sh huggingface-cli login @@ -86,8 +78,7 @@ In this step, you will download the [Meta Llama3.1 8B Instruct model](https://hu ```sh -cd ../torchchat -python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --quantize config/data/aarch64_cpu_channelwise.json --device cpu --max-seq-length 1024 +python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --quantize config/data/aarch64_cpu_channelwise.json --device cpu --max-seq-length 1024 ``` The output from this command should look like: @@ -108,7 +99,7 @@ You can now run the LLM on the Arm CPU on your server. To run the model inference: ```sh -LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --device cpu --max-new-tokens 32 --chat +LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --device cpu --max-new-tokens 32 --chat ``` The output from running the inference will look like: @@ -140,4 +131,4 @@ Bandwidth achieved: 254.17 GB/s *** This first iteration will include cold start effects for dynamic import, hardware caches. *** ``` -You have successfully run the Llama3.1 8B Instruct Model on your Arm-based server. In the next section, you will walk through the steps to run the same chatbot in your browser. +You have successfully run the Llama3.1 8B Instruct Model on your Arm-based server. In the next section, you will walk through the steps to run the same chatbot in your browser. diff --git a/data/stats_current_test_info.yml b/data/stats_current_test_info.yml index b5f27fbeb1..8caea4bc62 100644 --- a/data/stats_current_test_info.yml +++ b/data/stats_current_test_info.yml @@ -1,5 +1,5 @@ summary: - content_total: 373 + content_total: 379 content_with_all_tests_passing: 0 content_with_tests_enabled: 61 sw_categories: @@ -63,7 +63,8 @@ sw_categories: tests_and_status: [] aws-q-cli: readable_title: Amazon Q Developer CLI - tests_and_status: [] + tests_and_status: + - ubuntu:latest: passed azure-cli: readable_title: Azure CLI tests_and_status: [] diff --git a/data/stats_weekly_data.yml b/data/stats_weekly_data.yml index c891aa921e..16a2337ba6 100644 --- a/data/stats_weekly_data.yml +++ b/data/stats_weekly_data.yml @@ -6227,3 +6227,113 @@ avg_close_time_hrs: 0 num_issues: 14 percent_closed_vs_total: 0.0 +- a_date: '2025-06-16' + content: + automotive: 2 + cross-platform: 33 + embedded-and-microcontrollers: 41 + install-guides: 101 + iot: 6 + laptops-and-desktops: 38 + mobile-graphics-and-gaming: 34 + servers-and-cloud-computing: 124 + total: 379 + contributions: + external: 96 + internal: 505 + github_engagement: + num_forks: 30 + num_prs: 9 + individual_authors: + adnan-alsinan: 2 + alaaeddine-chakroun: 2 + albin-bernhardsson: 1 + alex-su: 1 + alexandros-lamprineas: 1 + andrew-choi: 2 + andrew-kilroy: 1 + annie-tallund: 4 + arm: 3 + arnaud-de-grandmaison: 4 + arnaud-de-grandmaison.: 1 + aude-vuilliomenet: 1 + avin-zarlez: 1 + barbara-corriero: 1 + basma-el-gaabouri: 1 + ben-clark: 1 + bolt-liu: 2 + brenda-strech: 1 + chaodong-gong: 1 + chen-zhang: 1 + christophe-favergeon: 1 + christopher-seidl: 7 + cyril-rohr: 1 + daniel-gubay: 1 + daniel-nguyen: 2 + david-spickett: 2 + dawid-borycki: 33 + diego-russo: 2 + dominica-abena-o.-amanfo: 1 + elham-harirpoush: 2 + florent-lebeau: 5 + "fr\xE9d\xE9ric--lefred--descamps": 2 + gabriel-peterson: 5 + gayathri-narayana-yegna-narayanan: 1 + georgios-mermigkis: 1 + geremy-cohen: 2 + gian-marco-iodice: 1 + graham-woodward: 1 + han-yin: 1 + iago-calvo-lista: 1 + james-whitaker: 1 + jason-andrews: 103 + joe-stech: 6 + johanna-skinnider: 2 + jonathan-davies: 2 + jose-emilio-munoz-lopez: 1 + julie-gaskin: 5 + julio-suarez: 6 + jun-he: 1 + kasper-mecklenburg: 1 + kieran-hejmadi: 11 + koki-mitsunami: 2 + konstantinos-margaritis: 8 + kristof-beyls: 1 + leandro-nunes: 1 + liliya-wu: 1 + mark-thurman: 1 + masoud-koleini: 1 + mathias-brossard: 1 + michael-hall: 5 + na-li: 1 + nader-zouaoui: 2 + nikhil-gupta: 1 + nina-drozd: 1 + nobel-chowdary-mandepudi: 6 + odin-shen: 7 + owen-wu: 2 + pareena-verma: 46 + paul-howard: 3 + peter-harris: 1 + pranay-bakre: 5 + preema-merlin-dsouza: 1 + przemyslaw-wirkus: 2 + rin-dobrescu: 1 + roberto-lopez-mendez: 2 + ronan-synnott: 45 + shuheng-deng: 1 + thirdai: 1 + tianyu-li: 2 + tom-pilar: 1 + uma-ramalingam: 1 + varun-chari: 2 + visualsilicon: 1 + willen-yang: 1 + ying-yu: 2 + yiyang-fan: 1 + zach-lasiuk: 2 + zhengjun-xing: 2 + issues: + avg_close_time_hrs: 0 + num_issues: 17 + percent_closed_vs_total: 0.0 diff --git a/themes/arm-design-system-hugo-theme/layouts/_default/index.coveo.xml b/themes/arm-design-system-hugo-theme/layouts/_default/index.coveo.xml index be2a6be300..48cf8c2e17 100644 --- a/themes/arm-design-system-hugo-theme/layouts/_default/index.coveo.xml +++ b/themes/arm-design-system-hugo-theme/layouts/_default/index.coveo.xml @@ -173,7 +173,7 @@ {{- end -}} {{- if and (.File) (in .File.Path "learning-paths") -}} - + {{- if .IsSection -}} {{- with .Title -}} {{ . }} @@ -187,7 +187,7 @@ Unknown Parent Title {{- end -}} {{- end -}} - + 1 {{.Params.weight -}} {{- end -}}