diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md deleted file mode 100644 index d08f796580..0000000000 --- a/content/learning-paths/automotive/openadkit2_safetyisolation/1_functional_safety.md +++ /dev/null @@ -1,134 +0,0 @@ ---- -title: Functional Safety for automotive software development -weight: 2 - -### FIXED, DO NOT MODIFY -layout: learningpathall ---- - -## Why Functional Safety Matters in Automotive Software - -Functional Safety refers to a system's ability to detect potential faults and respond appropriately to ensure that the system remains in a safe state, preventing harm to individuals or damage to equipment. - -This is particularly important in automotive, autonomous driving, medical devices, industrial control, robotics and aerospace applications, where system failures can lead to severe consequences. - -In software development, Functional Safety focuses on minimizing risks through software design, testing, and validation to ensure that critical systems operate in a predictable, reliable, and verifiable manner. This means developers must consider: -- Error detection mechanisms -- Exception handling -- Redundancy design -- Development processes compliant with safety standards - -### Definition and Importance of Functional Safety - -The core of Functional Safety lies in risk management, which aims to reduce the impact of system failures. - -In autonomous vehicles, Functional Safety ensures that if sensor data is incorrect, the system can enter a safe state, preventing incorrect driving decisions. - -The three core objectives of Functional Safety are: -1. Prevention: Reducing the likelihood of errors through rigorous software development processes and testing. In the electric vehicle, the battery systems monitor temperature to prevent overheating. -2. Detection: Quickly identifying errors using built-in diagnostic mechanisms, such as built-in self-test. -3. Mitigation: Controlling the impact of failures to ensure the overall safety of the system. - -This approach is critical in applications such as autonomous driving, flight control, and medical implants, where failures can result in severe consequences. - -### ISO 26262: Automotive Functional Safety Standard - -ISO 26262 is a functional safety standard specifically for automotive electronics and software systems. It defines a comprehensive V-model aligned safety lifecycle, covering all phases from requirement analysis, design, development, testing, to maintenance. - -Key Concepts of ISO 26262: -- ASIL (Automotive Safety Integrity Level) - - Evaluates the risk level of different system components (A, B, C, D, where D represents the highest safety requirement). - - For example: ASIL A can be Dashboard light failure (low risk) and ASIL D is Brake system failure (high risk). -- HARA (Hazard Analysis and Risk Assessment) - - Analyzes hazards and assesses risks to determine necessary safety measures. -- Safety Mechanisms - - Includes real-time error detection, system-level fault tolerance, and defined fail-safe or fail-operational fallback states. - -Typical Application Scenarios: -- Autonomous Driving Systems: - - Ensures that even if sensors (e.g., LiDAR, radar, cameras) provide faulty data, the vehicle will not make dangerous decisions. -- Powertrain Control: - - Prevents braking system failures that could lead to loss of control. -- Battery Management System (BMS): - - Prevents battery overheating or excessive discharge in electric vehicles. - -### Common Use Cases of Functional Safety in Automotive - -- Autonomous Driving: - - Ensures the vehicle can operate safely or enter a fail-safe state when sensors like LiDAR, radar, or cameras malfunction. - - Functional Safety enables real-time fault detection and fallback logic to prevent unsafe driving decisions. - -- Powertrain Control: - - Monitors throttle and brake signals to prevent unintended acceleration or braking loss. - - Includes redundancy, plausibility checks, and emergency overrides to maintain control under failure conditions. - -- Battery Management Systems (BMS): - - Protects EV batteries from overheating, overcharging, or deep discharge. - - Safety functions include temperature monitoring, voltage balancing, and relay cut-off mechanisms to prevent thermal runaway. - -These use cases highlight the need for a dedicated architectural layer that can enforce Functional Safety principles with real-time guarantees. -A widely adopted approach in modern automotive platforms is the Safety Island—an isolated compute domain designed to execute critical control logic independently of the main system. - -### Safety Island: Enabling Functional Safety in Autonomous Systems - -In automotive systems, a General ECU (Electronic Control Unit) typically runs non-critical tasks such as infotainment or navigation, whereas a Safety Island is dedicated to executing safety-critical control logic (e.g., braking, steering) with strong isolation, redundancy, and determinism. - -The table below compares the characteristics of a General ECU and a Safety Island in terms of their role in supporting Functional Safety. - -| Feature | General ECU | Safety Island | -|------------------------|----------------------------|--------------------------------------| -| Purpose | Comfort / non-safety logic | Safety-critical decision making | -| OS/Runtime | Linux, Android | RTOS, Hypervisor, or bare-metal | -| Isolation | Soft partitioning | Hard isolation (hardware-enforced) | -| Functional Safety Req | None to moderate | ISO 26262 ASIL-B to ASIL-D compliant | -| Fault Handling | Best-effort recovery | Deterministic safe-state response | - -This contrast highlights why safety-focused software needs a dedicated hardware domain with certified execution behavior. - -Safety Island is an independent safety subsystem separate from the main processor. It is responsible for monitoring and managing system safety. If the main processor fails or becomes inoperable, Safety Island can take over critical safety functions such as deceleration, stopping, and fault handling to prevent catastrophic system failures. - -Key Capabilities of Safety Island -- System Health Monitoring - - Continuously monitors the operational status of the main processor (e.g., ADAS control unit, ECU) and detects potential errors or anomalies. -- Fault Detection and Isolation - - Independently evaluates and initiates emergency handling if the main processing unit encounters errors, overheating, computational failures, or unresponsiveness. -- Providing Essential Safety Functions - - Even if the main system crashes, Safety Island can still execute minimal safety operations, such as: - - Autonomous Vehicles → Safe stopping (Fail-Safe Mode) - - Industrial Equipment → Emergency power cutoff or speed reduction - -### Why Safety Island Matters for Functional Safety - -Safety Island plays a critical role in Functional Safety by ensuring that the system can handle high-risk scenarios and minimize catastrophic failures. - -How Safety Island Enhances Functional Safety -1. Acts as an Independent Redundant Safety Layer - - Even if the main system fails, it can still operate independently. -2. Supports ASIL-D Safety Level - - Monitors ECU health status and executes emergency safety strategies, such as emergency braking. -3. Provides Independent Fault Detection and Recovery Mechanisms - - Fail-Safe: Activates a safe mode, such as limiting vehicle speed or switching to manual control. - - Fail-Operational: Ensures that high-safety applications, such as aerospace systems, can continue operating under certain conditions. - -### Functional Safety in the Software Development Lifecycle - -Functional Safety impacts both hardware and software development, particularly in areas such as requirement changes, version management, and testing validation. -For example, in ASIL-D level applications, every code modification requires a complete impact analysis and regression testing to ensure that new changes do not introduce additional risks. - -### Functional Safety Requirements in Software Development - -These practices ensure the software development process meets industry safety standards and can withstand system-level failures: -- Requirement Specification - - Clearly defining safety-critical requirements and conducting risk assessments. -- Safety-Oriented Programming - - Following MISRA C, CERT C/C++ standards and using static analysis tools to detect errors. -- Fault Handling Mechanisms - - Implementing redundancy design and health monitoring to handle anomalies. -- Testing and Verification - - Using Hardware-in-the-Loop (HIL) testing to ensure software safety in real hardware environments. -- Version Management and Change Control - - Using Git, JIRA, Polarion to track changes for safety audits. - -By establishing an ASIL Partitioning software development environment and leveraging SOAFEE technologies, you can enhance software consistency and maintainability in Functional Safety applications. - -This Learning Path follows [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](/learning-paths/automotive/openadkit1_container/) and introduces Functional Safety design practices from the earliest development stages. \ No newline at end of file diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1a_functional_safety.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1a_functional_safety.md new file mode 100644 index 0000000000..9848838044 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1a_functional_safety.md @@ -0,0 +1,25 @@ +--- +title: Why functional safety matters in software systems +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## What functional safety means for developers + +Functional safety helps systems detect faults and respond in ways that keep people and equipment safe. It ensures that even when errors occur, the system transitions into a known, safe state to prevent harm. + +This concept is foundational in domains like automotive, autonomous driving, medical devices, industrial control, robotics, and aerospace. In these systems, failures can have severe real-world consequences. + +In software development, functional safety focuses on minimizing risks through careful design, rigorous testing, and thorough validation. The goal is to make sure that critical systems behave predictably, reliably, and verifiably, even under fault conditions. + +To design for functional safety, developers must consider: + +- Error detection mechanisms +- Exception handling strategies +- Redundant system design +- Development processes that align with safety standards + + +In the following sections, you'll learn how to apply these principles throughout the software lifecycle, from early risk assessment and architectural design to runtime isolation and ISO 26262 compliance. diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1b_purpose.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1b_purpose.md new file mode 100644 index 0000000000..7363e87f80 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1b_purpose.md @@ -0,0 +1,39 @@ +--- +title: Understand functional safety risks +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Manage risk with functional safety principles + +At its core, functional safety is about managing risk and reducing the impact of system failures. + +In autonomous vehicles, for example, functional safety ensures that if sensors produce unreliable or conflicting input, the vehicle can fall back to a known-safe state and maintain control. + +The three core objectives of functional safety are: + +- **Prevention** reduces the likelihood of errors through rigorous software development processes and testing. For example, electric vehicles monitor battery temperature to prevent overheating. +- **Detection** quickly identifies errors using built-in diagnostic mechanisms, such as self-test routines. +- **Mitigation** controls the impact of failures to ensure the system stays safe, even when something goes wrong. + +In practice, these principles might be implemented through: + +- Redundant sensor fusion code paths +- Timeout mechanisms for control loops +- Watchdog timers that reset components on fault detection +- Safe-state logic embedded in actuator control routines + +Together, prevention, detection, and mitigation form the foundation for building safer, more reliable software systems. + +In the next step, you’ll explore how functional safety principles are formalized through safety standards like ISO 26262 and applied to real-world systems. + + + + + + + + + diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1c_iso26262.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1c_iso26262.md new file mode 100644 index 0000000000..511c708777 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1c_iso26262.md @@ -0,0 +1,39 @@ +--- +title: Apply ISO 26262 and ASIL levels +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## What ISO 26262 and ASIL levels mean for developers + +ISO 26262 is a functional safety standard for automotive electronics and software systems. It defines a structured safety lifecycle aligned with the V-model, spanning everything from initial requirements to final validation and maintenance. + +The V-model is a development framework where each design phase is paired with a corresponding test phase. This structure makes it easier to trace safety requirements from early specifications through to system verification. + +## Key concepts in ISO 26262 + +- **Automotive Safety Integrity Level (ASIL)** ranks the safety criticality of components on a scale from A (lowest) to D (highest). For example, ASIL A might apply to a dashboard indicator failure, while ASIL D applies to a brake system malfunction. + +- **Hazard Analysis and Risk Assessment (HARA)** identifies potential hazards and evaluates their risks to define the required safety goals and ASIL levels. + +- **Safety mechanisms** include techniques such as real-time fault detection, redundancy, and fallback modes like fail-safe and fail-operational behavior. + +{{% notice Note %}}In practice, many OEMs default to ASIL D for systems with any potential for passenger harm, even if the statistical likelihood of failure is low.{{% /notice %}} + +## Apply ISO 26262 to real-world systems + +ISO 26262 applies to many safety-critical vehicle systems: + +- **Autonomous driving systems** must respond safely to sensor errors (such as LiDAR, radar, or camera faults). Functional safety ensures the vehicle can enter a safe state and avoid unsafe decisions. + +- **Powertrain control** systems monitor throttle and braking inputs. Safety mechanisms such as redundancy, plausibility checks, and overrides prevent unintended acceleration or loss of braking function. + +- **Battery management systems (BMS)** protect electric vehicle batteries from overheating, overcharging, or deep discharge. Built-in safety functions monitor temperature, balance voltage, and isolate faulty circuits to prevent thermal runaway. + +These systems require dedicated hardware and software architectures that enforce functional safety guarantees. One common solution is the *safety island*, which is an isolated compute domain used to run safety-critical control logic independently from the main system. + + + + + diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1d_safety_island_arch.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1d_safety_island_arch.md new file mode 100644 index 0000000000..5f2f5798d6 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1d_safety_island_arch.md @@ -0,0 +1,51 @@ +--- +title: Implement safety-critical isolation using safety island architecture +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## How safety islands support functional safety + +In automotive systems, a non-safety ECU (Electronic Control Unit) typically runs non-critical tasks such as infotainment or navigation. A safety island, by contrast, is dedicated to executing safety-critical control logic (for example, braking and steering) with strong isolation, redundancy, and determinism. + +The table below compares the characteristics of an ECU and a safety island in terms of their role in supporting functional safety. + +| Feature | ECU | Safety island | +|------------------------|----------------------------|--------------------------------------| +| Purpose | Comfort/non-safety logic | Safety-critical decision making | +| OS/runtime | Linux, Android | RTOS, hypervisor, or bare-metal | +| Isolation | Soft partitioning | Hardware-enforced isolation | +| Functional safety requirement | None to moderate | ISO 26262 ASIL-B to ASIL-D compliant | +| Fault handling | Best-effort recovery | Deterministic safe-state response | + +This comparison shows why safety-critical software depends on dedicated hardware domains to meet functional safety goals. + +If the main processor fails or becomes inoperable, a safety island can take over critical safety functions such as deceleration, stopping, and fault handling to prevent catastrophic system failures. + +{{% notice Tip %}} +Safety islands are often implemented as lockstep cores or separate MCUs that run on real-time operating systems (RTOS), offering guaranteed performance under fault conditions. +{{% /notice %}} + +## Key capabilities of a safety island +- **System health monitoring** continuously monitors the operational status of the main processor (for example, the ADAS control unit) and detects potential errors or anomalies +- **Fault detection and isolation** independently detects failures and initiates emergency handling for overheating, execution faults, or unresponsiveness +- **Essential safety functions conitnue to operate**, even if the main system crashes. A safety island can execute fallback operations, such as: + - Autonomous Vehicles → safe stopping (fail-safe mode) + - Industrial Equipment → emergency power cutoff or speed reduction + +## Why a safety island matters for functional safety + +A safety island provides a dedicated environment for executing critical safety functions. Its key characteristics include: + +- **Acting as an independent redundant safety layer** + - Operates safety logic independently of the main processor + +- **Supporting the ASIL-D safety level** + - Enables the system to meet the highest ISO 26262 requirements for critical operations + +- **Providing independent fault detection and recovery mechanisms:** + - *Fail-safe*: activating a minimal-risk mode, such as limiting vehicle speed or switching to manual control + - *Fail-operational*: allowing high-integrity systems, such as those in aerospace or autonomous driving, to continue functioning under fault conditions + +Safety islands play a key role in enabling ISO 26262 compliance by isolating safety-critical logic from general-purpose processing. They're a proven solution for improving system determinism, fault tolerance, and fallback behavior. \ No newline at end of file diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/1e_implement_functional_safety.md b/content/learning-paths/automotive/openadkit2_safetyisolation/1e_implement_functional_safety.md new file mode 100644 index 0000000000..a83767f462 --- /dev/null +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/1e_implement_functional_safety.md @@ -0,0 +1,35 @@ +--- +title: Functional safety for automotive software development +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## The software development lifecycle + +Functional safety affects both hardware and software development, particularly in areas such as requirement changes, version control, and test validation. For example, in ASIL-D level applications, every code change must go through a full impact analysis and regression testing to ensure it doesn't introduce new risks. + +## Software development practices for functional safety + +These practices ensure that software meets industry standards and can withstand system-level failures: +- **Defining requirements clearly** + - Specifying safety-critical requirements and conduct formal risk assessments. + +- **Following safety-oriented programming standards** + - Using MISRA C or CERT C/C++ and static analysis tools to detect unsafe behavior. + +- **Implementing fault-handling mechanisms** + - Using redundancy, health monitoring, and fail-safe logic to manage faults gracefully. + +- **Testing and verifying rigorously** + - Using Hardware-in-the-Loop (HIL) testing to validate behavior under realistic conditions. + +- **Tracking changes with version control and audits** + - Using tools like Git, JIRA, or Polarion to manage revisions and maintain traceability for audits. + +- **Building an ASIL-partitioned development environment and adopting SOAFEE technologies** to help improve software maintainability and ensure consistent compliance with functional safety standards. + +{{% notice Note %}} +This Learning Path builds on [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](/learning-paths/automotive/openadkit1_container/). It introduces functional safety practices from the earliest stages of software development. +{{% /notice %}} diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md b/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md index 19a36ba606..eabd2f6d9b 100644 --- a/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/2_data_distribution_service.md @@ -1,22 +1,27 @@ --- -title: How to Use Data Distribution Service (DDS) -weight: 3 +title: How to use Data Distribution Service (DDS) +weight: 7 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### Introduction to DDS +## Introduction to DDS Data Distribution Service (DDS) is a real-time, high-performance middleware designed for distributed systems. It is particularly valuable in automotive software development, including applications such as autonomous driving (AD) and advanced driver assistance systems (ADAS). -DDS offers a decentralized architecture that enables scalable, low-latency, and reliable data exchange, making it ideal for managing high-frequency sensor streams. +DDS uses a decentralized architecture that supports scalable, low-latency, and reliable data exchange, which is ideal for managing high-frequency sensor streams. In modern vehicles, multiple sensors such as LiDAR, radar, and cameras must continuously communicate with compute modules. DDS ensures these components share data seamlessly and in real time, both within the vehicle and across infrastructure such as V2X systems, including traffic lights and road sensors. -### Why Automotive Software Needs DDS +{{% notice Tip %}} +To get started with open-source DDS on Arm platforms, see the [Installation Guide for CycloneDDS](https://learn.arm.com/install-guides/cyclonedds). +{{% /notice %}} + + +## Why automotive software needs DDS Next-generation automotive software architectures, such as SOAFEE, depend on deterministic, distributed communication. Traditional client-server models introduce latency and create single points of failure. In contrast, DDS uses a publish-subscribe model that enables direct, peer-to-peer communication across system components. @@ -26,7 +31,7 @@ Additionally, DDS provides a flexible Quality of Service (QoS) configuration, al These capabilities make DDS an essential backbone for autonomous vehicle stacks, where real-time sensor fusion and control coordination are critical for safety and performance. -### DDS Architecture and Operation +## DDS architecture and operation DDS uses a data-centric publish-subscribe (DCPS) model, allowing producers and consumers of data to communicate without direct dependencies. This modular approach enhances system flexibility and maintainability, making it well suited for complex automotive environments. @@ -41,7 +46,7 @@ Each domain contains multiple topics, representing specific data types such as v For example, in an autonomous vehicle, LiDAR, radar, and cameras continuously generate large amounts of sensor data. The perception module subscribes to these sensor topics, processes the data, and then publishes detected objects and road conditions to other components like path planning and motion control. Since DDS automatically handles participant discovery and message distribution, engineers do not need to manually configure communication paths, reducing development complexity. -### Real-World Use in Autonomous Driving +## Real-world use in autonomous driving DDS is widely used in autonomous driving systems, where real-time data exchange is crucial. A typical use case involves high-frequency sensor data transmission and decision-making coordination between vehicle subsystems. @@ -53,7 +58,7 @@ For example, Autoware, an open-source autonomous driving software stack, uses DD The Perception stack publishes detected objects from LiDAR and camera sensors to a shared topic, which is then consumed by the Planning module in real-time. Using DDS allows each subsystem to scale independently while preserving low-latency and deterministic communication. -### Publish-Subscribe Model and Data Transmission +## Publish-subscribe model and data transmission Let’s explore how DDS’s publish-subscribe model fundamentally differs from traditional communication methods in terms of scalability, latency, and reliability. @@ -68,11 +73,11 @@ DDS supports multiple transport mechanisms to optimize communication efficiency: * UDP or TCP/IP is used for inter-device communication, such as V2X applications where vehicles exchange safety-critical messages. * Automatic participant discovery eliminates the need for manual configuration, allowing DDS nodes to detect and establish connections dynamically. -#### Comparison of DDS and Traditional Communication Methods +## Compare DDS with traditional communication models The following table highlights how DDS improves upon traditional client-server communication patterns in the context of real-time automotive applications: -| Feature | Traditional Client-Server Architecture | DDS Publish-Subscribe Model | +| Feature | Traditional client-server architecture | DDS publish-subscribe model | |----------------------|--------------------------------------------|--------------------------- | | Data Transmission | Relies on a central server | Direct peer-to-peer communication | | Latency | Higher latency | Low latency | @@ -82,9 +87,9 @@ The following table highlights how DDS improves upon traditional client-server c These features make DDS a highly adaptable solution for automotive software engineers seeking to develop scalable, real-time communication frameworks. -In this section, you learned how DDS enables low-latency, scalable, and fault-tolerant communication for autonomous vehicle systems. +DDS is a critical building block in distributed automotive systems. By enabling scalable, low-latency communication and fault tolerance, it powers real-time coordination in modern architectures like ROS 2, SOAFEE, and Autoware. + +Its data-centric publish-subscribe architecture removes the limitations of traditional client-server models and forms the backbone of modern automotive software frameworks like ROS 2 and SOAFEE. -Its data-centric publish-subscribe architecture eliminates the limitations of traditional client-server models and forms the backbone of modern automotive software frameworks such as ROS 2 and SOAFEE. -To get started with open-source DDS on Arm platforms, refer to this [installation guide for Cyclonedds](https://learn.arm.com/install-guides/cyclonedds) on how to install open-source DDS on an Arm platform. diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md b/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md index 18e44dc451..b4db0659d8 100644 --- a/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/3_container_spliting.md @@ -1,45 +1,41 @@ --- -title: Split into multiple cloud container instances -weight: 4 +title: Deploy OpenAD Kit across multiple cloud instances +weight: 8 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### System Architecture and Component Design +## Refactor OpenAD Kit for distributed deployment -Now that you’ve explored the concept of a Safety Island, a dedicated subsystem responsible for executing safety-critical control logic, and learned how DDS (Data Distribution Service) enables real-time, distributed communication, you’ll refactor the original OpenAD Kit architecture into a multi-instance deployment. +Now that you’ve explored the concept of a safety island, a dedicated subsystem responsible for executing safety-critical control logic, and learned how DDS (Data Distribution Service) enables real-time, distributed communication, you’ll refactor the original OpenAD Kit architecture into a multi-instance deployment. -In [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](http://learn.arm.com/learning-paths/automotive/openadkit1_container/), you deployed three container components on a single Arm-based instance, handling: -- Simulation environment +The predecessor Learning Path, [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](http://learn.arm.com/learning-paths/automotive/openadkit1_container/), showed how to deploying three container components on a single Arm-based instance, to handle: +- The simulation environment - Visualization -- Planning-Control +- Planning and control -In this session, you will split the simulation and visualization stack from the planning-control logic and deploy them across two independent Arm-based instances. +In this Learning Path, you will split the simulation and visualization stack from the planning-control logic and deploy them across two independent Arm-based instances. These nodes communicate using ROS 2 with DDS as the middleware layer, ensuring low-latency and fault-tolerant data exchange between components. -### Architectural Benefits +## Architectural Benefits This architecture brings several practical benefits: -- Enhanced System Stability: -Decoupling components prevents resource contention and ensures that safety-critical functions remain deterministic and responsive. +- **Enhanced System Stability**: decoupling components prevents resource contention and ensures that safety-critical functions remain deterministic and responsive. -- Real-Time, Scalable Communication: -DDS enables built-in peer discovery and configurable QoS, removing the need for a central broker or manual network setup. +- **Real-Time, Scalable Communication**: DDS enables built-in peer discovery and configurable QoS, removing the need for a central broker or manual network setup. -- Improved Scalability and Performance Tuning: -Each instance can be tuned based on its workload—for example, simulation tasks can use GPU-heavy hardware, while planning logic may benefit from CPU-optimized setups. +- **Improved Scalability and Performance Tuning**: each instance can be tuned based on its workload,for example, simulation tasks can use GPU-heavy hardware, while planning logic might benefit from CPU-optimized setups. -- Support for Modular CI/CD Workflows: -With containerized separation, you can build, test, and deploy each module independently—enabling agile development and faster iteration cycles. +- **Support for Modular CI/CD Workflows**: with containerized separation, you can build, test, and deploy each module independently, which enables agile development and faster iteration cycles. -![img1 alt-text#center](aws_example.jpg "Figure 1: Split instance example in AWS") +![img1 alt-text#center](aws_example.jpg "Split instance example in AWS") -### Networking Setting +## Configure networking for DDS communication -To begin, launch two Arm-based VM instances. AWS EC2 is used, but you can use any Arm instances. +To begin, launch two Arm-based VM instances. AWS EC2 is used, but you can use any Arm-based instances. These instances will independently host your simulation and control workloads. @@ -58,19 +54,19 @@ Within the EC2 Security Group settings: - Add an inbound rule that allows all traffic from the same Security Group by setting the source to the security group itself. - Outbound traffic is typically allowed by default and usually does not require changes. -![img2 alt-text#center](security_group.jpg "Figure 2: AWS Security Group Setting") +![img2 alt-text#center](security_group.jpg "AWS Security Group Setting") This configuration allows automatic discovery and peer-to-peer communication between DDS participants across the two instances. Once both systems are operational, record the private IP addresses of each instance. You will need them when configuring CycloneDDS peer discovery in the next step. -### New Docker YAML Configuration Setting +## Update Docker and DDS configuration -Before you begin, ensure that Docker is installed on both of your development instances. Review the [Docker install guide](/install-guides/docker/docker-engine/) if needed. +Before you begin, ensure that Docker is installed on both of your development instances. Review the [Docker Install Guide](/install-guides/docker/docker-engine/) if needed. First, clone the demo repo and create xml file called `cycloneDDS.xml` -#### Step 1: Clone the repository and prepare configuration files +## Clone the repository and prepare configuration files ```bash git clone https://github.com/odincodeshen/openadkit_demo.autoware.git @@ -95,20 +91,20 @@ export COMMON_FILE=/home/ubuntu/openadkit_demo.autoware/docker/etc/simulation/co docker compose -f docker-compose.yml pull ``` -{{% notice info %}} -Each image is around 4–6 GB, so pulling them may vary depending on your network speed. +{{% notice Note %}} +Each image is around 4–6 GB, so the time to download them can vary depending on your network speed. {{% /notice %}} -This command will download all images defined in the docker-compose-2ins.yml file, including: +This command downloads all images defined in the docker-compose-2ins.yml file, including: - odinlmshen/autoware-simulator:v1.0 - odinlmshen/autoware-planning-control:v1.0 - odinlmshen/autoware-visualizer:v1.0 -#### Step 2: Configure CycloneDDS for Peer-to-Peer Communication +## Configure CycloneDDS for peer-to-peer communication The cycloneDDS.xml file is used to customize how CycloneDDS (the middleware used by ROS 2) discovers and communicates between distributed nodes. -Please copy the following configuration into docker/cycloneDDS.xml on both machines, and replace the IP addresses with the private IPs of each EC2 instance. +Copy the following configuration into docker/cycloneDDS.xml on both machines, and replace the IP addresses with the private IPs of each EC2 instance: ```xml @@ -128,7 +124,8 @@ Please copy the following configuration into docker/cycloneDDS.xml on both machi 1000 auto - + + @@ -141,19 +138,19 @@ Please copy the following configuration into docker/cycloneDDS.xml on both machi ``` {{% notice Note %}} -1. Make sure the network interface name (ens5) matches the one on your EC2 instances. You can verify this using `ip -br a`. -2. This configuration disables multicast and enables static peer discovery between the two machines using unicast. -3. You can find the more detail about CycloneDDS setting [Configuration](https://cyclonedds.io/docs/cyclonedds/latest/config/config_file_reference.html#cyclonedds-domain-internal-socketreceivebuffersize) +- Make sure the network interface name (for example, `ens5`) matches the one used by your EC2 instances. You can verify the instance name by running `ip -br a`. +- This configuration disables multicast and enables static peer discovery between the two machines using unicast. +- For more information on CycloneDDS settings, see the [Cyclone DDS Configuration Guide](https://cyclonedds.io/docs/cyclonedds/latest/config/config_file_reference.html#cyclonedds-domain-internal-socketreceivebuffersize). {{% /notice %}} -#### Step 3: Update the Docker Compose Configuration for Multi-Host Deployment +## Update the Docker Compose Configuration for Multi-Host Deployment To support running containers across two separate hosts, you’ll need to modify the docker/docker-compose-2ins.yml file. This includes removing inter-container dependencies and updating the network and environment configuration. -##### Remove Cross-Container Dependency +## Remove cross-container dependency -Since the planning-control and simulator containers will now run on different machines, you must remove any depends_on references between them to prevent Docker from attempting to start them on the same host. +Since the planning-control and simulator containers will run on separate machines, remove any `depends_on` references between them. This prevents Docker from treating them as interdependent services on a single host. ```YAML planning-control: @@ -162,9 +159,10 @@ Since the planning-control and simulator containers will now run on different ma # - simulator ``` -##### Enable Host Networking +## Enable host networking + +All three containers (visualizer, simulator, and planning-control) need access to the host’s network interfaces for DDS-based peer discovery. -All three containers (visualizer, simulator, planning-control) need access to the host’s network interfaces for DDS-based peer discovery. Replace Docker's default bridge network with host networking: ```YAML @@ -172,7 +170,7 @@ Replace Docker's default bridge network with host networking: network_mode: host ``` -##### Use CycloneDDS Configuration via Environment Variable +## Apply the CycloneDDS configuration using an environment variable To ensure that each container uses your custom DDS configuration, mount the current working directory and set the CYCLONEDDS_URI environment variable: @@ -258,7 +256,7 @@ services: Before moving to the next step, make sure that `docker-compose-2ins.yml` and `cycloneDDS.xml` are already present on both instances. -#### Step 4: Optimize Network Settings for DDS Communication +## Optimize Network Settings for DDS Communication In a distributed DDS setup, `high-frequency UDP traffic` between nodes may lead to `IP packet fragmentation` or `buffer overflows`, especially under load. These issues can degrade performance or cause unexpected system behavior. @@ -270,10 +268,10 @@ sudo sysctl net.ipv4.ipfrag_high_thresh=134217728 sudo sysctl -w net.core.rmem_max=2147483647 ``` -Explanation of Parameters -- `net.ipv4.ipfrag_time=3`: Reduces the timeout for holding incomplete IP fragments, helping free up memory more quickly. -- `net.ipv4.ipfrag_high_thresh=134217728`: Increases the memory threshold for IP fragment buffers to 128 MB, preventing early drops under high load. -- `net.core.rmem_max=2147483647`: Expands the maximum socket receive buffer size to support high-throughput DDS traffic. +## Explanation of parameters +- `net.ipv4.ipfrag_time=3`: reduces the timeout for holding incomplete IP fragments, helping free up memory more quickly. +- `net.ipv4.ipfrag_high_thresh=134217728`: increases the memory threshold for IP fragment buffers to 128 MB, preventing early drops under high load. +- `net.core.rmem_max=2147483647`: expands the maximum socket receive buffer size to support high-throughput DDS traffic. To ensure these settings persist after reboot, create a configuration file under /etc/sysctl.d/: @@ -295,18 +293,19 @@ Links to documentation: - [ROS2 documentation](https://docs.ros.org/en/humble/How-To-Guides/DDS-tuning.html#cyclone-dds-tuning) -#### Step 5: Verifying Cross-Instance DDS Communication with ROS 2 +## Verify DDS communication between instances using ROS 2 -To confirm that ROS 2 nodes can exchange messages across two separate EC2 instances using DDS, this test will walk you through a minimal publisher–subscriber setup using a custom topic. +To confirm that ROS 2 nodes can exchange messages across two separate EC2 instances using DDS, this test walks you through a minimal publisher-subscriber setup using a custom topic. -##### On Planning-Control Node (Publisher) +## On the planning-control node (publisher) On the first EC2 instance, you will publish a custom message to the /hello topic using ROS 2. This will simulate outbound DDS traffic from the planning-control container. -Set the required environment variables and launch the planning-control container. +Set the required environment variables and launch the planning-control container: ```bash +# Replace this path with the location where you cloned the repository, if different export SCRIPT_DIR=/home/ubuntu/openadkit_demo.autoware/docker export CONF_FILE=$SCRIPT_DIR/etc/simulation/config/fail_static_obstacle_avoidance.param.yaml export COMMON_FILE=$SCRIPT_DIR/etc/simulation/config/common.param.yaml @@ -317,7 +316,7 @@ export TIMEOUT=300 docker compose -f docker-compose-2ins.yml run --rm planning-control bash ``` -Once inside the container shell, activate the ROS 2 environment and start publishing to the /hello topic: +Once inside the container shell, activate the ROS 2 environment and start listening to the /hello topic: ```bash # Inside the container: @@ -326,7 +325,7 @@ ros2 topic list ros2 topic pub /hello std_msgs/String "data: Hello From Planning" --rate 1 ``` -##### On Simulator Node (Subscriber) side +## On the simulator node (subscriber) On the second EC2 instance, you will listen for the /hello topic using ros2 topic echo. This confirms that DDS communication from the planning node is received on the simulation node. @@ -363,6 +362,11 @@ ros2 topic echo /hello ``` In the simulator container, you should see repeated outputs like: + +{{% notice tip %}} +If the subscriber shows a continuous stream of `/hello` messages, DDS discovery and ROS 2 communication between the nodes is working as expected. +{{% /notice %}} + ``` root@ip-172-31-19-5:/autoware# ros2 topic echo /hello data: Hello From Planning diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/4_multiinstance_executing.md b/content/learning-paths/automotive/openadkit2_safetyisolation/4_multiinstance_executing.md index def332f4e9..142aa3e88f 100644 --- a/content/learning-paths/automotive/openadkit2_safetyisolation/4_multiinstance_executing.md +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/4_multiinstance_executing.md @@ -1,30 +1,29 @@ --- -title: Executing OpenAD Kit in a Distributed ROS 2 Instances +title: Run OpenAD Kit across distributed ROS 2 instances -weight: 5 +weight: 9 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### Demonstrating the Distributed OpenAD Kit in Action +## What you'll learn in this section -In this section, you’ll bring all the previous setup together and execute the full OpenAD Kit demo across two Arm-based instances. +The OpenAD Kit is an open-source reference design for autonomous driving workloads on Arm. It demonstrates how Autoware modules can run on scalable infrastructure, whether on a single machine or distributed across multiple compute nodes using ROS 2 and DDS. -OpenAD Kit is an open-source reference design for autonomous driving workloads on Arm. -It demonstrates how Autoware modules can be deployed on scalable infrastructure, whether on a single machine or split across multiple compute nodes. +In this section, you'll run the full OpenAD Kit demo across two Arm-based cloud instances using the setup from previous steps. -#### Preparing the Execution Scripts +## Set up launch scripts on both instances -This setup separates the simulation/visualization environment from the planning-control logic, allowing you to explore how ROS 2 nodes communicate over a distributed system using DDS (Data Distribution Service). +This setup separates the simulation and visualization environment from the planning and control logic, allowing you to explore how ROS 2 nodes communicate over a distributed system using DDS (Data Distribution Service). -To start the system, you need to configure and run separate launch commands on each machine. +To start the system, run separate launch scripts on each machine: On each instance, copy the appropriate launch script into the `openadkit_demo.autoware/docker` directory. {{< tabpane code=true >}} {{< tab header="Planning-Control" language="bash">}} - !/bin/bash + #!/bin/bash # Configure the environment variables export SCRIPT_DIR=/home/ubuntu/openadkit_demo.autoware/docker export CONF_FILE_PASS=$SCRIPT_DIR/etc/simulation/config/pass_static_obstacle_avoidance.param.yaml @@ -40,7 +39,7 @@ On each instance, copy the appropriate launch script into the `openadkit_demo.au TIMEOUT=120 CONF_FILE=$CONF_FILE_PASS docker compose -f "$SCRIPT_DIR/docker-compose-2ins.yml" up planning-control -d {{< /tab >}} - {{< tab header="Visualizer & Simulator" language="bash">}} + {{< tab header="Visualizer and simulator" language="bash">}} #!/bin/bash export SCRIPT_DIR=/home/ubuntu/openadkit_demo.autoware/docker @@ -67,11 +66,11 @@ On each instance, copy the appropriate launch script into the `openadkit_demo.au {{< /tab >}} {{< /tabpane >}} -You can also find the prepared launch scripts `opad_planning.sh` and `opad_sim_vis.sh` inside the `openadkit_demo.autoware/docker` directory on both instances. +You can also find these scripts `opad_planning.sh` and `opad_sim_vis.sh` inside the `openadkit_demo.autoware/docker` directory on both instances. These scripts encapsulate the required environment variables and container commands for each role. -#### Running the Distributed OpenAD Kit Demo +## Run the distributed OpenAD Kit demo On the Planning-Control node, execute: @@ -85,18 +84,16 @@ On the Simulation and Visualization node, execute: ./opad_sim_vis.sh ``` +Once both machines are running their launch scripts, the Visualizer container exposes a web-accessible interface at: http://6080/vnc.html. -Once both machines are running their respective launch scripts, the Visualizer will generate a web-accessible interface on: - -http://[Visualizer public IP address]:6080/vnc.html +Open this link in your browser to observe the simulation in real time. The demo closely resembles the output in the [previous Learning Path, Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](http://learn.arm.com/learning-paths/automotive/openadkit1_container/4_run_openadkit/). -You can open this link in a browser to observe the demo behavior, which will closely resemble the output from the [previous learning path](http://learn.arm.com/learning-paths/automotive/openadkit1_container/4_run_openadkit/). +![Distributed OpenAD Kit simulation running on two Arm-based instances with visualizer and simulator coordination over DDS alt-text#center](split_aws_run.gif "Visualizer output from a distributed OpenAD Kit simulation showing ROS 2 modules running across two cloud instances using DDS communication.") -![img3 alt-text#center](split_aws_run.gif "Figure 4: Simulation") +You’ve now run the OpenAD Kit across two nodes with separated control and visualization roles. DDS enabled real-time, peer-to-peer communication between the ROS 2 nodes, supporting synchronized behavior across the planning and simulation components deployed on two separate instances. -The containers are now distributed across two separate instances, enabling real-time, cross-node communication. -Behind the scenes, this architecture demonstrates how DDS manages low-latency, peer-to-peer data exchange in a distributed ROS 2 environment. +The containers are now distributed across two separate instances, enabling real-time, cross-node communication. Behind the scenes, this architecture demonstrates how DDS manages low-latency, peer-to-peer data exchange in a distributed ROS 2 environment. -To support demonstration and validation, the simulator is configured to run three times sequentially, giving you multiple opportunities to observe how data flows between nodes and verify that communication remains stable across each cycle. +The simulator runs three times by default, giving you multiple chances to observe data flow and verify stable communication between nodes. -Now that you’ve seen the distributed system in action, consider exploring different QoS settings, network conditions, or even adding a third node to expand the architecture further. +Now that you’ve seen the distributed system in action, try modifying QoS settings, simulating network conditions, or scaling to a third node to explore more complex configurations. diff --git a/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md b/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md index 4319e4824a..55245d22ff 100644 --- a/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md +++ b/content/learning-paths/automotive/openadkit2_safetyisolation/_index.md @@ -1,23 +1,19 @@ --- -title: Prototyping Safety-Critical Isolation for Autonomous Application on Neoverse - -draft: true -cascade: - draft: true +title: Prototype safety-critical isolation for autonomous driving systems on Neoverse minutes_to_complete: 60 -who_is_this_for: This Learning Path targets advanced automotive software engineers developing safety-critical systems. It demonstrates how to use Arm Neoverse cloud infrastructure to accelerate ISO-26262 compliant software prototyping and testing workflows. +who_is_this_for: This Learning Path is for automotive engineers developing safety-critical systems. You'll learn how to accelerate ISO 26262-compliant development workflows using Arm-based cloud compute, containerized simulation, and DDS-based communication. learning_objectives: - - Learn the Functional Safety principles—including risk prevention, fault detection, and ASIL compliance—to design robust and certifiable automotive software systems. - - Understand how DDS enables low-latency, scalable, and fault-tolerant data communication for autonomous driving systems using a publish-subscribe architecture. - - Distributed Development for Functional Safety. Learn how to split the simulation platform into two independent units and leverage distributed development architecture to ensure functional safety. + - Apply functional safety principles, including risk prevention, fault detection, and ASIL compliance, to build robust, certifiable automotive systems + - Use DDS and a publish-subscribe architecture for low-latency, scalable, and fault-tolerant communication in autonomous driving systems + - Implement distributed development by separating the simulation platform into independent, safety-isolated components prerequisites: - - Two Arm-based Neoverse cloud instances or a local Arm Neoverse Linux computer with at least 16 CPUs and 32GB of RAM. - - To have completed [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](/learning-paths/automotive/openadkit1_container/). - - Basic knowledge of using Docker. + - Access to two Arm-based Neoverse cloud instances, or a local Arm Neoverse Linux system with at least 16 CPUs and 32 GB of RAM + - Completion of the [Deploy Open AD Kit containerized autonomous driving simulation on Arm Neoverse](/learning-paths/automotive/openadkit1_container/) Learning Path + - Basic familiarity with Docker author: - Odin Shen @@ -32,6 +28,7 @@ tools_software_languages: - Python - Docker - ROS2 + - DDS operatingsystems: - Linux