# Safety-critical Control in Mixed Criticality Embedded Systems

Master Thesis Royal Institute of Technology Stockholm, Sweden

 $\begin{array}{c} {\rm Emil\ Hjelm} \\ emilhje@kth.se \end{array}$ 

February 14, 2017



#### Master Thesis MMK2017:Z MDAZZZ

#### Safety-critical Control in Mixed Criticality Embedded Systems

#### Emil Hjelm

| Approved: | Examiner:       | Supervisor:    |
|-----------|-----------------|----------------|
| (datum)   | Martin Törngren | Bengt Eriksson |
|           | Uppdragsgivare: | Kontaktperson: |
|           | Alten           | Detlef Scholle |

#### Abstract

Modern automotive systems contain a large number of Electronic Control Units, each controlling a specific system of a specific criticality level. To increase computational efficiency it is desired to combine multiple applications into fewer ECUs, this leads to mixed criticality embedded systems. The assurance of safety critical applications not being affected by non-critical applications on the same system is crucial.



#### Examensarbete MMK2017:Z MDAZZZ

# Säkerhetskritisk kontroll i blandkritiska inbyggda system

#### Emil Hjelm

| Godkänt: | Examinator:     | Handledare:    |
|----------|-----------------|----------------|
| (datum)  | Martin Törngren | Bengt Eriksson |
|          | Uppdragsgivare: | Kontaktperson: |
|          | Alten           | Detlef Scholle |

### Sammanfattning

Denna del kommer att inehålla en sammanfattning av arbetet på svenska.

## Preface

Credit where credit is due.

Emil Hjelm Stockholm

## Contents

| P            | refac | e iii                                                                                                                                                        |
|--------------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $\mathbf{A}$ | bbre  | viations                                                                                                                                                     |
| 1            | Intr  | roduction 1                                                                                                                                                  |
|              | 1.1   | Background                                                                                                                                                   |
|              |       | 1.1.1 Definition of safety-critical systems                                                                                                                  |
|              |       | 1.1.2 Different levels of criticality                                                                                                                        |
|              |       | 1.1.3 EMC <sup>2</sup> development board $\dots \dots \dots$ |
|              |       | 1.1.4 Platooning                                                                                                                                             |
|              | 1.2   | Problem statement                                                                                                                                            |
|              | 1.3   | Purpose                                                                                                                                                      |
|              | 1.4   | Goals                                                                                                                                                        |
|              |       | 1.4.1 Team goal                                                                                                                                              |
|              |       | 1.4.2 Individual goal                                                                                                                                        |
|              |       | 1.4.3 Scope                                                                                                                                                  |
|              | 1.5   | Research design                                                                                                                                              |
|              | 1.6   | Ethical considerations                                                                                                                                       |
| 2 St         | Sta   | te of the art                                                                                                                                                |
|              | 2.1   | Mixed criticality systems                                                                                                                                    |
|              |       | 2.1.1 Economical benefits of MCS                                                                                                                             |
|              |       | 2.1.2 Sharing processor                                                                                                                                      |
|              |       | 2.1.3 Different criticality on different processors 8                                                                                                        |
|              |       | 2.1.4 Sharing memory                                                                                                                                         |
|              | 2.2   | Current system                                                                                                                                               |
|              |       | 2.2.1 TrustZone                                                                                                                                              |
|              | 2.3   | Standards                                                                                                                                                    |
|              |       | 2.3.1 AUTOSAR                                                                                                                                                |
|              |       | 2.3.2 ASIL 10                                                                                                                                                |

| 3 | System design                        | 11 |
|---|--------------------------------------|----|
| 4 | Implementation                       | 12 |
| 5 | Results                              | 13 |
| 6 | Discussion                           | 14 |
| 7 | Future work 7.1 Using virtualization |    |

# List of Figures

| 2.1 | Flowchart of the boot sequence of the CPU. [16] | 8 |
|-----|-------------------------------------------------|---|
| 2.2 | System overview of the MCS in place. [16]       | 9 |

## List of Tables

### Abbreviations

#### Abbreviation Description ECU Electronic Control Unit MCS Mixed Criticality System $EMC^2$ Embedded Multi-Core systems for Mixed Criticality applications in dynamic and changeable real-time environments RTOS Real-Time Operating System GPOS General Purpose Operating System FPGA Field Programmable Gate Array SILSafety Integrity Level ASIL Automotive Safety Integrity Level Development Assurance Level DALVMMVirtual Machine Monitor $\mathrm{EMC^2DP}$ EMC<sup>2</sup> Development Platform Rate Monotonic RM

### Introduction

This chapter will introduce the subject of mixed criticality embedded systems and the EU project "EMC<sup>2</sup>" to the reader.

### 1.1 Background

Today, modern automotive systems contain around 70-100 Electric Control Units (ECU)s, around 100 electric motors and 2 km of wiring [10]. Each ECU controls a subsystem of a specific criticality level such as safety-critical anti-lock brake system, or non-critical entertainment systems [13]. Having the ECUs isolated ensures that the numerous critical and non-critical applications do not interfere with each other, thus it is a simple task to certify an individual ECU. However, this approach leads to an inefficient use of system resources and expensive system implementation [3]. In order to lower the cost of the collective system and increase system efficiency (utilization), applications of different criticality levels can be integrated into a single multicore platform, leading to a Mixed Criticality System (MCS). However, this approach increases system complexity, and hinders the certification of safety-critical systems [16]. In order to facilitate the design, test, and certification of such systems, spatial and temporal partitioning can be used in the architecture of the system as described by [16].

Protecting the integrity of a component from the faults of another is desired in all systems hosting multiple applications. However, it is of higher significance if the different applications have different criticality levels. Without such protection all components on the same system would need to be engineered to the standards of the highest criticality level, potentially massively increasing development costs [3].

The EU project "Embedded Multi-Core systems for Mixed Criticality applications in dynamic and changeable real-time environments" (EMC<sup>2</sup>) was founded in order to "find solutions for dynamic adaptability in open systems, provide handling of mixed criticality applications under real-time conditions, scalability and utmost flexibility, full scale deployment and management of integrated tool chains, through the entire lifecycle" [13].

#### 1.1.1 Definition of safety-critical systems

The term "safety-critical system" has many definitions, most quite similar. Most definitions relate to systems with the potential to harm humans if the system malfunctions. According to [7] it is defined as "A system in which any failure or design error has the potential to lead to loss of life." Further, [5] defines safety-critical systems as "A computer, electronic or electromechanical system whose failure may cause injury or death to human beings." A Wikipedia article, [15], defines a safety-critical system") as a system whose failure or malfunction may result in one (or more) of the following outcomes:

- death or serious injury to people
- loss or severe damage to equipment/property
- environmental harm

In this thesis, a safety-critical system will be defined as "a system whose failure may cause injury or death to human beings."

#### 1.1.2 Different levels of criticality

Different names of levels of criticality are typically Safety Integrity Level (SIL), Automotive Safety Integrity Level (ASIL) and Development Assurance Level (DAL). The IEC 61508 standard [8] defines four different levels and the ISO 26262 standard [9] and the DO-178C standard [6] define five different levels each. These levels range from low to no hazard up to life-threatening or fatal in the event of a malfunction requiring the highest level of assurance that the dependent safety goals are sufficient and have been achieved.

In this report the number of criticality levels will be restricted to two: "safety-critical" and "non-critical". This is due to the constraints presented in section 1.1.3 and 1.4.3.

#### 1.1.3 EMC<sup>2</sup> development board

As a part of the EMC<sup>2</sup> project, Alten has developed a system for handling applications of mixed criticality on the same piece of hardware. The MCS developed at Alten is implemented on a Xilinx Zynq-7000. The development board is called EMC<sup>2</sup> Development Board, or EMC<sup>2</sup>DB. It employs two operating systems to handle applications of different criticality. A General Purpose Operating System (GPOS) for non-critical applications and a Real-Time Operating System (RTOS) for safety-critical applications. A Virtual Machine Monitor (VMM) is used to alternate between the two.

Peripherals connected to the board are separated between safety-critical and non-critical via ARM TrustZone [1].

The board also has a Field Programmable Gate Array (FPGA).

For more detailed information, see the report by [16].

#### 1.1.4 Platooning

"The platooning concept can be defined as a collection of vehicles that travel together, actively coordinated in formation. Some expected advantages of platooning include increased fuel and traffic efficiency, safety and driver comfort" [2].

#### 1.2 Problem statement

An ideal MCS ensures partitioning between different criticality levels while still sharing resources efficiently.

The MCS developed at Alten (EMC<sup>2</sup>DP) 1.1.3 switches Operative System (OS) to enable partitioning between safety-critical and non-critical applications, which takes about  $2\mu s$ . This mode switch introduces additional deadlines which makes processor scheduling more difficult.

To evaluate the performance of the system, a distance keeping control algorithm for platooning will be implemented on it. A demonstrator will be constructed in the form of a RC car capable of following a vehicle in front of it at a specified distance. If the lead car exceeds a predefined maximum speed or deviates from the road, the following car should not exceed the maximum speed. It should be verified that no matter the computational load and eventual crashes of the Linux based non-critical system, the distance keeping algorithm on the RTOS never crashes. It should also be investigated at how high frequencies the control algorithm can operate while still maintaining

functionality on the GPOS.

This problem leads to the research question:

• How well can a safety-critical control system perform when implemented on a mixed criticality system using virtualization?

#### alternatively:

• "How, in a disciplined way, to reconcile the conflicting requirements of partitioning for safety assurance and sharing for efficient resource usage?" [3]

#### alternatively:

• Is virtualization an efficient approach when trying to reconcile the conflicting requirements of partitioning for safety assurance and sharing for efficient resource usage when implementing a safety-critical control system?

### 1.3 Purpose

Reducing the amount of computers in automotive systems would have many effects. Manufacturing costs would decrease and with fewer physical components maintenance costs would also decrease. However, the system complexity would increase and thereby increasing time and cost to design the system. To combat this the EMC<sup>2</sup> project aims at creating platforms for easier development of MCS.

The EMC<sup>2</sup> project lists several goals [14]:

- Reduce the cost of the system design by 15%
- Reduce the effort and time required for re-validation and re-certification of systems after making changes by 15%
- Manage a complexity increase of 25% with 10% effort reduction
- Achieve cross-sectorial reusability of Embedded Systems devices and architecture platforms that will be developed using the ARTEMIS JU results.

#### 1.4 Goals

In this project there are both team goals and individual goals that do not always necessarily align with each other.

#### 1.4.1 Team goal

The team consists of five master thesis students. The students areas of work are: control theory and system modeling, data aggregation, safety-critical communication in MCS, lane detection and finally safety-critical control in MCS. Together the team will build a vehicle capable of following a vehicle ahead of it while keeping inside road markers.

#### 1.4.2 Individual goal

Verify quantitatively the performance of safety-critical distance keeping controller, see 1.5 Solve the problems described in 1.2.

#### 1.4.3 Scope

The work of this thesis and the implementation on the demonstrator will build upon the work of [16].

The embedded computer is constrained to the Xilinx Zynq-7000 SoC <sup>1</sup>.

The thesis is produced at Alten AB.

Scheduling on the processor will be restricted to Rate Monotonic (RM).

### 1.5 Research design

The plan is to make an confirmatory investigation using qualitative data/operations. This is a mixed methods approach where qualitative data is gathered during experiments and from simulations of the environment [4]. Gather data from simulations and from real tests, iterated many times.

The position of the demonstrator will be read by a separate sensor of the same type as the one on the demonstrator. The performance of the control system and the embedded controller will be measured and compared with the same system without any non-critical computational load. This will also be done for a simulation of the system. The measures regarding control system performance will consist of

- Response time
- Overshoot
- Settling time

<sup>1</sup>https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html

Datapoints for the performance of the embedded controller will be extracted from the RTOS, and the measures will consist of

- Missed deadlines
- CPU utilization

#### 1.6 Ethical considerations

When designing a MCS it is crucial to ensure that errors made by a lower criticality application cannot propagate to higher criticality applications. This could have catastrophic consequences. Because of this the requirement of partitioning must have higher priority than the need of sharing.

### State of the art

This chapter will go through relevant articles and already known knowledge on the subject of mixed criticality systems. It will also explain the EMC2DP.

### 2.1 Mixed criticality systems

A MCS is achieved by letting applications of different criticality share resources. These resources could be computational power in the CPU, memory, peripherals, input/output ports. As of July 2016, the most explored area is sharing the CPU between multiple criticality levels [3].

Potential benefits with pursuing MCS as opposed to distributed systems are reduced physical space required, reduced weight, reduced heat generation, reduced power consumption and reduced production costs [3].

#### 2.1.1 Economical benefits of MCS

#### 2.1.2 Sharing processor

Vestal, Burns.

1. (audsley) 2. EDF-VDL

For a more complete review of work done on MCS with a shared processor, see the paper by [3].

#### 2.1.3 Different criticality on different processors

#### 2.1.4 Sharing memory

### 2.2 Current system

The EMC2DP uses two operating systems, a RTOS for safety-critical applications and a GPOS for non-critical applications. A Virtual Machine Monitor (VMM) or "Hypervisor" is used to alternate between safety-critical RTOS and non-critical GPOS. The RTOS is TOPPERS FMP kernel [11], and the GPOS is a custom modified Linux distribution. The VMM used is SafeG [12]. It switches processor state via a hardware switch. See figure 2.1.



Figure 2.1: Flowchart of the boot sequence of the CPU. [16]

A Field Programmable Gate Array (FPGA) as interface between processor and board. Peripherals are separated as secure and non-secure using ARM TrustZone [1], see section 2.2.1.

An overview of the system can be seen in Figure 2.2.



Figure 2.2: System overview of the MCS in place. [16]

#### 2.2.1 TrustZone

"TrustZone is a security extension available in modern ARM processors that creates a security infrastructure designers can use to protect critical system assets. This infrastructure is achieved by enabling the partition of system components, both hardware and software, into either a Secure and a Normal

world (or zone). Resources that are marked as normal are not permitted to access Secure zone components. This mechanism is enforced by the AMBA3 (Advanced Microcontroller Bus Architecture) AXI (Advanced eXtensible Interface) bus system. It contains an extra control signal for each of the read and write channels (Non-secure or NS bits) that dictate the access rights Non-secure bus masters to the Secure slaves. Each processor with an enabled TrustZone security extension can be partitioned into a Normal and a Secure virtual CPU. The virtual processors execute in a time-multiplexed fashion, and use the "Monitor Mode" state to create an efficiently switching mechanism between Normal and Secure zones. The NS-bit, bit[0] of the Secure Configuration Register (SCR) in the System Control Coprocessor (CP15), controls the activation of secure state of the processor. Whenever the NS-bit set high, the processor state immediately switches to the Normal world. However, if the processor is in Monitor Mode, it remains in the Secure world regardless of the state of the SCR NS-bit. The processor state can nter monitor mode by either issuing a special instruction, SMC (Secure Monitor Call), in software or by hardware exception mechanisms such as IRQ, FIQ, external Data Abort, or external Prefetch Abort. In general, the software running in monitor mode serves the purpose of saving the state of the current world, and loading the state the other." [1]

- 2.3 Standards
- 2.3.1 AUTOSAR
- 2.3.2 ASIL

# System design

This chapter will derive the design of the controller to be implemented.

## Implementation

This chapter will describe the implementation of the control system in the demonstrator.

## Results

This chapter will present results from the demonstrator to the reader.

## Discussion

Discussion about the results produced by the thesis.

### Future work

This chapter will contain thoughts and ideas for future work building on this thesis.

### 7.1 Using virtualization

Facilitate for more than two different criticality levels. Examine different scheduling methods.

### 7.2 Using other means of partitioning

Examine limitations for other configurations of MCS, for example different CPUs for different criticality levels.

## **Bibliography**

- [1] ARM Ltd. ARM TrustZone, February 2017. https://www.arm.com/products/security-on-arm/trustzone.
- [2] Carl Bergenhem, Henrik Pettersson, Erik Coelingh, Cristofer Englund, Steven Shladover, and Sadayuki Tsugawa. Overview of platooning systems. In *Proceedings of the 19th ITS World Congress*, Vienna, Austria, October 2012.
- [3] Alan Burns and Robert I. Davis. Mixed criticality systems a review. Available online: https://www-users.cs.york.ac.uk/burns/review.pdf, July 2016.
- [4] John W. Creswell. Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications, Inc., 2009.
- [5] Dictionary.com. safety-critical system, January 2017. http://www.dictionary.com/browse/safety-critical-system.
- [6] Software considerations in airborne systems and equipment certification. Standard, Radio Technical Commission for Aeronautics, Washington, DC, USA, December 2011.
- [7] Encyclopedia.com. safety-critical system, January 2017. http://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/safety-critical-system.
- [8] Functional safety of electrical/electronic/programmable electronic safety-related systems parts 1 to 7. Standard, International Electrotechnical Commission, Geneva, CH, April 2010.
- [9] Road vehicles functional safety part 9: Automotive safety integrity level (asil)-oriented and safety-oriented analyses. Standard, International Organization for Standardization, Geneva, CH, November 2011.

- [10] H. Thompson. Mixed criticality systems, February 2012.
- [11] TOPPERS Project, Inc. Introduction to the TOPPERS/FMP kernel, February 2017. https://www.toppers.jp/en/fmp-kernel.html.
- [12] TOPPERS Project, Inc. TOPPERS SafeG, February 2017. https://www.toppers.jp/en/safeg.html.
- [13] W. Weber, A. Hoess, F. Oppenheimer, B. Koppenhöefer, B. Vissers, and B. Nordmoen. EMC2 a Platform Project on Embedded Microcontrollers in Applications of Mobility, Industry and the Internet of Things. In 2015 Euromicro Conference on Digital System Design, pages 125–130, August 2015.
- [14] Werner Weber. A Platform Project on Embedded Microcontrollers in Applications of Mobility, Industry and the Internet of Things, Mars 2015. https://artemis-ia.eu/publication/download/1131.pdf.
- [15] Wikipedia.com. Life-critical system, January 2017. https://en.wikipedia.org/wiki/Life-critical\_system.
- [16] Youssef Zaki. An embedded multi-core platform for mixed-criticality systems: Study and analysis of virtualization techniques. Master's thesis, KTH, School of Information and Communication Technology (ICT), 2016.