

### Universidade Estadual de Campinas Instituto de Computação



## Augusto Fernandes Ribas Queiroz

Secure code execution using PUF authentication

Execução segura de códigos utilizando PUF para autenticação

#### Augusto Fernandes Ribas Queiroz

### Secure code execution using PUF authentication

### Execução segura de códigos utilizando PUF para autenticação

Dissertação apresentada ao Instituto de Computação da Universidade Estadual de Campinas como parte dos requisitos para a obtenção do título de Mestre em Ciência da Computação.

Thesis presented to the Institute of Computing of the University of Campinas in partial fulfillment of the requirements for the degree of Master in Computer Science.

Supervisor/Orientador: Prof. Dr.Guido Costa Souza de Araújo Co-supervisor/Coorientador: Prof. Dr.Mário Lúcio Côrtes

Este exemplar corresponde à versão final da Dissertação defendida por Augusto Fernandes Ribas Queiroz e orientada pelo Prof. Dr.Guido Costa Souza de Araújo. Na versão final, esta página será substituída pela ficha catalográfica.

De acordo com o padrão da CCPG: "Quando se tratar de Teses e Dissertações financiadas por agências de fomento, os beneficiados deverão fazer referência ao apoio recebido e inserir esta informação na ficha catalográfica, além do nome da agência, o número do processo pelo qual recebeu o auxílio."

e

"caso a tese de doutorado seja feita em Cotutela, será necessário informar na ficha catalográfica o fato, a Universidade convenente, o país e o nome do orientador."



### Universidade Estadual de Campinas Instituto de Computação



### Augusto Fernandes Ribas Queiroz

### Secure code execution using PUF authentication

### Execução segura de códigos utilizando PUF para autenticação

#### Banca Examinadora:

- Prof. Dr. someone Universidade Estadual de Campinas
- Prof. Dr.Someonel Universidade Federal de Santa Catarina
- Profa. Dra. someone Universidade Estadual de Campinas

A ata da defesa com as respectivas assinaturas dos membros da banca encontra-se no processo de vida acadêmica do aluno.

Campinas, 28 de março de 2019

## Dedicatória

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus vitae iaculis erat. Aliquam tristique consectetur ante, quis commodo lacus egestas in. Nullam semper elit nec eros pretium

## Resumo

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus vitae iaculis erat. Aliquam tristique consectetur ante, quis commodo lacus egestas in. Nullam semper elit nec eros pretium dapibus. Ut eget porta metus. Mauris rhoncus vel magna non faucibus. Ut a ornare elit. Morbi sagittis quam nec risus laoreet, tincidunt volutpat ex venenatis. Sed ultrices felis quis felis scelerisque gravida a non neque. Etiam sed nisi neque. Ut lobortis pulvinar facilisis.

## Abstract

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus vitae iaculis erat. Aliquam tristique consectetur ante, quis commodo lacus egestas in. Nullam semper elit nec eros pretium dapibus. Ut eget porta metus. Mauris rhoncus vel magna non faucibus. Ut a ornare elit. Morbi sagittis quam nec risus laoreet, tincidunt volutpat ex venenatis. Sed ultrices felis quis felis scelerisque gravida a non neque. Etiam sed nisi neque. Ut lobortis pulvinar facilisis.

# List of Figures

| 4.1 | A system overview of the CSHIA system                                                | 22 |
|-----|--------------------------------------------------------------------------------------|----|
| 4.2 | The PTAG-GEN during PTAG Generation (write) and PTAG Verification (read) operations  | 22 |
| 5.1 | The prototype architecture                                                           | 25 |
|     | GRMON Interface                                                                      |    |
| 6.3 | SPARC assembly code where the constant $0x67452301$ is assigned to a global register | 29 |

## List of Tables

| 3.1 S | Summary | of Related | Works in | comparison | with | CSHIA |  |  |  | 18 |
|-------|---------|------------|----------|------------|------|-------|--|--|--|----|
|-------|---------|------------|----------|------------|------|-------|--|--|--|----|

## Contents

| 1 | Introduction                             | 11 |
|---|------------------------------------------|----|
|   | 1.1 Contributions                        | 12 |
|   | 1.2 Publications                         | 13 |
|   | 1.3 Organization of the dissertation     | 13 |
| 2 | Fundamental concepts                     | 14 |
|   | 2.1 Physical Unclonable Functions - PUFs | 15 |
|   | 2.2 Security Properties                  | 15 |
|   | 2.2.1 Authenticity                       | 15 |
|   | 2.2.2 Integrity                          | 15 |
|   | 2.2.3 Secrecy                            | 16 |
| 3 | Related Work                             | 17 |
| 4 | CSHIA Architecture                       | 21 |
|   | 4.0.1 PTAG-GEN Operation                 | 22 |
| 5 | CSHIA Prototype                          | 24 |
|   | 5.0.1 Prototype Configuration            | 25 |
| 6 | CSHIA Evaluation                         | 26 |
|   | 6.0.1 GRMON Debug Monitor                | 27 |
|   | 6.0.2 Performance Analysis               | 28 |
|   | 6.0.3 Fault Injection Attack             | 28 |
| 7 | Conclusion and Future Work               | 30 |
|   | 7.1 Conclusion                           | 30 |
|   | 7.2 Future Work                          | 30 |

## Introduction

The demand for code/data integrity and authenticity has steadily increased. The wide spectrum of known attacks currently poses a threat to a variety of embedded systems that need constant protection against tampering. A particular class of embedded systems which must resist many forms of tampering comprises systems equipped with a large external non-volatile memory to store software and data, such as voting machines, smart metering devices and employee attendance control systems. These systems need to provide integrity and authenticity guarantees, but usually not secrecy or confidentiality, in order to be easily audited by government authorities and independent experts.

Due to the stringent nature of available resources of embedded systems, software solutions for code and data integrity do not fit best. In addition, software authenticity would involve a third party certification authority. Therefore, hardware solutions are desirable for such systems. A myriad of hardware solutions for code and data authenticity and integrity have been proposed in the literature ([?,?,?,?]), however, some of those solutions target high-end embedded systems or more powerful configurations, requiring at least a two-level cache in the processor for their performance overhead not to be prohibitive. Other approaches need modifications on the Instruction Set Architecture (ISA) or processor datapath, leading to complete redesign of code, compilers, operating systems, among others. Moreover, not all solutions provide integrity and authenticity.

Recently, an architecture aiming at code/data authenticity and integrity was proposed in [?]. The Computer Security by Hardware-Intrinsic Authentication (CSHIA) provides authenticity by authenticating all memory blocks of the external memory using a unique key extracted from Physical Unclonable Functions (PUFs) implemented in each instance. The authentication tags (called PTAGs) are computed during an enrollment procedure and later verified or updated on runtime for each memory block brought to the processor. The main advantages of CSHIAover the previous hardware solutions are that it does not require changes in the ISA or datapath, being adaptable to most of embedded system architectures while providing complete software compatibility, and also using a separate bus for the tag memory, which gives to designers freedom to match timing requirements to hide verification overhead.

Basing on Gaisler's Leon3 [?] FPGA implementation, this work presents a proof-of-concept of CSHIA. The main goal of our implementation was to improve the original version of the architecture and add more flexible design choices. Besides presenting an in depth description of the integration between the architecture and a real processor.

### 1.1 Contributions

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus vitae iaculis erat. Aliquam tristique consectetur ante, quis commodo lacus egestas in. Nullam semper elit nec eros pretium dapibus. Ut eget porta metus. Mauris rhoncus vel magna non faucibus. Ut a ornare elit. Morbi sagittis quam nec risus laoreet, tincidunt volutpat ex venenatis. Sed ultrices felis quis felis scelerisque gravida a non neque. Etiam sed nisi neque. Ut lobortis pulvinar facilisis.

### 1.2 Publications

The contribuitions of this work were published in the following conferences

- Pub0 Publication 1 description 1
- Pub1 Publication 2 description 2

## 1.3 Organization of the dissertation

This work is organized as follows. A review of the related work is presented in the chapter 3. Chapter 2 introduces the necessary concepts needed for this work. The Architecture is described in chapter 4. Chapter 5 details the protoype and all implementation requirements. The evaluation of the protoype is presented in chapter 6 ans Chapter 7 concludes this work.

## Fundamental concepts

### 2.1 Physical Unclonable Functions - PUFs

Recently, PUFs have been employed to generated secret keys. PUFs are physical functions created to mimic random functions. Their inputs, called challenges, and outputs, called responses, are designed to have a unique relation for every PUF instance. This is achieved by leveraging on imperfections resulted from fabricating electronic devices. In regard to authenticity, the main advantage of using PUFs as key generators is that they can produce keys on running time, on-chip memories are not needed for key storage, and they are unclonable. That means that even the manufacturer itself cannot produce two PUF instances that will have the same the set of Challenge-Response Pairs (CRPs) [?].

## 2.2 Security Properties

In order to counter the attacks discussed above, a system designer can employ mechanisms implementing three security properties: authenticity, integrity, and secrecy. Although these features can be implemented through software, the stringent nature of embedded systems demands solutions that consume few clock cycles and are not power consuming. In the following, we discuss hardware implementation of those security features.

### 2.2.1 Authenticity

Suppose that an attacker wants to add his/her own code for execution in the embedded system or intends to move the data from one system instance to another. These attacks can be avoided by employing authentication mechanisms. In this solution, a key (or unique set of keys) is determined for each instance. Code and/or data are tagged using these keys during manufacturing (an enrollment phase). On running time, this key (or set of keys) is used to regenerate tags. Only a correct key value will be able to verify what was installed during manufacture. Therefore, an instance will not accept code or data that was not tagged using its own keys.

Before the introduction of electronic Physical Unclonable Functions (PUFs) [?], these keys had to be inserted into systems before they were made available to customers. To do so, keys were stored on chip using non-volatile memories and the manufacturer/vendor controlled the uniqueness of the keys in each instance. The main downsides of storing key permanently include: facilitating physical attacks [?], and possibly increasing costs of production since it may demand integration of different technologies on the same chip.

## 2.2.2 Integrity

Similarly to authentication, integrity is ensured by tagging code and data with additional information such as memory address location and/or timestamps in general. This prevents an attacker from tampering with a system by, for instance, moving instructions from their location in memory, setting different initial values of variables, etc. The level of integrity can be done for an entire program, or memory pages, or memory blocks. That depends on the choice of designers.

Integrity can also be considered at the instruction sequence level, which we refer as Control-Flow Integrity (CFI). Hardware solutions for control-flow integrity usually require deep integration between hardware and software [?], that can result not only in changing the Instruction Set Architecture (ISA) and/or the tool-chain, but also the processor's data path, as proposed in [?,?]. Even though the CFI protection is welcomed, due to the focused nature of embedded systems, many applications cannot afford the performance penalties and storage overhead inherently of this solution. For instance, in applications where user inputs is limited and I/O involves fixed amounts of data, an attacker has very little room to employ a buffer overflow or similar attacks prevented by CFI. However, integrity verification regarding blocks of code and data (as mentioned above) can avoid a variety of situations that go beyond runtime attacks. For example, if an embedded system is unwatched, an attacker can upload a malicious code or modify the data in the external memory even if the system is not running. Integrity verification can prevent and indicate these violations before they reach the processor.

#### 2.2.3 Secrecy

An embedded system can also use encryption to prevent exposure of code and/or data stored in the external memory. Consequently, the processor can process these instructions and data only after decryption. Therefore, the major drawback on using encryption is the performance overhead that highly depends on which cryptographic primitive is employed. In addition, secrecy only prevents that an attacker obtains the information, if it is not combined with a unique key or integrity verification, the system will be vulnerable to execute code of different system instances and/or to suffer relocation and replay attacks.

## Related Work

Qualitative analyses of PUFs have already been done in the literature [?] motivated by several applications such as cryptographic key generation [?,?] and true random number generation [?,?]. Unlike those works, which aim at evaluating the quality of a standalone PUF-inspired mechanism, this work focus on proposing and analyzing a PUF-based microarchitecture mechanism to enable secure code execution.

Most of the preliminary work on secure code execution aimed at keeping instructions and data secure from scrutiny, by using mechanisms like bus encryption. In [?], Elbaz et al. performed a comprehensive survey of bus encryption, where they describe many possible ways of using cryptographic algorithms in SoC architectures, so as to ensure that no malicious instruction/data would be executed by the CPU. The major shortcoming of these solutions is the usage of on-chip secret key storage in non-volatile memories which enable off-line key recovery attacks [?].

AEGIS, the secure processor proposed by Suh et al. in [?], employs PUFs as a cryptography primitive to uniquely authenticate code and data in order to prevent both software and physical attacks. They present a toolchain for developing secure software for their architecture which includes a secure operating system to manage different levels of memory protection. Although the presented toolchain does not require modifications in the processor architecture, it demands extensive changes in the SoC architecture, in addition to changes in the compiler and operating system. Moreover, AEGIS does not ensure full-time security from power-on to power-off; i.e. the system runs unprotected until the security kernel loads the system. In addition, physical attacks were neither evaluated nor simulated. Different circuits used in AEGIS, like PUFs and post-processing schemes for key extraction such as Fuzzy Extractors, have been successfully attacked with side-channel [?,?] and semi-invasive attacks [?]. While semi-invasive attacks are hard to repeal, side-channel attacks have few known countermeasures [?] that can be easily adopted.

In 2009, Vaslin *et al.* proposed a security approach for off-chip memory in embedded microprocessors [?]. Vaslin *et al.* used the One-Time-Pad (OTP) scheme to provide integrity and secrecy. Their architecture encrypts a timestamp, the memory address and a

Table 3.1: Summary of Related Works in comparison with CSHIA.

| Work Target Architec |                | Most Positive Feature            | Downside                          |  |
|----------------------|----------------|----------------------------------|-----------------------------------|--|
|                      | ture           |                                  |                                   |  |
| AEGIS                | High-End em-   | A complete solution              | Integration with standard prod-   |  |
|                      | bedded systems |                                  | ucts can be difficult due to mod- |  |
|                      | and above      |                                  | ification imposed to the whole    |  |
|                      |                |                                  | toolchain.                        |  |
| [?]                  | Embedded Sys-  | Uses AES in OTP mode com-        | High area overhead in a           |  |
|                      | tems           | bined with CRC32 to provide in-  | FPGA implementation.              |  |
|                      |                | tegrity with low on-chip memory  |                                   |  |
|                      |                | overhead.                        |                                   |  |
| [?]                  | Embedded Sys-  | Security is based on public-key  | No performance evaluation.        |  |
|                      | $_{ m tems}$   | cryptography.                    |                                   |  |
| [?]                  | MPSoC          | First PUF based secure architec- | Does not estimate area and power  |  |
|                      |                | ture for multiple cores.         | increment in regard to the base-  |  |
|                      |                |                                  | line system.                      |  |
| CSHIA                | Embedded Sys-  | Design Flexibility.              | Does not provide concrete esti-   |  |
| -                    | tems           |                                  | mative of area and power.         |  |

padding value using AES. Then, this encrypted content is combined with the cache line. Because they used memory address and timestamp, relocation and replay attacks are thwarted. However, to inhibit spoofing attacks, memory blocks need tags and Vaslin et al. proposed using CRC32. One critical point is that their architecture not only needs an internal timestamp memory but also a CRC32 memory. That led to an internal memory of at least 18.8% of the size of main memory. Nonetheless, Vaslin et al.'s architecture was able to achieve a worst case performance impact of 10% in the tested benchmarks. However, the area overhead in the FPGA tested almost tripled.

Bobade and Mankar presented in [?] a secure architecture for embedded system. Their architecture provides integrity and secrecy through an Elliptic Curve Cryptographic engine. The main difference regarding the others architectures presented here is that they use the timestamps as private keys. Thus, cache lines are encapsulated with their address and time stamp (for integrity verification purpose), and then encrypted with the public key to be stored in external memory. As the timestamps are stored in an internal memory, the decryption can be done with reprocessing the pair private/public key and the integrity is ensured by the correct decryption of the triad encapsulated: data, address, and time stamp. Although Bobade and Mankar synthesized their architecture for a FPGA, they only simulated the architecture and did not use any benchmark. Nonetheless, they computed the overhead of slices and LUTs over their baseline processor, which was over 76%. Memory overhead was 25%. In addition, they estimated power increment over baseline. Despite the dynamic power more than doubled in all processor's frequency simulated, the static was kept stable.

Recently, Sepulveda, Wilgerodt, and Pehl in [?] has proposed a Multi-Processors System-on-Chip that provides memory integrity and authenticity through PUFs. The proposed architecture innovates by targeting multi-processors. They also used SipHash to provide integrity tags to memory blocks to protect against all three major threats we have discussed before. One key difference on their replay attack solution is that they use session tokens instead of timestamps. While that is an innovative way, it may not be sufficient to protect against replay attacks, since tokens are updated during idle periods and booting time. Thus, in a long period of execution, in which a specific memory block can be written back multiple times to memory, an attacker might mount a replay attack. One interesting point is that Sepulveda et al. argue that CSHIA needs deep modifications in SoC and CPU. However, we believed that this work demonstrates that only minor modification are needed and they are all transparent to the core and does not affect how it works. It is also important to notice that the authors used a similar Code-offset Fuzzy Extractor CSHIA had originally employed, which, as we discussed in the previous section, is less secure than the one we currently proposed in terms of entropy reduction of the key. Finally, they estimated area and power of the components of their architecture, and did performance evaluation which, by computing an average degradation, was 5.6% on the tested benchmarks.

Table 3.1 presents a summary of most positive feature and downside of CSHIA and related works. A fair comparison of performance among the works is quite hard to be performed, due to a variety of benchmarks, baseline cores, choice of platforms, etc. However, a qualitative analysis over design choices can still be done. For instance, PUFs have been

constantly claimed to be a better solution for key generation than storing on-chip key. In that regard, our implementation is more advantageous than those that did not use them. Moreover, we carefully analyzed major threats presented in the literature in order to propose a secure employment of a PUF-based key. Because embedded system applications can have a very specific nature, our concern since the beginning was to propose a flexible architecture, which is characterized by its additional bus for the PTAG Memory and the choice between timestamps or Merkle Tree as replay attack solution. Thus, although we were not able to precisely estimate power and area, we believe that we presented a solid solution for the security of embedded systems.

## CSHIA Architecture

CSHIA, illustrated in Figure 4.1, is a processor architecture which aims at providing secure code execution by means of PUF-based authentication of cache lines. The central idea behind CSHIA is a PUF-Tag (PTAG) Memory, which runs in parallel with the system main memory (Figure 4.1). Each entry in the PTAG Memory stores an authentication code of a cache line generated by a PUF-based device located on-chip.



Figure 4.1: A system overview of the CSHIA system.

In comparison to traditional architectures, CSHIA includes two main modifications: The Secure Engine (SEC-ENG), which includes the PTAG Generator (PTAG-GEN, Figure 4.2); and the Security-Cache (SEC-CACHE) that controls bus traffic between the processor and the Memory Controller (MCTRL). Other two new architectural components are also required to complete the CSHIA design, the PTAG Memory and the PTAG Bus. In a few words, when the processor requires/sends data/instructions to the MCTRL, the SEC-CACHE sends the related cache line to the SEC-ENG for computing/validating its PTAG. Notice from Figure 4.1 that the PTAG bus runs in parallel to the system buses, and thus no program can directly read the PTAG Memory, since neither the processor nor the MCTRL are aware about the SEC-CACHE.

### 4.0.1 PTAG-GEN Operation

The SEC-ENG controls the PTAG-GEN based on the information delivered by the SEC-CACHE. This information is generated from bus transactions (Memory READ, Memory WRITE and I/O) between the processor and the memory controller, and which the SEC-CACHE controls. Next, each PTAG-GEN action is explained in regard to bus transactions.



Figure 4.2: The PTAG-GEN during PTAG Generation (write) and PTAG Verification (read) operations.

#### PTAG Generation (memory write)

During a write operation, the SEC-CACHE passes data/instruction cache lines to the SEC-ENG and the PTAG-GEN computes PTAGs and stores it into the PTAG Memory. A Pseudorandom Function (PRF) [?] module is used to generate the PTAG and takes as input the concatenation (||) of the cache line bits and the base address of the cache line provided by the core (see Figure 4.2). In order to ensure uniqueness, the PRF is configured using a unique-per-device key. This key is produced by the intrinsic hardware features of a PUF. Such authentication tag is specific to the core running that specific cache line, as PUF outputs are dependent on the statistical variations of the manufacturing process, and are unique to each processor [?]. Hence identical cache lines running on different processors will produce different PTAG values for the same inputs. Notice that only code in the cache, for which integrity has been ensured, will be able to write to memory.

#### PTAG Verification (memory read)

During a read operation, the SEC-CACHE passes data/instruction cache lines to the SEC-ENG and the PTAG-GEN computes PTAGs for verification. As shown in Figure 4.2, during a read operation the cache line base address produced by the core is appended to the cache line contents read from memory and the result is fed to the PRF module. The PTAG produced this way is compared to the PTAG read from memory for equality. If the previously stored PTAG and the recently computed value do not match, a *Non-Maskable Interrupt* (NMI) is generated to the core (called PTAG-NMI), as code/data integrity may have been violated. As shown in Figure 4.1, in order to hide PUF latency, the data/instruction is sent to the respective cache (I\$ or D\$) at the same time that PTAG-GEN computes the PTAG for that cache line and compares it to its PTAG previously stored into the PTAG Memory.

#### Handling I/O

In modern computer systems, I/O operations store data directly into specific memory regions through DMA mechanism. Thus, it is not possible to trust such memory regions and CSHIA does not ensure their integrity and authenticity. Software should first perform authentication of I/O data in a higher abstraction layer and then copy it to secure areas where the CSHIA can ensure integrity and authenticity.

Chapter 5
CSHIA Prototype

Figure 5.1: The prototype architecture.

The prototype will be implemented upon a Leon 3 SPARC V8 platform from Aeroflex Gaisler [?] on an Altera DE2-115 Development Kit. To implement CSHIA's pre design (Figure 4.1), the design of two blocks are planned: the Security Engine (SEC-ENG) and the Security Cache (SEC-CACHE), to be inserted between the processor and the memory controller as shown in Figure 5.1. The SEC-CACHE will control bus transactions between the processor and the MCTRL, and provide data to the SEC-ENG. Consequently, the SEC-ENG will control the fuzzy extractor and the PTAG-GEN. To use CSHIA's fuzzy extractor a (127, 64, 10)-BCH code instance for error correction and an internal memory that emulates a SPUFwill be used. The PTAG-GEN will use a SipHash-2-4 for PTAG generation and verification.

### 5.0.1 Prototype Configuration

Since the processor may ask for an arbitrary number of words from memory, and the architecture needs to check for the integrity of a full memory block in every cache miss, a buffer in SEC-CACHEwill be used to hold isolated memory words required by the processor. When the processor demands a memory word, the buffer controller will request all the other words from the main memory to fit one PTAGblock. That will allow CSHIA to authenticate entire memory blocks and, with the proper memory size, can also speedup sequential requests of the processor. The buffer will be configurable to fit a variable PTAG block.

## **CSHIA** Evaluation

For fault injection and performance evaluation, we will use Gaisler's debugging interface GRMON which is a general debug monitor for the LEON processor, and for SOC designs based on the GRLIB IP library. GRMON includes the following functions:

- Read/write access to all system registers and memory
- Built-in dis-assembler and trace buffer management
- Downloading and execution of LEON applications
- Breakpoint and watchpoint management
- Remote connection to GNU debugger (GDB)
- Support for USB, JTAG, RS232, PCI, Ethernet and SpaceWire debug links

#### 6.0.1 GRMON Debug Monitor

The GRMON debug monitor is intended to debug system-on-chip (SOC) designs based on the LEON processor. The monitor connects to a dedicated debug interface on the target hardware, through which it can perform read and write cycles on the on-chip bus (AHB). The debug interface can be of various types: the LEON2 processor supports debugging over a serial UART and 32-bit PCI, while LEON3 also supports JTAG, ethernet and spacewire debug interfaces. On the target system, all debug interfaces are realized as AHB masters with the debug protocol implemented in hardware. There is thus no software support necessary to debug a LEON system, and a target system does in fact not even need to have a processor present.

GRMON can operate in two modes: command-line mode and GDB mode. In commandline mode, GRMON commands are entered manually through a terminal window. In GDB mode, GRMON acts as a GDB gateway and translates the GDB extended-remote protocol to debug commands on the target system. GRMON is implemented using three functional layers: command layer, debug driver layer, and debug interface layer. The command layer consist of a general command parser which implements commands that are independent of the used debug interface or target system. These commands include program downloading and flash programming. The debug driver layer implements custom commands which are related to the configuration of the target system. GRMON scans the target system at start up, and detects which IP cores are present and how they are con-figured. For each supported IP core, a debug driver is enabled which implements additional debug commands for the specific core. Such commands can consist of memory detection routines for memory controllers, or program debug commands for the LEON processors. The debug interface layer implements the debug link protocol for each supported debug interface. The protocol depends on which interface is used, but provides a uniform read/write interface to the upper layers. Which interface to use for a debug session is specified through command-line options during the start of GRMON.



Figure 6.1: GRMON Interface

### 6.0.2 Performance Analysis

For correctness, the CSHIA prototype will be evaluated using 8 benchmarks from the MiBench suite: basicmath; bitcount; crc32; dijkstra; fft; qsort; sha; stringsearch. Benchmark outputs will be compared to the outputs of reference execution of MiBench and also to those produced by the Leon baseline implementation. In addition, program performance will be compared among three architectures: (1) Unmodified Leon 3 (baseline); (2) CSHIA (unsecured), whose authentication and integrity verification will be disabled by making the bus traffic bypass the SEC-CACHE; and (3) CSHIA (secure), the architecture proposed herein. All architectures will use a 50-MHz clock, 16-KB L1 data and instruction caches with 256-bit cache lines.

As Leon 3 (baseline), and consequently both CSHIA (unsecured) and CSHIA (secure), do not support system calls, benchmarks that read files will be modified to obtain their data from hard-coded integer or string vectors. Large MiBench inputs will be used for all benchmarks.

## 6.0.3 Fault Injection Attack

Many attack scenarios proposed in [?] can be used to evaluate the CSHIAprototype, in most of them the attack is performed by inserting a modified memory block into the bus

or main memory. Thus, verifying how those insertions can happen and how they will affect the behavior of CSHIA is crucial. This section presents an execution of a fault injection attack planned to be executed in the real prototype.

```
[language=c, tabsize=1, numbers=right, numbersep=-5pt, firstline=126, lastline=135, basicstyle=|sha.c
```

Figure 6.2: C code for the sha initialization function.

```
[language=[x86masm]Assembler, tabsize=1, numbers=right, numbersep=-5pt, firstline=2055, last-line=2056, basicstyle=]sha.objdump
```

Figure 6.3: SPARC assembly code where the constant 0x67452301 is assigned to a global register.

The attack scenario is the following. An attacker wants to tamper with the message integrity scheme of a node from a sensor network. The Secure Hash Algorithm (SHA) is used in this network to create a Message Authentication Code (MAC) for incoming and outgoing messages. The goal of the attacker is to make a particular node reject all incoming messages with valid MACs and generate invalid MACs for outgoing messages that all other nodes will reject. This attack may be hard to detect since all nodes will be active and promptly responding, but the network is malfunctioning. For fault injection, we will use Gaisler's debugging interface, GRMON, to directly insert memory words into the AMBA bus. Faults will be inserted as bus memory write operations that can be executed during the runtime of a program, after a breakpoint. An attacker could easily achieve that by connecting a physical adapter to the external pins of the processor. As implementations of SHA are open, we assume the attacker knows all the source code and decides to tamper with the initialization process, in which constant values are assigned, as shown in Figure 6.2. The attacker then learns that line 3 from Figure 6.2 is compiled to the two instructions in Figure 6.3. Given that, he/she replaces the memory word 05 19 d1 48, a sethi instruction, by 05 11 11 11, which represents the substitution of constant 0x67452301 by 0x44444701. After applying the attack, all digest values will differ from those generated by an authentic implementation of SHA. This initial scenario will be used to evaluate the CSHIAprototype.

## Conclusion and Future Work

- 7.1 Conclusion
- 7.2 Future Work