Aalto University, School of Science ITMO University, Faculty of Information Security and Computer Technologies

Master's Programme in Computer, Communication and Information Sciences International double degree programme

Artem YUSHKOVSKIY

## Automated Analysis of Weak Memory Models

Master's Thesis Espoo, Finland & Saint Petersburg, Russia, ?.?.2018 **TODO** 

Supervisors: Assoc. Prof. Keijo Heljanko

Docent Igor I. Komarov



Aalto University, School of Science ITMO University, Faculty of Information Security and Computer Technologies

Master's Programme in Computer, Communication and Information Sciences
International double degree programme
ABSTRACT

|                                          | 0 1 0                                    |                 |  |
|------------------------------------------|------------------------------------------|-----------------|--|
| Author:                                  | Artem Yushkovskiy                        |                 |  |
| Title:                                   | :le:                                     |                 |  |
| Automated Analysis of Weak Memory Models |                                          |                 |  |
| Date:                                    | ?.?.2018 <b>TODO</b>                     | Pages: $v + 52$ |  |
| Supervisors:                             | Supervisors: Assoc. Prof. Keijo Heljanko |                 |  |
|                                          | Docent Igor I. Komarov                   |                 |  |
|                                          |                                          |                 |  |

In id fringilla velit. Maecenas sed ante sit amet nisi iaculis bibendum sed vel elit. Quisque eleifend lacus nec ipsum lobortis ornare. Nam lectus diam, facilisis eget porttitor ac, fringilla quis massa. Phasellus ac dolor sem, eget varius lacus. Sed sit amet ipsum eget arcu tristique aliquam. Integer aliquam velit sit amet odio tempus commodo. Quisque commodo lacus in leo sagittis vel dignissim quam vestibulum. Cras fringilla velit et diam dictum faucibus. Pellentesque at eros non mauris auctor euismod. Nullam convallis arcu vel lectus sollicitudin rutrum. Praesent consequat, nisl at pretium posuere, neque arcu dapibus lacus, ut sollicitudin elit velit ultricies libero. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Nulla semper hendrerit molestie. Pellentesque blandit velit sit amet est vestibulum faucibus. Nullam massa turpis, venenatis non mollis fringilla, mattis et diam. Fusce molestie convallis elementum. Morbi nec lacus dapibus arcu mollis gravida. Aliquam erat volutpat. Nam vitae magna nunc. Nunc ut ipsum at massa porttitor vestibulum. Praesent diam lorem, ultrices nec vestibulum id, volutpat nec lacus.

| <b>Keywords:</b> | TODO: Thesis template, master's thesis |  |
|------------------|----------------------------------------|--|
| Language:        | English                                |  |

## Acknowledgements

In id fringilla velit. Maecenas sed ante sit amet nisi iaculis bibendum sed vel elit. Quisque eleifend lacus nec ipsum lobortis ornare. Nam lectus diam, facilisis eget porttitor ac, fringilla quis massa. Phasellus ac dolor sem, eget varius lacus. Sed sit amet ipsum eget arcu tristique aliquam. Integer aliquam velit sit amet odio tempus commodo. Quisque commodo lacus in leo sagittis vel dignissim quam vestibulum. Cras fringilla velit et diam dictum faucibus. Pellentesque at eros non mauris auctor euismod. Nullam convallis arcu vel lectus sollicitudin rutrum. Praesent consequat, nisl at pretium posuere, neque arcu dapibus lacus, ut sollicitudin elit velit ultricies libero. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Saint Petersburg & Espoo ?.?.2018 **TODO** 

Artem Yushkovskiy

## **Abbreviations**

AST Abstract Syntax Tree

BDD Binary Decision Diagrams
BMC Bounded Model Checking

CF Control-Flow

CPU Central Processor Unit
CTL Computational Tree Logic

DF Data-Flow

DSA Dynamic Single-Assignment

DTO Data-Transfer Object LTL Linear Temporal Logic

OOP Object-Oriented Programming

SB Store Buffering

SC Sequential Consistency
SAT Satisfiability problem

SMT Satisfiability Modulo Theories problem

UMC Unbounded Model Checking UML Unified Modeling Language

WMM Weak Memory Model

NP?

## **Contents**

| Al | Abbreviations |                                                        |    |  |
|----|---------------|--------------------------------------------------------|----|--|
| 1  | Intr          | oduction                                               | 1  |  |
|    | 1.1           | Problem statement                                      | 1  |  |
|    | 1.2           | Related work                                           | 3  |  |
|    | 1.3           | Task specification                                     | 5  |  |
|    | 1.4           |                                                        | 6  |  |
| 2  | Mer           | nory model-aware analysis                              | 7  |  |
|    | 2.1           | The event-based program representation                 | 7  |  |
|    |               | 2.1.1 Events                                           | 8  |  |
|    |               | 2.1.2 Relations                                        | 9  |  |
|    |               | 2.1.3 Executions                                       | 10 |  |
|    | 2.2           | CAT language                                           | 11 |  |
| 3  | Port          | ability analysis as an SMT problem                     | 13 |  |
|    | 3.1           | Model checking and reachability analysis               | 13 |  |
|    | 3.2           | Portability analysis as a bounded reachability problem | 14 |  |
|    |               | 3.2.1 Encoding for the control-flow                    | 15 |  |
|    |               | 3.2.2 Encoding for the data-flow                       | 17 |  |
|    |               | 3.2.3 Encoding for the memory model                    | 19 |  |
| 4  | The           | PorthosC: implementation                               | 21 |  |
|    | 4.1           | General principles                                     | 21 |  |
|    | 4.2           | Architecture                                           | 23 |  |
|    |               | 4.2.1 Program input                                    | 25 |  |
|    |               | 4.2.2 The internal representations                     | 28 |  |
|    |               | 4.2.2.1 Y-tree                                         | 28 |  |
|    |               | 4.2.2.2 X-graph                                        | 30 |  |

|    |       |        | 4.2.2.3   | W-model                                | 36 |
|----|-------|--------|-----------|----------------------------------------|----|
|    |       |        | 4.2.2.4   | Z-formula                              | 36 |
|    |       | 4.2.3  | The prod  | cessing units                          | 36 |
|    |       |        | 4.2.3.1   | Input parsers (1,2)                    | 38 |
|    |       |        | 4.2.3.2   | W-model constructor (3)                | 39 |
|    |       |        | 4.2.3.3   | Y-tree constructor (4)                 | 39 |
|    |       |        | 4.2.3.4   | X-graph pre-compiler (5)               | 39 |
|    |       |        | 4.2.3.5   | X-graph compiler (6)                   | 41 |
|    |       |        | 4.2.3.6   | X-graph unroller (7)                   | 44 |
|    |       |        | 4.2.3.7   | X-graph data-flow constructor (8)      | 45 |
|    |       |        | 4.2.3.8   | Z-formula encoder (9)                  | 45 |
|    |       |        | 4.2.3.9   | SMT-formula converter (10)             | 45 |
|    |       |        | 4.2.3.10  | SMT-formula translator (11)            | 45 |
|    |       | 4.2.4  | Program   | output                                 | 45 |
|    |       | 4.2.5  |           | y components                           | 45 |
|    |       |        | 4.2.5.1   | Watchdog timer                         | 45 |
|    |       |        | 4.2.5.2   | Logger                                 | 45 |
| 5  | Eval  | uation |           |                                        | 46 |
|    | 5.1   | Comp   | arison wi | th Porthos                             | 46 |
|    |       | 5.1.1  |           | Features                               | 46 |
|    |       | 5.1.2  |           | ance                                   | 46 |
|    | 5.2   | Comp   | arison wi | th HERD                                | 46 |
|    |       | 5.2.1  | Unique l  | Features                               | 46 |
|    |       | 5.2.2  | Performa  | ance                                   | 46 |
| 6  | Sum   | ımary  |           |                                        | 47 |
| D: | blica | raphy  |           |                                        | 48 |
| DΙ | unog  | rapny  |           |                                        | 70 |
| Aį | pend  |        |           |                                        | 48 |
|    | A.1   |        | 0         | imar for the porthos v1 input language | 49 |
|    |       |        |           | ree and X-graph representations        | 50 |
|    |       |        |           | emory model defined in CAT language    | 51 |
|    | A.4   | Public | interface | methods of the X-interpreter           | 52 |

## Chapter 1

## Introduction

#### 1.1 Problem statement

Most modern computer systems contain large parts that operate concurrently. Although the parallelisation of a system can drastically improve its performance, it opens numerous of problems connected to correctness, robustness and reliability, which makes the concurrent program design one of the most difficult problems of programming [McK].

Traditionally, studies related to concurrent programming focus on more fundamental theoretical questions of designing race-free and lock-free parallel algorithms, asynchronous data structures and synchronisation primitives of a programming language [Ben06]. Unfortunately, when it comes to real-world concurrent programs, the algorithmic level of abstraction is not enough for guaranteeing their correctness. The reasons of this fact lie in the code optimisations that both compiler and hardware perform in order to increase performance of the system [AG96].

As an example, consider a shared memory concurrent system that executes two parallel processes as it is described in Figure 1.1. The process P0 writes the value 1 to the shared variable x and reads a value of the shared variable y, and the process P1 writes the y and reads the x. Considering all interleavings of the processes, one could expect the final state to be one of the following states:

- (0:EAX=0, 1:EBX=1)
- (0:EAX=1, 1:EBX=0)
- (0:EAX=1, 1:EBX=1)

However, on the x86 architecture the state '(0:EAX=0, 1:EAX=0)' is also reachable as the processors may cache the write to shared memory into

| { x=0; y=0; } |               |  |  |
|---------------|---------------|--|--|
| P0            | P1            |  |  |
| MOV [x],1     | MOV [y],1     |  |  |
| MOV EAX,[y]   | MOV EBX,[x]   |  |  |
| exists (0:EA  | X=0, 1:EBX=0) |  |  |
| x86: allow    |               |  |  |

**Figure 1.1:** Store buffering: A litmus test on write-read reordering allowed under the x86-TSO and forbidden under the SC memory model

their local write buffers, so that they do not immediately become visible by processes running on other cores. This behaviour is known as *store buffering* (SB).

Such little examples that illustrate specific behaviour of a concurrent execution environment are called *litmus tests*. The formal way to define the semantics of memory operations and synchronisation primitives of a parallel execution environment is to define its *memory model*. There are two types of formal memory models. Models of the first type characterise the behaviour of the system (its *operational semantics*) in terms of the abstract machine executing the code, as it was done for the SB example above. Models of the second type define the *axiomatic semantics* of the system by specifying the set of assertions over states of the program. Although the former type of memory models may be easier to describe and interpret, existing numerous formal verification tools and methods direct the research towards axiomatic memory models.

The first memory model for concurrent systems was formulated by Leslie Lamport back in 1979 [Lam79]. This memory model, called the *sequential consistency* (*SC*), allows only those executions that produce the same result as if the operations had been executed in an interleaved fashion in a single process<sup>1</sup>. This means that the order of operations executed by a process is strictly defined by the program (the code) it executes. The SC model does require the write to a shared variable performed in one process to become visible by all other processes *instantly*, each process writes directly to the shared memory, without local buffering. Another important requirement of the SC memory model is that it forbids memory

<sup>&</sup>lt;sup>1</sup>In order to prescind from the implementation details while discussing the memory models theory, we avoid the use of the software-specific term *thread* and the hardware-specific term *processor*. Instead, we adhere the terminology of the theory of concurrency employed by Ben-Ari [Ben06] by naming a concurrent piece of code the *process*.

operations reordering within a single process (the order is strictly defined by the program). Originally, the operational semantics was defined for the SC model, however there exist axiomatic specifications for it [MGZ15].

The SC model is considered to be a *strong memory model* in the sense that it provides firm guarantees regarding the ordering and caused effect of memory operations. Different relaxations of this model lead to the class of *weak memory models (WMMs)* that specify how processes interact through shared memory, when a write becomes visible to other processes and what value a read can return. Thus, WMMs serve as set of guarantees made by designers of execution environment (hardware, programming language, compiler, database, operation system, etc.) to programmers on which behaviours of their concurrent code they may rely on.

### 1.2 Related work

Research on weak memory models firstly aims to *formalise* a formal approach of understanding programs with respect to weak memory models which is *systematic*, *sound* and *complete*. One of the most well-known frameworks for weak memory model-aware analysis was formalised by J. Alglave in 2010 [Alg10],. In addition to developing rather theoretical basis, researchers work on extracting the WMMs for hardware architectures from existing implementations of from their specifications, which are written in natural language and thus suffer from ambiguities and incompleteness. Over the last decade the memory models have been defined for most mainstream multiprocessor architectures, such as x86-TSO and Sparc-TSO (for *Total Store Order*) model for x86 and Sparc architectures [OSS09], much more relaxed memory model for Power and ARM architectures [AFI+09; SSA+11; AMT14], etc. There are projects for validating hardware architectures wrt. a memory model as well, e.g. [LPM14; LSM+16].

Most modern high-level programming languages rely on relaxed memory model as well. Thus, the memory model for Java is based on the *happens-before* principle [Lam78], it was introduced in J2SE 5.0 in 2004 [MPA05]; the C++11 standard [ISO12] has introduced the set of hardware-independent synchronisation fences and atomic operations. The C++17 memory model [BOS+11] is based on the relation *strongly happens-before*. Weak memory are being formalised for even more abstract software environments, the notable project in this area is the project on formalising

the Linux kernel memory model, which is being actively developing these days [AMM<sup>+</sup>18; MAM<sup>+</sup>17].

Furthermore, there exists a wide range of tools that perform memory model-aware analysis.

- A state-of-the-art tool is diy (*do it yourself*), developed by researchers from INRIA institute, France and University of Cambridge, UK. The diy<sup>2</sup> is a software suite for designing and testing weak memory models. It is firstly released back in 2010, and since that time it remained to be the only tool for testing weak memory models. The diy consists of several modules: the litmus tests generators diy, diycross and diyone, the litmus test concrete executor litmus that runs tests on a physical machine and collects its behaviours, and the weak memory model simulator herd that implements reachability analysis for exploring states reachable under the specified WMM.
- There exist tools that perform the weak memory model-aware program verification and model checking. The notable examples are the stateless model checkers RCMC [KLS<sup>+</sup>17], CHESS [MQ08] and Nidhugg [AAA<sup>+</sup>17], the tool Trencher for checking programs against the TSO memory model [BDM13], the tool Porthos for portability analysis [PFH<sup>+</sup>17a],
- Some tools tackle the problem of automated synthesis of the synchronisation primitives, such as the automatic fence insertion tool musketeer [AKN<sup>+</sup>14], and the automatic verification and fence inference tool blender [KVY11].
- Some other tools perform static instrumentation of concurrent C programs and encode the WMM into the program representation so that it can be model-checked by standard tools. The examples are the instrumenting compiler goto-cc which is a part of CBMC model checker [KT14], the tool that performs the sequentialisation of concurrent programs [AKN+13], the tool Weak2SC for producing program descriptions which can be fed into standard model checking tools (such as SPIN [Hol97] or NuSMV [CCG+00]) for performing memory model-aware analysis [TW16].

All the tools listed above consider only a single memory model, however, in real life we face serious engineering problems involving necessity to model more than one execution environment. One of these problems is the *portability* of the program from one hardware architecture to another. A

<sup>&</sup>lt;sup>2</sup>The div project web site: http://div.inria.fr/

program written in a high-level language is then compiled for different hardware. Even if all the compiler optimisations were disabled (which is rare case nowadays), the behaviour of two compiled versions of the same program may differ due to differences between hardware memory models. As the result, a program compiled under the platforms  $\mathcal{T}$  can reach states that are unreachable on the platform  $\mathcal{S}$ , which is a *portability bug* from the source platform  $\mathcal{S}$  to the target platform  $\mathcal{T}$  [PFH+17a].

The very first tool that performs the WMM-aware portability analysis is Porthos<sup>3</sup> introduced in April 2017 [PFH+17b]. This tool reduces described problem to a bounded reachability problem, which can be solved via an SMT-solver. This approach allows to capture symbolically the semantics of analysing program and both weak memory models into a single SMT-formula, augmented by the reachability assertion. As most modern SMT-solvers are efficient enough to be able to operate the state space of size millions of variables bounded by millions of constraints ([MZ09]), the used method can be applicable in solving the real-world problems.

## 1.3 Task specification

The current work aims to rework the proof-of-concept tool *Porthos* by extending the input language, which currently represents the minimum subset of C, and revising the general architecture of the tool in order to enhance performance, extensibility, reliability and maintainability. As the general architecture and almost all components of Porthos have been redesigned, the tool received a new name – *PorthosC*<sup>4</sup>. Considering the enhancements of the architectural design, PorthosC represents a generalised framework for SMT-based memory model-aware analysis, which can not only perform the portability analysis, but can serve as a basis for other kinds of static code analysis.

## 1.4 Thesis structure

The thesis is organised as following. Chapter 2 gives a general view on the weak memory model-aware analysis. Chapter 3 examines the portability

<sup>&</sup>lt;sup>3</sup>The Porthos project web site: http://github.com/hernanponcedeleon/PORTHOS

<sup>&</sup>lt;sup>4</sup>Hereinafter with the name 'Porthos' we refer to the tool Porthos of version 1, whereas the new implementation of Porthos is called PorthosC.

analysis as an bounded reachability problem that can be encoded into an SMT-formula in order to be solved automatically by an SMT solver. Chapter 4 delves into the description of architectural solutions and implementation details of the PorthosC framework. **TODO: other chapters!!!** 

## Chapter 2

## Memory model-aware analysis

The main idea behind the memory model-aware program analysis is that the set of all possible interleavings of the concurrent program (anarchic semantics) can be specified by the axiomatic constraints of the memory model that filter out executions inconsistent in particular architecture (analytic semantics) [ACM16]. The anarchic semantics of the program is a truly parallel semantics with no global time that describes all possible computations with all possible communications. However, the analytic semantics captures the program behaviours on a certain execution environment more precisely.

## 2.1 The event-based program representation

The classical approach for analysing concurrent programs is to model it as the set of sequentially consistent programs, obtained by enumerating all possible interleavings. These models are deterministic as they include the notion of the *global time*. Although these models are easy to build and analyse, the number of all possible interleavings grows exponentially (known as the *combinatorial explosion*), which affects the completeness of an analysis method in general case.

One way to fight the combinatorial explosion is to exclude the global time from the model and treat executions from one equivalence class together in a non-deterministic fashion. For instance, such an equivalence class can be the set of computations performed by a processor locally that do not affect the global state. This idea is used in the *event-based* model, that represents the program as a directed graph of events (the *event-flow graph*) [Alg10]. The vertices of such a graph represent *events* (see Section 2.1.1), and edges represent basic relations (see Section 2.1.2). The graph represents the

set of executions (sequences of events; see Section 2.1.3) defined by the non-deterministic guesses of certain relations on some states.

There are three main types of sources of non-determinism in concurrent programs [MQ08]:

- 1. *input non-determinism*, which is a standard undecidable problem for all static analysis methods: to resolve the user input, system call from the environment, unresolved function calls, etc.;
- 2. *scheduling non-determinism*, caused by the interleavings, which in turn are caused by the scheduler activity;
- 3. *memory-model non-determinism*, caused by hardware and compiler relaxations.

The event-based program model is able to emulate effectively the second and the third types of sources of non-determinism, while the first one can be coped by standard static analysis methods [Lan92; BCD<sup>+</sup>18].

### **2.1.1** Events

An event is a fact of executing the low-level primitive operation such as memory access, threads synchronisation, computation, etc.

A memory event  $e_m \in \mathbb{E}$  represents the fact of access to the memory. Only memory events change the state of an abstract machine executing the code, since it is completely determined by values stored in its memory. Since memory is the crucial low-level resource shared by multiple processes, most relations are defined over memory events. The processes can access a shared memory location (denoted by  $l_i$ , for location), or a local one (denoted by  $r_i$ , for register). A memory event can access at most one shared memory location, high-level instructions that address more than one shared variable must be transformed into a sequence of events. A memory event is specified by its direction with respect to the shared variable, its location  $loc(e_m)$ , its processor label  $proc(e_m)$ , and a unique event label  $id(e_m)$  [Alg10].

The set of memory events  $\mathbb{M}$  is divided into write events  $\mathbb{W}$  (that write values to shared-memory locations) and read events  $\mathbb{R}$  (that read values stored in shared-memory locations). We add a restriction that each memory event uses at most one shared location, so that the write instruction  $i = write(l_1, l_2)$ , that encodes the write from the shared location  $l_2$  to the shared location  $l_1$ , is represented as two consequent events  $e_1 = load(r_1 \leftarrow l_2)$ ;  $e_2 = store(l_1 \leftarrow r_1)$ . Also, it is important to separate the set of initial write events  $\mathbb{IW} \subseteq \mathbb{W}$  that perform initialisation of program variables.

A computation event  $e_c \in \mathbb{C} \subseteq \mathbb{E}$ , represents a low-level assembly computation operation performed solely on local-memory arguments. An example of computation event may be the event  $e_c = r_1 \leftarrow add(r_2, 1)$  that writes the sum of values stored in register  $r_2$  and constant 1 (which is modelled as a register as well) to the register  $r_1$ . For modelling branching statements, we distinguish the set  $\mathbb{C}_{\eth} \subseteq \mathbb{C}$  of *guard* computation events (also called as *branching events*), that are evaluated as a boolean value.

The synchronisation instructions (fences) cause the *barrier events*, that do not perform any computation or memory value transfer, instead, they add new relations to the program model that restrict the set of allowed behaviours. Functionally, a fence may be a synchronisation barrier or a instruction of flushing the local memory caches, etc.

#### 2.1.2 Relations

The relation  $r \subseteq \mathbb{E} \times \mathbb{E}$  is a set of pairs of events (a subset of Cartesian product of two sets of events). There are two kinds of relations between events: *basic relations* that capture the semantics of the program, and *derived relations* that are defined from the basic relations and events in the weak memory model specification. Constraints over relations that are specified by weak memory models are defined as requirements of acyclicity, irreflexivity or emptiness of specific relations [ACM16].

The basic relations are the following [Alg10]:

- The *control-flow* of a program is defined by the *program-order* relation po  $\subseteq \mathbb{E} \times \mathbb{E}$ , which represents the total order of events of same process. For instance, if the instruction  $i_1$  generates the event  $e_1$  and the instruction  $i_2$  follows  $i_1$  and generates the event  $e_2$ , then  $e_1 \stackrel{\text{po}}{\rightarrow} e_2$ .
- The data-flow of a program is defined by communication relations:
  - the *read-from* relation  $rf \subseteq W \times \mathbb{R}$  that maps each write event to the read event that reads the value written by write event;
  - o the *coherence order* relation  $co \subseteq W \times W$  that defines the total order on writes to the same location across all processes (also called the *write serialisation*, ws-relation);
- Events from the same process are related by the *scope relation*  $sr \subseteq \mathbb{E} \times \mathbb{E}$ . In contrast to the herd tool, the PorthosC does not

use hierarchy of scopes (depicted as the scope tree); instead, it uses simple labels that indicate which process has produced certain event.

Below we enumerate some derived relations [Alg10]:

• the from-read relation  $fr \subseteq \mathbb{R} \times \mathbb{W}$  that maps a read event to all write events succeeding the write event from which the read event gets its value:

$$r \xrightarrow{\text{fr}} w \triangleq (\exists w'. w' \xrightarrow{\text{rf}} r \land w' \xrightarrow{\text{co}} w)$$

• the *communication* relation po over memory events, that fully describes the data-flow of a program:

$$m_1 \stackrel{\text{com}}{\to} m_2 \triangleq ((m_1 \stackrel{\text{rf}}{\to} m_2) \vee (m_1 \stackrel{\text{co}}{\to} m_2) \vee (m_1 \stackrel{\text{fr}}{\to} m_2))$$

• the *external* (and *internal*) *from-read* relations that restrict the fr-relation to the different (respectively, same) processes:

$$w \xrightarrow{\text{fre}} r \triangleq (w \xrightarrow{\text{fr}} r \land \text{proc}(w) = \text{proc}(w))$$

$$w \xrightarrow{\text{fri}} r \triangleq (w \xrightarrow{\text{fr}} r \land \text{proc}(w) \neq \text{proc}(w))$$

• the po-loc relation that is the po-relation over events that access to the same shared variable:

$$m_1 \stackrel{\text{po-loc}}{\rightarrow} m_2 \triangleq (m_1 \stackrel{\text{po}}{\rightarrow} m_2 \wedge \text{loc}(m_1) = \text{loc}(m_2))$$

• the semantics of *fences* (memory barriers) specific for different architectures may be defined as derived relations.

#### 2.1.3 Executions

The semantics of a concurrent program is represented as the set of allowed executions. The *execution* is a path in the event-flow graph defined by poand rf-relations and set of final writes to a given memory location that is valid under certain memory model [AMT14]. It can be interpreted as a sequence of guesses which event is to be executed next. The *candidate execution* is an execution that is not yet constrained by a memory model.

Figure 2.1 illustrates four possible candidate executions for the litmus test Example 1.1 (the pictures are generated by the herd7 tool, version 7.47). Since there are no conditional jumps, the po-relation is defined and we do not need to guess it. Since each thread performs single write followed by a single read, the co-relation is also defined (it relates the initial write event with the write event to the same location).



**Figure 2.1:** Possible candidate executions for the litmus test Example 1.1

Thus, there are only four possible executions defined by the choice of rf-relation. The candidate executions pictured in Figures 2.1a–2.1c are consistent both under strong memory model SC and under relaxed memory models x86-TSO, Power, ARM, and some others. However, the execution shown in Figure 2.1c is still consistent under relaxed-memory architectures, but it becomes inconsistent under SC architecture as it forbids cycles over fr  $\cup$  po.

## 2.2 CAT language

Weak memory models are defined via CAT language [ACM16]. This is a domain specific language for describing consistency properties of concurrent programs. The language combines expressive power of a functional language (it is inspired by OCaml and adopts its types, first-class functions, pattern matching and other features) with types, operations and assertions that are specific for operating with relations and executions. In CAT, new relations can be defined via the keyword let and the following operators over relations [ACM16].

Below we enumerate pre-defined operators over relations and sets of events:

#### 1. Unary operations:

the complement of a relation r is ~r,

- the transitive closure of a relation r is r+,
- the reflexive closure of a relation r is r?,
- the reflexive-transitive closure of a relation r is r\*, and
- the inverse of a relation r is  $r^-1$ .

#### 2. Binary operations:

- the union of two relations r1 and r2 is r1 | r2,
- the intersection of two relations r1 and r2 is r1 & r2,
- *the difference* of two relations r1 and r2 is r1 $\times$ r2, and
- the sequence of two relations r1 and r2 is r1; r2, which is defined as the set of pairs (x,y) such that there exists an intervening z, such that  $(x,z) \in r1$  and  $(z,y) \in r2$ .

For instance, the fr-relation is defined as a sequence of inverted rf-relation and co-relation:  $fr = (rf^{-1}; co)$ . As an example memory model definition in CAT language, the the x86-TSO can be found in Appendix A.3. This memory model asserts acyclicity of communication relation, po-loc relation, mfence relation and some other derived relations [OSS09].

## Chapter 3

# Portability analysis as an SMT problem

As it has been discussed in Chapter 1, the program may behave differently when compiled for different parallel hardware architectures. This may cause the portability bugs, the behaviour that is allowed under one architecture and forbidden under another. In this Chapter, we describe the general task of analysing the concurrent software portability as a *bounded reachability* problem, which in turn can be reduced to a SMT problem [PFH<sup>+</sup>17a].

## 3.1 Model checking and reachability analysis

The model checking is the problem of verifying the system (the model) against a set of constraints (the specification) [DKW08]. As the state machine model is the most widespread mathematical model of computation, most classical model checking algorithms explore the state space of a system in order to find states that violate the specification.

The general scheme of model checking is the following. The analysing system (the *model*) is represented as a transition system, a directed graph with labelled nodes representing states of the system. Each state corresponds to the unique subset of atomic propositions that characterise its behavioural properties. Once the model has been constructed, it can be checked for compliance to the *specification*.

Usually, the specification defines temporal constraints over the properties of the system. For instance, the specification assert may state that the property *always* holds (the *safety*) or the property will *eventually* hold (the *liveness*). Commonly, the *Linear Temporal Logic (LTL)* or *Computational Tree* 

*Logic (CTL)* (along with their extensions) is used as a specification language due to the expressiveness and verifiability of their statements.

In the described scheme, the model checking problem is reducible to the reachability analysis, an iterative process of a systematic exhaustive search in the state space. This approach is called *Unbounded Model Checking (UMC)*. However, all model checking techniques are exposed to the *state explosion problem* as the size of the state space grows exponentially with respect to the number of state variables used by the system (its size). In case of modelling concurrent systems, this problem becomes much more considerable due to exponential number of possible interleavings of states. Therefore, the research in model checking over past 40 years was fighting the state explosion problem mostly by optimising search space, search strategy or basic data structures of existing algorithms.

One of the first techniques that optimises the search space considerably is the symbolic model checking with *Binary Decision Diagrams (BDDs)*. Instead of processing each state individually, in this approach the set of states is represented by the BDD, a data structure that allow to perform operations on large boolean formulas efficiently [CKN+12]. The BDD representation can be linear of size of variables it encodes if the ordering of variables is optimal, otherwise the size of BDD is exponential. The problem of finding such an optimal ordering is known as NP-complete problem, which makes this approach inapplicable in some cases.

The other idea is to use satisfiability solvers for symbolic exploration of state space [CBR $^+$ 01]. In this approach, the state space exploration consists of the sequence of queries to the SAT-solver, represented as boolean formulas that encode the constraints of the model and the finite path to a state in the corresponding transition system. This technique is called *bounded model checking (BMC)* as the search process is being repeated up to the user-defined bound k, which may result to incomplete analysis in general case. However, there exist numerous techniques for making BMC complete for finite-state systems (e.g., [Sht00]).

## 3.2 Portability analysis as a bounded reachability problem

In general, a BMC problem aims to examine the non-reachability of the "undesirable" states of a finite-state system. Let  $\vec{x} = (x_1, x_2, ..., x_n)$  be a vector of n variables that uniquely distinguishes states of the system; let  $Init(\vec{x})$  be

an *initial-state predicate* that defines the set of initial states of the system; let  $Trans(\vec{x}, \vec{x}')$  be a *transition predicate* that signifies whether there the transition from state  $\vec{x}$  to state  $\vec{x}'$  is valid; let  $Bad(\vec{x})$  be a bad-state predicate that defines the set of undesirable states. Then, the BMC problem, stated as the reachability of the undesirable state withing k steps, is formulated as following:  $SAT(Init(\vec{x_0}) \land Trans(\vec{x_0}, \vec{x_1}) \land ... \land Trans(\vec{x_{k-1}}, \vec{x_k}) \land Bad(\vec{x_k}))$ .

The portability analysis problem may also be stated as a reachability problem, where the undesirable state is one reachable under the target  $\mathcal{M}_{\mathcal{T}}$  memory model and unreachable under the source memory model  $\mathcal{M}_{\mathcal{S}}$ . Consider the function  $cons_{\mathcal{M}}(P)$  calculates the set of executions of program P consistent under the memory model  $\mathcal{M}$ . Then, the program P is called portable from the source architecture (memory model)  $\mathcal{M}_{\mathcal{S}}$  to the target architecture  $\mathcal{M}_{\mathcal{T}}$  if all executions consistent under  $\mathcal{M}_{\mathcal{T}}$  are consistent under  $\mathcal{M}_{\mathcal{S}}$  [PFH<sup>+</sup>17a]:

**Definition 3.2.1** (Portability). Let  $\mathcal{M}_{\mathcal{S}}$ ,  $\mathcal{M}_{\mathcal{T}}$  be two weak memory models. A program P is portable from  $\mathcal{M}_{\mathcal{S}}$  to  $\mathcal{M}_{\mathcal{T}}$  if  $cons_{\mathcal{M}_{\mathcal{T}}}(P) \subseteq cons_{\mathcal{M}_{\mathcal{S}}}(P)$ 

Note that the definition of portability requirements against *executions* is strong enough, as it implies the portability against *states* (the *state-portability*) [PFH<sup>+</sup>17b]. The result SMT-formula  $\phi$  that encodes the portability problem should contain both encodings of control-flow  $\phi_{CF}$  and data-flow  $\phi_{DF}$  of the program, and assertions of both memory models:  $\phi := \phi_{CF} \wedge \phi_{DF} \wedge \phi_{\mathcal{M}_T} \wedge \phi_{\neg \mathcal{M}_S}$ . If the formula is satisfiable, there exist a portability bug.

## 3.2.1 Encoding for the control-flow

The control-flow of a program is represented by the *control-flow graph*, a directed acyclic connected graph with single source and multiple sink nodes. In a control-flow graph, there are two types of edges (transitions): *primary transitions* that denote unconditional jumps or if-true-transitions (pictured with solid lines), and *alternative transitions* that denote if-false-transitions (pictured with dotted lines). Each node on graph can have either one successor (primary) or two successors (both primary and alternative); only computation events can serve as a branching point). However, each merge node can have any positive number of predecessors, where each edge may be either primary or alternative.

While working on the PorthosC, we applied some modifications of the encoding scheme for the control-flow. The changes are conditioned by the need to be able to process an arbitrary control-flow produced by conditional and unconditional jumps of C language. For that, we compile the recursive abstract syntax tree (AST) of the parsed C-code to the plain (non-recursive) event-flow graph. We show that the new encoding is smaller than the old one used in Porthos since it does not produces new variables for each high-level statement of the input language. For instance, Porthos uses the encoding scheme where the control-flow of the sequential instruction  $i_1 = i_2; i_3$  was encoded as  $\phi_{CF}(i_2; i_3) := (cf_{i_1} = (cf_{i_2} \wedge cf_{i_3})) \wedge \phi_{CF}(i_2) \wedge \phi_{CF}(i_3)$ , and control-flow of the branching instruction  $i_1 = (c?i_2:i_3)$  was encoded as  $\phi_{CF}(c?i_2:i_3) := (cf_{i_1} \Leftrightarrow (cf_{i_2} \vee cf_{i_3})) \wedge \phi_{CF}(i_2) \wedge \phi_{CF}(i_3)$  (here we used the notation of C-like ternary operator 'x?y:z' for defining the conditional expression 'if x then y else z'). In contrast, the new scheme implemented in PorthosC firstly compiles the recursive high-level code into the linear low-level event-based representation, that is then encoded into an SMTformula. The encoding of branching nodes depends on the guards, the value of conditional variable on the branching state, which in turn is encoded as data-flow constraint (see Section 3.2.2).

Let  $\mathbf{x}: \mathbb{E} \to \{0,1\}$  be the predicate that signifies the fact that the event has been executed (and, consequently, has changed the state of the system). Let  $\mathbf{v}: \mathbb{C} \to \mathbb{R}$  be the function that returns the value of the computation event (evaluates it) that will be computed once the event is executed (strictly speaking, it returns the *set* of values determined by the rf-relation; see Section 2.1.2 for details on relations). We distinguish the function  $\mathbf{v}_g: \mathbb{C}_g \to \{0,1\}$  that evaluates the guard computation event. In the result formula, all symbols  $\mathbf{x}(e_i)$  and  $\mathbf{v}(e_i)$  are encoded as boolean variables.

Consider the possible mutual arrangements of nodes in a control-flow graph presented in Figure 3.1. For these cases, we propose the encoding scheme that uniquely encodes each node of graph and at the same time allows to encode partially executed program. Equation 3.1 encodes the sequential control-flow represented in Figure 3.1a and reflects the fact that



**Figure 3.1:** Linear and non-linear cases of a control-flow graph



**Figure 3.2:** Transformation of the empty-branch nonlinear control-flow

the event  $e_2$  can be executed iff the event  $e_1$  has been executed. Equation 3.2 encodes the branching control-flow depicted in Figure 3.1b by allowing only following executions:  $\{\emptyset, (e_1), (e_1 \rightarrow e_2), (e_1 \rightarrow e_3)\}$ . In Equation 3.3, which encodes the control-flow of a merge-point represented in Figure 3.1c, the event  $e_k$  is executed if either of its predecessors has been executed, regardless the type of the transition.

$$\phi_{CF_{seq}} := \mathbf{x}(e_2) \to \mathbf{x}(e_1) \tag{3.1}$$

$$\phi_{CF_{br}} := [\mathbf{x}(e_2) \to \mathbf{x}(e_1)] \wedge [\mathbf{x}(e_3) \to \mathbf{x}(e_1)] \wedge$$

$$[\mathbf{x}(e_2) \to \mathbf{v}(e_1)] \wedge [\mathbf{x}(e_3) \to \neg \mathbf{v}(e_1)] \wedge$$
  
$$\neg [\mathbf{x}(e_2) \wedge \mathbf{x}(e_3)]$$

(3.2)

$$\phi_{CF_{mer}} := \mathbf{x}(e_k) \to (\bigvee_{e_p \in \text{pred}(e_k)} \mathbf{x}(e_p))$$
(3.3)

For sake of correctness of the encoding, we require all branches to have at least one event. Thus, for branching statements that do not have any events in one of the branches (such a branch represents a conditional jump forward), we add the synthetic nop-event as it is shown in Figure 3.2.

## 3.2.2 Encoding for the data-flow

To encode the data-flow constraints, we use the *Static Single-Assignment (SSA) form* in order to be able to capture an arbitrary data-flow into a single SMT-formula. The SSA form requires each variable to be assigned only once within entire program. In contrast, Porthos used the *Dynamic Single-Assignment (DSA)* form, that requires indices to be unique within a branch. Although the number of variable references (each of which

is encoded as unique SMT-variable) on average is logarithmically less in case of the DSA form than the SSA form, the result SMT-formula still needs to be complemented by same number of equality assertions when encoding the data-flow of merge points [PFH<sup>+</sup>17a].

Following [PFH+17b], the indexed references of variables are computed in accordance with the following rules: (1) any access to a shared variable (both read and write) increments its SSA-index; (2) only writes to a local variable increment its SSA-index (reads preserve indices); (3) no access to a constant variable or computed (evaluated) expression changes their These rules determine the following encoding of load, store and computation events within single thread:

$$\phi_{DF_{e=1\text{oad}(r\leftarrow l)}} := \mathbf{x}(e) \to (r_{i+1} = l_{i+1})$$
 (3.4)

$$\phi_{DF_{e=\text{store}(l\leftarrow r)}} := \mathbf{x}(e) \to (l_{i+1} = r_i)$$
(3.5)

$$\phi_{DF_{e=\text{eval}(\cdot)}} := \mathbf{x}(e) \to \mathbf{v}(e) \tag{3.6}$$

To convert the program into SSA form, for each event each variable that is declared so far (either local or shared) is mapped to its indexed reference; this information is stored in the SSA-map "event *to* variable *to* SSA-index". The SSA-map is computed iteratively while traversing the event-flow graph in topological order as it is described in Algorithm 1.

#### Algorithm 1 Algorithm for computing the SSA-indices

**Input:** The event-flow graph  $G = \langle N, E \rangle$  where V is the set of nodes (events), E is the set of control-flow transitions,  $e_0$  is the entry node

**Output:** The SSA-map of the form "{ event : { variable : index }}"

```
1: function Compute-SSA-Map(G)
       S \leftarrow \text{empty map}; S[e_0] \leftarrow \text{empty map}
       for each event e_i \in G.N in topological order do
3:
            for each predecessor e_i \in pred(e_i) do
4:
                S[e_i] \leftarrow \text{copy}(S[e_i])
5:
                for each variable v_k \in \text{set of variables accessed by } e_i do
6:
                     S[e_i][v_k] \leftarrow \max(S[e_i][v_k], S[e_i][v_k])
7:
                     if need to update the index of v_k then
                                                                            ▶ cases (1)-(2)
8:
                         S[e_i][v_k] \leftarrow S[e_i][v_k] + 1
9:
```

The time of described algorithm is linear of the size of event-flow graph since it performs only single traverse of the graph.

As it has been described before, the rf-relation links data-flow between events of data-flow stored in equivalence assertions over the SSA-variables. The encoding of this linkage left untouched as it is implemented in Porthos: for each pair of events  $e_1$  and  $e_2$  linked by the rf-relation, we add the following constraint:

$$\phi_{DF_{mem}}(e_1, e_2) := \text{rf}(e_1, e_2) \to (l_i = l_j)$$
(3.7)

where the variable of location l is mapped to the SSA-variable  $l_i$  for event  $e_1$ , and to the SSA-variable  $l_j$  for event  $e_2$ ; and the predicate  $rf(e_1, e_2)$  is encoded as a boolean variable, which itself equals true if  $e_2$  reads the shared variable that was written in  $e_1$ .

## 3.2.3 Encoding for the memory model

The basic scheme for encoding the memory model is proposed in [PFH<sup>+</sup>17a]. The encoding consists of two parts: encoding the *derived relations* and encoding the memory model *assertions*.

In an SMT-formula, the relation  $x \stackrel{\Gamma}{\to} y$  is represented by a boolean variable r(x,y) that indicates whether the relation holds. The derived relations are encoded by fresh boolean variables according to the following rules [PFH<sup>+</sup>17b]:

- $r_1 \cup r_2(e_1, e_2) := r_1(e_1, e_2) \vee r_2(e_1, e_2)$
- $r_1 \cap r_2(e_1, e_2) := r_1(e_1, e_2) \wedge r_2(e_1, e_2)$
- $r_1 \setminus r_2(e_1, e_2) := r_1(e_1, e_2) \land \neg r_2(e_1, e_2)$
- $r^{-1}(e_1, e_2) := r(e_2, e_1)$
- $r^*(e_1, e_2) := r^+(e_1, e_2) \lor (e_1 = e_2)$
- $r_1; r_2(e_1, e_2) := \bigvee_{e_k \in (E)} r_1(e_1, e_k) \wedge r_2(e_k, e_2)$
- $r^+(e_1, e_2) := \operatorname{tc}_{\lceil log \mid \mathbb{E} \mid \rceil}(e_1, e_2), where$   $\operatorname{tc}_0(e_1, e_2) := r(e_1, e_2), and$  $\operatorname{tc}_{i+1}(e_1, e_2) := r(e_1, e_2) \vee (\operatorname{tc}_i(e_1, e_2); \operatorname{tc}_i(e_1, e_2))$

Note that CAT language allows mutually-recursive definitions of relations (for example, ' $\Gamma_1 = \Gamma_2 \cup (r_1; r_1)$ '). The basic idea of using the Kleene fixpoint iteration for encoding such relations was also proposed in [PFH<sup>+</sup>17a]: for any pair of events  $e_1, e_2 \in \mathbb{E}$  and relation  $r \subseteq \mathbb{E} \times \mathbb{E}$ , we encode a new integer variable  $\Phi^r_{e_1,e_2}$  that represents the round of Kleene iteration on which the variable  $r(e_1,e_2)$  has been set.

The memory model can assert acyclicity, irreflexivity of emptiness of a relation or a set of events. As it has been proposed in [PFH<sup>+</sup>17a], encoding

the acyclicity assertion uses numerical variable  $\Psi_e \in \mathbb{N}$  for each event e in the relation to be asserted:  $acyclic(r) := (r(e_1, e_2) \Rightarrow (\Psi_{e_1} < \Psi_{e_2}))$ . The irreflexivity assertion as  $irreflexive(r) := \bigvee_{e_k \in (E)} \neg r(e_k, e_k)$ .

## Chapter 4

## The PorthosC: implementation

The main call for commencing the work on PorthosC was the need for processing real-world C programs, which, at first, requires the input language to be extended. This implies the support not only for new syntactic structures of C language (such as the switch statement or the postfix increment operator i++), but also for its fundamental concepts and features (such as types, pointer arithmetic or first-order functions), which requires revision of the whole architecture of the tool. Yet far from all of C language has been supported (that, considering its complexity and numerous pitfalls, goes far beyond current thesis<sup>1</sup>), we consider the accomplished work as a step towards it.

## 4.1 General principles

The existing implementation of Porthos does not distinguish the event-based program model from the high-level AST, they both are encoded into single SMT-formula (see classes of package 'dartagnan.program' of Porthostool). Moreover, the syntax tree was implemented as a mutable data structure, which is being modified at all stages of the program (for instance, see the methods 'dartagnan.program.Program.compile(...)' of Porthos that recursively compute some properties of the AST and change its state). We are inclined to consider this architecture as one that is fast to develop, but hard to maintain (since it is difficult to guarantee the correctness of the program) and extend (since adding the support for a new high-level

<sup>&</sup>lt;sup>1</sup>For ensuring this, one merely has to look at existing C compilers, for instance, the open-source gcc compiler, which uses the C parser written in more than 18.5 thousand lines of code (see https://github.com/gcc-mirror/gcc/blob/master/gcc/c/c-parser.c)

instruction requires changing multiple components of the program, from parser to encoder).

Therefore, while working on the new design of PorthosC, we decided to clearly separate the high-level intermediate code representation (implemented as a recursive AST structure) from low-level event-based representation (implemented as an event-flow graph). Such a modular architecture allows to support multiple input languages by parsing them and converting parsed syntax trees to a simplified AST. which, along with all other data-transfer objects (DTO), must be immutable, so that it is possible to guarantee the correctness of the program by controlling its invariants. The immutability in PorthosC is implemented via final fields that are assigned by the immutable-object values (either a primitive type, or another immutable object, or an immutable collection provided by the library Guava by Google<sup>2</sup>).

During the development of PorthosC we mainly followed the KISS principle, which can be exhaustively described in 17 Unix Rules of Eric Raymond [Ray03]. The following list summarises the main rules we followed during the development of PorthosC:

#### 1. Robustness:

- 1.1. prohibit under-approximating abstractions;
- 1.2. preservation the completeness of analysis;
- 1.3. modular architecture: each module can be tested independently;
- 1.4. usage of software design patterns where necessary;
- 1.5. usage of immutable data structures for all DTOs;
- 2. Transparency:
  - 2.1. following the principles of simplicity and readability;
  - 2.2. clear and informative program output;
- 3. Efficiency:
  - 3.1. keeping the trade-off between execution time and memory usage;
- 4. Extensibility:
  - 4.1. clear modular architecture.

<sup>&</sup>lt;sup>2</sup>Guava project repository: https://github.com/google/guava/

Robustness of the analysis is the main criterion of PorthosC as a verification tool. This implies modelling the system with usage of only overapproximating abstractions However it makes the analysis more sensitive to the combinatorial explosion, it preserves the completeness of analysis, which is necessary for a model-checking tool. As a robust and transparent tool, PorthosC must adhere to the strategy of aborting its work on any unexpected state (for instance, if a parser failed to parse the string and recovery algorithm is not described).

As its predecessor, PorthosC uses the open-source SMT-solver Z3 <sup>3</sup>by Microsoft Research [DB08]. However, unlikely its predecessor, the PorthosC has an additional abstraction level Z-formula (see Section 4.2.2.4) that allow to use any other SMT-solver.

The programming language choice for PorthosC was also made in favour of *java*, firstly, in order to be able to reuse some parts and concepts of Porthos that is written in java, and secondly, because the authors find the object-oriented (OOP) concepts of java suitable for modelling languages. Although java does not show best results in performance benchmarks (for example, comparing to C++ [Hun11; Oak14]), the performance cornerstone of PorthosC (as well as any other SMT-based code analyser) is the phase of solving the SMT-formula, which is left to the third-party SMT-solverinvoked from PorthosC via java API. However, considering the perspective of using PorthosC as a static analyser for real-world programs, the memory optimisation problem must also be taken into account during both encoding and solving stages. It is worth noting that, for the reasons of simplicity, the PorthosC is not a concurrent program, however, we believe that, due to its modular architecture, it can be easily parallelised on the level of program modules.

## 4.2 Architecture

The general architecture scheme of PorthosC is presented in Figure 4.1. The rectangles in the picture denote processing units (marked with gear sign with a unique number of the component), and labels of the arrows describe the internal representation carried out during the transformation.

The program takes as input the program to be analysed and one (the reachability analysis mode) or two (the portability analysis mode) memory models. The parsed program syntax tree is then converted (Section 4.2.3.3)

<sup>&</sup>lt;sup>3</sup>The Z3 project repository: https://github.com/Z3Prover/z3



**Figure 4.1:** The general architecture of PorthosC

to a program AST called Y-tree<sup>4</sup>(Section 4.2.2.1), which then is being preprocessed at the pre-compilation stage (Section 4.2.3.4) in order to collect information necessary for the compilation. The Y-tree then is being compiled (Section 4.2.3.5) to an X-graph representation (Section 4.2.2.2). The compiled X-graph then is being converted to an acyclic form (Section 4.2.3.6) in order to be encoded into a Z-formula (Section 4.2.2.4). Apart from that, the memory-model constructor (Section 4.2.3.2) constructs the abstract syntax tree of derived relations of the weak memory model W-model (Section 4.2.3.3). Thereafter, W-model and acyclic X-graph are encoded (Section 4.2.3.8) to a Z-formula representation (a wrapper over an SMT-formula), which then is translated to an SMT-formula (Section 4.2.3.9), which then is solved by the SMT-solver (Section 4.2.3.10).

## 4.2.1 Program input

Both Porthos and PorthosC use the ANTLR parser generator<sup>5</sup> [Par13], a powerful language processing tool. The ANTLR takes as input the user-defined grammar of the target language in a BNF-like form and produces the LL(\*)-parser and optionally some auxiliary classes (such as listeners and visitors for the syntax tree). Although this parser may be not as efficient as the hand-written language-optimised parser, it reduces the overhead of implementing the parser significantly. Among other advantages ANTLR, it is worth of noting that it has rather large collection of officially supported grammars. Nonetheless, the intuitive syntax for defining grammars and numerous of tools for debugging grammars make the ANTLR an attractive instrument for solving the parsing problem.

Figure 4.1 represents the simplified grammar of of input language used by previous version of Porthos (the full ANTLR grammar is available at Appendix A.1. The input language parser used by Porthos suffered from several disadvantages. Firstly, it contained the parser code inlined into directly the grammar, so that the grammar would serve as a template for the parser code (which is called semantic actions). Such a combining of two expressive languages makes the code hardly understandable and,

<sup>&</sup>lt;sup>4</sup>In order to avoid confusion between different internal representations, we prefix the names of elements of each internal representation with a letter. For instance, we picked the letter 'Y' to denote the AST code representation as drawing of this letter resembles the tree branching; with letter 'X' we prefix elements of the event-flow graph as the events are to be executed; and with letter 'W' we prefix elements of the weak memory model AST.

<sup>&</sup>lt;sup>5</sup>The ANTLR project repository: https://github.com/antlr/antlr4

**Figure 4.1:** The sketch of the input language grammar used by Porthos v1

therefore, poorly maintainable. In PorthosC, we clearly separated the parser (generated from the grammar file '<*grammar*>.*g4*') from the converting the ANTLR syntax tree to the AST, that is one for all languages of an input program.

Secondly, in Porthos, the semantics of operations was defined syntactically (in ANTLR grammar), whereas this should be performed by a separate module operating on the AST level<sup>6</sup>, so that it does not require to change grammar for encoding the semantics of a new function.

As the reader can notice from the grammar sketch in Figure 4.1, the memory operations of different kinds vary syntactically. For example, the assignment of local computation to a register uses the symbol '<-', the atomic non-relaxed load operation denoted as '<:-', non-relaxed store operation denoted as ':=', and the semantics of relaxed load and store are resolved syntactically by matching the function name. In PorthosC, the semantics of the data-flow operation is determined according to the types of operands, that are determined during the pre-compilation stage (see Section 4.2.3.4). The semantics of the functions also is being resolved during the pre-compilation stage via the *invocation hooking* mechanism (see Section 4.2.3.5.2).

 $<sup>^6</sup>$ Most modern languages (including C++) allow the function overloading, thus requiring the type information to be available for the function resolution algorithm (see Section 4.2.3.4.2).

Thirdly, the grammar used by Porthos had restricted set of allowed operations. For example, it allowed only computations over the local variables, which might lead to inconsistency of the result SMT-formula if the same variable name was used both as a register and as a location, for instance, the in code snippet 'x := reg1; reg2 <- (x + 1);', the first statement interprets the variable x as a location, while the grammar of second statement requires it to be a register. Also, only integers were processed by the input language parser of Porthos. In PorthosC, we extended support for primitive types supported by the Z3 solver (this apply to 32-bit integers encoded as Ints of Z3, floats encoded as Reals, enumerations encoded as Scalars). Although Z3 supports the array theory (characterised by the select-store axioms [MB11]), the complexity of static analysis of pointers and non-constant sized arrays moves the full support of arrays and pointers out of the scope of current thesis.

The minor drawbacks of grammar used by Porthos include lack of operator associativity (expressions of the form '1 + 2 \* 3' could not be parsed), incorrectly (in terms of C) implemented grammar rule for the statement (the semicolon punctuator ';' was implemented as the separator between two statements, whereas in C it serves as a statement terminator). Also, Porthos supports only the litmus-specific syntax for the variables initialisation, however, it allows only ini tialisation of the shared variable and only by default value 0. PorthosC supports arbitrary declaration to be performed in initialisation statement.

PorthosC uses the C language grammar of proposed in the C11 standard [ISO11], that was extended by litmus test-specific syntax such as initialisation and final-state assertion statements (the original ANTLR grammar can be found in the official repository containing the collection of ANTLR v4 grammars<sup>7</sup>). Current version of PorthosC does not recognise C processor directives (it ignores them), however, in future it can be extended to support them.

Currently, PorthosC can operate only in the intra-procedural analysis mode, assuming that each function defined in the input file is being executed in a separate thread. However, the redesigned architecture of PorthosC allow to easily support the inter-procedural (cross-procedure) analysis by inlining function calls and binding variable contexts. Thus, instead of analysing a single code file, the user can specify the whole code project. The functions to be executed in parallel can be specified by the user, however,

<sup>&</sup>lt;sup>7</sup>Repository path: https://github.com/antlr/grammars-v4

the some concurrent parts of code can be detected automatically at the compilation stage while the semantics of function invocations is being resolved (for instance, the C threads operations from thread.h or pthread.h may be supported).

## 4.2.2 The internal representations

For keeping the architecture transparent, we build all abstraction levels with interfaces, even if some of them does not add any new functionality.

#### 4.2.2.1 Y-tree

The first internal representation used by PorthosC is the *Y-tree*, which represents a rather high-level AST. The Apendix A.2 represents the file tree of main classes that constitute the Y-tree hierarchy (as the inheritance tree might be obvious for the C-like AST, we confine ourselves to presenting the classes file tree only, which we tend to retain clearly structured).

The abstract syntax tree Y-tree is an abstraction level suitable for compiling the program to a low-level representation (in case of processing low-level assembly code, it may be directly converted to the X-graph representation). As the ANTLR syntax tree follows the same structure as the grammar, which is a superset of the real (meaningful) grammar of C language, it lacks multiple concepts of the language (for example, the syntax of indexer access corresponds the grammar rule "postfixExpression '[' expression ']'", that is converted to the YIndexerExpression). However some details of the syntax might have been abstracted away (for instance, array operations may be emulated by functions invocations, see [Gri12, Chapter 5]), we found this level of abstraction suitable enough for our tasks.

Each Y-tree element implements the interface YEntity and carries the OriginLocation instance that contains information about the coordinates of the input text that generated the Y-tree element.

Following the C11 standard [ISO12], we distinguish a statement ("an action to be performed") from an expression ("a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof").

All Y-tree expressions implement the YExpression interface. On the Y-tree level, the pointer arithmetic is modelled by the integer number *pointer level* of an expression (although in fact this is the property of a type not of an expression, the y-tree is an untyped syntax tree, therefore the

elements Y-tree should carry this property). We distinguish the subset of expressions that imply no side-effects, they implement the interface YAtom and can be global or local (which is defined also syntactically). Each Y-tree expression carries the original code coordinates, that can be resolved to text by the service LocationService.

The Y-tree expressions are the following:

- YBinaryExpression that model the C binary operator (*relative* operator that compares two expressions of any type, *logical* that processes two boolean expressions, and *numerical* that processes two numerical expressions);
- YUnaryExpression that model the C unary expression (logical negation, numeric prefix and postfix increment and decrement, bitwise complement);
- YMemberAccessExpression that has an arbitrary expression of type YExpression as its base expression (it will be resolved during the compilation stage);
- YIndexerExpression and YInvocationExpression that as arbitrary expression as its base or arguments (strictly speaking, the indexer expression is an unary-function invocation, but as the SMT-solver we use supports the constant-array theory, we can maintain the array type);
- YAssignmentExpression that assigns an YExpression to an YAtom;
- YVariableRef that stores the untyped "reference" to a variable (viz., the name only);
- YLabeledVariableRef that represents the litmus-specific local variable reference for a certain the process (e.g., 'P0:x' which means the local variable x of the process P0);
- YParameter that represents a typed variable (the type was declared, similarly to the variable definition);
- YConstant that represents an untyped non-named constant.

Similarly to expressions, all Y-tree statements implement the YStatement interface. The statements are the following:

- YBranchingStatement representing the if-then-else statement;
- YLoopStatement representing both while- and for- loops;
- YJumpStatement representing unconditional jump (goto-jump to a label and loop-jumps break and continue);
- YCompoundStatement (block statement) representing sequence of N statements grouped into one syntactic unit;
- YLinearStatement representing a single expression;
- YVariableDeclarationStatement containing the information about the variable type during the variable declaration.

On the Y-level of abstraction, we define the YType as a *reference* for the type (since the Y-tree is not typed, all expressions do not have type, however, the YType is used for storing the information on declaration, including the type itself, type modifiers and qualifiers).

According to the C standard, "any statement may be preceded by a prefix that declares an identifier as a label name". The Y-tree statements of follow this rule, however they these labels are symbolic, and they need to be resolved at the pre-compilation stage. Apart from the set of statements listed before, we define the YFunctionDefinition and its inheritor a litmus-specific declaration YProcessDefinition used in intra-procedural analysis mode. The function definition contains the YCompoundStatement body and the YMethodSignature signature, which is used in the function resolution during the compilation stage. The other litmus-specific statements are YPreludeDefinition that carries the list of YStatement initial writes, and YPostludeDefinition that carries the YExpression binary expression to be asserted by the litmus test.

The syntax tree that contains set of definitions (e.g., litmus-initialisations, function definitions, litmus-asserts) is modelled by the class YSyntaxTree.

#### 4.2.2.2 X-graph

The Y-tree is compiled into the low-level event-based program representation called *X-graph*. The mathematical structure of event-flow graph was discussed in Section 2.1. The nodes of the graph are events, and the edges are basic relations: the control-flow relation po and the data-flow relations co and rf. Hereinafter, we denote the X-graph with only control-flow



**Figure 4.2:** The inheritance tree of interfaces of X-graph



**Figure 4.3:** The inheritance tree of X-graph memory units

edges as X-graph<sub>CF</sub>, the X-graph with only data-flow edges as X-graph<sub>CF</sub>. The complete X-graph is X-graph<sub>CF+DF</sub> = X-graph<sub>CF</sub>  $\cup$  X-graph<sub>DF</sub>. The UML diagram on Figure 4.2 represents the hierarchy of main interfaces of the X-abstraction level.

All elements of X-graph implement the interface XEntity. There are two main kinds of X-entity: *events* that implement the XEvent interface, and *memory-units* that implement the XMemoryUnit interface.

#### **4.2.2.2.1** Memory units

The *memory unit* is a memory cell of an abstract machine executing the code. This machine has an infinite number of arbitrary-sized *registers* (local memory units) and *locations* (shared memory units). Local memory units inherit the XProcessLocalElement interface, that stores the ID of the owning process. Figure 4.3 represents inheritance hierarchy of memory units.

Following the terminology of the C standard, we distinguish the *r-value* and *l-value* memory units (unlike r-values, the l-values may be assigned a new value). As r-values cannot change their value, they can be seen as the value itself (therefore the XComputationEvent is modelled as an local r-value memory unit, see more detailed discussion further in current Section).

Each memory unit has an XType associated with it. The X-type is a symbolic representation of the C primitive type<sup>8</sup> that is easily convertible to an SMT-type (modelled as ZType).

The memory units are created and stored by the XMemoryManager, which provides interface for accessing memory units during compilation stage. For more detailed description of memory management see Section 4.2.3.5.1.

Following the litmus tests format, we distinguish three types of X-graph: one for a process, one for a litmus-initialisation block and one for the assertion statement. However, all three types of X-graph are modelled by the same graph structure XProcess with certain restrictions complied by the corresponding type of X-interpreter that constructs the graph. This simplifies drastically the processing of different types of code blocks as they all are modelled by the same data structure. The examples of restrictions on X-graphs are the following: the initialisation block can not have branchings, fence events; the process cannot have assertion events; the assertion block can not have shared-memory events or fence events. We discuss the interpretation of different types of statements in more detail in Section 4.2.3.5.3.

#### 4.2.2.2.2 Events

An event (XEvent) represents the fact of executing the primitive operation, which is independent from other events. Each event is specified by the process generated them and a unique event label. This information is stored by events in the immutable structure XEventInfo. Also, each event carries the reference to the Y-instruction that has generated it.

The following interfaces model basic kinds of events (see Figure 4.2):

• XMemoryEvent
The memory event defines the transfer of the value from one memory

<sup>&</sup>lt;sup>8</sup>Here we should note that PorthosC can eventually evolve to be able to analyse programs written in an OOP language (for instance, in C++). In this case, the XType will have more complex structure than a simple enumeration, which it has when we need to emulate only primitive types of C language. See more detailed discussion on input language type system in Section 4.2.3.4.2.

unit to another. There are four types of memory events (the arrow denotes the direction of the data-flow):

- XRegisterMemoryEvent: (XLocalLvalueMemoryUnit) ← (XLocalMemoryUnit),
- XLoadMemoryEvent: (XLocalLvalueMemoryUnit) ← (XSharedMemoryUnit),
- XStoreMemoryEvent: (XSharedLvalueMemoryUnit) ← (XLocalMemoryUnit),
- XInitialWriteEvent: (XLvalueMemoryUnit) ← (XRvalueMemoryUnit).

#### • XComputationEvent

We distinguish two types of computation events:

- XUnaryComputationEvent that encodes bit negation and nooperation,
- o XBinaryComputationEvent that encodes numeric operations (such as addition, multiplication, etc.), bit vector operations (such as bit-and, bit-xor, etc.), relative operations (such as greater-then comparison, equality comparison, etc.), and logical operations (such as conjunction and disjunction).

The computation event class implements both XEvent and XMemoryUnit. This is a model-level optimisation, which is possible because a computation event performs computation over local-only memory and does not change value of any memory unit. Thus, the *computation* abstraction (as the CPU time spent for the computation itself) can be safely removed from the model, and the computation event can be seen as a zero-time operation that produces the *value*. For analysing large expressions this optimisation is sensible because it considers the whole expression as a single computation event, encoded therefore as a single SMT-variable.

Note, for the purpose of simplifying the X abstraction level, computation events may have been modelled as invocations of bodiless functions (for instance, the operation 'x + y' may be modelled as the invocation of the function '+' with the arguments 'x' and 'y'). However, current version of PorthosC maintains

the XComputationEvent as the operators are supported by the SMT-solver.

#### • XControlFlowEvent

The control-flow event indicates a non-linear jump in the code. We distinguish two kinds of control-flow events:

- XJumpEvent that performs no computation and no data operation, it can be safely removed from the model as an optimisation,
- o XMethodCallEvent that models the function call with the fastcall calling convention (passing arguments in registers). The function call also implements the XLocalMemoryUnit since it represents the computed value returned by the function call. If the function has been resolved, the actual arguments are bind to the formal parameters (treated as temporary local memory units), the called XMethodCallEvent is pushed onto the call stack, and the execution jumps to the function body. The call stack is bounded by the user-defined parameter; once the stack is full, the interpretation continues without jumping to the body of the invoked function (such cases are properly logged). Each return statement creates the assignment of the function call event on the top of call stack. Note, this approach works for any kind of recursion, which is unrolled up the user-defined boundary as well as explicit loops.

According to the Rice's theorem [Ric53], in general case there always will be functions unresolved within an analysis launch. If the function has not been resolved, it can be safely assumed to be a no-operation function. In other words, we suppose that the knowledge base of PorthosC is complete and the tool can resolve all memory-operation functions and fence instructions. Although this can affect the completeness of the analysis, it is safe to make such an assumption as the user can check the log file and manually analyse the semantics of an unresolved call and it to the knowledge base if necessary.

#### • XFenceEvent

The fences are implemented as an enumeration XBarrierEvent. Current implementation of PorthosC supports all fences supported by Porthos: mfence, sync, optsync, lwsync, optlwsync, ish, isb, and isync.

#### • XFakeEvent

The fake events are the auxiliary elements of X-graph.

- XEntryEvent, the per-process unique source event in the event-flow graph,
- XExitEvent, the process sink event. There are two kinds of sink events: complete and incomplete (see description in Section 4.2.3.6),
- XNopEvent, the no-operation event (a jump to the next event), used for correct encoding in case when the control-flow branch does not have any event (see Figure 3.2).

#### 4.2.2.2.3 Edges

Edges are stored as an immutable hash-map from source to target event. The type of edge is stored in the enumeration XEdgeKind:

- the *control-flow edges*: a primary edge (epsilon or if-then transition) and an alternative edge (if-else transition), see more in Section 3.2.1;
- the data-flow edges: the co-relation edge and the rf-relation edge.

#### 4.2.2.2.4 Graph invariants

Once being constructed, the graph must conform the following requirements (see also Section 3.2.1):

- 1. the graph must have a single source with no ingoing edges, and two sinks of different kinds without outgoing edges,
- 2. the graph must be connected,
- 3. each node of the graph can have either one or two direct control-flow successors,
- 4. only nodes of type XComputationEvent can have two direct control-flow successors,
- 5. a co-edge connects two writes, an rf-relation edge connects a write and a read,
- 6. all write-event nodes except initial write-nodes must have exactly one co-predecessor
- 7. all write-event nodes except final write-nodes must have exactly one co-successor.

#### 4.2.2.2.5 Graph example

consider the following code in C:

picture of the CF-graph:

<TODO: finish it later after fixing the code>

#### 4.2.2.3 W-model

A W-model represents a recursive AST of *computations over relations* and assertion expressions defined by the memory model. The atomic elements of W-model are the basic relations (po, rf and co; see Section 2.1.2) and sets of events ( $\mathbb{R}$ ,  $\mathbb{W}$ ,  $\mathbb{I}\mathbb{W}$ , etc.; see Section 2.1.1). The expressions of W-model are unary (such as complement, transitive closure, etc.) and binary operations (such as union, intersection, etc.) over relations or sets of events; see Section 2.2. For the sake of transparency, each element of a W-model contains the origin location as the coordinates of the string in the model file.

<todo: enumerate elements – after refactoring>

#### 4.2.2.4 Z-formula

The Z-formula representation is a wrapper for an SMT-formula used as an additional abstraction level used to increase the transparency of the architecture, simplify the debugging process and ease the support of different SMT-solvers. As all other internal representations, the Z-formula is implemented as an immutable data structure.

The Z-abstraction level models the the logical formulas (definitions and assertions) that are put on the assertion stack of the SMT-solver. Generally, a Z-formula represents the S-expression-based syntax of the SMT-LIB language [BFT17]. Currently, the Z-formula supports definitions and assertions over *variables* and binary and unary *expressions*. However, in future it may be extended to support *function symbols* and *binders* (existential and universal quantification, pattern matching and functional type construction).

All expressions of a Z-formula are *typed* (or *sorted*, in terms of SMT-LIB standard). Basically, we distinguish *boolean* and *numeric* (integer, bitvector, real) expressions. The typisation of a Z-formula is necessary for checking the basic validity of its expressions and converting it to a typed SMT-LIB formula. For the sake of transparency, each element of a Z-formula contains the origin location as the reference to the X-element that produced current Z-element.

<todo: enumerate elements – after completing>

#### 4.2.3 The processing units

This section describes program units that construct, transform and analyse internal representations described before. Further in this Section, the subsection titles have a number in parentheses, which correspond to a unique number of the processing unit being discussed. These numbers are pictured within the gear in the top-left corner of each processing unit in Figure 4.1.

The construction of the X-graph is performed in three stages. First, the Y-tree is compiled to a *cyclic* control-flow event-based graph X-graph<sub>CF</sub>. Then, this graph is unrolled to an *acyclic* control-flow event-based graph X-graph<sup>U</sup><sub>CF</sub>. After that, the compiler is able to perform the data-flow analysis and produce the full event-based graph X-graph<sup>U</sup><sub>CF+DF</sub>, which remains to be *CF-acyclic* (no cycles among control-flow edges).

All recursive data structures are processed by units that implement the *visitor* pattern [PJ98]. This is a behavioural pattern that separates the program logic from the object implementation by specifying handling methods for each element of the object. The visitor pattern performs the double-dispatching mechanism: the visiting class defines the method for accepting a visitor, and the visitor itself defines the <TODO!!!!>

To visit an instance x by a visitor v, the program should call the acceptmethod of x and pass v as an argument. In our implementation, the visitor must carry out the continuation of traversing the recursive data structure itself. The general structure of the visitor pattern is illustrated by the pseudo-java code in Figure 4.4.

```
interface Element {
                                        class Visitor<T> {
  <T> T accept(Visitor<T> visitor);
                                           T visit(AnElement e) {
                                             // visiting logic
}
                                             // continue recursively
class AnElement implements Element {
                                             e.getChild().accept(this);
  @Override
                                           T visit(AnOtherElement e) {
  <T> T accept(Visitor<T> visitor) {
    return visitor.visit(this);
  }
                                           }
                                           . . .
}
                                        }
```

**Figure 4.4:** *Illustration of the visitor pattern* 

We consider the visitor pattern as the most natural way for operating the hierarchical data structures such as AST. However, we use its double-dispatching capabilities to reduce cost of multiple type casting performed while traversing the elements of a non-recursive data structure. The operator instance, instead of having a single method that handles and element of the instance, extends the visitor interface and splits the handler method into multiple methods, one for each type of element.

For traversing plain (non-recursive) complex data structures, we mostly follow the iterator pattern: the special object Iterator, that has access to elements of the data structure, iterates over them and ... . Since in Porthos we usually do not need to browse an object elements in different order, the iterator creation is performed by the data structure itself. However, in contrast to the visitor of a recursive data structure, the visitor of a plain data structure should be used with an *iterator* that has access to the elements of the object and passes them to the visitor.

Construction of immutable data structures is performed by units that follow the *builder* pattern. A builder is a mutable object that contains methods "filling" it by the elements that will constitute the result complex object. The methods contain basic validity checks of the elements. Once a builder has been built, it cannot be modified.

#### 4.2.3.1 Input parsers (1,2)

Both input-language and input-model parsers are implemented via ANTLR parser generator.

Details on the input program language grammar are discussed in Section 4.2.1. Currently, PorthosC does not consider preprocessor instructions (it ignores them). For implementing the inter-procedural mode of Porthos, it is crucial to support inclusion of header files (the #include preprocessor directive). Next step would be implementing the support of macros and conditional compilation directives, that are used often in C code. As preprocessor statements may appear in arbitrary place of a program, the preprocessor must be a stateful processing unit that reads the token stream and dynamically instrument the program by interpreting directives and expanding macros.

The input memory model language CAT is discussed in Section 2.2. The ANTLR grammar for CAT language was extracted from the parser used by herd tool<sup>9</sup> written in OCaml.

<sup>&</sup>lt;sup>9</sup>herd project repository: https://github.com/herd/herdtools7

#### 4.2.3.2 W-model constructor (3)

The W-model representation is constructed by the stateless visitor Cat2WmodelConverterVisitor from the ANTLR syntax tree CatParser.MainContext of the memory-model defined in CAT language. Currently, the visitor supports the small subset of CAT language, which constitutes only non-functional declarative expressions of the language as the support for functional-style expressions requires implementing the full OCaml interpreter. However, some most commonly used functional-style expressions may be supported by mapping them syntactically to corresponding W-elements by the W-model constructor.

Following the principle of transparency, the W-model constructor aborts its work with the NotSupportedException if met the unrecognised syntax construction (in contrast to the Porthos v1 approach in which the null value was produced in all exceptional states).

<todo: file tree of w-model to appendix – after code fixing>

#### 4.2.3.3 Y-tree constructor (4)

The Y-tree is constructed by the stateless visitor C2YtreeConverterVisitor from the ANTLR syntax tree C11Parser.MainContext of C language. As the W-model constructor, the Y-tree constructor aborts its work with the NotSupportedException once it meets an unsupported syntax construction. A syntax exception YConverterException is thrown if the converted syntax tree contains semantic errors that prevent it to be converted to the Y-tree.

As a Y-tree constitutes a generic AST, the Y-tree constructor performs code desugaring (for example, a switch-statement is converted to the equivalent if-statement).

#### 4.2.3.4 X-graph pre-compiler (5)

The precompiler traverses the Y-tree and collects information necessary for its compilation into an X-graph.

#### 4.2.3.4.1 The label resolution

The label resolution is necessary for establishing links to labelled statements. In C, labelled statements are declared via the colon-syntax '<label> : <statement>', and the labels are referenced by the jump-statement 'goto <label>'. The label resolution algorithm traverses the Y-tree and collects all declared labels into a map that points a label to the

labeled statement. This information is used during compilation to set up unconditional jumps.

#### 4.2.3.4.2 Type analysis

C language has a static (resolved at compile-time) manifest (all types are declared explicitly) type system. Comparing to languages that use type inference, the type analysis of a C program constitutes a simple propagating the type information (obtained from variables declarations) to all expressions. Being carried at Y- and X- representation levels, the type is converted to a Z-type at the stage of the Z-formula encoding (see Section 4.2.3.8).

Currently, PorthosC handles only the primitive C types (such as int, char, float, etc.), which is modelled as an enumeration XType. The array dimension is stored at the Y-level as an integer number (currently this information is not used as PorthosC does not support arrays, although this information can be used at the stage of pre-compilation for converting array elements to separate variables). In addition, PorthosC has a built-in extensible database storing the semantics of non-primitive C types stored by the X2ZTypeConverter, which can be requested for converting an X-type to a Z-type.

The type analysis algorithm should consider the type aliases supported by C language (defined by the typedef instruction). For resolving the type aliases, the precompiler should make an extra traverse of the Y-tree before the pre-compilation stage and build up a symbol map.

The type analysis also includes the process of resolving semantics of function invocations. The result is stored in a map of invocation expression to the resolved function signature, so that the compiler can decide either to jump to the function body and interpret it, or to hook the invocation if the semantics of the function is resolved. However, due to lack of polymorphism and function overloading mechanism inherent to OOP languages, the function resolution algorithm may set up the mapping only for the function name and thus may neglect analysis of types of the arguments, we build the mapping for the full function signature so that the type analysis preprocessing unit may be used for analysing the C++ code if needed.

#### 4.2.3.4.3 Variable kind analysis

On the compilation stage, once the compiler meets the reference to a variable, it should know whether it refers to a local or global variable. The kinds of variables have to be determined on the pre-compilation stage.

The following types of variables are detected as *global variables*:

- a variable was declared as a pointer;
- a variable whose address was accessed by any process;
- a variable that was declared as a parameter of the process function;
- a variable that was exported by the extern keyword (in the kernelanalysis mode, the functions EXPORT\_SYMBOL and EXPORT\_SYMBOL\_GPL also export symbols for dynamic linking);

### 4.2.3.5 X-graph compiler (6)

The X-compiler is the main component that transforms the recursive Y-tree data structure to the plain X-graph representation. It is a complex processing unit; Figure 4.5 illustrates the relationship between the principal components of the X-compiler in the UML language.

The main class representing the X-compiler is Y2XConverter. It receives as input the Y-tree, the memory model kind and user settings (for instance, the interpreter mode defining the set of invocation hooks enabled during the analysis run). The Y2XConverter creates an instance of the stateless visitor Y2XConverterVisitor that traverses the Y-tree while invoking the stateful interpreter XInterpreter.

We distinguish three types of processes (each implement the XInterpreter interface):

- the prelude process XPreludeInterpreter (to be executed before all other processes) that allows only declaration of local or shared variables, computations and memory operations;
- the postlude process XPostludeInterpreter (to be executed after all other processes) that allows only declaration of local variables, memory operations, computations and assertions;
- the process XProcessInterpreter (to be executed in parallel) that allows all interface operations except declaration of shared variables (that should be declared during pre-compilation stage or during compilation of the prelude process) and assertions (declaration of local variables, computations,

prelude must start first (for declaring shared vars), post - last , others - i any order



Figure 4.5: Main components of the X-compilation processing unit

all start with entry and end with exit

The X-interpreter is a stateful processing unit that carries the X-graph-builder and provides *action* methods for changing its state and thus the state of the builder.

X compilation error

<scheme of who calls whom>

#### 4.2.3.5.1 Memory manager

The X-graph abstract machine model disposes infinitely many memory units, both local and shared (see Section 4.2.2.2.1). The X-compiler accesses all memory units via the XMemoryManager, which by the end of precompilation stage is already initialised (has registered all shared memory units). However, the memory manager is a stateful component of the compiler as it offers the methods for declaring and removing local memory units dynamically at the compilation time. At each time of the compilation process, the XMemoryManager can resolve the memory unit by its name. Following the C standard, local memory units have higher priority over global ones. Since C language allows usage of variables that have the same name to be declared in nested contexts, the XMemoryManager carries the stack of block-contexts for local variables.

<some picture?>

#### 4.2.3.5.2 Invocation hook manager

lambda-hook: knowledge base: InterceptionAction(BiFunction<XMemoryUnit, XMemoryUnit[], ? extends XEntity>action)

todo: kernel hooks

#### **4.2.3.5.3 Interpreter**

stateful

stacks: context, readyContext, call, BlockContext (rename): what stores, enum of states

almostReadyContexts: for computation events. Flushed by invocation . When are filled.

setting up edges depending of type of ready-context

- hierarchy of Compilers: pic on inheritance (XCompiler is an stateful abstract machine). Restrictions of lude-compilers postlude: assertion: a new event XAssertionEvent

interfaces that offer compilers: listing of interface algorithm for interpreting

#### 4.2.3.6 X-graph unroller (7)

up to bound *k* 

recursion – may be, if calls are supported

two kinds of exits

- unrolling: why we cannot encode cyclic structures. reference to the paper (see arXiv version)

simple algorithm

setting up backward edges

how we attempted, why it didn't work in general case:

why we changed the notion of unrolling bound

sth about hashcodes and collections

- After we acquired the event-based representation, we can perform some modifications/simplifications/optimisations on it (separately, allowing user to manage them)

#### 4.2.3.7 X-graph data-flow constructor (8)

- set up co, rf edges => new graph
- computing SSA maps (now: during the encoding. should be: during the post-compilation) (as one of necessary steps before encoding)

#### 4.2.3.8 Z-formula encoder (9)

Z-type: SMT-specific (say about smt-lib)

#### 4.2.3.9 SMT-formula converter (10)

#### 4.2.3.10 SMT-formula translator (11)

### 4.2.4 Program output

structure: verdict



**Figure 4.6:** Example of the flow graph from Figure  $\ref{eq:condition}$ , unwinded up to the bound k = 6

## 4.2.5 Auxiliary components

## 4.2.5.1 Watchdog timer

todo

## 4.2.5.2 Logger

todo

## Chapter 5

## **Evaluation**

tests

- 5.1 Comparison with Porthos
- 5.1.1 Unique Features
- 5.1.2 Performance
- 5.2 Comparison with HERD
- 5.2.1 Unique Features
- 5.2.2 Performance

## Chapter 6

# **Summary**

enumerate goals in past tense + justification contribution arch – framework new encoding: more laconic (compare!) + actually show size kernel

## **Bibliography**

- [AAA<sup>+</sup>17] Parosh Aziz Abdulla, Stavros Aronis, Mohamed Faouzi Atig, Bengt Jonsson, Carl Leonardsson, et al. "Stateless model checking for TSO and PSO". In: *Acta Informatica* 54.8 (2017), pp. 789–818.
- [ACM16] Jade Alglave, Patrick Cousot, and Luc Maranget. "Syntax and semantics of the weak consistency model specification language cat". In: *arXiv preprint arXiv:1608.07531* (2016).
- [AFI<sup>+</sup>09] Jade Alglave, Anthony Fox, Samin Ishtiaq, Magnus O Myreen, Susmit Sarkar, et al. "The semantics of Power and ARM multiprocessor machine code". In: *Proceedings of the 4th workshop on Declarative aspects of multicore programming*. ACM. 2009, pp. 13–24.
- [AG96] Sarita V Adve and Kourosh Gharachorloo. "Shared memory consistency models: A tutorial". In: *computer* 29.12 (1996), pp. 66–76.
- [AKN<sup>+</sup>13] Jade Alglave, Daniel Kroening, Vincent Nimal, and Michael Tautschnig. "Software verification for weak memory via program transformation". In: *European Symposium on Programming*. Springer. 2013, pp. 512–532.
- [AKN<sup>+</sup>14] Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. "Don't sit on the fence". In: *International Conference on Computer Aided Verification*. Springer. 2014, pp. 508–524.
- [Alg10] Jade Alglave. "A shared memory poetics". In: La Thèse de doctorat, L'université Paris Denis Diderot (2010).

- [AMM+18] Jade Alglave, Luc Maranget, Paul E McKenney, Andrea Parri, and Alan Stern. "Frightening small children and disconcerting grown-ups: Concurrency in the Linux kernel". In: *Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems*. ACM. 2018, pp. 405–418.
- [AMT14] Jade Alglave, Luc Maranget, and Michael Tautschnig. "Herding cats: Modelling, simulation, testing, and data mining for weak memory". In: *ACM Transactions on Programming Languages and Systems (TOPLAS)* 36.2 (2014), p. 7.
- [BCD<sup>+</sup>18] Roberto Baldoni, Emilio Coppa, Daniele Cono D'Elia, Camil Demetrescu, and Irene Finocchi. "A Survey of Symbolic Execution Techniques". In: *ACM Comput. Surv.* 51.3 (2018).
- [BDM13] Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. "Checking and enforcing robustness against TSO". In: *European Symposium on Programming*. Springer. 2013, pp. 533–553.
- [Ben06] Mordechai Ben-Ari. *Principles of concurrent and distributed programming*. Pearson Education, 2006.
- [BFT17] Clark Barrett, Pascal Fontaine, and Cesare Tinelli. *The SMT-LIB Standard: Version 2.6.* Tech. rep. Available at www.SMT-LIB.org. Department of Computer Science, The University of Iowa, 2017.
- [BOS<sup>+</sup>11] Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. "Mathematizing C++ concurrency". In: *ACM SIGPLAN Notices* 46.1 (2011), pp. 55–66.
- [CBR<sup>+</sup>01] Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. "Bounded model checking using satisfiability solving". In: Formal methods in system design 19.1 (2001), pp. 7–34.
- [CCG+00] Alessandro Cimatti, Edmund Clarke, Fausto Giunchiglia, and Marco Roveri. "NuSMV: a new symbolic model checker". In: *International Journal on Software Tools for Technology Transfer* 2.4 (2000), pp. 410–425.

- [CKN+12] Edmund M Clarke, William Klieber, Miloš Nováček, and Paolo Zuliani. "Model checking and the state explosion problem".
   In: Tools for Practical Software Verification. Springer, 2012, pp. 1–30.
- [DB08] Leonardo De Moura and Nikolaj Bjørner. "Z3: An efficient SMT solver". In: *International conference on Tools and Algorithms for the Construction and Analysis of Systems*. Springer. 2008, pp. 337–340.
- [DKW08] Vijay D'Silva, Daniel Kroening, and Georg Weissenbacher. "A Survey of Automated Techniques for Formal Software Verification". In: *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)* 27.7 (July 2008), pp. 1165–1178.
- [Gri12] David Gries. *The science of programming*. Springer Science & Business Media, 2012.
- [Hol97] Gerard J. Holzmann. "The model checker SPIN". In: *IEEE Transactions on software engineering* 23.5 (1997), pp. 279–295.
- [Hun11] Robert Hundt. "Loop recognition in C++/Java/Go/Scala". In: *Proceedings of Scala Days* 2011 (2011), p. 38.
- [ISO11] ISO/IEC. SC22/WG14. ISO/IEC 9899: 2011. Tech. rep. Geneva, Switzerland, 2011.
- [ISO12] ISO ISO. *IEC 14882: 2011 Information technology Programming languages C++*. Tech. rep. 2012, p. 59.
- [KLS+17] Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. "Effective stateless model checking for C/C++ concurrency". In: *Proceedings of the ACM on Programming Languages* 2.POPL (2017), p. 17.
- [KT14] Daniel Kroening and Michael Tautschnig. "CBMC C bounded model checker". In: *International Conference on Tools and Algorithms for the Construction and Analysis of Systems*. Springer. 2014, pp. 389–391.
- [KVY11] Michael Kuperstein, Martin Vechev, and Eran Yahav. "Partial-coherence abstractions for relaxed memory models". In: *ACM SIGPLAN Notices*. Vol. 46. 6. ACM. 2011, pp. 187–198.

- [Lam78] Leslie Lamport. "Time, clocks, and the ordering of events in a distributed system". In: *Communications of the ACM* 21.7 (1978), pp. 558–565.
- [Lam79] Leslie Lamport. "How to make a multiprocessor computer that correctly executes multiprocess program". In: *IEEE transactions on computers* 9 (1979), pp. 690–691.
- [Lan92] William Landi. "Undecidability of static analysis". In: ACM Letters on Programming Languages and Systems (LOPLAS) 1.4 (1992), pp. 323–337.
- [LPM14] Daniel Lustig, Michael Pellauer, and Margaret Martonosi. "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models". In: *Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture*. IEEE Computer Society. 2014, pp. 635–646.
- [LSM<sup>+</sup>16] Daniel Lustig, Geet Sethi, Margaret Martonosi, and Abhishek Bhattacharjee. "Coatcheck: Verifying memory ordering at the hardware-OS interface". In: *ACM SIGOPS Operating Systems Review* 50.2 (2016), pp. 233–247.
- [MAM+17] Paul E. McKenney, Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern. *A formal kernel memory-ordering model (part 1)*. 2017. URL: https://lwn.net/Articles/718628/.
- [MB11] Leonardo de Moura and Nikolaj Bjørner. Z3 a Tutorial. 2011.
- [McK] Paul E McKenney. Is parallel programming hard, and, if so, what can you do about it? (v2017. 01.02 a).
- [MGZ15] William Mansky, Dmitri Garbuzov, and Steve Zdancewic. "An axiomatic specification for sequential memory models". In: *International Conference on Computer Aided Verification*. Springer. 2015, pp. 413–428.
- [MPA05] Jeremy Manson, William Pugh, and Sarita V. Adve. *The Java memory model*. Vol. 40. 1. ACM, 2005.
- [MQ08] Madanlal Musuvathi and Shaz Qadeer. "Fair stateless model checking". In: *ACM SIGPLAN Notices*. Vol. 43. 6. ACM. 2008, pp. 362–371.
- [MZ09] Sharad Malik and Lintao Zhang. "Boolean satisfiability from theoretical hardness to practical success". In: *Communications of the ACM* 52.8 (2009), pp. 76–82.

- [Oak14] Scott Oaks. Java Performance: The Definitive Guide: Getting the Most Out of Your Code. "O'Reilly Media, Inc.", 2014.
- [OSS09] Scott Owens, Susmit Sarkar, and Peter Sewell. "A better x86 memory model: x86-TSO". In: *International Conference on Theorem Proving in Higher Order Logics*. Springer. 2009, pp. 391–407.
- [Par13] Terence Parr. *The definitive ANTLR 4 reference*. Pragmatic Bookshelf, 2013.
- [PFH+17a] Hernán Ponce de León, Florian Furbach, Keijo Heljanko, and Roland Meyer. "Portability Analysis for Weak Memory Models. PORTHOS: One Tool for all Models". In: *Static Analysis 24th International Symposium, SAS 2017, New York, NY, USA, August 30 September 1, 2017, Proceedings.* 2017, pp. 299–320. DOI: 10.1007/978-3-319-66706-5\_15. URL: https://doi.org/10.1007/978-3-319-66706-5\_15.
- [PFH<sup>+</sup>17b] Hernán Ponce de León, Florian Furbach, Keijo Heljanko, and Roland Meyer. "Portability Analysis for Axiomatic Memory Models. PORTHOS: One Tool for all Models". In: *CoRR* abs/1702.06704 (2017). arXiv: 1702.06704. URL: http://arxiv.org/abs/1702.06704.
- [PJ98] Jens Palsberg and C Barry Jay. "The essence of the visitor pattern". In: Computer Software and Applications Conference, 1998. COMPSAC'98. Proceedings. The Twenty-Second Annual International. IEEE. 1998, pp. 9–15.
- [Ray03] Eric S Raymond. *The art of Unix programming*. Addison-Wesley Professional, 2003.
- [Ric53] Henry Gordon Rice. "Classes of recursively enumerable sets and their decision problems". In: *Transactions of the American Mathematical Society* 74.2 (1953), pp. 358–366.
- [Sht00] Ofer Shtrichman. "Tuning SAT checkers for bounded model checking". In: *International Conference on Computer Aided Verification*. Springer. 2000, pp. 480–494.
- [SSA+11] Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. "Understanding POWER multiprocessors". In: *ACM SIGPLAN Notices* 46.6 (2011), pp. 175–186.

[TW16] Oleg Travkin and Heike Wehrheim. "Verification of concurrent programs on weak memory models". In: *International Colloquium on Theoretical Aspects of Computing*. Springer. 2016, pp. 3–24.

# **Appendices**

## A.1 The ANTLR grammar for the porthos v1 input language

```
grammar Porthos;
main
: program
bool_expression
 : bool_atom
 | bool_atom BOOL_OP bool_atom

    TRUE

 | FALSE
 '('arith_expr COMP_OP arith_expr')'
'('bool_expression')'
arith_expr
: arith_atom ARITH_OP arith_atom
| arith_atom
arith_atom
 register
 '('arith_expr')'
register
: WORD
location
local
: register '<-' arith_expr
: register '<:-' location
store
: location ':=' register
: register '=' location'.''load''('
  ATOMIC ')'
 : location'.''store''('ATOMIC','
  register')'
instruction
 I sequence
   while_
 if
atom
 | load
 store
 | read
 | write
```

```
sequence
: atom ';' instruction | while_';' instruction | if ';' instruction
: 'if' bool_expression'then''{' instruction'}'
('else''{' instruction'}')?
while_
: 'while' bool_expression '{' instruction '}'
;
program
: '{' location(',' location)*'}'
    // Lexer rules:
: '_na'|'_sc'|'_rx'|'_acq'|'_rel'|'_con'
: 'mfence'|'sync'|'lwsync'|'isync';
COMP_OP : '==' | '!=' | '<=' | '<' | '>=' | '>';
ARITH_OP: '+' | '-' | '*' | '/' | '%';
BOOL_OP : 'and' | 'or';
DIGIT :[0-9];
LETTER:'a'..'z'|'A'..'Z';
TRUE : 'true' | 'True';
FALSE : 'false' | 'False';
WORD : (LETTER | DIGIT)+;
```

### A.2 File trees of Y-tree and X-graph representations



## A.3 The x86-TSO memory model defined in CAT language

```
"X86 TS0"
include "x86fences.cat"
include "filters.cat"
include "cos.cat"
(* Uniproc check *)
let com = rf | fr | co
acyclic po-loc | com
(* Atomic *)
empty rmw & (fre;coe)
(* Global happens-before *)
let po_ghb = WW(po) | RM(po)
#mfence
include "x86fences.cat"
#implied barriers
let poWR = WR(po)
let i1 = MA(poWR)
let i2 = AM(poWR)
let implied = i1 | i2
let com = rfe | fr | co
let ghb = mfence | implied | po_ghb | com
show implied
acyclic ghb as tso
```

## A.4 Public interface methods of the X-interpreter

```
public interface XInterpreter {
     XProcessId getProcessId();
     XCyclicProcess getResult();
void finishInterpretation();
     // memory manager interface methods:
XLocation declareLocation(String name, XType type);
     XRegister declareRegister(String name, XType type);
     XRegister declareTempRegister(XType type);
XLvalueMemoryUnit declareUnresolvedUnit(String name, boolean isGlobal);
     XLvalueMemoryUnit getDeclaredUnitOrNull(String name);
XRegister getDeclaredRegister(String name, XProcessId processId);
     XLocalMemoryUnit tryConvertToLocalOrNull(XEntity expression);
     \label{lem:computation} XComputationEvent tryEvaluateComputation(XEntity entity); \\ XRegister copyToLocalMemory(XSharedMemoryUnit shared); \\
     // linear interpretation methods:
     XEntryEvent emitEntryEvent();
     XExitEvent emitExitEvent();
     XBarrierEvent emitBarrierEvent(XBarrierEvent.Kind kind);
     XJumpEvent emitJumpEvent();
     XNopEvent emitNopEvent():
     XLocalMemoryEvent emitMemoryEvent(XLocalLvalueMemoryUnit destination, XLocalMemoryUnit source);
     XSharedMemoryEvent emitMemoryEvent(XLocalLvalueMemoryUnit destination, XSharedMemoryUnit source); XSharedMemoryEvent emitMemoryEvent(XSharedLvalueMemoryUnit destination, XLocalMemoryUnit source);
     {\tt XAssertionEvent\ emit Assertion Event\ (XBinary Computation Event\ assertion);}
     // non-linear interpretation methods:
     void startBlockDefinition(BlockKind blockKind);
void startBlockConditionDefinition();
     {\tt void finishBlockConditionDefinition(XComputationEvent conditionEvent);}\\
     void startBlockBranchDefinition(BranchKind branchKind);
     void finishBlockBranchDefinition();
     void finishNonlinearBlockDefinition();
void processJumpStatement(JumpKind kind);
     XEntity processMethodCall(String methodName, @Nullable XMemoryUnit receiver, XMemoryUnit... arguments);
     // create computation event without processing it:
     XComputationEvent createComputationEvent(XBinaryOperator operator,
                                                         XLocalMemoryUnit firstOperand,
XLocalMemoryUnit secondOperand);
     enum BlockKind {
          Sequential,
          Branching,
          Loop,
     enum BranchKind {
     }
     enum JumpKind {
          Break,
          Continue,
    }
```