Aalto University School of Science Master's Programme in Computer, Communication and Information Sciences

Artem YUSHKOVSKIY

# **Automated Analysis of Weak Memory Models**

Master's Thesis Espoo, ?.?.2018

Supervisor: Assoc. Prof. Keijo Heljanko

Instructor:



Aalto University School of Science

Master's Programme in Computer, Communication and Information Sciences

ABSTRACT OF
MASTER'S THESIS

**Author:** Artem YUSHKOVSKIY

Title:

Automated Analysis of Weak Memory Models

 Date:
 ?.?.2018
 Pages:
 vi + 30

 Professorship:
 Code:
 AS-116

**Supervisor:** Assoc. Prof. Keijo Heljanko

**Instructor:** 

In id fringilla velit. Maecenas sed ante sit amet nisi iaculis bibendum sed vel elit. Quisque eleifend lacus nec ipsum lobortis ornare. Nam lectus diam, facilisis eget porttitor ac, fringilla quis massa. Phasellus ac dolor sem, eget varius lacus. Sed sit amet ipsum eget arcu tristique aliquam. Integer aliquam velit sit amet odio tempus commodo. Quisque commodo lacus in leo sagittis vel dignissim quam vestibulum. Cras fringilla velit et diam dictum faucibus. Pellentesque at eros non mauris auctor euismod. Nullam convallis arcu vel lectus sollicitudin rutrum. Praesent consequat, nisl at pretium posuere, neque arcu dapibus lacus, ut sollicitudin elit velit ultricies libero. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Nulla semper hendrerit molestie. Pellentesque blandit velit sit amet est vestibulum faucibus. Nullam massa turpis, venenatis non mollis fringilla, mattis et diam. Fusce molestie convallis elementum. Morbi nec lacus dapibus arcu mollis gravida. Aliquam erat volutpat. Nam vitae magna nunc. Nunc ut ipsum at massa porttitor vestibulum. Praesent diam lorem, ultrices nec vestibulum id, volutpat nec lacus.

**Keywords:** Thesis template, master's thesis

**Language:** English



# Aalto-yliopisto Perustieteiden korkeakoulu

???

| Tekijä:      | Artem YUSHKOVSKIY           |            |         |
|--------------|-----------------------------|------------|---------|
| Työn nimi:   |                             |            |         |
| !            |                             |            |         |
| Päiväys:     | ?.?.2018                    | Sivumäärä: | vi + 30 |
| Professuuri: | ?                           | Koodi:     | AS-116  |
| Valvoja:     | Assoc. Prof. Keijo Heljanko |            |         |
| Ohjaaja:     |                             |            |         |

Cras tincidunt bibendum erat, vel tincidunt diam porttitor aliquam. Donec sit amet urna non felis placerat pharetra. Aenean ultrices facilisis nulla vitae semper. Nullam non libero quis dui fermentum aliquam id vel eros. Praesent elementum tortor quis sem congue iaculis sit amet eget nisl. Quisque erat tortor, condimentum eu volutpat et, blandit et augue. Phasellus erat turpis, pretium non feugiat id, posuere id velit. Vestibulum ut sapien felis, quis convallis dui.

In elementum est eu nulla hendrerit feugiat. In sodales diam vel lacus cursus tincidunt. Morbi nibh dui, imperdiet non vestibulum non, dignissim id risus. Sed sollicitudin neque lectus, porttitor sollicitudin elit. Nulla facilisi. Nullam in ante eu mi suscipit sollicitudin. Sed est velit, gravida facilisis varius eget, tempus sed urna. Aliquam erat volutpat. Nam semper condimentum nisi. Nullam scelerisque, metus nec sodales vulputate, purus augue venenatis urna, sit amet mattis turpis nisl ac metus. Mauris nec odio ut neque condimentum vulputate vel in turpis. Nulla facilisi. Nulla id tellus sapien, vitae blandit lorem.

| Asiasanat: | Diplomityöpohja |
|------------|-----------------|
| Kieli:     | Englanti        |

# Acknowledgements

In id fringilla velit. Maecenas sed ante sit amet nisi iaculis bibendum sed vel elit. Quisque eleifend lacus nec ipsum lobortis ornare. Nam lectus diam, facilisis eget porttitor ac, fringilla quis massa. Phasellus ac dolor sem, eget varius lacus. Sed sit amet ipsum eget arcu tristique aliquam. Integer aliquam velit sit amet odio tempus commodo. Quisque commodo lacus in leo sagittis vel dignissim quam vestibulum. Cras fringilla velit et diam dictum faucibus. Pellentesque at eros non mauris auctor euismod. Nullam convallis arcu vel lectus sollicitudin rutrum. Praesent consequat, nisl at pretium posuere, neque arcu dapibus lacus, ut sollicitudin elit velit ultricies libero. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Espoo, ?.?.2018

Artem YUSHKOVSKIY

# **Abbreviations**

LI Lorem Ipsum

ABC Quisque et mi lacus, nec porta ante.
DEF Proin pellentesque accumsan laoreet

# **Contents**

| Al | Abbreviations |          |                                                   |    |
|----|---------------|----------|---------------------------------------------------|----|
| 1  | Intr          | oductio  | on                                                | 1  |
|    | 1.1           | Thesis   | s structure                                       | 4  |
| 2  | Mer           | nory m   | odel-aware analysis                               | 5  |
|    | 2.1           | The ev   | vent-based program representation                 | 5  |
|    |               | 2.1.1    |                                                   | 6  |
|    |               | 2.1.2    |                                                   | 6  |
|    |               | 2.1.3    | Executions                                        | 8  |
|    | 2.2           | The ca   | at language                                       | 9  |
| 3  | Port          | tability | analysis as an SMT problem                        | 10 |
|    | 3.1           | •        | l checking and reachability analysis              | 10 |
|    | 3.2           |          | pility analysis as a bounded reachability problem | 11 |
|    |               | 3.2.1    | Encoding for the control-flow                     | 12 |
|    |               | 3.2.2    | Encoding for the data-flow                        | 14 |
|    |               | 3.2.3    | Encoding for the memory model                     | 16 |
| 4  | The           | portho   | s2: implementation                                | 17 |
|    | 4.1           | _        | rements                                           | 17 |
|    | 4.2           | _        | am Components                                     | 18 |
|    |               | 4.2.1    | <del>-</del>                                      | 19 |
|    | 4.3           | The Y    | -tree: an AST                                     | 22 |
|    | 4.4           |          | iling the Y-tree to the X-graph                   | 22 |
|    |               | 4.4.1    | Pre-compilation                                   | 23 |
|    |               | 4.4.2    |                                                   | 23 |
|    |               | 4.4.3    | Post-compilation transformations                  | 23 |
|    |               | 4.4.4    | Input language parser                             | 23 |

|              | 4.5                                                     | XGrap  | oh to ZFormula (SMT) encoder | 25 |  |  |
|--------------|---------------------------------------------------------|--------|------------------------------|----|--|--|
|              | 4.6                                                     | Optim  | nisations                    | 25 |  |  |
| 5            | Eval                                                    | uation |                              | 26 |  |  |
|              | 5.1                                                     | Comp   | arison with PORTHOS          | 26 |  |  |
|              |                                                         | 5.1.1  | Unique Features              | 26 |  |  |
|              |                                                         | 5.1.2  | Performance                  | 26 |  |  |
|              | 5.2                                                     | Comp   | arison with HERD             | 26 |  |  |
|              |                                                         | 5.2.1  | Unique Features              | 26 |  |  |
|              |                                                         | 5.2.2  | Performance                  | 26 |  |  |
| 6            | Sun                                                     | ımary  |                              | 27 |  |  |
| Bibliography |                                                         |        |                              |    |  |  |
| ΑĮ           | peno                                                    | dices  |                              | 28 |  |  |
| A            | A The ANTLR grammar of the input language of porthos v1 |        |                              |    |  |  |

# Chapter 1

# Introduction

Most modern computer systems contain large parts that operate concurrently. Though parallelisation of the system can improve its performance drastically, it opens numerous of problems connected to correctness, robustness and reliability, which makes the concurrent program design one of the most difficult problems of programming [McK17].

Traditionally, studies related to concurrent programming concern on more fundamental theoretical questions of designing race-free and lock-free parallel algorithms, asynchronous data structures and synchronisation primitives of a programming language. Unfortunately, when it comes to the real-world concurrent programs, the algorithmic level of abstraction is not enough for guaranteeing their properties of correctness and reliability. The reasons of this fact lie in the code optimisations that both compiler and hardware perform in order to increase performance as much as possible. For instance, Figure 1.1 provides simple example of reachability of the state '(0:EAX=0 /\ 1:EAX=0)' on x86 machines (such little examples that illustrate specific behaviour of a WMM are called *litmus tests*). This state is allowed because in x86 architecture each processor may cache the write to shared memory variable into its local write buffer, so that they do not become visible by other processes immediately. In the example, the write 'MOV [x], 1' performed by process P0 stores value 1 to the shared variable [x] into the write buffer of process P0. Meanwhile, the write cache of the process P1 may not have updated version of the variable [x], neither may have the main memory, so that the read 'MOV EBX, [x]' performed in the process P1 may read the initial value 0 even if this variable has been already updated in another thread. These problems have lead to the need for formalisation of

| { x=0; y=0; }               |             |  |
|-----------------------------|-------------|--|
| P0                          | P1          |  |
| MOV [x],1                   | MOV [y],1   |  |
| MOV EAX,[y]                 | MOV EAX,[x] |  |
| exists (0:EAX=0 /\ 1:EAX=0) |             |  |
| x86-TSO: allow              |             |  |

**Figure 1.1:** Store buffering (SB): a litmus test on write-read reordering allowed under the x86-TSO and forbidden under the SC memory model

semantics of memory operations within different concurrent architectures defined by *weak memory models (WMM)*.

Research of weak memory models firstly aims to *formalise* develop the formal approach of understanding programs with respect to weak memory models which is systematic, sound and complete. The first (and so far the only) such a framework was presented in 2010 [Alg10]. In addition to developing rather theoretical basis, researchers work on extracting the WMMs for hardware architectures from existing implementations of from their specifications, which are written in natural language and thus suffer from ambiguities and incompleteness. Over last decade the memory models have been defined for most mainstream multiprocessor architectures, such as x86-TSO and Sparc-TSO (for *Total Store Order*) model for x86 and Sparc architecture formalised in 2009 [OSS09], much more relaxed memory model for Power and ARM architectures [AMT14] [SSA+11] [AFI+09], and others. There are projects for validating hardware architectures wrt. a memory model, e.g. [LSM+16] [LPM14].

Most modern high-level programming languages rely on relaxed memory model as well. Thus, the memory model for Java is based on the *happens-before* principle [Lam78], it was introduced in J2SE 5.0 in 2004 [MPA05]; the C++11 standard [ISO12] has introduced the set of hardware-independent synchronisation fences and atomic operations, whenever the C++17 memory model [BOS+11] is based on the relation *strongly happens-before*. Weak memory are being formalised for even more abstract software environments, the notable project in this area is the project on formalising the Linux kernel memory model, which is being actively developing these days [MAM+17]. Furthermore, there is a wide range of tools that perform program verification wrt memory models (see [AKN+13], [LFH+17]).

The first memory model for concurrent systems was formulated by Leslie Lamport back in 1979 [Lam79]. This memory model, called the *sequential consistency (SC)*, allows only those executions (interleavings) that produce the same result as if the operations had been executed by single process. This means that the order of operations executed by a process is strictly defined by the program it executes. The SC model does requires the write to a shared variable performed in one process to become visible by all other processes not instantly, but simultaneously. This means each process communicates to the shared memory directly, without local buffering. Another important requirement of SC memory model is that it forbids memory operations reordering within single process (the order is strictly defined by the program).

The SC model is considered to be the strong memory model in the sense that it provides strong guarantees regarding the ordering and caused effect of memory operations. Different relaxations of this model lead to the class of *weak memory models (WMM)*. They specify how threads interact through shared memory, when a write becomes visible to other threads and what value a read can return. Therefore, WMMs serve as set of guarantees made by designers of execution environment (hardware, programming language, compiler, database, operation system, etc.) to programmers on which behaviours of their concurrent code they may expect.

Although weak memory studies is rather young research area, there exist frameworks and tools for exploring WMMs and examining simple programs with respect to the them. The state-of-the-art tool is diy (for *do it yourself*), developed by the researchers from INRIA institute, France and University of Cambridge, UK. The diy <sup>1</sup> is a software suite for designing and testing weak memory models. It is firstly released back in 2010, and since that time it remained to be the only tool for testing weak memory models. The diy consists of several modules: the litmus tests generators diy, diycross and diyone, the litmus tests concrete executor litmus that runs tests on a physical machine while collecting its behaviours, and the weak memory models simulator herd that implements reachability analysis for exploring states reachable under specified WMM.

All the diy tools work only with single memory model, however, in real life we face serious engineering problems involving necessity to model more than one execution environment. One of these problems is the *portability* of the program from one hardware architecture to another. A program written

<sup>&</sup>lt;sup>1</sup>Project web site: http://diy.inria.fr/

in a high-level language is then compiled for different hardware. Even if all the compiler optimisations were disabled (which is rare case nowadays), the behaviour of two compiled versions of the same program may differ due to differences between hardware memory models. As the result, a program compiled under the platforms T can reach states that are unreachable on the platform S, which is a *portability bug* from the source platform S to the target platform T [LFH<sup>+</sup>17].

The first tool that performs the WMM-aware portability analysis is porthos<sup>2</sup>introduced in April 2017 [LFH+17]. This tool reduces described problem to a bounded reachability problem, which can be solved with help of an SMT-solver. This approach allows to capture symbolically the semantics of analysing program and both weak memory models into single SMT-formula, augmented by the reachability assertion. As most modern SMT-solvers are efficient enough to be able to operate the state space of size millions of variables bounded by millions of constraints ([MZ09]), the used method can be applicable in solving the real-world problems.

Current work aims to rework the proof-of-concept tool porthos by extending the input language, which currently represents the minimum subset of C, and revising the general architecture of the tool in order to enhance performance, reliability and maintainability. As the general architecture and almost all components of porthos have been redesigned, the tool received a new name – porthos2<sup>3</sup>. Considering the enhancements of the architecture, porthos2 represents a generalised framework for SMT-based memory model-aware analysis, which can not only perform the portability analysis, but can serve as a basis for other kinds of static code analysis.

#### 1.1 Thesis structure

The thesis is organised as following. Chapter 2 gives a general view on the weak memory model-aware analysis. Chapter ...

<sup>&</sup>lt;sup>2</sup>Project web site: http://github.com/hernanponcedeleon/PORTHOS

 $<sup>^{3}</sup>$ Hereinafter with the name 'porthos' we refer to the tool porthos version 1 (also addressed as porthos v1), whereas the new version of porthos is called porthos2.

# Chapter 2

# Memory model-aware analysis

In general, analysis of concurrent programs with respect to axiomatic memory models is performed in several stages. Firstly, the control-flow and data-flow of a program is encoded as the set of possible *candidate executions*. Obtained model of the program is called an anarchic semantics, which is a truly parallel semantics with no global time that describes all possible computations with all possible communications [ACM16]. Thereafter, the anarchic semantics is constrained by the *weak memory model* specification which is a set of axiomatic constraints for filtering out executions inconsistent in particular architecture.

# 2.1 The event-based program representation

The classical approach for modeling concurrent programs is to use the *global time*, a single order of interleavings among all events happened in different threads. Although these models are easy to understand, it may be impossible to process *all* possible states, number of which is exponentially large. However, there exist equivalence classes such that the result of execution different interleavings from single equivalence class is the same (for instance, computations performed by a processor locally do not affect the global state). One such model is the *event-based* representation of a program, which models a program as a directed graph of events (the *event-flow graph*). The vertices of such a graph represent *events* (independent low-level instructions; see Section 2.1.1), and edges represent *relations* over the events (see Section 2.1.2).

#### **2.1.1** Events

A memory event  $e_m \in \mathbb{E}$  represents the fact of access to the memory. Since memory is the crucial low-level resource shared by multiple processes, most relations are defined over memory events. The processes can access a shared memory location (denoted by  $l_i$ , for location), or a local one (denoted by  $r_i$ , for register). A memory event can access at most one shared memory location, high-level instructions that address more than one shared variable must be transformed into a sequence of events. A memory event is specified by its direction with respect to the shared variable, its location  $loc(e_m)$ , its processor label  $proc(e_m)$ , and a unique event label  $id(e_m)$  [Alg10].

The set of memory events  $\mathbb M$  is divided into write events  $\mathbb W$  (that write values to shared-memory locations) and read events  $\mathbb R$  (that read values stored in shared-memory locations). We add a restriction that each memory event uses at most one shared location, so that the write instruction  $i = write(l_1, l_2)$ , that encodes the write from the shared location  $l_2$  to the shared location  $l_1$ , is represented as two consequent events  $e_1 = 1 \operatorname{oad}(r_1 \leftarrow l_2)$ ;  $e_2 = \operatorname{store}(l_1 \leftarrow r_1)$ . Also, it is important to separate the set of initial write events  $\mathbb{IW} \subset \mathbb{W}$  that perform initialisation of program variables.

A computation event  $e_c \in \mathbb{C} \subseteq \mathbb{E}$ , represents a low-level assembly computation operation performed solely on local-memory arguments. An example of computation event may be the event  $e_c = r_1 \leftarrow add(r_2, 1)$  that writes the sum of values stored in register  $r_2$  and constant 1 (which is modelled as a register as well) to the register  $r_1$ . For modelling branching statements, we distinguish the set  $\mathbb{C}_1 \subseteq \mathbb{C}$  of *predicative* computation events (also called as *branching events*), that are evaluated as a boolean value.

The synchronisation instructions (fences) cause the *barrier events*, that do not perform any computation or memory value transfer, instead, they add new relations to the program model that restrict the set of allowed behaviours. Functionally, a fence may be a synchronisation barrier or a instruction of flushing the local memory caches, etc.

#### 2.1.2 Relations

The relation  $\stackrel{\prime}{\to}$ :  $\mathbb{E} \times \mathbb{E}$  is a binary function over events (set of pairs of events). There are two kinds of relations between events: *basic relations* that capture semantics of the program, and *derived relations* that are defined from the basic relations and events in the weak memory model specification.

Constraints over relations that are specified by weak memory models are defined as requirements of acyclicity, irreflexivity or emptiness of specific relations [ACM16].

The basic relations are the following [Alg10]:

- The *control-flow* of a program is defined by the *program-order* relation po  $\subset \mathbb{E} \times \mathbb{E}$ , which represents the total order of events of same process. For instance, if the instruction  $i_1$  generates the event  $e_1$  and the instruction  $i_2$  follows  $i_1$  and generates the event  $e_2$ , then  $e_1 \stackrel{\text{po}}{\rightarrow} e_2$ .
- The data-flow of a program is defined by communication relations:
  - the *read-from* relation  $rf \subset W \times \mathbb{R}$  that maps each write event to the read event that reads its value;
  - the coherence order relation co ⊂ W × W that defines the total order on writes to the same location across all processes (also called the write serialisation ws-relation);
- Events from the same process are related by the *scope relation* sr ⊂ E × E. In contrast to the herd tool, the porthos2 does not use hierarchy of scopes (depicted as the scope tree); instead, it uses simple lables that indicate which process has produced certain event.

Below we enumerate some derived relations [Alg10]:

- the *from-read* relation  $fr \subset \mathbb{R} \times \mathbb{W}$  that maps a read to writes preceding the write event from which the read reads the value:  $r \stackrel{fr}{\to} w = (\exists w'. w' \stackrel{ff}{\to} r \land w' \stackrel{co}{\to} w).$
- the *data dependency* relation dp, which is a subset of po-relation that always has a read at its source (it connects the read to the write which it depends on).
- the *external* (and *internal*) *read-from* relations that restrict the rf-relation to the different (respectively, same) processes.
- the po-loc relation that is the po-relation over events that access to the same shared variable:  $m_1 \stackrel{\text{po-loc}}{\to} m_2 = (m_1 \stackrel{\text{po}}{\to} m_2 \wedge \text{loc}(m_1) = \text{loc}(m_2)).$
- the semantics of fences and barriers specific for different architectures may be defined as derived relations.

The work [Alg10, Chapter 2] provides definition of basic properties of relations, such as *reflexivity* and *irreflexivity*, *transitivity* and *transitive closure*, *acyclicity*. Thereafter, weak memory models make asserts over these properties, thus restricting the set of allowed behaviours of the system.

#### 2.1.3 Executions

The *candidate execution* is a path in the event-flow graph defined by poand rf-relations and set of final writes to a given memory location that is valid under certain memory model [AMT14]. Figure 2.1 illustrates four possible candidate executions for the litmus test Example 1.1 (the pictures are generated by the herd7 tool, version 7.47). Since there are no conditional jumps, the po-relation is defined and we do not need to guess it. Since each thread performs single write followed by a single read, the co-relation is also defined (it relates the initial write event with the write event to the same location). Thus, there are only four possible executions defined by the choice of rf-relation. The candidate executions pictured in Figures 2.1a–2.1c are consistent both under strong memory model SC and under relaxed memory models x86-TSO, Power, ARM, and some others. However, the execution shown in Figure 2.1c is still consistent under relaxed-memory architectures, but it becomes inconsistent under SC architecture as it forbids cycles over fr  $\cup$  po.



**Figure 2.1:** Possible candidate executions for the litmus test Example 1.1

# 2.2 The cat language

Weak memory models are defined via the cat language [ACM16]. This is a domain specific language for describing consistency properties of concurrent programs. The cat language combines expressive power of a functional language (it is inspired by OCaml and adopts its types, first-class functions, pattern matching and other features) with types, expressions and assertions that are specific for operating with relations and executions.

The derived relations can be defined via the keyword let and the following operations over relations [ACM16]:

- the union of two relations r1 and r2 is r1 | r2
- the intersection of two relations r1 and r2 is r1 & r2
- the difference of two relations r1 and r2 is r1\r2
- the sequence of two relations r1 and r2 is r1;r2

For instance, the fr-relation is defined as follows:  $fr = (rf^{-1}; co)$ . Figure 2.2 contains part of x86-TSO model [OSS09] that asserts acyclicity of communication relation and po-loc relation:

```
let com = rf | fr | co
let po-loc = po & loc
acyclic po-loc | com
```

**Figure 2.2:** Excerpt from the x86-TSO memory model in cat language

<sup>&</sup>lt;sup>1</sup>The sequence of two relations r1 and r2 is defined as the set of pairs (x,y) such that there exists an intervening z, such that  $(x,z) \in r1$  and  $(z,y) \in r2$ 

# Chapter 3

# Portability analysis as an SMT problem

As it has been discussed in Chapter 1, the program may behave differently when compiled for different parallel hardware architectures. This may cause the portability bugs, the behaviour that is allowed under one architecture and forbidden under another. In this Chapter, we describe the general task of analysing the concurrent software portability as a *bounded reachability* problem, which in turn can be reduced to a SAT problem [LFH+17] (more precisely, to an SMT problem).

### 3.1 Model checking and reachability analysis

The model checking is the problem of verifying the system (the model) against the set of constraints (the specification). As the state machine model is the most widespread mathematical model of computation, most classical model checking algorithms explore the state space of a system in order to find states that violate the specification. The general schema of model checking is the following: firstly, the analysing system is being represented as a transition system, a finite directed graph with labeled nodes representing states of the system such that each state corresponds to the unique subset of atomic propositions, that characterise the behavioral properties of each state. Then, the system constraints are being defined in terms of a modal temporal logic with respect to the atomic propositions. Commonly, the Linear Temporal Logic (LTL) or Computational Tree Logic (CTL), along with their extensions, are used as a specification language due to the expressiveness and verifiability of their statements. In the described schema,

the model checking problem is reducible to the reachability analysis, an iterative process of a systematic exhaustive search in the state space. This approach is called *unbounded model checking (UMC)*.

However, all model checking techniques are exposed to the *state explosion problem* as the size of the state space grows exponentially with respect to the number of state variables used by the system (its size). In case of modeling concurrent systems, this problem becomes much more considerable due to exponential number of possible interleavings of states. Therefore, the research in model checking over past 40 years was aimed at tackling the state explosion problem, mostly by optimising search space, search strategy or basic data structures of existing algorithms.

One of the first technique that optimises the search space considerably major was the symbolic model checking with binary decision diagrams (BDDs). Instead of by processing each state individually, in this approach the set of states is represented by the BDD, efficient data structure for performing operations on large boolean formulas [CKN+12]. The BDD representation can be linear of size of variables it encodes if the ordering of variables is optimal, otherwise the size of BDD is exponential. The problem of finding such an optimal ordering is known as NP-complete problem, which makes this approach inapplicable in some cases.

The other idea is to use satisfiability solvers for symbolic exploration of state space [CBR $^+$ 01]. In this approach, the state space exploration consists of sequence of queries to the SAT-solver, represented as boolean formulas that encode the constraints of the model and the finite path to a state in the corresponding transition system. Due to the SAT-solver. This technique is called *bounded model checking (BMC)*, because the search process is being repeated up to user-defined bound k, which may result to incomplete analysis in general case. However, there exist numerous techniques for making BMC complete for finite-state systems (e.g., [Sht00]).

# 3.2 Portability analysis as a bounded reachability problem

In general, a BMC problem aims to examine the reachability of the "undesirable" states of a finite-state system. Let  $\vec{x} = (x_1, x_2, ..., x_n)$  be a vector of n variables that uniquely distinguishes states of the system; let  $Init(\vec{x})$  be an initial-state predicate that defines the set of initial states of the system; let  $Trans(\vec{x}, \vec{x}')$  be a transition predicate that signifies whether there the

transition from state  $\vec{x}$  to state  $\vec{x}'$  is valid; let  $Bad(\vec{x})$  be a bad-state predicate that defines the set of undesirable states. Then, the BMC problem, stated as the reachability of the undesirable state withing k steps is formulated as following:  $SAT(Init(\vec{x_0}) \land Trans(\vec{x_0}, \vec{x_1}) \land \cdots \land Trans(\vec{x_{k-1}}, \vec{x_k}) \land Bad(\vec{x_k}))$ .

Portability analysis problem may also be stated as a reachability problem, where the undesirable state is the state reachable under the target  $\mathcal{M}_{\mathcal{T}}$  memory model and unreachable under the source memory model  $\mathcal{M}_{\mathcal{S}}$ . However, unlikely the BMC problem, the portability analysis does not require to call the SMT-solver repeatedly, since (imperative) programs may be converted as acyclic state graph (by reducing the loops, see Section ??) and the Trans predicate may be stated only for the final state of a program.

Consider the function  $cons_{\mathcal{M}}(P)$  calculates the set of executions of program P consistent under the memory model  $\mathcal{M}$ . Then, the program P is called portable from the source architecture (memory model)  $\mathcal{M}_{\mathcal{S}}$  to the target architecture  $\mathcal{M}_{\mathcal{T}}$  if all executions consistent under  $\mathcal{M}_{\mathcal{T}}$  are consistent under  $\mathcal{M}_{\mathcal{S}}$  [LFH<sup>+</sup>17]:

**Definition 3.2.1** (Portability). Let  $\mathcal{M}_{\mathcal{S}}$ ,  $\mathcal{M}_{\mathcal{T}}$  be two weak memory models. A program P is portable from  $\mathcal{M}_{\mathcal{S}}$  to  $\mathcal{M}_{\mathcal{T}}$  if  $cons_{\mathcal{M}_{\mathcal{T}}}(P) \subseteq cons_{\mathcal{M}_{\mathcal{S}}}(P)$ 

Note that the definition of portability requirements against *executions* is strong enough, as it implies the portability against *states* (the *state-portability*) [LFH<sup>+</sup>17]. The result SMT formula  $\phi$  of the portability problem should contain both encodings of control-flow  $\phi_{CF}$  and data-flow  $\phi_{DF}$  of the program, and assertions of both memory models:  $\phi = \phi_{CF} \wedge \phi_{DF} \wedge \phi_{\mathcal{M}_T} \wedge \phi_{\mathcal{M}_S}$ . If the formula is satisfiable, there exist a portability bug.

### 3.2.1 Encoding for the control-flow

The control-flow of a program is represented in the *control-flow graph*, a directed acyclic connected graph with single source and multiple sink nodes, obtained by the *loop unrolling* (see Section ??). In control-flow graph, there are two types of transitions (edges): *primary transitions* that denote unconditional jumps or if-true-transitions (pictured with solid lines), and *alternative transitions* that denote if-false-transitions (pictured with dotted lines). Each node on graph can have either one successor (primary) or two successors (both primary and alternative); only computation events can serve as a branching point). However, each merge node can have any positive number of predecessors, where each edge may be either primary or alternative.

While working on the porthos2, we applied some modifications of the encoding scheme for the control-flow. The changes are conditioned by the need to be able to process an arbitrary control-flow produced by conditional and unconditional jumps of C language. For that, we compile the recursive abstract syntax tree (AST) of the parsed C-code to the plain (non-recursive) event-flow graph. We show that the new encoding is smaller than the old one used in porthos since it does not produces new variables for each high-level statement of the input language. For instance, porthos uses the encoding scheme where the control-flow of the sequential instruction  $i_1 = i_2; i_3$  was encoded as  $\phi_{CF}(i_2; i_3) = (cf_{i_1} \Leftrightarrow (cf_{i_2} \land cf_{i_3})) \land \phi_{CF}(i_2) \land \phi_{CF}(i_3)$ , and control-flow of the branching instruction  $i_1 = (c?i_2:i_3)$  was encoded as  $\phi_{CF}(c?i_2:i_3)=(cf_{i_1}\Leftrightarrow (cf_{i_2}\vee cf_{i_3}))\wedge\phi_{CF}(i_2)\wedge\phi_{CF}(i_3)$  (here we used the notation of C-like ternary operator x?y:z for defining the conditional expression if xthenyelsez). In contrast, the new scheme implemented in porthos2 firstly compiles the recursive high-level code into the linear low-level event-based representation, that is then encoded into an SMTformula. The encoding of branching nodes depends on the guards, the value of conditional variable on the branching state, which in turn is encoded as data-flow constraint (see Section 3.2.2).

Let  $\mathbf{x}: \mathbb{E} \to \{0,1\}$  be the predicate that signifies the fact that the event has been executed (and, consequently, has changed the state of the system). Let  $\mathbf{v}: \mathbb{C} \to \mathbb{R}$  be the function that returns the value of the computation event (evaluates it) that will be computed once the event is executed (strictly speaking, it returns the *set* of values determined by the  $\stackrel{\mathrm{rf}}{\to}$ -relation; see Chapter **?TODO?** for the relations encoding). We distinguish the function  $\mathbf{v}_p: \mathbb{C}_1 \to \{0,1\}$  that evaluates the predicative computation event. In the result formula, all symbols  $\mathbf{x}(e_i)$  and  $\mathbf{v}(e_i)$  are encoded as boolean variables.

Consider the following possible mutual arrangement of nodes in a control-flow graph:



Figure 3.1: Linear and non-linear cases of control-flow graph

For listed cases, below we propose the encoding scheme that uniquely encodes each node of graph and allows to encode partially executed program. Equation 3.1 encodes the sequential control-flow represented in Figure 3.1a and reflects the fact that the event  $e_2$  can be executed iff the event  $e_1$  has been executed. Equation 3.2 encodes the branching control-flow depicted in Figure 3.1b by allowing only following executions:  $\{\emptyset, (e_1), (e_1 \rightarrow e_2), (e_1 \rightarrow e_3)\}$ . In encoding 3.3 of the merge-point represented in Figure 3.1c, the event  $e_k$  is executed if either of its predecessors was executed, regardless of type of the transition.

$$\phi_{CF_{seq}} = \mathbf{x}(e_2) \to \mathbf{x}(e_1)$$

$$\phi_{CF_{br}} = [\mathbf{x}(e_2) \to \mathbf{x}(e_1)] \wedge [\mathbf{x}(e_3) \to \mathbf{x}(e_1)] \wedge$$

$$[\mathbf{x}(e_3) \to \mathbf{x}(e_3)] \wedge [\mathbf{x}(e_3) \to \mathbf{x}(e_3)] \wedge$$
(3.1)

$$[\mathbf{x}(e_2) \to \mathbf{v}(e_1)] \wedge [\mathbf{x}(e_3) \to \neg \mathbf{v}(e_1)] \wedge \\ \neg [\mathbf{x}(e_2) \wedge \mathbf{x}(e_3)]$$
(3.2)

$$\phi_{CF_{mer}} = \mathbf{x}(e_k) \to (\bigvee_{e_p \in \text{pred}(e_k)} \mathbf{x}(e_p))$$
(3.3)

For sake of encoding correctness, we require all branches to have at least one event. Thus, for branching statements that do not have any events in one of the branches (such a branch represents a conditional jump forward), we add the synthetic nop-event as it is shown in Figure 3.2:



**Figure 3.2:** *Transformation of the empty-branch nonlinear control-flow* 

#### 3.2.2 Encoding for the data-flow

To encode the data-flow constraints, we use the *static single-assignment* (SSA) form in order to be able to capture an arbitrary data-flow into a single SMT-formula. The SSA form requires each variable to be assigned only once within entire program. In contrast, porthos used the dynamic

single-assignment (DSA) form, that requires indices to be unique within a branch. Although the number of variable references (each of which is encoded as unique SMT-variable) on average is logarithmically less in case of the DSA form than the SSA form, the result SMT-formula still needs to be complemented by same number of equality assertions when encoding the data-flow in merge points [LFH+17].

Following [LFH+17], the indexed references of variables are computed in accordance with the following rules: (1) any access to a shared variable (both read and write) increments its SSA-index; (2) only writes to a local variable increment its SSA-index (reads preserve indices); (3) no access to a constant variable or computed (evaluated) expression changes their SSA-index. These rules determine the following encoding of load, store and computation events within single thread:

$$\phi_{DF_{e=1\text{oad}(r\leftarrow l)}} = \mathbf{x}(e) \to (r_{i+1} = l_{i+1})$$
(3.4)

$$\phi_{DF_{e=\text{store}(l\leftarrow r)}} = \mathbf{x}(e) \to (l_{i+1} = r_i)$$
(3.5)

$$\phi_{DF_{e=\text{eval}(\dots)}} = \mathbf{x}(e) \to \mathbf{v}(e)$$
 (3.6)

To convert the program into SSA form, for each event each variable that is declared so far (either local or shared) is mapped to its indexed reference; this information is stored in the SSA-map "event *to* variable *to* SSA-index". The SSA-map is computed iteratively while traversing the event-flow graph in topological order as it is described in Algorithm 1.

#### **Algorithm 1** Algorithm for computing the SSA-indices

**Input:** The event-flow graph  $G = \langle N, E \rangle$  where V is the set of nodes (events), E is the set of control-flow transitions,  $e_0$  is the entry node

**Output:** The SSA-map of the form "{ event : { variable : index }}"

```
1: function Compute-SSA-MAP(G)
        S \leftarrow \text{empty map}; S[e_0] \leftarrow \text{empty map}
2:
        for each event e_i \in G.N in topological order do
3:
            for each predecessor e_i \in pred(e_i) do
4:
                 S[e_i] \leftarrow \operatorname{copy}(S[e_i])
5:
                 for each variable v_k \in \text{set of variables accessed by } e_i \text{ do}
6:
                     S[e_i][v_k] \leftarrow \max(S[e_i][v_k], S[e_i][v_k])
7:
                     if need to update the index of v_k then
                                                                               > cases (1)-(2)
8:
                          S[e_i][v_k] \leftarrow S[e_i][v_k] + 1
9:
```

The time of described algorithm is linear of the size of event-flow graph since it performs only single traverse of the graph.

As it has been described before, the rf-relation links data-flow between events of data-flow stored in equivalence assertions over the SSA-variables. The encoding of this linkage left untouched as it is implemented in porthos: for each pair of events  $e_1$  and  $e_2$  linked by the rf-relation, we add the following constraint:

$$\phi_{DF_{mem}}(e_1, e_2) = \text{rf}(e_1, e_2) \to (l_i = l_j)$$
 (3.7)

where the variable of location l is mapped to the SSA-variable  $l_i$  for event  $e_1$ , and to the SSA-variable  $l_j$  for event  $e_2$ ; and the predicate  $rf(e_1, e_2)$  is encoded as a boolean variable, which itself equals true if  $e_2$  reads the shared variable that was written in  $e_1$ .

#### 3.2.3 Encoding for the memory model

todo

# Chapter 4

# The porthos2: implementation

Current Chapter describes the architecture of the porthos2 framework.

The programming language choice for porthos2 was made in favour of java in order to be able to reuse some parts of its predecessor porthos written in java. However this language does not show best results in performance benchmarks (comparing to C++, for example) **TODO**, the performance cornerstone of porthos2 (as well as any other SMT-based code analyser) is the phase of solving the SMT-formula, which is left to the third-party SMT-solver called from porthos2 via java API. However, considering the perspective of using porthos2 as a static analyser for real-world programs, the memory optimisation problem must also be taken into account during both encoding and solving stages.

## 4.1 Requirements

The main reason for starting the work on re-implementing the porthos tool was the need for extension the input language and optimisation of the tool so that it is able to process real C programs in perspective. In the existing porthos architecture, the high-level recursive instructions (statements of C) are processed together with low-level non-recursive events abstractions (see classes of package 'dartagnan.program' of porthos) as single AST structure. This AST was implemented as a mutable data structure, which is being modified during the stage of SMT-encoding. We consider this architecture as hardly manageable (since it is hard to guarantee preserving of the invariants during the program execution) and poorly extensible (since adding support for a new high-level control-flow instruction requires changing multiple components of the program, from parser to encoder).

Therefore, while designing the architecture of porthos2, we decided to clearly separate the high-level intermediate code representation (implemented as a recursive AST structure) from low-level event-based representation (implemented as an event-flow graph). Such a modular architecture allows to use multiple input language parsers and convert parsed syntax trees to our AST, thus having support for multiple languages (for instance, the original input language of porthos tool, two variants of syntax of a litmus test used by herd, an assembly language for any supported architecture).

The following list enumerates the target *requirements* for the new tool:

- 1. stability and transparency
  - following the principles of simplicity and readability;
  - usage of software design patterns if necessary;
  - usage of immutable data structures for all data transfer objects (DTO);
  - high code coverage by unit and functional tests;
- 2. efficiency
  - keeping the trade-off between execution time and memory usage;
- 3. extensibility
  - clear modular architecture

# 4.2 Program Components

The general architecture of porthos2 is presented in Figure 4.1. The program receives as input the program to be analysed and one (the reachability analysis mode) or two (the portability analysis mode) memory models (see Section 4.2.1). Then, the parsed program input (called Y-tree<sup>1</sup>) is being preprocessed in order to collect information necessary for the compilation (see Section 4.4.1), and compiled into the X-graph representation (see Section 4.4.2). If the original program has a loop, the X-graph will have cycles, which need to be unrolled before being encoded into an SMT-formula.

<sup>&</sup>lt;sup>1</sup>In order to avoid confusing different internal representations, we prefix the names of elements of each internal representation with a letter. For instance, we picked the letter 'Y' to denote the AST code representation as drawing of this letter resembles the tree branching; with letter 'X' we prefix elements of the event-flow graph as the events are to be executed; and with letter 'W' we prefix elements of the weak memory model AST.

The unrolling and some other transformations are made by the X-graph transformer (see Section ??).

#### 4.2.1 Parsing the input

Both porthos and porthos2 use the parser generator ANTLR [Par13], a powerful language processing tool. The full ANTLR grammar of input language used by previous version of porthos is available at Appendix A.

- Figure 4.1 represents the informal grammar of the input language processed by porthos v1 ... short characteristics of the old grammar, enumerate its features (wrt the Figure)
  - DRAWBACKS:
- the semantics of memory operations and method invocations is encoded directly into the grammar. Taking into account the complexity of developing the accurate grammar for a Turing-complete language,
- in old implementation, the logics of parser was written directly in the grammar. Although being fast-to-impelement (1 internal representation less), this is approach has numerous drawbacks s.a.: \* hard to read grammar (mixing 2 languages in a single file) \* hard to maintain grammar (no debugging with standard utils, no code analysis in the grammar file) \* non-extensible (there is no option to just plug-in existing grammar for any language and write a converter from this language's syntax tree to our AST)
  - syntactically only the integers were supported. we:arrays???pointers??enums?structs?
- no declarations (all shared variables are declared in the init section, local variables are not decalred ever)
- no arbitrary function calls (need support for the knowledge base after implementing the typisation. cannot be supproted only syntactically)
  - Minor drawbacks:
  - restricted sytnax for expressions:
  - no operator associativity allowed ('1 + 2 \* 3' was invalid syntax)
- no unary increment/decrement (technically it's hard to implement post-incerment/decrement without interpretation: we don't know when does this expression ends.) Example: 'int\* x; if (x++>0) '
- incorrectly implemented statement in old code: sequence of stmts is separated by ';', but the syntax of C requires ';' as the operator that ends each operation (how to say this? see C standard)
- litmus-initialisation stmt did not allow the non-default values. The new syntax allow any type of declaration statement with initialisation
  - this means the full compiler engine that resolves the semantics



Figure 4.1: Main components of porthos2

```
< : <init> <thrd>* <assert>
<thrd> : thread <tid> <inst>
<inst> : <atom>
      | <inst> ; <inst>
      | while <pred> <inst>
      | if <pred> { <inst> } <inst>
<atom> : <reg> <-- <expr>
      | <reg> <-- <loc>
      | <loc> := <reg>
      | 'mfence'
      | 'sync'
      | 'lwsync'
      | 'isync'
<pred> :
      | true
       | <expr> (and | or) <expr>
      | <expr> ('==' | '!=' | '>' | '>=' | '<' | '>=') <expr>
;
<expr> : [0-9]
       | <reg>
       | <expr> ('*' | '+' | '-' | '/' | '%') <expr>
```

**Figure 4.1:** Syntax of an input language of porthos version 1

- NOW: we're using the C grammar from the standard
- for now, we just ignore C macro directives, in future it's planned to support partially the preprocessing before parsing

The porthos2 uses the C language grammar of proposed in the C11 standard [ISO11], that was extended by litmus test-specific syntax such as initialisation and final-state assertion statements (the original ANTLR grammar can be found in the official ANTLR repository on GitHub <sup>2</sup>). Current version of porthos2 can operate only in the inter-procedural mode, assuming that each function defined in the input file is being executed in a separate thread. However, the redesigned architecture of porthos2 allow to easily support intra-procedural analysis by inlining function calls.

- if we don't support, our parser still parses it, and the error is thrown at the moment of converting syntax tree to the AST (Y-tree).
- So, The language-dependent syntax tree is converted to the AST by the stateless Visitor (e.g., for C11->Ytree conversion is made by 'C2YtreeConverterVisitor') + short structure of this visitor (how?.. need ly?)

#### 4.3 The Y-tree: an AST

- picture of the Y-hierarchy. everything inherits interface YEntity
  - immutable
  - AST is untyped (YVariableRef).
- short characteristics with citations from the code (this AST contains very basic language elements according to the C execution model (statements and expressions))
- minor changes are performed by converting to ytree representation: desugaring the target code, etc. (what else?)

### 4.4 Compiling the Y-tree to the X-graph

- hierarchy of Compilers (XCompiler is an stateful abstract machine)
  - interface that it provides
  - dependencies on other modules (memory-manager, etc.)

<sup>&</sup>lt;sup>2</sup>The repository containing the collection of ANTLR v4 grammars: https://github.com/antlr/grammars-v4

#### 4.4.1 Pre-compilation

- collect goto labels (not done yet.)
- determine kind of variables (cannot be done during parsing? don't say this)
  - basic typisation

#### 4.4.2 Compilation

- more low-level code representation (or high-level assembly); abstract assembly language. refer to the
  - X-hierarchie

#### 4.4.3 Post-compilation transformations

- After we acquired the event-based representation, we can perform some modifications/simplifications/optimisations on it (separately, allowing user to manage them)
- converting to SSA form (now: during the encoding. should be: during the post-compilation) (as one of necessary steps before encoding)
  - setting up backward edges
  - more?
- unrolling: why we cannot encode cyclic structures. reference to the paper (see arXiv version)

The original program encoded into the XGraph represents a *flow graph*, a connected cyclic directed graph with single source node (ENTRY) (usually for convenience all leaves are connected to the sink node (EXIT)). The cycles are caused by low-level jump instructions, obtained from non-linear high-level control-flow statements (such as while, do-while, for, etc.). However, the cyclic flow graph cannot be encoded into SMT formula since ... //TODO:REFERENCE.

#### 4.4.4 Input language parser



**Figure 4.2:** Example of the flow graph from Figure **??**, unwinded up to the bound k = 6

# 4.5 XGraph to ZFormula (SMT) encoder

- Then, this modified event-representation is being encoded to SMT formula and sent to the solver.

# 4.6 Optimisations

... performed on each stage

# **Chapter 5**

# **Evaluation**

- 5.1 Comparison with PORTHOS
- 5.1.1 Unique Features
- 5.1.2 Performance
- 5.2 Comparison with HERD
- 5.2.1 Unique Features
- 5.2.2 Performance

Chapter 6

**Summary** 

# **Bibliography**

- [LFH<sup>+</sup>17] H Ponce de León, Florian Furbach, Keijo Heljanko, and Roland Meyer. "Portability Analysis for Axiomatic Memory Models. PORTHOS: One Tool for all Models". In: *CoRR* abs/1702.06704 (2017). arXiv: 1702.06704. URL: http://arxiv.org/abs/1702.06704.
- [McK17] Paul E McKenney. Is parallel programming hard, and, if so, what can you do about it?(v2017. 01.02 a). 2017.
- [MAM<sup>+</sup>17] Paul E. McKenney, Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern. *A formal kernel memory-ordering model (part 1)*. 2017. URL: https://lwn.net/Articles/718628/.
- [ACM16] Jade Alglave, Patrick Cousot, and Luc Maranget. "Syntax and semantics of the weak consistency model specification language cat". In: *arXiv* preprint arXiv:1608.07531 (2016).
- [LSM<sup>+</sup>16] Daniel Lustig, Geet Sethi, Margaret Martonosi, and Abhishek Bhattacharjee. "Coatcheck: Verifying memory ordering at the hardware-OS interface". In: *ACM SIGOPS Operating Systems Review* 50.2 (2016), pp. 233–247.
- [AMT14] Jade Alglave, Luc Maranget, and Michael Tautschnig. "Herding cats: Modelling, simulation, testing, and data mining for weak memory". In: *ACM Transactions on Programming Languages and Systems (TOPLAS)* 36.2 (2014), p. 7.
- [LPM14] Daniel Lustig, Michael Pellauer, and Margaret Martonosi. "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models". In: *Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture*. IEEE Computer Society. 2014, pp. 635–646.

- [AKN<sup>+</sup>13] Jade Alglave, Daniel Kroening, Vincent Nimal, and Michael Tautschnig. "Software verification for weak memory via program transformation". In: *European Symposium on Programming*. Springer. 2013, pp. 512–532.
- [Par13] Terence Parr. *The definitive ANTLR 4 reference*. Pragmatic Bookshelf, 2013.
- [CKN<sup>+</sup>12] Edmund M Clarke, William Klieber, Miloš Nováček, and Paolo Zuliani. "Model checking and the state explosion problem". In: *Tools for Practical Software Verification*. Springer, 2012, pp. 1–30.
- [ISO12] ISO ISO. "IEC 14882: 2011 Information technology—Programming languages—C++". In: *International Organization for Standardization, Geneva, Switzerland* 27 (2012), p. 59.
- [BOS<sup>+</sup>11] Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. "Mathematizing C++ concurrency". In: *ACM SIGPLAN Notices* 46.1 (2011), pp. 55–66.
- [ISO11] ISO/IEC. "SC22/WG14. ISO/IEC 9899: 2011". In: Information technology Programming languages –C. http://www.iso.org/iso/iso\_catalogue/catalogue\_tc/catalogue\_detail. htm (2011).
- [SSA<sup>+</sup>11] Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. "Understanding POWER multiprocessors". In: *ACM SIGPLAN Notices* 46.6 (2011), pp. 175–186.
- [Alg10] Jade Alglave. "A shared memory poetics". In: La Thèse de doctorat, L'université Paris Denis Diderot (2010).
- [AFI+09] Jade Alglave et al. "The semantics of Power and ARM multi-processor machine code". In: *Proceedings of the 4th workshop on Declarative aspects of multicore programming*. ACM. 2009, pp. 13–24.
- [MZ09] Sharad Malik and Lintao Zhang. "Boolean satisfiability from theoretical hardness to practical success". In: *Communications of the ACM* 52.8 (2009), pp. 76–82.
- [OSS09] Scott Owens, Susmit Sarkar, and Peter Sewell. "A better x86 memory model: x86-TSO". In: *International Conference on Theorem Proving in Higher Order Logics*. Springer. 2009, pp. 391–407.

- [MPA05] Jeremy Manson, William Pugh, and Sarita V Adve. *The Java memory model*. Vol. 40. 1. ACM, 2005.
- [CBR<sup>+</sup>01] Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. "Bounded model checking using satisfiability solving". In: Formal methods in system design 19.1 (2001), pp. 7–34.
- [Sht00] Ofer Shtrichman. "Tuning SAT checkers for bounded model checking". In: *International Conference on Computer Aided Verification*. Springer. 2000, pp. 480–494.
- [Lam79] Leslie Lamport. "How to make a multiprocessor computer that correctly executes multiprocess program". In: *IEEE transactions on computers* 9 (1979), pp. 690–691.
- [Lam78] Leslie Lamport. "Time, clocks, and the ordering of events in a distributed system". In: *Communications of the ACM* 21.7 (1978), pp. 558–565.

# Appendices

#### Appendix A

### The ANTLR grammar of the input language of porthos v1

```
grammar Porthos;
                                              write
                                               : location '.' 'store' '(' ATOMIC ','
                                                  register ')'
main
 : program
                                              instruction
bool_expression
                                               : atom
 : bool_atom
| bool_atom BOOL_OP bool_atom
                                                1
                                                   sequence
                                                   while_
                                               | if
bool_atom
 : TRUE
                                              atom
 | FALSE
                                                   local
                                               :
 '(' arith_expr COMP_OP arith_expr ')'
                                                | load
                                               | store
 '(' bool_expression ')'
                                                   FENCE
                                                  read
arith_expr
                                               | write
 : arith_atom ARITH_OP arith_atom
    arith_atom
                                              sequence
                                               : atom ';' instruction
                                                | while_ ';' instruction
arith_atom
 : DIGIT
                                               | if ';' instruction
    register
 | '(' arith_expr ')'
                                               : 'if' bool_expression 'then' '{' instruction '}'
register
                                                  ('else' '{' instruction '}')?
 : WORD
                                              while_
location
                                                   'while' bool_expression '{' instruction '}'
                                               :
 : WORD
                                              program
                                               : '{' location (',' location)* '}'
local
 : register '<-' arith_expr
                                                   ('thread t' DIGIT '{' instruction '}'
                                                     ('exists' (location '=' DIGIT ','
                                                      | DIGIT ':' register '=' DIGIT ',')
load
                                                      )*
 : register '<:-' location
                                              // Lexer rules:
store
: location ':=' register
                                              : '_na' | '_sc' | '_rx' | '_acq' | '_rel' | '_con'
 : register '=' location '.' 'load' '('
    ATOMIC ')'
                                              FENCE
                                               : 'mfence' | 'sync' | 'lwsync' | 'isync'
```

```
COMP_OP : '==' | '!=' | '<=' | '<' | '>=' | '>';

ARITH_OP : '+' | '-' | '*' | '/' | '%';

BOOL_OP : 'and' | 'or';

DIGIT : [0-9];

LETTER : 'a'..'z' | 'A'..'Z';

TRUE : 'true' | 'True';

FALSE : 'false' | 'False';

WORD : (LETTER | DIGIT)+;
```