# A Theoretical Framework for Symbolic Quick Error Detection

Florian Lonsing (b), Subhasish Mitra, and Clark Barrett (b)
Computer Science Department, Stanford University, Stanford, CA 94305, USA
E-mail: {lonsing, subh, barrett}@stanford.edu

Abstract-Symbolic quick error detection (SQED) is a formal pre-silicon verification technique targeted at processor designs. It leverages bounded model checking (BMC) to check a design for counterexamples to a self-consistency property: given the instruction set architecture (ISA) of the design, executing an instruction sequence twice on the same inputs must always produce the same outputs. Self-consistency is a universal, implementation-independent property. Consequently, in contrast to traditional verification approaches that use implementationspecific assertions (often generated manually), SQED does not require a full formal design specification or manually-written properties. Case studies have shown that SQED is effective for commercial designs and that SQED substantially improves design productivity. However, until now there has been no formal characterization of its bug-finding capabilities. We aim to close this gap by laying a formal foundation for SQED. We use a transition-system processor model and define the notion of a bug using an abstract specification relation. We prove the soundness of SQED, i.e., that any bug reported by SQED is in fact a real bug in the processor. Importantly, this result holds regardless of what the actual specification relation is. We next describe conditions under which SQED is complete, that is, what kinds of bugs it is guaranteed to find. We show that for a large class of bugs, SQED can always find a trace exhibiting the bug. Ultimately, we prove full completeness of a variant of SQED that uses specialized state reset instructions. Our results enable a rigorous understanding of SQED and its bug-finding capabilities and give insights on how to optimize implementations of SQED in practice.

### I. INTRODUCTION

Pre-silicon verification of HW designs given as models in a HW description language (e.g., Verilog) is a critical step in HW design. Due to the steadily increasing complexity of designs, it is crucial to detect logic design bugs before fabrication to avoid more difficult and costly debugging in post-silicon validation.

Formal techniques such as bounded model checking (BMC) [1] have an advantage over traditional pre-silicon verification techniques such as simulation in that they are exhaustive up to the BMC bound. Hence, formal techniques provide valuable guarantees about the correctness of a design under verification (DUV) with respect to the checked properties. However, in traditional assertion-based formal verification techniques, these properties are implementation-specific and must be written manually based on expert knowledge about the DUV. Moreover, it is a well-known, long-standing challenge that sets of manually-written, implementation-specific properties might be insufficient to detect all bugs present in a DUV [2]–[6].

This work was supported by the Defense Advanced Research Projects Agency, grant FA8650-18-2-7854.

Symbolic quick error detection (SQED) [7]–[10] is a formal pre-silicon verification technique targeted at processor designs. In sharp contrast to traditional formal approaches, SQED does not require manually-written properties or a formal specification of the DUV. Instead, it checks whether a self-consistency [11] property holds in the DUV. The self-consistency property employed by SQED is universal and implementation-independent. Each instruction in the instruction set architecture (ISA) of the DUV is interpreted as a function in a mathematical sense. The self-consistency check then amounts to checking whether the outputs produced by executing a particular instruction sequence match if the sequence is executed twice, assuming the inputs to the two sequences also match.

SQED leverages BMC to exhaustively explore all possible instruction sequences up to a certain length starting from a set of initial states. Several case studies have demonstrated that SQED is highly effective at producing short bug traces by finding counterexamples to self-consistency in a variety of processor designs, including industrial designs [9]. Moreover, SOED substantially increases verification productivity.

However, until now there has been no rigorous theoretical understanding of (A) whether counterexamples to self-consistency found by SQED always correspond to actual bugs in the DUV—the *soundness* of SQED—and (B) whether for each bug in the DUV there exists a counterexample to self-consistency that SQED can find—the *completeness* of SQED. This paper makes significant progress towards closing this gap.

We model a processor as a transition system. This model abstracts away implementation-level details, yet is sufficiently precise to formalize the workings of SQED. To prove soundness and (conditional) completeness of SQED, we need to establish a correspondence between counterexamples to self-consistency and bugs in a DUV. In our formal model we achieve this correspondence by first defining the correctness of instruction executions by means of a general, abstract specification. A bug is then a violation of this specification. The abstract specification expresses the following general and natural property we expect to hold for actual DUVs: an instruction writes a correct output value into a destination location and does not modify any other locations.

As **our main results**, we prove soundness and conditional completeness of SQED. For soundness, we prove that if SQED reports a counterexample to the universal self-consistency property, then the processor has a bug. This result shows that SQED does not produce spurious counterexamples. Importantly, this





result holds regardless of the actual specification, confirming that SQED does not depend on such implementation-specific details. For completeness, we prove that if the processor has a bug then, under modest assumptions, there exists a counterexample to self-consistency that can be found by SQED. We also show that SQED can be made fully (unconditionally) complete with additional HW support in the form of specialized state reset instructions. Our results enable a rigorous understanding of SQED and its bug-finding capabilities in actual DUVs and provide insight on how to optimize implementations of SQED.

In the following, we first present an overview of SQED from a theoretical perspective (Section II). Then we define our transition system model of processors (Section III) and formalize the correctness of instruction executions in terms of an abstract specification relation (Section IV). After establishing a correspondence between the abstract specification and the self-consistency property employed by SQED (Section V), we prove soundness and (conditional) completeness of SQED (Section VI). We conclude with a discussion of related work and future research directions (Sections VII and VIII).

# II. OVERVIEW OF SQED

We first informally introduce the basic concepts and terminology related to SQED. Fig. 1a shows an overview of the high-level workflow. Given a processor design  $\mathcal{P}$ , i.e., the DUV, SQED is based on symbolic execution of instruction sequences using BMC. We assume that an *instruction* i = (op, l, (l', l'')) consists of an opcode op, an output location l, and a pair (l', l'') of input locations. Locations are an abstraction used to represent registers and memory locations.

The self-consistency check is based on executing two instructions that should always produce the same result. The two instructions are called an *original* and a *duplicate instruction*, respectively. The duplicate instruction has the *same opcode* as the original one, i.e., it implements the same functionality, but it operates on different input and output locations. The locations on which the duplicate instruction operates are determined by an *arbitrary but fixed bijective function*  $L_D: \mathcal{L}_O \to \mathcal{L}_D$  between two subsets  $\mathcal{L}_O$ , the *original locations*, and  $\mathcal{L}_D$ , the *duplicate locations*, that form a partition of the set  $\mathcal{L}$  of all locations in  $\mathcal{P}$ . An original instruction can only use locations in  $\mathcal{L}_O$ . An *instruction duplication function* Dup then maps any original instruction  $i_O$  to its duplicate  $i_D$  by copying the opcode and then applying  $L_D$  to its locations.

**Example 1.** Let  $\mathcal{L} = \{0, \dots, 31\}$  be the identifiers of 32 registers of a processor  $\mathcal{P}$ , and consider the partition  $\mathcal{L}_O = \{0, 1, \dots, 15\}$  and  $\mathcal{L}_D = \{16, 17, \dots, 31\}$ . Let  $i_O = (\mathsf{ADD}, l_{12}, (l_4, l_8))$  be an original register-type ADD instruction operating on registers 4, 8, and 12. Using  $L_D(k) = k + 16$ , we obtain  $Dup(i_O) = i_D = (\mathsf{ADD}, l_{28}, (l_{20}, l_{24}))$ .

Consider a different partition  $\mathcal{L}'_O = \{0, 2, 4, \dots, 30\}$  and  $\mathcal{L}'_D = \{1, 3, 5, \dots, 31\}$  and function  $L'_D(k) = k + 1$ . For this function,  $Dup(i_O) = (\mathsf{ADD}, l_{13}, (l_5, l_9))$ .

Self-consistency checking is implemented using *QED tests*. A QED test is an instruction sequence  $i = i_O :: i_D$  consisting of a sequence  $i_O$  of n original instructions followed by a corresponding sequence  $i_D = Dup(i_O)$  of n duplicate instructions (where operator "::" denotes concatenation). A QED test i is symbolically executed from a QED-consistent state, that is, a state where the value stored in each original location l is the same as the value stored in its corresponding duplicate location  $\mathcal{L}_D(l)$ . The resulting final state after executing *i* should then also be QED-consistent. Fig. 1a illustrates the workflow. A QED test i succeeds if the final state that results from executing i is QED-consistent; otherwise it fails. Starting the execution in a QED-consistent state guarantees that original and duplicate instructions receive the same input values. Thus, if the final state is not QED-consistent, then this indicates that some pair of original and duplicate instructions behaved differently.

**Example 2.** Consider Fig. 1b and the QED test  $i = i_O :: i_D$  consisting of one original instruction  $i_O$  and its duplicate  $Dup(i_O) = i_D$  for some function  $L_D$ . Suppose that i is executed in a QED-consistent state  $s_0$  (denoted by QEDcons $(s_0)$  and  $s_0(\mathcal{L}_O) = s_0(\mathcal{L}_D)$ ) and both  $i_O$  and  $i_D$  execute correctly. Instruction  $i_O$  produces state  $s_1$ , where the values at duplicate locations remain unchanged, i.e.,  $s_0(\mathcal{L}_D) = s_1(\mathcal{L}_D)$ , because  $i_O$  operates on original locations only. When instruction  $i_D$  is executed in state  $s_1$ , it modifies only duplicate locations. The final state  $s_2$  is QED-consistent (denoted by QEDcons $(s_2)$  and  $s_2(\mathcal{L}_D) = s_2(\mathcal{L}_D)$ ), and thus QED test i succeeds.

**Example 3** (Bug Detection). Consider processor  $\mathcal{P}$  and  $\mathcal{L}_O$ and  $\mathcal{L}_D$  from Example 1. Let  $i_{O,1} = (ADD, l_{12}, (l_4, l_{15}))$ and  $i_{O,2} = (MUL, l_{15}, (l_{12}, l_{12}))$  be original register-type addition and multiplication instructions. Using  $L_D(k) = k+16$ , we obtain  $Dup(i_{O,1}) = i_{D,1} = (ADD, l_{28}, (l_{20}, l_{31}))$  and  $Dup(i_{O,2}) = i_{D,2} = (MUL, l_{31}, (l_{28}, l_{28}))$ . Assume that Phas a bug that is triggered when two MUL instructions are executed in subsequent clock cycles, resulting in the corruption of the output location of the second MUL instruction.<sup>2</sup> Note that executing the QED test  $i = i_{O,1}, i_{O,2} :: i_{D,1}, i_{D,2}$  in a QED-consistent initial state produces a QED-consistent final state: the bug is not triggered by i because  $i_{D,1}$ is executed between  $i_{O,2}$  and  $i_{D,2}$ . A slightly longer test  $i = i_{O,2}, i_{O,1}, i_{O,2} :: i_{D,2}, i_{D,1}, i_{D,2}$  does trigger the bug, however, because the subsequence  $i_{O,2}, i_{D,2}$  of two back-toback MULs causes the first duplicate instruction  $i_{D,2}$  in i to produce an incorrect result at  $l_{31}$ . This incorrect result then propagates through the next two instructions, resulting in a QED-inconsistent final state since the values at  $l_{15}$  and  $l_{31}$ , i.e., the output locations of  $i_{O,2}$  and  $i_{D,2}$ , differ.

QED-consistency is the universal, implementation-independent property that is checked in SQED. In practice, the property must refer to some basic information about the design such as, e.g., symbolic register names, but this can be generated automatically from a high-level ISA description [10]. BMC

<sup>&</sup>lt;sup>1</sup>This model is used for simplicity, but it could easily be extended to allow instructions with additional inputs or outputs.

<sup>&</sup>lt;sup>2</sup>This scenario corresponds to a real bug in an out-of-order RISC-V design detected by SQED: https://github.com/ridecore/ridecore/issues/4.



Fig. 1. SQED workflow from a theoretical perspective (a) and illustration of executing the QED test  $i = i_O :: i_D$  in Example 2 (b).

is used to symbolically and exhaustively generate all possible QED tests up to a certain length 2n (the BMC bound). BMC ensures that SQED will find the shortest possible failing QED test first. The high-level workflow shown in Fig. 1a allows for flexibility in choosing the partition and mapping between original and duplicate locations. We rely on this flexibility for the results in this paper (Theorems 1 and 2). Current SQED implementations use a predefined partition and mapping, based on which BMC enumerates all possible QED tests. Extending implementations to have the BMC tool also choose a partition and mapping could be explored in future work.

We refer to related work [7], [9], [12] for case studies that demonstrate the effectiveness of BMC-based SQED on a variety of processor designs. The scalability of SQED in practice is determined by the scalability of the BMC tool being used. Thus, approaches for improving scalability of BMC can also be applied to SQED, e.g. abstraction, decomposition, and partial instantiation techniques [7].

# III. INSTRUCTION AND PROCESSOR MODEL

We model a processor as a transition system containing an abstract set of locations. The set of locations includes registers and memory locations. A state of a processor consists of an *architectural* and a *non-architectural* part. In a state transition that results from executing an instruction, the architectural part of a state is modified explicitly by updating the value at the output location of the executed instruction. The architectural part of a state is also called the *software-visible* state of the processor. It comprises those parts of the state that can be updated by executing instructions of the user-level ISA of the processor, such as memory locations and general-purpose registers. The non-architectural part of a state comprises the remaining parts that are updated only implicitly by executing an instruction, such as pipeline or status registers.

Instructions are functions that take inputs from locations and write an output to a location. We assume that every instruction produces its result in one transition. In our model, we abstract away implementation details of complex processor designs

(e.g., pipelined, out-of-order, multi-processor systems). This is for ease of presentation and reasoning. However, many of these complexities can be viewed as refinements of our abstraction, meaning that our formal results still hold on complex models (i.e., our results can be lowered to more detailed models such as those described in [7], [8]). Working out the details of such refinements is one important avenue for future work.

**Definition 1** (Transition System). A processor is a transition system [13], [14]  $\mathcal{P} = (\mathcal{V}, \mathcal{L}, S_{\overline{a}}, s_{\overline{a},I}, Op, I, T)$ , where

- V is a set of abstract data values,
- $\mathcal{L}$  is a set of memory locations (from which we define the set  $S_a$  of architectural states as the set of total functions from locations to values, i.e.  $S_a = \{s_a \mid s_a : \mathcal{L} \to \mathcal{V}\}\)$ ,
- $S_{\overline{a}}$  is a set of non-architectural states (from which we further define the set of all states as  $S = S_a \times S_{\overline{a}}$ ),
- $s_{\overline{a},I} \in S_{\overline{a}}$  is a unique initial non-architectural state (from which we define the set of initial states as  $S_I = S_a \times \{s_{\overline{a},I}\}$ ,
- Op is a set of operation codes (opcodes),
- $I = Op \times \mathcal{L} \times \mathcal{L}^2$  is the set of instructions, and
- $T: S \times I \rightarrow S$  is the transition function, which is total.

A state  $s \in S$  with  $s = (s_a, s_{\overline{a}})$  consists of an architectural part  $s_a \in S_a$  and a non-architectural part  $s_{\overline{a}} \in S_{\overline{a}}$ . In the architectural part  $s_a : \mathcal{L} \to \mathcal{V}$ ,  $\mathcal{L}$  represents all possible registers and memory locations, i.e., in practical terms,  $\mathcal{L}$  is the address space of  $\mathcal{P}$ . An initial state  $s_I \in S_I$  with  $s_I = (s_a, s_{\overline{a},I})$  is defined by a unique non-architectural part  $s_{\overline{a},I} \in S_{\overline{a}}$  and an arbitrary architectural part  $s_a \in S_a$ . We assume that  $s_{\overline{a},I} \in S_{\overline{a}}$  is unique to make the exposition simpler. Our model could easily be extended to a set of initial non-architectural states. The number  $|\mathcal{L}|$  of memory locations is arbitrary but fixed. We write v = s(l) to denote the value  $v = s_a(l)$  at location  $l \in \mathcal{L}$  in state  $s = (s_a, s_{\overline{a}})$ . We also write (v, v') = s(l, l') as shorthand for v = s(l) and v' = s(l').

To formally define instruction duplication, we need to reason about *original* and *duplicate* memory locations. To this end, we partition the set  $\mathcal{L}$  of memory locations into two sets

of equal size, the *original* and *duplicate locations*  $\mathcal{L}_O$  and  $\mathcal{L}_D$ , respectively, i.e.,  $\mathcal{L}_O \cap \mathcal{L}_D = \emptyset$ ,  $\mathcal{L}_O \cup \mathcal{L}_D = \mathcal{L}$ , and  $|\mathcal{L}_O| = |\mathcal{L}_D|$ . Given  $\mathcal{L}_O$  and  $\mathcal{L}_D$ , we define an **arbitrary but fixed** *bijective function*  $L_D: \mathcal{L}_O \to \mathcal{L}_D$  that maps an original location  $l_O \in \mathcal{L}_O$  to its corresponding duplicate location  $l_D = L_D(l_O)$ . The inverse of  $L_D$  is denoted by  $L_D^{-1}$  and is uniquely defined. We write  $(l_D, l_D') = L_D(l_O, l_O')$  as shorthand for  $l_D = L_D(l_O)$  and  $l_D' = L_D(l_O')$ . Function  $L_D$  implements a correspondence between original and duplicate locations, which we need to define QED-consistency (Definition 11 below).

An instruction  $i \in I$  with i = (op, l, (l', l'')) is defined by an opcode  $op \in Op$ , an output location  $l \in \mathcal{L}$ , and a pair of input locations  $(l', l'') \in \mathcal{L}^2$ . Function  $op : I \to Op$  maps an instruction to its opcode op(i). Functions  $L_{out}: I \to \mathcal{L}$  and  $L_{in}:I\to\mathcal{L}^2$  map an instruction i to its output and input locations  $L_{out}(i) = l$  and  $L_{in}(i) = (l', l'')$ , respectively. Given a state  $s = (s_a, s_{\overline{a}})$ , instruction i reads values in s from its input locations  $L_{in}(i)$  and writes a value to its output location  $L_{out}(i)$ , resulting in a transition to a new state  $s' = (s'_a, s'_{\overline{a}})$ , written as s' = T(s, i). The transition function T is total, i.e., for every instruction i and state s, there exists a successor state s' = T(s,i). As mentioned above, we have kept the model simple in order to make the presentation more accessible, but our results can be lifted to many extensions, including, e.g., more complicated kinds of instructions or instructions with enabledness conditions cf. [15].

We write  $i \in I^n$  and  $s \in S^n$  to denote sequences  $i = \langle i_1, \ldots, i_n \rangle$  and  $s = \langle s_1, \ldots, s_n \rangle$  of n instructions and n states, respectively. We will use :: for sequence concatenation and extend the transition function T to sequences as follows.

**Definition 2** (Path). Given sequences  $i = \langle i_1, \ldots, i_n \rangle$  and  $s = \langle s_1, \ldots, s_n \rangle$  of n instructions and states, s is a path from state  $s_0 \in S$  to  $s_n$  via i, written  $s = T(s_0, i)$ , iff  $\bigwedge_{k=0}^{n-1} s_{k+1} = T(s_k, i_{k+1})$ .

If  $s = T(s_0, i)$ , then for convenience we also write  $s_n = T(s_0, i)$  to denote the final state  $s_n$ .

**Definition 3** (Reachable State). A state s is reachable, written reach(s), iff  $s = T(s_0, i)$  for some  $s_0 \in S_I$  and instruction sequence i.

The set I of instructions contains as proper subsets the sets of original and duplicate instructions,  $I_O$  and  $I_D$ , respectively. Original (duplicate) instructions operate only on original (duplicate) locations, i.e.,  $\forall i_O \in I_O$ .  $L_{in}(i_O) \in \mathcal{L}_O^2 \land L_{out}(i_O) \in \mathcal{L}_O$  and  $\forall i_D \in I_D$ .  $L_{in}(i_D) \in \mathcal{L}_D^2 \land L_{out}(i_D) \in \mathcal{L}_D$ . Given these definitions, we formalize instruction duplication as follows.

**Definition 4** (Instruction Duplication). Let  $Dup: I_O \to I_D$  be an instruction duplication function that maps an original instruction  $i_O = (op, l_O, (l_O', l_O''))$  to a duplicate instruction  $i_D = Dup(i_O) = (op, L_D(l_O), L_D(l_O', l_O''))$  with respect to the bijective function  $L_D$ .

An original instruction and its duplicate have the same opcode. We write  $i_O \in I_O^n$  and  $i_D \in I_D^n$  to denote sequences  $i_O = \langle i_{O,1}, \ldots, i_{O,n} \rangle$  and  $i_D = \langle i_{D,1}, \ldots, i_{D,n} \rangle$  of n original and

duplicate instructions, respectively. We lift Dup in the natural way also to sequences of instructions as follows.

**Definition 5** (Instruction Sequence Duplication). Let  $i_{O} = \langle i_{O,1}, \dots, i_{O,n} \rangle$  be a sequence of original instructions. Then  $Dup(i_{O}) = \langle Dup(i_{O,1}), \dots, Dup(i_{O,n}) \rangle$ .

### IV. FORMALIZING CORRECTNESS

We formalize the correctness of instruction executions in a processor  $\mathcal{P}$  using an abstract specification relation. We then link this abstract specification to QED-consistency, the self-consistency property employed by SQED (Section V below).

For our formalization, we assume that every opcode  $op \in Op$  has a  $specification\ function\ Spec_{op}: \mathcal{V}^2 \to \mathcal{V}$  that specifies how the opcode computes an output value from input values. Using this family of functions, we define an overall abstract  $specification\ relation\ Spec \subseteq S \times I \times S$ , which expresses when an instruction  $i \in I$  can transition to a state  $s' \in S$  from a state  $s \in S$  while respecting the opcode specification.

**Definition 6** (Abstract Specification).  $\forall s, s' \in S, i \in I$ .

$$Spec(s, i, s') \leftrightarrow \forall l \in \mathcal{L}.$$

$$(l \neq L_{out}(i) \rightarrow s(l) = s'(l)) \land$$

$$(l = L_{out}(i) \rightarrow s'(l) = Spec_{op(i)}(s(L_{in}(i))))$$

$$(1)$$

Equation (1) states general and natural properties that we expect to hold for a processor  $\mathcal{P}$ . If an instruction i executes according to its specification, then the values at locations that are not output locations of i are unchanged. Additionally, the value produced at the output location of the instruction must agree with the value specified by function  $Spec_{op(i)}$ . Note that the specification relation Spec specifies only how the architectural part of a state is updated by a transition (not the non-architectural part). Consequently, there might exist multiple states whose non-architectural parts satisfy the right-hand side of (1). This is why Spec is a relation rather than a function. As special cases of (1), original and duplicate instructions have the following properties:

$$\forall s, s' \in S, i_O \in I_O, l_O \in \mathcal{L}_O, i_D \in I_D, l_D \in \mathcal{L}_D.$$

$$(Spec(s, i_O, s') \to s(l_D) = s'(l_D)) \land \qquad (2)$$

$$(Spec(s, i_D, s') \to s(l_O) = s'(l_O)) \qquad (3)$$

Equations (2) and (3) express that the execution of an original (duplicate) instruction does not change the values at duplicate (original) locations if the instruction executes according to its specification. The following *functional congruence* property of instructions also follows from (1):

$$\forall s_0, s_1, s', s'' \in S, i, i' \in I.$$

$$[op(i) = op(i') \land Spec(s_0, i, s') \land Spec(s_1, i', s'') \land (4)$$

$$s_0(L_{in}(i)) = s_1(L_{in}(i'))] \rightarrow s'(L_{out}(i)) = s''(L_{out}(i'))$$

By functional congruence, if two instructions with the same opcode are executed on inputs with the same values, then the output values are the same. We next define the correctness of a processor  $\mathcal{P}$  based on the abstract specification Spec.

**Definition 7** (Correctness). A processor  $\mathcal{P}$  is correct with respect to specification Spec iff  $\forall i \in I, s \in S. reach(s) \rightarrow Spec(s, i, T(s, i)).$ 

Correctness requires every instruction to execute according to the abstract specification Spec in every reachable state of  $\mathcal{P}$ .

A *bug* in  $\mathcal{P}$  is a counterexample to correctness, i.e., an instruction that fails in at least one (not necessarily initial) reachable state and may or may not fail in other states.

**Definition 8** (Bug). A bug with respect to specification Spec in a processor  $\mathcal{P}$  is defined by a pair  $\mathcal{B} = \langle i_b, S_b \rangle$  consisting of an instruction  $i_b \in I$  and a non-empty set  $S_b \subseteq S$  of states such that  $S_b = \{s \in S \mid reach(s) \land \neg Spec(s, i_b, T(s, i_b))\}.$ 

The above definitions rely on the notion of an abstract specification relation. Having *some* abstract specification is a *theoretical* construct that is necessary to formally characterize instruction failure and establish formal proofs about SQED. However, it is important to note that to apply SQED in *practice*, we do not need to know what the abstract specification relation is.

A bug  $\langle i_b, S_b \rangle$  is precisely characterized by the set  $S_b$  of all reachable states in which  $i_b$  fails. The following proposition follows from Definitions 7 and 8.

**Proposition 1.** A processor P has a bug with respect to specification Spec iff it is not correct with respect to Spec.

As special cases of processor correctness and bugs, respectively, we define correctness and bugs with respect to instructions that are executed in an initial state only.

**Definition 9** (Single-Instruction Correctness). *Processor*  $\mathcal{P}$  *is* single-instruction correct *iff*:

$$\forall i \in I, s_0 \in S_I. Spec(s_0, i, T(s_0, i)).$$

Single-instruction correctness implies that all instructions, i.e., all opcodes and all combinations of input and output locations, execute correctly in all initial states. A *single-instruction bug* is a counterexample to single-instruction correctness.

**Definition 10** (Single-Instruction Bug). Processor  $\mathcal{P}$  has a single-instruction bug with respect to specification Spec iff  $\exists i \in I, s_0 \in S_I. \neg Spec(s_0, i, T(s_0, i)).$ 

Several approaches exist for single-instruction checking of a processor, which is complementary to SQED (cf. Section VII).

# V. Self-Consistency as QED-Consistency

We now define QED-consistency (cf. Section II) as a property of states of a processor  $\mathcal{P}$  based on function  $L_D$ . Then we formally define the notion of QED test and show that for correct processors, QED tests preserve QED-consistency. This result is key to the proof of the soundness in Section VI below.

**Definition 11** (QED-Consistency). A state s is QED-consistent, written QEDcons(s), iff  $\forall l_O \in \mathcal{L}_O$ .  $s(l_O) = s(L_D(l_O))$ .

QED-consistency is based on checking the architectural part of a state. An equivalent condition can be formulated in terms of duplicate locations:  $\forall l_D \in \mathcal{L}_D$ .  $s(l_D) = s(L_D^{-1}(l_D))$ .

**Definition 12** (QED test). An instruction sequence i is a QED test if  $i = i_O :: Dup(i_O)$  for some sequence  $i_O$  of original instructions.

We link the abstract specification *Spec* to the semantics of original and duplicate instructions. This way, we obtain a notion of functional congruence that readily follows as a special case from (4).

**Corollary 1** (Functional Congruence: Duplicate Instructions). Given  $i_O \in I_O$  and  $i_D \in I_D$  with  $i_D = Dup(i_O)$ , the following holds for all states  $s_0$ ,  $s_1$ , s', and s'':

$$[Spec(s_0, i_O, s') \land Spec(s_1, i_D, s'') \land s_0(L_{in}(i_O)) = s_1(L_D(L_{in}(i_O)))] \rightarrow s'(L_{out}(i_O)) = s''(L_D(L_{out}(i_O)))$$

Corollary 1 states that an original instruction  $i_O$  produces the same value at its output location as its duplicate instruction  $i_D = Dup(i_O)$ , provided that these instructions execute in states where the values at the respective input locations match.

We generalize Corollary 1 to show that after executing a pair of original and duplicate instructions, the values at *all* original locations match the values at the corresponding duplicate locations, assuming those values also matched before executing the instructions.

**Lemma 1** (cf. Corollary 1). Given  $i_O \in I_O$  and  $i_D \in I_D$  with  $i_D = Dup(i_O)$ , the following holds for all states  $s_0$ ,  $s_1$ , s', and s'':

$$[Spec(s_0, i_O, s') \land Spec(s_1, i_D, s'') \land \forall l_O \in \mathcal{L}_O. \ s_0(l_O) = s_1(L_D(l_O))] \rightarrow \forall l_O \in \mathcal{L}_O. \ s'(l_O) = s''(L_D(l_O))$$

*Proof.* See online appendix [16].

Lemma 1 leads to an important result that we need to prove soundness of SQED (Lemma 3 below): executing a QED test i starting in a QED-consistent state results in a QED-consistent final state if all instructions in i execute according to the abstract specification Spec (cf. Fig. 1b).

**Lemma 2** (QED-Consistency and QED tests). Let  $i = \langle i_1, \ldots, i_{2n} \rangle$  be a QED test, let  $\langle s_0, \ldots, s_{2n} \rangle$  be a sequence of 2n+1 states, and let Spec be some abstract specification relation. Then,

$$QEDcons(s_0) \land \big(\bigwedge_{j:=0}^{2n-1} Spec(s_j, i_{j+1}, s_{j+1})\big) \rightarrow \\ QEDcons(s_{2n})$$

*Proof.* Assuming the antecedent, let  $l_O \in \mathcal{L}_O$  be arbitrary but fixed with  $l_D = L_D(l_O)$ . By repeated application of (2), we derive  $s_0(l_D) = s_1(l_D) = \ldots = s_n(l_D)$ , and hence:

$$s_0(l_D) = s_n(l_D) \tag{5}$$

by transitivity. By repeated application of (3), we derive:

$$s_n(l_O) = s_{2n}(l_O) \tag{6}$$

Now,  $QEDcons(s_0)$  implies  $s_0(l_O) = s_0(L_D(l_O))$ , from which it follows by (5) that  $s_0(l_O) = s_n(L_D(l_O))$ . By repeated application of Lemma 1, we can next derive  $s_j(l_O) = s_{n+j}(L_D(l_O))$  for  $1 \leq j \leq n$ , and in particular,  $s_n(l_O) = s_{2n}(L_D(l_O))$ . Finally, by applying (6), we get  $s_{2n}(l_O) = s_{2n}(L_D(l_O))$ . Since  $l_O$  was chosen arbitrarily,  $QEDcons(s_{2n})$  holds.

### VI. SOUNDNESS AND CONDITIONAL COMPLETENESS

SOED checks a processor  $\mathcal{P}$  for self-consistency by executing QED tests and checking QED-consistency (cf. Fig 1a). We now define the correctness of  $\mathcal{P}$  in terms of QED tests that, when executed, always result in QED-consistent states. This way, we establish a correspondence between counterexamples to QED-consistency and bugs in  $\mathcal{P}$ . We then prove our main results (Theorem 1) related to the bug-finding capabilities of SQED, i.e., soundness and conditional completeness.

**Definition 13** (Failing and Succeeding QED Tests). Let i be a QED test,  $s_0 \in S_I$  an initial state such that  $QEDcons(s_0)$ holds, and let  $s = T(s_0, i)$ . We say that:

- *OED test* i fails if  $\neg QEDcons(s)$ .
- QED test i succeeds if QEDcons(s).

**Definition 14** (Processor QED-Consistency). A processor  $\mathcal{P}$ is QED-consistent if all possible QED tests succeed.

**Definition 15** (Processor QED-Inconsistency). A processor  $\mathcal{P}$ is QED-inconsistent if some QED test fails.

**Lemma 3.** Let  $\mathcal{P}$  be a processor. If  $\mathcal{P}$  is QED-inconsistent, then  $\mathcal{P}$  is not correct with respect to any abstract specification

*Proof.* Let i be a failing QED test for  $\mathcal{P}$  and assume that processor  $\mathcal{P}$  is correct with respect to some abstract specification relation Spec. By Lemma 2, we conclude  $QEDcons(s_{2n})$ , which contradicts the assumption that i is a failing QED test.

Importantly, Lemma 3 holds regardless of what the actual specification relation Spec is, i.e., it is independent of Spec and the opcode specification function  $Spec_{op}$  (Definition 6).

Lemma 3 shows that SQED is a sound technique: any error reported by a failing QED test is in fact a real bug in the system. It is more challenging to determine the degree to which SQED is complete, that is, for which bugs do there exist failing QED tests? We address this question next.

Suppose that  $\mathcal{B} = \langle i_b, S_b \rangle$  is a bug with respect to a specification Spec in a processor  $\mathcal{P}$ , where  $i_b = (op_b, l_{out}^b, (l_{in1}^b, l_{in2}^b))$ . A bug-specific QED test for  $\mathcal{B}$  is a QED test that sets up the conditions for and includes the activation of the bug. By Definition 8, if  $i_b$  is executed in  $\mathcal{P}$  starting from any state in  $S_b$ , the specification is violated. That is, for each  $s_b \in S_b$ ,  $\neg Spec(s_b, i_b, T(s_b, i_b))$ . Let  $s = T(s_b, i_b)$ . According to (1), there are two ways the specification can be violated. Either: (A) the value in the output location of  $i_b$  is different from that required by Spec, i.e.:  $s(l_{out}^b) \neq Spec_{op_b}(s_b(l_{in1}^b), s_b(l_{in2}^b))$ , which we call a type-A bug; or (B) the value in some other, nonoutput location  $l_{bad}$  is not preserved, i.e.:  $s(l_{bad}) \neq s_b(l_{bad})$ 

for some  $l_{bad} \neq l_{out}^b$ , which we call a type-B bug. We now define a bug-specific QED test formally.

**Definition 16** (Bug-Specific QED Test). Let  $\mathcal{B}$  $\langle i_b, S_b \rangle$  be a bug in  $\mathcal{P}$  with respect to Spec, where  $\vec{l}_b = (op_b, l_{out}^b, (l_{in1}^b, l_{in2}^b))$ . The instruction sequence  $i = l_{out}$  $\langle i_1,\ldots,i_n,i_{n+1},\ldots,i_{2n}\rangle$  is a bug-specific QED test for  $\mathcal B$  if the following conditions hold:

- 1)  $i_{n+1} = i_b$ .
- 2) *i* is a QED test for some  $L_D$ , i.e. for  $1 \le k \le n$ ,  $i_{n+k} =$  $Dup(i_k)$ . In particular,  $i_1 = (op_b, l_{out}, (l_{in1}, l_{in2}))$ , with  $(l_{in1}, l_{in2}, l_{out}) = L_D^{-1}((l_{in1}^b, l_{in2}^b, l_{out}^b)).$ 3) There exists a path  $s \in S^{2n}$  from  $s_0 \in S_I$
- with  $QEDcons(s_0)$ , such that  $s = T(s_0, i) =$  $\langle s_1,\ldots,s_n,s_{n+1},\ldots,s_{2n}\rangle$ , where  $s_n\in S_b$ .
- 4)  $Spec(s_0, i_1, s_1)$ .
- 5) Additionally, we need three more conditions that depend on the bug types:

Case A: If  $i_b$  is a type-A bug with respect to  $s_n$ , i.e.  $s_{n+1}(l_{out}^b) \neq Spec_{op_b}(s_n(l_{in1}^b), s_n(l_{in2}^b)), \text{ then let}$  $l_{orig} = l_{out}$  and  $l_{dup} = l_{out}^b$ .

- We then require:
  - $s_{n+1}(l_{dup}) = s_{2n}(l_{dup}),$

  - $s_1(l_{orig}) = s_{2n}(l_{orig}),$   $s_0(L_{in}(i_b)) = s_n(L_{in}(i_b)).$

Case B: If  $i_b$  is a type-B bug with respect to  $s_n$ , i.e.  $s_n(l_{bad}) \neq s_{n+1}(l_{bad})$  for some  $l_{bad} \neq l_{out}^b$ , then let  $l_{orig} = L_D^{-1}(l_{bad})$  with  $l_{orig} \neq l_{out}$  and  $l_{dup} = l_{bad}$ .

• We then require:

- $s_{n+1}(l_{dup}) = s_{2n}(l_{dup}),$
- $s_1(l_{orig}) = s_{2n}(l_{orig})$ .  $s_1(l_{dup}) = s_n(l_{dup})$ ,

Clearly, it is always possible to satisfy the first two conditions by declaring the buggy instruction  $i_b$  to be the duplicate of  $i_1$ with respect to some function  $L_D$ . Moreover, if we restrict our attention to single-instruction correct processors, then the fourth condition always holds as well. This fits in well with the stated intended role of SQED which is to find sequence-dependent bugs, rather than single-instruction bugs.

Understanding when the remaining conditions 3 and 5 hold is more complicated. We must find some instruction sequence  $i^* = \langle i_2 \dots i_n \rangle$  that can transition  $\mathcal{P}$  from the state  $s_1$  following the execution of  $i_1$  to one of the bug-triggering states in  $S_b$ , i.e.,  $s_n$ . Often it is reasonable to assume that  $\mathcal{P}$  is strongly connected, i.e., that there always exists an instruction sequence that can transition from one reachable state to another. This is almost enough to ensure the existence of  $i^*$ . However, there are a few other restrictions on  $i^*$  to satisfy Definition 16.

First,  $i^*$  must consist of only original instructions to satisfy the definition of a QED test. We are free to choose  $L_D$  to be anything that works, so the main restriction is that  $i^*$  cannot use any instructions referencing locations that are used by  $i_b$ , i.e.,  $l_{in1}^b$ ,  $l_{in2}^b$ , or  $l_{out}^b$ . Note that we defined  $i_{n+1} = i_b$  to be the first duplicate instruction. This ends up being the most severe restriction on  $i^*$  because it means that instructions in  $i^*$ 

cannot write to the locations used as inputs by  $i_b$ . We discuss some mitigations to this restriction in Section VI-A.

Somewhat surprisingly, the three requirements in condition 5 are not very severe, as we now explain. For both type-A and type-B bugs, locations  $l_{orig}$  and  $l_{dup}$  are an original location and its duplicate, respectively, that will hold inconsistent values when the QED test i fails. For type-A bugs,  $l_{orig}$  holds the correct output value of  $i_1$  and  $l_{dup}$  holds the incorrect output value of  $i_b$ . For type-B bugs,  $l_{dup}$  holds the value of location  $l_{bad}$  that is incorrectly modified when  $i_b$  is executed in state  $s_n$ , and  $l_{orig}$  is the original location that corresponds to  $l_{dup} = l_{bad}$ .

The first requirement  $s_{n+1}(l_{dup}) = s_{2n}(l_{dup})$  means that the duplicate sequence  $Dup(i^*)$  of  $i^*$  in the QED test has to preserve the value of  $l_{dup}$  in  $s_{n+1}$  also in the final state  $s_{2n}$ . Further, since  $l_{orig} = L_D^{-1}(l_{dup})$ , this also imposes restrictions on the modifications that  $i^*$  can make to  $l_{orig}$ . However, as this is just one original location, it is unlikely that every possible  $i^*$  would need to modify it to get to some bug-triggering state  $s_n$ .

The second requirement is  $s_1(l_{orig}) = s_{2n}(l_{orig})$ . For similar reasons, it is unlikely that  ${\boldsymbol i}^*$  would need to modify  $l_{orig}$ , and the duplicate sequence  $Dup({\boldsymbol i}^*)$  of  ${\boldsymbol i}^*$  should not modify it either, since it is an original location and original locations should be left alone by duplicate instructions. Although the buggy instruction  $i_b$  might modify  $l_{orig}$  if it has more than one bug effect, we may be able to choose the locations of  $i_1$  and  $L_D$  differently to avoid this.

Finally, the last requirement of condition 5 depends on the two cases A and B. In both cases, we require that  $i^*$  does not modify certain duplicate locations: the input locations  $L_{in}(i_b)$  of  $i_b$  (A) and location  $l_{dup}$  that is incorrectly modified by  $i_b$  (B). Sequence  $i^*$  should not modify any duplicate locations as it is composed of original instructions. Note that we do not have to make the strong assumption that  $i^*$  executes according to its specification, only that it avoids corrupting a few key locations. Given that we have a lot of freedom in choosing  $L_D$  and hence the locations of  $i_1$ , these requirements are likely to be satisfiable if there are some degrees of freedom in choosing a path to one of the bug-triggering states.

We now prove our conditional completeness property, namely that if a bug-specific QED test i exists, then i fails.

**Lemma 4.** Let  $\mathcal{P}$  be a processor with a bug  $\mathcal{B} = \langle i_b, S_b \rangle$  with respect to specification Spec, for which there exists a bug-specific QED test i. Then i fails.

*Proof.* Let  $\mathcal{B}=\langle i_b,S_b\rangle$  be a bug and i be a bug-specific QED test for  $\mathcal{B}$ . By Definition 16 we have  $i=\langle i_1,\ldots,i_n,i_{n+1},\ldots,i_{2n}\rangle$  and  $s=T(s_0,i)=\langle s_0,s_1,\ldots,s_n,s_{n+1},\ldots,s_{2n}\rangle$ , where  $s_n\in S_b$  and  $i_b=i_{n+1}$ , and  $QEDcons(s_0)$  holds. We show that  $\neg QEDcons(s_{2n})$  holds by showing that  $s_{2n}(l_{orig})\neq s_{2n}(l_{dup})$ . We distinguish the two cases A and B in Definition 16.

**Case A.** Since  $QEDcons(s_0)$  and  $Dup(i_1) = i_b$ , we have

$$s_0(L_{in}(i_1)) = s_0(L_{in}(i_b)) \tag{7}$$

From the third requirement of Case A in Definition 16, we have  $s_0(L_{in}(i_b)) = s_n(L_{in}(i_b))$ , so it follows that,

$$s_0(L_{in}(i_1)) = s_n(L_{in}(i_b))$$
 (8)

By (8) and since  $op(i_1) = op(i_b)$ , also

$$Spec_{op(i_1)}(s_0(L_{in}(i_1))) = Spec_{op(i_b)}(s_n(L_{in}(i_b)))$$
 (9)

Since  $Spec(s_0, i_1, s_1)$  by Definition 16, we have

$$s_1(L_{out}(i_1)) = Spec_{op(i_1)}(s_0(L_{in}(i_1)))$$
 (10)

Since we are in Case A, we have from Definition 16 that  $l_{orig} = L_{out}(i_1)$ , and from the second requirement of Case A, we have  $s_1(l_{orig}) = s_{2n}(l_{orig})$ , so it follows that,

$$s_{2n}(l_{orig}) = Spec_{op(i_1)}(s_0(L_{in}(i_1)))$$
 (11)

Since  $i_b$  fails in state  $s_n$ , we have that,

$$s_{n+1}(L_{out}(i_b)) \neq Spec_{on(i_b)}(s_n(L_{in}(i_b)))$$
 (12)

Again, from Case A in Definition 16, we have  $l_{dup} = L_{out}(i_b)$ , and from the first requirement of Case A, we have  $s_{n+1}(l_{dup}) = s_{2n}(l_{dup})$ , so it follows that,

$$s_{2n}(l_{dup}) \neq Spec_{op(i_b)}(s_n(L_{in}(i_b)))$$
(13)

Finally, (9) and (11) give us,

$$s_{2n}(l_{orig}) = Spec_{op(i_b)}(s_n(L_{in}(i_b)))$$
(14)

But then (13) and (14) imply  $s_{2n}(l_{orig}) \neq s_{2n}(l_{dup})$ , and hence  $\neg QEDcons(s_{2n})$ .

Case B. See online appendix [16]. 
$$\Box$$

# Theorem 1.

- SQED is sound (Lemma 3).
- SQED is complete for bugs for which a bug-specific QED test exists (Lemma 4).

Theorem 1 is relevant for practical applications of SQED. Referring to the high-level workflow shown in Fig. 1a, BMC symbolically explores all possible QED tests up to bound n for a particular fixed mapping  $L_D$ . If a failing QED test i is found, then by the soundness of SQED, i corresponds to a bug in the processor. By completeness, if there exists a bug for which a bug-specific QED test i exists, then with a sufficiently large bound n, BMC will find a sequence i that will fail.

### A. Extensions

We now consider variants of QED tests that cover a larger class of bugs (i.e. bugs that cannot be detected by a bug-specific QED test). Ultimately, with hardware support we obtain a family of QED tests which, together with single-instruction correctness, results in a complete variant of SQED (Theorem 2).

The main limitation of bug-specific QED tests arises from the fact that QED tests consist of a sequence of original instructions followed by duplicate ones. This makes it impossible to set up a bug-specific QED test for an important class of forwarding-logic bugs (a simple refinement of our model can be used for the important case of pipelined systems). To see why, consider that

a bug-triggering state  $s_n \in S_b$  must be reached by executing a sequence of original instructions. The buggy instruction, which is a *duplicate*, is executed in state  $s_n$  and would have to read a value from some *original* location written previously.

To resolve this limitation, first note that there is another way that SQED can find bugs, namely by finding QED tests for which the bug occurs during the original sequence, but not during the duplicate one. This kind of QED test is much more effective with a simple extension to allow no-operation instructions (a trick also employed in [11]). To formalize this, we first define a set  $\mathcal N$  of no-operation instructions (NOPs).

**Definition 17.** Let  $\mathcal{N}$  be the set of instructions such that, for every state  $(s_a, s_{\overline{a}})$ , if  $i_{nop} \in \mathcal{N}$ , then  $T((s_a, s_{\overline{a}}), i_{nop}) = (s_a, s'_{\overline{a}})$  for some  $s'_{\overline{a}} \in S_{\overline{a}}$ .

An instruction in  $\mathcal N$  may change the non-architectural part of a state, but not the architectural part.

**Definition 18.** An extended QED test is any sequence of instructions obtained from a standard QED test by inserting zero or more instructions from N anywhere in the sequence.

Extended QED tests enjoy the same properties as standard QED tests. In particular, an appropriately lifted version of Lemma 2 holds and the notions of failing and succeeding QED tests can be lifted to extended QED tests in the obvious way.

**Definition 19** (Bug-Hunting Extended QED Test). Let  $\mathcal{P}$  be a single-instruction correct processor with at least one bug. The instruction sequence i is a bug-hunting extended QED test with a bug-prefix of size k and initial state  $s_0$  for  $\mathcal{P}$  if the following conditions hold:

- 1) There is some bug  $\mathcal{B} = \langle i_b, S_b \rangle$  in  $\mathcal{P}$  such that  $T(s_0, \langle i_1, \dots, i_{k-1} \rangle) \in S_b$  and  $i_k = i_b$
- 2) i is an extended QED test
- 3)  $i_k$  is an original instruction, and  $i_{k+1} = Dup(i_1)$

Unlike a bug-specific QED test, a bug-hunting extended QED test is not guaranteed to fail. It starts with a bug-triggering sequence of length k, and then finishes with a modified duplicate sequence which may add (or subtract) NOPs from  $\mathcal{N}$ . The NOPs can be used to change the timing between any interdependent instructions, making it more likely that the duplicate sequence will produce a correct result, especially if the bug depends on forwarding-logic. One can show (omitted for lack of space) that for a general class of forwarding-logic bugs, there does always exist an extended QED test that fails.

Another QED test extension is to allow original and duplicate instructions to be *interleaved* [10], rather than requiring that all original instructions precede all duplicate instructions [8].<sup>3</sup> Again, it is straightforward to show that this extension preserves Lemma 2. Clearly, the set of bugs that can be found by adding

interleaving are a strict superset of those that can be found without. In practice, implementations of SQED search for all possible extended QED tests with interleaving. Empirically, case studies have not turned up any (non-single-instruction) bugs that cannot be found with this combination. However, one can construct pathological systems with bugs that cannot be found by such QED tests. We address these cases next.

### B. Hardware Extensions

With hardware support, stronger guarantees can be achieved that lead to our final completeness result (Theorem 2). We first introduce a *soft-reset* instruction, which transitions the non-architectural part of a state to the initial non-architectural state  $s_{\overline{a},I}$  without changing the architectural part. Then we define a variant of bug-hunting extended QED tests where we insert soft-reset instructions in the sequence of duplicate instructions. This way, all duplicate instructions execute in an initial state and hence execute according to the specification for single-instruction correct processors. The resulting QED test always fails, in contrast to a bug-hunting extended QED test.

**Definition 20.**  $i_r$  is a soft-reset instruction for  $\mathcal{P}$  if for every state  $(s_a, s_{\overline{a}})$ ,  $T((s_a, s_{\overline{a}}), i_r) = (s_a, s_{\overline{a},I})$ .

It is easy to see that  $i_r \in \mathcal{N}$ .

**Definition 21** (Bug-Specific Soft-Reset QED Test). Let  $\mathcal{P}$  be single-instruction correct with at least one bug  $\mathcal{B} = \langle i_b, S_b \rangle$ . The instruction sequence  $i = \langle i_1, \ldots i_n \rangle$  is a bug-specific soft-reset QED test for  $\mathcal{P}$  if the following conditions hold:

- 1) i is a bug-hunting extended QED test for  $\mathcal{P}$  with a minimal bug-prefix of size  $k \geq 2$  and initial state  $s_0$
- 2) Let  $s = T(s_0, i)$ . Then,  $\forall l \in \mathcal{L}_D$ .  $s_{k-1}(l) = s_k(l)$ , i.e.,  $i_b = i_k$  does not corrupt any duplicate location
- 3) n = 3k
- 4) For each  $1 \le j \le k$ ,  $i_{k+2j-1} = i_r$

**Lemma 5.** If P is single-instruction correct and has a bugspecific soft-reset QED test i, then i fails.

*Proof.* See online appendix [16]. 
$$\Box$$

There are still a few (pathological) ways in which a bug may be missed by searching for all possible soft-reset QED tests. First, there may be no triggering sequence starting from any QED-consistent state. Second, it could be that the triggering sequence for a bug requires using more than half of all the locations, making it impossible to divide the locations among original and duplicate instructions. Finally, it could be that the bug always corrupts duplicate locations for every possible candidate sequence. These can all be remedied by adding *hard reset* instructions, which reset  $\mathcal P$  to a specific initial state.

**Definition 22.** The set  $\{i_{R,s_I}|s_I \in S_I\}$  is a family of hard reset instructions for  $\mathcal{P}$  if for every state s,  $T(s,i_{R,s_I}) = s_I$ .

**Definition 23.** Let  $\mathcal{P}$  be a processor. Then  $i = \langle i_1 \dots i_{2k+2} \rangle$  is a bug-specific hard-reset QED test with bug-prefix size k and initial state  $s_I$  for  $\mathcal{P}$  if the following conditions hold:

1)  $k \ge 2$ 

 $<sup>^3</sup>$  The bug in Example 3 can be detected by executing the QED test  $\boldsymbol{i}=i_{O,1},i_{D,1}::i_{O,2},i_{D,2},$  which interleaves original and duplicate instructions. The subsequence  $i_{O,2},i_{D,2}$  of two back-to-back MULs causes  $i_{D,2}$  to produce an incorrect result at its output location  $l_{31}$ . The final state is QED-inconsistent since the output location  $l_{15}$  of  $i_{O,2}$  holds the correct value, while  $l_{31}$  holds an incorrect one.

- 2)  $\langle i_1 \dots i_k \rangle$  reach and trigger a bug  $\mathcal{B} = \langle i_b, S_b \rangle$  in  $\mathcal{P}$ starting from  $s_I$ , where  $i_k = i_b$
- 3)  $i_{k+1}=i_{R,s_I}$ 4)  $\langle i_{k+2}\dots i_{2k}\rangle=\langle i_1\dots i_{k-1}\rangle$
- 5)  $i_{2k+1} = i_r$
- 6)  $i_{2k+2} = i_k$

Notice that there is no notion of duplication for a hard-reset QED test. Instead, the exact same sequence is executed twice except that there is a hard reset in between and a soft reset right before the last instruction. Hard-reset QED tests also use a slightly different notion of success and failure.

**Definition 24.** Let i be a bug-specific hard-reset QED test with bug-prefix size k and initial state  $s_I$ , and let  $s = T(s_I, i)$ .

- *i* succeeds if  $s_k(l) = s_{2k+2}(l)$  for every location  $l \in \mathcal{L}$ .
- i fails if  $s_k(l) \neq s_{2k+2}(l)$  for some location  $l \in \mathcal{L}$ .

The combination of single-instruction correctness checking and exhaustive search for hard-reset QED tests is complete.

**Theorem 2.** If P is single-instruction correct and has no failing bug-specific hard-reset QED tests, then it is correct.

### VII. RELATED WORK

Assertion-based formal verification techniques using theorem proving or (bounded) model checking, e.g., [1], [17]–[19], require implementation-specific, manually-written properties. In contrast to that, symbolic quick error detection (SQED) [7]-[10] is based on a universal self-consistency property.

In an early application of self-consistency checking for processor verification without a specification [11], given instruction sequences are transformed by, e.g., inserting NOPs. The original and the modified instruction sequence are expected to produce the same result. As a formal foundation, this approach relies on formulating and explicitly computing an equivalence relation over states, which is not needed with SOED.

SQED originates from quick error detection (QED), a postsilicon validation technique [20]-[22]. QED is highly effective in reducing the length of existing bug traces (i.e., instruction sequences) in post-silicon debugging of processor cores. To this end, existing bug traces are systematically transformed into QED tests by techniques that (among others) include instruction duplication [23]. SQED exhaustively searches for minimallength QED tests using BMC for pre-silicon verification. It is also applicable to post-silicon validation. SQED was extended to operate with symbolic initial states [12], [24] to overcome the potential limitations of BMC when unrolling the transition relation of a design starting in a concrete initial state.

SOED employs the principle of self-consistency based on a mathematical interpretation of instructions as functions. That principle is also applied by accelerator quick error detection (A-QED) [25], a formal pre-silicon verification technique for HW accelerator designs. A-QED checks the functions implemented by an accelerator for functional consistency and, like SQED, does not require a formal specification.

Unique program execution checking [26] relies on a particular variant of self-consistency to check security vulnerabilities of processor designs for covert channel attacks. In the context of security, self-consistency is also applied to verify secure information flow by self-composition of programs [27]-[30].

Several approaches, including both formal and simulationbased approaches, exist for checking single-instruction (SI) correctness cf. [9], [24], [31]. Checking SI correctness is complementary to checking self-consistency using SQED and is also much more tractable. In a formal approach, a property corresponding to  $Spec_{op}$  (based on the ISA) is written for each opcode  $op \in Op$ , and the model checker is used to ensure that the property holds when starting from any initial state. Because the approach is restricted to initial states and only a single instruction execution, it is much simpler to specify and check than would be a property specifying the full correctness of  $\mathcal{P}$ . Efficient specialized approaches exist for checking multiplier units [32]–[35], which is computationally hard.

### VIII. CONCLUSION AND FUTURE WORK

We laid a formal foundation for symbolic quick error detection (SQED) and presented a theoretical framework to reason about its bug-finding capabilities. In our framework, we proved soundness as well as (conditional) completeness, thereby closing a gap in the theoretical understanding of SQED. Soundness implies that SQED does not produce spurious counterexamples, i.e., any counterexample to QED-consistency reported by SQED corresponds to an actual bug in the design. For completeness, we characterized a large class of bugs that can be detected by failing QED tests under modest assumptions about these bugs. We also identified several QED test extensions based on executing no-operation and reset instructions. For these extensions, we proved even stronger completeness guarantees, ultimately leading to a variant of SQED that, together with single-instruction correctness, is complete.

As future work, it would be valuable to extend our framework to consider variants of SQED that operate with more fully symbolic initial states [12], [24]. The challenge will be to identify how this can be done while guaranteeing no spurious counterexamples. For practical applications, our theoretical results provide valuable insights. For example, in present implementations of SQED [9], [10], the flexibility to partition register/memory locations into sets of original and duplicate locations and to select the bijective mapping between these two sets has not yet been explored. Similarly, it is promising to combine standard QED tests and the specialized extensions we presented in a uniform practical tool framework. Features like soft/hard reset instructions could either be implemented in HW in a design-for-verification approach or in software inside a model checker. In another research direction, we plan to extend our framework to model the detection of deadlocks using SQED, cf. [7], and prove related theoretical guarantees.

Acknowledgments. We thank Karthik Ganesan and John Tigar Humphries for helpful initial discussions and the anonymous reviewers for their feedback.

### REFERENCES

- A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu, "Symbolic Model Checking without BDDs," in *Proc. TACAS*, ser. LNCS, vol. 1579. Springer, 1999, pp. 193–207.
- [2] S. Katz, O. Grumberg, and D. Geist, ""Have I written enough Properties?"
   A Method of Comparison between Specification and Implementation," in *Proc. CHARME*, ser. LNCS, vol. 1703. Springer, 1999, pp. 280–297.
- [3] H. Chockler, O. Kupferman, and M. Y. Vardi, "Coverage Metrics for Temporal Logic Model Checking," in *Proc. TACAS*, ser. LNCS, vol. 2031. Springer, 2001, pp. 528–542.
- [4] K. Claessen, "A Coverage Analysis for Safety Property Lists," in Proc. FMCAD. IEEE, 2007, pp. 139–145.
- [5] D. Große, U. Kühne, and R. Drechsler, "Estimating functional coverage in bounded model checking," in *Proc. DATE*. EDA Consortium, San Jose, CA, USA, 2007, pp. 1176–1181.
- [6] H. Chockler, D. Kroening, and M. Purandare, "Coverage in interpolation-based model checking," in *Proc. DAC*. ACM, 2010, pp. 182–187.
- [7] D. Lin, E. Singh, C. Barrett, and S. Mitra, "A structured approach to post-silicon validation and debug using symbolic quick error detection," in *Proc. ITC*. IEEE, 2015, pp. 1–10.
- [8] E. Singh, D. Lin, C. Barrett, and S. Mitra, "Logic bug detection and localization using symbolic quick error detection," *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2018
- [9] E. Singh, K. Devarajegowda, S. Simon, R. Schnieder, K. Ganesan, M. R. Fadiheh, D. Stoffel, W. Kunz, C. W. Barrett, W. Ecker, and S. Mitra, "Symbolic QED Pre-Silicon Verification for Automotive Microcontroller Cores: Industrial Case Study," in *Proc. DATE*. IEEE, 2019, pp. 1000–1005
- [10] F. Lonsing, K. Ganesan, M. Mann, S. S. Nuthakki, E. Singh, M. Srouji, Y. Yang, S. Mitra, and C. W. Barrett, "Unlocking the Power of Formal Hardware Verification with CoSA and Symbolic QED: Invited Paper," in *Proc ICCAD*. ACM, 2019, pp. 1–8.
- [11] R. B. Jones, C. H. Seger, and D. L. Dill, "Self-Consistency Checking," in *Proc. FMCAD*, ser. LNCS, vol. 1166. Springer, 1996, pp. 159–171.
- [12] M. R. Fadiheh, J. Urdahl, S. S. Nuthakki, S. Mitra, C. Barrett, D. Stoffel, and W. Kunz, "Symbolic quick error detection using symbolic initial state for pre-silicon verification," in *Proc. DATE*. IEEE, 2018, pp. 55–60.
- [13] R. M. Keller, "A Fundamental Theorem of Asynchronous Parallel Computation," in *Parallel Processing, Proc. Sagamore Computer Conference*, ser. LNCS, vol. 24. Springer, 1974, pp. 102–112.
- [14] R. M. Keller, "Formal Verification of Parallel Programs," *Commun. ACM*, vol. 19, no. 7, pp. 371–384, 1976.
- [15] B. Huang, H. Zhang, P. Subramanyan, Y. Vizel, A. Gupta, and S. Malik, "Instruction-Level Abstraction (ILA): A Uniform Specification for Systemon-Chip (SoC) Verification," ACM Trans. Design Autom. Electr. Syst., vol. 24, no. 1, pp. 10:1–10:24, 2019.
- [16] F. Lonsing, S. Mitra, and C. W. Barrett, "A Theoretical Framework for Symbolic Quick Error Detection," *CoRR*, vol. abs/2006.05449, 2020, FMCAD 2020 proceedings version with appendix. [Online]. Available: https://arxiv.org/abs/2006.05449
- [17] W. A. Hunt Jr., "Microprocessor design verification," J. Autom. Reasoning, vol. 5, no. 4, pp. 429–460, 1989.
- [18] J. R. Burch and D. L. Dill, "Automatic Verification of Pipelined Microprocessor Control," in *Proc. CAV*, ser. LNCS, vol. 818. Springer, 1994, pp. 68–80.

- [19] A. Biere, E. M. Clarke, R. Raimi, and Y. Zhu, "Verifiying Safety Properties of a Power PC Microprocessor Using Symbolic Model Checking without BDDs," in *Proc. CAV*, ser. LNCS, vol. 1633. Springer, 1999, pp. 60–71.
- [20] T. Hong, Y. Li, S. Park, D. Mui, D. Lin, Z. A. Kaleq, N. Hakim, H. Naeimi, D. S. Gardner, and S. Mitra, "QED: Quick Error Detection tests for effective post-silicon validation," in *Proc. ITC*. IEEE, 2010, pp. 154–163.
- [21] D. Lin, T. Hong, Y. Li, F. Fallah, D. S. Gardner, N. Hakim, and S. Mitra, "Overcoming post-silicon validation challenges through quick error detection (QED)," in *Proc. DATE*. EDA Consortium San Jose, CA, USA / ACM DL, 2013, pp. 320–325.
- [22] D. Lin, T. Hong, Y. Li, E. S, S. Kumar, F. Fallah, N. Hakim, D. S. Gardner, and S. Mitra, "Effective Post-Silicon Validation of System-on-Chips Using Quick Error Detection," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 33, no. 10, pp. 1573–1590, 2014.
- [23] N. Oh, P. P. Shirvani, and E. J. McCluskey, "Error detection by duplicated instructions in super-scalar processors," *IEEE Trans. Reliability*, vol. 51, no. 1, pp. 63–75, 2002.
- [24] K. Devarajegowda, M. R. Fadiheh, E. Singh, C. Barrett, S. Mitra, W. Ecker, D. Stoffel, and W. Kunz, "Gap-free Processor Verification by S<sup>2</sup>QED and Property Generation," in *Proc. DATE*. IEEE, 2020.
- [25] E. Singh, F. Lonsing, S. Chattopadhyay, M. Strange, P. Wei, X. Zhang, Y. Zhou, D. Chen, J. Cong, P. Raina, Z. Zhang, C. Barrett, and S. Mitra, "A-QED Verification of Hardware Accelerators," in *Proc. DAC*, to appear. ACM, 2020.
- [26] M. R. Fadiheh, D. Stoffel, C. W. Barrett, S. Mitra, and W. Kunz, "Processor Hardware Security Vulnerabilities and their Detection by Unique Program Execution Checking," in *Proc. DATE*. IEEE, 2019, pp. 994–999.
- [27] G. Barthe, P. R. D'Argenio, and T. Rezk, "Secure Information Flow by Self-Composition," in *Proc. CSFW-17*. IEEE, 2004, pp. 100–114.
- [28] G. Barthe, J. M. Crespo, and C. Kunz, "Relational Verification Using Product Programs," in *Proc. FM*, ser. LNCS, vol. 6664. Springer, 2011, pp. 200–214.
- [29] J. B. Almeida, M. Barbosa, G. Barthe, F. Dupressoir, and M. Emmi, "Verifying Constant-Time Implementations," in *Proc. USENIX*. USENIX Association, 2016, pp. 53–70.
- [30] W. Yang, Y. Vizel, P. Subramanyan, A. Gupta, and S. Malik, "Lazy Self-composition for Security Verification," in *Proc. CAV*, ser. LNCS, vol. 10982. Springer, 2018, pp. 136–156.
- [31] A. Reid, R. Chen, A. Deligiannis, D. Gilday, D. Hoyes, W. Keen, A. Pathirane, O. Shepherd, P. Vrabel, and A. Zaidi, "End-to-End Verification of Processors with ISA-Formal," in *Proc. CAV*, ser. LNCS, vol. 9780. Springer, 2016, pp. 42–58.
- [32] U. Krautz, M. Wedler, W. Kunz, K. Weber, C. Jacobi, and M. Pflanz, "Verifying full-custom multipliers by Boolean equivalence checking and an arithmetic bit level proof," in ASP-DAC. IEEE, 2008, pp. 398–403.
- [33] A. A. R. Sayed-Ahmed, D. Große, U. Kühne, M. Soeken, and R. Drechsler, "Formal verification of integer multipliers by combining Gröbner basis with logic reduction," in *Proc. DATE*, 2016, pp. 1048–1053.
- [34] D. Ritirc, A. Biere, and M. Kauers, "Column-wise verification of multipliers using computer algebra," in *Proc. FMCAD*, 2017, pp. 23–30.
- [35] D. Kaufmann, A. Biere, and M. Kauers, "Verifying Large Multipliers by Combining SAT and Computer Algebra," in *Proc. FMCAD*. IEEE, 2019, pp. 28–36.