# Modular Verification of SPARCv8 Code\*

Junpeng Zha<sup>1</sup>, Xinyu Feng<sup>2, ⋈</sup>, and Lei Qiao<sup>3</sup>

 $^1$  Nanjing University  $^2$  State Key Laboratory for Novel Software Technology, Nanjing University  $\verb|xyfeng@nju.edu.cn||^3 Beijing Institute of Control Engineering$ 

Abstract. Inline assembly code is common in system software to interact with the underlying hardware platforms. Safety and correctness of the assembly code is crucial to guarantee the safety of the whole system. In this paper we propose a practical Hoare-style program logic for verifying SPARC assembly code. The logic supports modular reasoning about the main features of SPARCv8 ISA, including delayed control transfers, delayed writes to special registers, and register windows. We have applied it to verify the main body of a context switch routine in a realistic embedded OS kernel, and extended it to support refinement verification. All of the formalization and proofs have been mechanized in Coq.

### 1 Introduction

Operating system kernels are at the most foundational layer of computer software systems. To interact directly with hardware, many important components in OS kernels are implemented in assembly, such as the context switch code or the code that manages interrupts. And some other codes that are not required to be written in assembly are also implemented in assembly (e.g.memcpy in linux v2.6.17.10 [2]), in order to achieve high performance. Their correctness is crucial to ensure the safety and security of the whole system. However, assembly code verification remains a challenging task in existing work on OS kernel verification (e.g. [29, 15, 12]), where the assembly code is either unverified or verified based on operational semantics without a general program logic.

SPARC (Scalable Processor ARChitecture) is a CPU instruction set architecture (ISA) with high-performance and great flexibility [4]. It has been widely used in various processors for workstations and embedded systems. The SPARCv8 ISA has some interesting features, which make it a non-trivial task to design a Hoare-style program logic for assembly code.

— Delayed control transfers. SPARCv8 has two program counters pc and npc. The npc register points to the next instruction to run. Control-transfer instructions in SPARCv8 change npc instead of pc to the target program point, while pc takes the original value of npc. This makes the control transfer to happen one cycle later than the execution of the control transfer instructions.

<sup>\*</sup> This work is supported in part by grants from National Natural Science Foundation of China (NSFC) under Grant Nos. 61632005, 61502442 and 61502031.

```
CALLER:
                                  ChangeY:
                                        rd Y, %10
          1, \% o_0
                                  6
                                            \%i<sub>0</sub>, 0, Y
1
    mov
2
    call ChangeY
                                  7
3
           \%sp, -64, \%sp
                                  8
    save
                                       nop
          %o_0, %1_0
                                  9
    mov
                                       nop
                                  10
                                       ret
                                  11
                                       restore
                                                  %1_0, 0, %0_0
```

Fig. 1. An Example for SPARC Code

- Delayed writes. The wr instruction that writes a special class of registers does not take effect immediately. Instead the write operation is buffered and then executed X cycles later, where X is a predefined system parameter which usually ranges from 0 to 3.
- Register windows. SPARCv8 uses register windows and the window rotation mechanism to avoid saving contexts in the stack directly and achieves high performance in context management.

We use a simple example in Fig. 1 to show these three features. The function CALLER calls ChangeY, which updates the special register Y and returns its original value.

ChangeY requires an input parameter as the new value for the special register Y. CALLER calls ChangeY at line 2, and pc and npc point to line 2 and 3 respectively at this moment. The call instruction changes the value of pc to npc and let npc points to ChangeY at line 5, which means the control-flow will not transfer to ChangeY in the next cycle, but in the cycle after the execution of the save instruction following the call. Similarly, when ChangeY returns (at line 10), the control is transferred back to the caller after executing the restore instruction at line 11. We call this feature "delayed control transfers".

SPARCy8 uses the save instruction (at line 3 in the example) to save the current context and restore (at line 10) to restore it. Its 32 general registers are split into four logic groups as global  $(r_0 \sim r_7)$ , out  $(r_8 \sim r_{15})$ , local  $(r_{16} \sim r_{23})$ and in  $(r_{24} \sim r_{31})$  registers. Correspondingly, we give aliases " $\%g_0 \sim \%g_7$ ", "% $o_0 \sim \%o_7$ ", "% $l_0 \sim \%l_7$ " and "% $i_0 \sim \%i_7$ " for these groups respectively. The out, local and in registers form the current register window. The local registers are for private use in the current context. The in and out registers are shared with adjacent register windows for parameters passing. The save instruction rotates the register window from the current one to the next. Then the local and in registers in the original window are no longer accessible, and the original out registers becomes the in registers in the current window. The restore instruction does the inverse. The arguments taken by the save in this example tells us that, in addition to operate the register window, the execution of this instruction will also assign the value of (%sp -64), where the value of the register %sp is taken from the original window, to the %sp register in the new register window. The %sp register, which is an alias of  $r_{14}$ , acts as the stack pointer.

So, "save %sp, -64, %sp" in this example means that we allocate a new stack frame, whose size is 64 byte, to the callee (ChangY). The arguments taken by the restore are irrelevant here and can be ignored here.

At line 6, the wr instruction tries to update the special register Y with the value of  $\%i_0 \oplus 0$  (bitwise exclusive OR). However, the write is delayed for X cycles, where X is some predefined system parameter that ranges from 0 to 3. For portability, programmers usually do not rely on the exact value of X and assume it takes the maximum value 3. Therefore three nop instructions are inserted. Reading of Y earlier than line 9 may give us the old value. This feature is called "delayed writes".

These features make the semantics of the SPARCv8 code context-dependent. For instance, a read of a special register (e.g. the register Y in the above example) needs to make sure there are enough instructions executed since the most recent delayed write. As another example, the instruction following the call can be any instruction in general, but it is not supposed to update the register  $\mathbf{r}_{15}$ , which contains the return address saved by the call instruction. In addition, the delayed control transfer and the register windows also allow highly flexible calling conventions. Together, they make it a challenging task to have a Hoarestyle program logic for local and modular reasoning of SPARCv8 assembly code.

Working towards a fully certified OS kernel for aerospace crafts whose inline assembly is written in SPARCv8, we try to address these challenges and propose a practical program logic for realistically modelled SPARCv8 code. We have applied our logic to verify the main body of the task context switch routine in the kernel. Then, considering some works, e.g.[29, 10], define a set of abstract assembly primitives to substitute their concrete implementations for some convenience in program verification. We extend our program logic to support refinement verification to bridge the gap. The extended program logic ensures the contextual refinement relation [18, 29] between the implementation of SPARCv8 function and its corresponding abstract assembly primitive. The contextual refinement can be represented as the following form, which means that calling a SPARCv8 function  $C_{\rm as}$  will not produce more observable behaviors than calling its corresponding abstract assembly primitive  $\Upsilon$  under any context C:

$$\forall C. C[C_{as}] \subseteq C[\Upsilon]$$

Our work is based on earlier work on assembly code verification but makes the following contributions:

- Our logic supports all the above features of SPARCv8. We redefine basic blocks to include the instruction following the jump or return as the tail of a block, which models the delayed control transfer. To reason about delayed writes, we introduce a modal assertion  $\triangleright_t \mathtt{sr} \mapsto w$ , saying that the special register  $\mathtt{sr}$  will hold the value w in up to t cycles. We also give logic rules for  $\mathtt{save}$  and  $\mathtt{restore}$  instructions that do register window rotation.
- Following SCAP [11], our logic supports modular reasoning of function calls in a direct-style. We use the standard pre- and post-conditions as function

- specifications, instead of the binary assertion g used in SCAP. This allows us to reuse existing techniques (e.g. Coq tactics) to simplify the program verification process. The logic rules for function call and return is general and independent of any specific calling convention.
- We give direct-style semantic interpretation for the logic judgments, based on which we establish the soundness. This is different from previous work, which either does syntactic-based soundness proof (e.g. SCAP [11]) or treats return code pointers as first-class code pointers and gives CPS-style (continuation-passing style) semantics. Those approaches for soundness make it difficult to verify the interaction between the inline assembly and the C code in the kernel, the latter being verified following a direct-style program logic.
- Context switch of concurrent tasks is an important component in OS kernels. It is usually implemented as inline assembly because of the need to access registers and the stack. We verify the main body of the context switch routine in a realistic embedded OS kernel for aerospace crafts, which consists of around 250 lines of SPARCv8 code.
- In order to support refinement verification, we define a Pseudo-SPARCv8 program as our high-level program, which is multithread and can call abstract assembly primitives. It also abstracts or omits some sophisticated mechanism, like register window and delayed writes, in SPARCv8, and makes the program state of Pseudo-SPARCv8 program more simpler than the physical SPARCv8 program defined in Sec. 2. So, it provides us more convenience to write the abstract assembly primitive in high-level program.
- We extend the program logic defined before to support refinement verification. The extended logic can ensure the contextual refinement between of SPARCv8 functions and their corresponding abstract assembly primitives. Thanks for our direct-style semantics interpretation for the logic judgments proposed before, which allows us to establish the correctness of each assembly function separatly, the extension does not meet much challenges.

The program logic, its soundness proof, the verification of the context switch module, and the extended program logic for refinement verification have been mechanized in Coq[3].

In the rest of paper, we present the program model and operational semantics of SPARCv8 in Sec. 2. Then we propose the program logic in Sec. 3, including the inference rules and the soundness proof, and show how our logic supprts the three main features of SPARCv8. We show the verification of the main body of the context switch routine in Sec. 4. The extended program logic to support refinement verification is presented in Sec. 5, and the high-level Pseudo-SPARCv8 program is introduced in Sec. 5.1. Finally we discuss more on related work and conclude in Sec. 6.

### 2 The SPARCv8 Assembly Language

We introduce the key SPARCv8 instructions, the model of machine states, and the operational semantics in this section.

```
(Block)
                                                                           \in \mathbb{Z}
   (Word) w, f
                    \in Int32
   (Addr)
                     \in
                        Block \times Word
                                                         (Val)
                                                                          := w \mid l
                    ::= (C, S, pc, npc)
                                                                    C
                                                                           ∈ Word → Comm
    (Prog)
                                                 (CodeHeap)
   (State)
              S
                    ::= (M, Q, D)
                                                     (RState)
                                                                    Q
                                                                          ::= (R, F)
(Memory)
              M
                    \in Addr 
ightharpoonup Val
                                                (ProgCount) pc, npc ∈ Word
 (OpExp)
                    ::= r \mid w
                                                  (AddrExp)
                                                                         := o | r + o
                                                                    а
                    ::= i \mid call f \mid jmp a \mid retl \mid be f
 (Comm)
                    ::= ld a r_d | st r_s a | nop | save r_s o r_d | restore r_s o r_d
(SimpIns)
              i
                      \mid add r_s o r_d \mid rd sr r_d \mid wr r_s o sr \mid ...
(InstrSeq)
              \mathbb{I}
                   := i; I \mid jmp \ a; i \mid call \ f; i; I \mid retl; i \mid be \ f; i; I
```

Fig. 2. Machine States and Language for SPARCv8 Code

### 2.1 Language syntax and states

The machine model and syntax of SPARCv8 assembly language are defined in Fig. 2. Here, we follow the block-based memory [17] used in CompCert, so that each procedure can have a block of its own stack frame. The memory address lis defined as a pair of its block id and the offset. The type of block is the integer in mathematics represent as  $\mathbb{Z}$ , and the type of offset is a 32-bit integer, which we called word in our work. So, the value here is either a word w or address l. The whole program configuration P consists of the code heap C, the machine state S, and the program counters pc and npc. The code heap C is a partial function from labels f to commands c. Labels are 32-bit integers (called words), which can be viewed as addresses or locations where the commands are saved in code heap. The operand expression o, which is either a general register r or an immediate value w, and address expression a, which is either a operand expression or a sum of the value of register r and an operation expression, are auxiliary definitions used as paramters of commands. Note, the immediate value is a word, and not allowed to be an address in our work. Commands in SPARCv8 can be classified into two categories, the simple instructions i and the controltransfer instructions like call and jmp.

The machine state S consists of three parts: the memory M, the register state Q which is a pair of register file R and frame list F, and the delay buffer D. As defined in Fig. 3, R is a partial mapping from register names to words. Registers include the general registers  $\mathbf{r}$ , the processor state register  $\mathbf{psr}$  and the special registers  $\mathbf{sr}$ . The processor state register  $\mathbf{psr}$  contains the integer condition code fields  $\mathbf{n}$ ,  $\mathbf{z}$ ,  $\mathbf{v}$  and  $\mathbf{c}$ , which can be modified by the arithmetic and logical instructions and used for conditional control-transfer, and  $\mathbf{cwp}$  recording the id of the current register window. We explain the frame list F and the delay buffer D below.

**Register windows and frame List.** SPARCv8 provides 32 general registers, which are split into four groups as global  $(\mathbf{r}_0 \sim \mathbf{r}_7)$ , out  $(\mathbf{r}_8 \sim \mathbf{r}_{15})$ , local $(\mathbf{r}_{16} \sim \mathbf{r}_{23})$  and in  $(\mathbf{r}_{24} \sim \mathbf{r}_{31})$  registers. The latter three groups (out, local and in) form the current register window.

Fig. 3. Register File, Frame List and DelayBuffer



Fig. 4. Register Windows (figure taken from [4])

At the entry and exit of functions and traps, one may need to save and restore some of the general registers as execution contexts. Instead of saving them into stacks in memory, SPARCv8 uses multiple register windows to form a circular stack, and does window rotation for efficient context save and restore. As shown in Fig. 4, there are N register windows (N=8 here) consisting of  $2\times N$  groups of registers (each group containing 8 registers). The cwp register (part of psr) records the id number of the current window (cwp = 0 in this example).

The in and out registers of each window are shared with its adjacent windows for parameter passing. For example, the in registers of the  $w_0$  is the out registers of the  $w_1$ , and the out registers of the  $w_0$  is the in registers of the  $w_7$ . This explains why we need only  $2 \times N$  groups of registers for N windows, while each window consisting of three groups (out, local and in).

To save the context, the save instruction rotates the window by decrements the cwp pointer (modulo N). So  $w_7$  becomes the current window. The out registers of  $w_0$  becomes the in registers of  $w_0$  become

```
\begin{aligned} & \text{out} \ ::= [\mathbf{r}_{8}, \dots, \mathbf{r}_{15}] & \text{local} \ ::= [\mathbf{r}_{16}, \dots, \mathbf{r}_{23}] & \text{in} \ ::= [\mathbf{r}_{24}, \dots, \mathbf{r}_{31}] \\ & R([\mathbf{r}_{i}, \dots, \mathbf{r}_{i+k}]) \ ::= [R(\mathbf{r}_{i}), \dots, R(\mathbf{r}_{i+k})] \\ & R\{[\mathbf{r}_{i}, \dots, \mathbf{r}_{i+7}] \leadsto \mathbf{fm}\} \ ::= R\{\mathbf{r}_{i} \leadsto v_{0}\} \dots \{\mathbf{r}_{i+7} \leadsto v_{7}\} \\ & \text{where} \quad \mathbf{fm} = [v_{0}, \dots, v_{7}] \end{aligned}
& \mathbf{win\_valid}(w_{id}, R) ::= 2^{w_{id}} \& R(\mathbf{wim}) = 0 \\ & \text{where } \& \text{ is the bitwise AND operation.}
& \mathbf{next\_cwp}(w_{id}) \quad ::= (w_{id} + N - 1)\% N \qquad \mathbf{prev\_cwp}(R(\mathbf{cwp})), \mathbf{win\_valid}(w'_{id}, R), \\ & F = F'' \cdot \mathbf{fm}_{1} \cdot \mathbf{fm}_{2}, F' = R(\mathbf{local}) :: R(\mathbf{in}) :: F'', \\ & R'' = R\{\mathbf{in} \leadsto R(\mathbf{out}), \mathbf{local} \leadsto \mathbf{fm}_{2}, \mathbf{out} \leadsto \mathbf{fm}_{1}\}, \\ & R' = R''\{\mathbf{cwp} \leadsto w'_{id}\}, \\ & \bot \qquad if \neg \mathbf{win\_valid}(\mathbf{next\_cwp}(R(\mathbf{cwp})), \mathbf{win\_valid}(w'_{id}, R), \\ & F = \mathbf{fm}_{1} :: \mathbf{fm}_{2} :: F'', \ F' = F'' \cdot R(\mathbf{out}) \cdot R(\mathbf{local}), \\ & R'' = R\{\mathbf{in} \leadsto \mathbf{fm}_{2}, \mathbf{local} \leadsto \mathbf{fm}_{1}, \mathbf{out} \leadsto R(\mathbf{in})\}, \\ & R'' = R'[\mathbf{cwp} \leadsto w'_{id}\}, \\ & \bot \qquad if \neg \mathbf{win\_valid}(\mathbf{prev\_cwp}(R(\mathbf{cwp})), R) \end{aligned}
```

Fig. 5. Auxiliary Definitions for Instruction save and restore

inaccessible. This is like pushing them onto the circular stack. The restore instruction does the inverse, which is like a stack pop.

The wim register is used as a bit vector to record the end of the stack. Each bit in wim corresponds to a register window. The bit corresponding to the last available window is set to 1, which means *invalid*. All other bits are 0 (*i.e. valid*). When executing save (and restore), we need to ensure the next window is valid, in order to avoid the overflow of register window because of the limitation of the number of windows. We use the assertion  $\mathbf{win}_{-}\mathbf{valid}(w_{id}, R)$  defined in Fig. 5 to say the window pointed to by  $w_{id}$  is valid, given the value of wim in R.

We use the frame list F to model the circular stack consisting of register windows. As defined in Fig. 3, a frame is an array of 8 words, modeling a group of 8 registers. F consists of a sequence of frames corresponding to all the register windows except the out, local and in registers in the current window. Then save saves the local and in registers onto the head of F and loads the two groups of register at the tail of F to the local and out registers (and the original out registers becomes the in group). The restore instruction does the inverse. The operations are defined formally in Fig. 5.

**The delay buffer.** The delay buffer D is a sequence of delayed writes. Because the wr instruction does not update the target register immediately, we put the write operation onto the delay buffer. A delayed write is recorded as a triple

consisting of the remaining cycles t to be delayed, the target special register sr and the value w to be written. Note that we restrict that the value of a special register can only be a word, because the special registers are used to record the state of processor, and there is impossible to store memory addresses in them.

Instruction sequences. We use an instruction sequence  $\mathbb{I}$  to model a basic block, *i.e.* a sequence of commands ending with a control transfer. As defined in Fig. 2, we require that a delayed control-transfer instruction must be followed by a simple instruction  $\mathbf{i}$ , because the actual control-transfer occurs after the execution of  $\mathbf{i}$ . The end of each instruction sequence can only be  $\mathtt{jmp}$  or  $\mathtt{ret1}$  followed by a simple instruction  $\mathbf{i}$ . Note that we do not view the  $\mathtt{cal1}$  instruction as the end of a basic block, since the callee is expected to return, following our direct-style semantics for function calls. We define  $C[\mathtt{f}]$  to extract an instruction sequence starting from  $\mathtt{f}$  in C below.

$$C[\mathbf{f}] = \begin{cases} \mathbf{i}; \mathbb{I} & C(\mathbf{f}) = \mathbf{i} \text{ and } C[\mathbf{f}+4] = \mathbb{I} \\ c; \mathbf{i} & c = C(\mathbf{f}) \text{ and } c = \mathbf{jmp} \text{ a or retl} \\ \text{and } C(\mathbf{f}+4) = \mathbf{i} \end{cases}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

$$c = C(\mathbf{f}) \text{ and } c = \text{call } \mathbf{f} \text{ or be } \mathbf{f}$$

#### 2.2 Operational Semantics

The operational semantics is taken from Wang et al. [28], but we omit features like interrupts and traps. We show the selected rules in Fig. 6. The program transition relation  $C \vdash (S, \mathtt{pc}, \mathtt{npc}) \longmapsto (S', \mathtt{pc}', \mathtt{npc}')$  is defined in Fig. 6 (a). Before the execution of the instruction pointed by  $\mathtt{pc}$ , the delayed writes in D with 0 delay cycles are executed first. The execution of the delayed writes are defined in the form of  $(R, D) \rightrightarrows (R', D')$ , as shown below:

$$\frac{(R,D) \rightrightarrows (R',D')}{(R,\operatorname{nil}) \rightrightarrows (R,\operatorname{nil})} \frac{(R,D) \rightrightarrows (R',D')}{(R,(t+1,\operatorname{sr},w)::D) \rightrightarrows (R',(t,\operatorname{sr},w)::D')}$$
 
$$\frac{(R,D) \rightrightarrows (R',D') \quad \operatorname{sr} \in \operatorname{dom}(R)}{(R,(0,\operatorname{sr},w)::D) \rightrightarrows (R'\{\operatorname{sr} \leadsto w\},D')} \frac{(R,D) \rightrightarrows (R',D') \quad \operatorname{sr} \not\in \operatorname{dom}(R)}{(R,(0,\operatorname{sr},w)::D) \rightrightarrows (R',D')}$$

Note that the write of sr has no effect if sr is not in the domain of R. Since R is defined as a partial map, we can prove the following lemma.

**Lemma 2.1.** 
$$(R,D) \rightrightarrows (R',D')$$
 and  $R = R_1 \uplus R_2$ , if and only if there exists  $R'_1$  and  $R'_2$ , such that  $(R_1,D) \rightrightarrows (R'_1,D')$ ,  $(R_2,D) \rightrightarrows (R'_2,D')$ , and  $R' = R'_1 \uplus R'_2$ .

Here the disjoint union  $R_1 \uplus R_2$  represents the union of  $R_1$  and  $R_2$  if they have disjoint domains, and undefined otherwise. This lemma is important to give sound semantics to delay buffer related assertions, as discussed in Sec. 3.

The transition steps for individual instructions are classified into three categories: the control transfer steps ( $\_\vdash\_\circ\longrightarrow\_$ ), the steps for save, restore and

$$\begin{split} &(R,D) \rightrightarrows (R',D') \\ & C \vdash ((M,(R',F),D'),\operatorname{pc},\operatorname{npc}) & \circ \longrightarrow ((M',(R'',F'),D''),\operatorname{pc}',\operatorname{npc}') \\ & \overline{C \vdash ((M,(R,F),D),\operatorname{pc},\operatorname{npc}) \longmapsto ((M',(R'',F'),D''),\operatorname{pc}',\operatorname{npc}')} \end{split}$$

## (a) Program Transistion

$$\begin{split} &C(\mathsf{pc}) = \mathbf{i} \qquad (M,(R,F),D) \bullet \stackrel{\mathbf{i}}{\longrightarrow} (M',(R',F'),D') \\ &\overline{C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc})} \quad \circ \longrightarrow ((M',(R',F'),D'),\mathsf{npc},\mathsf{npc} + 4) \\ & \qquad \qquad C(\mathsf{pc}) = \mathsf{jmp} \, \mathbf{a} \quad \llbracket \mathbf{a} \rrbracket_R = \mathbf{f} \\ & \overline{C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc})} \quad \circ \longrightarrow ((M,(R,F),D),\mathsf{npc},\mathbf{f}) \\ & \qquad \qquad C(\mathsf{pc}) = \mathsf{call} \, \mathbf{f} \quad \mathbf{r}_{15} \in \mathsf{dom}(R) \\ & \overline{C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc})} \quad \circ \longrightarrow ((M,(R\{\mathbf{r}_{15} \leadsto \mathsf{pc}\},F),D),\mathsf{npc},\mathbf{f}) \\ & \qquad \qquad C(\mathsf{pc}) = \mathsf{retl} \quad R(\mathbf{r}_{15}) = \mathbf{f} \\ & \qquad \overline{C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc})} \quad \circ \longrightarrow ((M,(R,F),D),\mathsf{npc},\mathbf{f} + 8) \end{split}$$

# (b) Control Transfer Instruction Transition

$$\frac{(M,R) \xrightarrow{i} (M',R')}{(M,(R,F),D) \bullet \xrightarrow{i} (M',(R',F),D)} \qquad \frac{R(\mathbf{r}_s) = w_1}{\sup_{\mathbf{r} \in \mathrm{dom}(R)} \mathbb{O}_f^l = \mathrm{set\_delay}(\mathbf{r}_t,w,D)}{D' = \mathrm{set\_delay}(\mathbf{r}_t,w,D)} \\ \frac{\mathrm{save}(R,F) = (R',F')}{(M,(R,F),D) \bullet \xrightarrow{\mathrm{save}\,\mathbf{r}_s \circ \mathrm{sr}} (M,(R,F),D')} \\ \frac{(M,(R,F),D) \bullet \xrightarrow{\mathrm{save}\,\mathbf{r}_s \circ \mathrm{r}_d}}{(M,(R,F),D) \bullet \xrightarrow{\mathrm{save}\,\mathbf{r}_s \circ \mathrm{r}_d} (M,(R'',F'),D)} \\ \frac{\mathrm{restore}(R,F) = (R',F')}{(M,(R,F),D) \bullet \xrightarrow{\mathrm{restore}\,\mathbf{r}_s \circ \mathrm{r}_d} (M,(R'',F'),D)}$$

(c) Save, Restore and Wr instruction Transition

(d) Simple Instruction Transition

$$\llbracket \mathtt{o} \rrbracket_R ::= \begin{cases} R(r) & \text{if } \mathtt{o} = r \\ w & \text{if } \mathtt{o} = w, \\ -4096 \leq w \leq 4095 \\ \bot & \text{otherwise} \end{cases} \quad \llbracket \mathtt{a} \rrbracket_R ::= \begin{cases} \llbracket \mathtt{o} \rrbracket_R & \text{if } \mathtt{a} = \mathtt{o} \\ v_1 + v_2 & \text{if } \mathtt{a} = \mathtt{r} + \mathtt{o}, \ R(\mathtt{r}) = v_1 \\ & \text{and } \llbracket \mathtt{o} \rrbracket_R = v_2 \\ \bot & \text{otherwise} \end{cases}$$

(e) Expression Semantics

Fig. 6. Selected operational semantics rules

wr instructions ( $\_ \bullet \longrightarrow \_$ ), and the steps for other simple instructions ( $\_ \longrightarrow \_$ ). The corresponding step transition relations are defined inductively in Fig. 6 (b), (c) and (d) respectively.

Note that, after the control-transfer instructions, pc is set to npc and npc contains the target code pointer. This explains the one cycle delay for the control transfer. The call instruction saves pc into the register  $\mathbf{r}_{15}$ , while retl uses  $\mathbf{r}_{15}+8$  as the return address (which is the address for the second instruction following the call). Evaluation of expressions a and o is defined as  $[a]_R$  and  $[o]_R$  in Fig. 6 (e). Here, we define the sum of two values  $v_1$  and  $v_2$  below. The result of  $v_1+v_2$  is legel, if both of the  $v_1$  and  $v_2$  are words (Int32), or  $v_1$  is an address and  $v_2$  is an offset. This is explain why the immediate value in our work is a word, because immediate value usually acts as an offset, which is a word, in the calculation of address.

$$v_1 + v_2 ::= \begin{cases} w_1 + w_2 & \text{if } v_1 = w_1, \text{ and } v_2 = w_2 \\ (b, w_1 + w_2) & \text{if } v_1 = (b, w_1), \text{ and } v_2 = w_2 \\ \bot & \text{otherwise} \end{cases}$$

The wr wants to save the bitwise exclusive OR of the operands into the special register sr, but it puts the write into the delay buffer D instead of updating R immediately. The operation  $set\_delay(sr, w, D)$  is defined below:

set 
$$\operatorname{delay}(\operatorname{sr}, w, D) ::= (X, \operatorname{sr}, w) :: D$$

where X ( $0 \le X \le 3$ ) is a predefined system parameter for the delay cycle. Note that as we have explained before, special register is used to record the state of processors, so we do not permit saving memory address in it.

The save and restore instruction rotate the register windows and update the register file. Their operations over F and R are defined in Fig. 5.

### 3 Program Logic

In this section, we introduce the assertion language and program logic designed for SPARCv8 program.

## 3.1 Assertions

$$(Asrt) \ p, q ::= \operatorname{emp} \mid l \mapsto v \mid \operatorname{rn} \mapsto v \mid \triangleright_t \operatorname{sr} \mapsto w \mid p \downarrow \mid \operatorname{cwp} \mapsto (\!\!\mid w_{id}, F \!\!\mid)$$
 
$$\mid p \wedge q \mid p \vee q \mid p * q \mid \operatorname{a}_{=_q} v \mid \operatorname{o} = v \mid \forall x. \ p \mid \exists x. \ q \mid \dots$$

Fig. 7. Syntax of Assertions

We define syntax of assertions in Fig. 7, and their semantics in Fig. 8. We extend separation logic assertions with specifications of delay buffers and register windows. Registers are like variables in separation logic, but are treated as

```
\begin{split} S &\models \mathsf{emp} & ::= S.M = \emptyset \land S.Q.R = \emptyset \\ S &\models l \mapsto v & ::= S.M = \{l \leadsto v\} \land S.Q.R = \emptyset \\ S &\models \mathsf{rn} \mapsto v & ::= S.Q.R = \{\mathsf{rn} \leadsto v\} \land \mathsf{rn} \notin \mathsf{dom}(S.D) \land S.M = \emptyset \\ S &\models \mathsf{rn} \mapsto v & ::= \exists k, R', D'. 0 \leq k \leq t+1 \land (R,D) \rightrightarrows^k (R',D') \land \\ & ((M,(R',F),D') \models \mathsf{sr} \mapsto w) \land \mathsf{noDup}(D,\mathsf{sr}) \\ & \mathsf{where} \ S = (M,(R,F),D) \\ S &\models \mathsf{p} \downarrow & ::= \exists R', D'. ((M,(R',F),D') \models \mathsf{p}) \land (R',D') \rightrightarrows (R,D) \\ & \mathsf{where} \ S = (M,(R,F),D) \\ S &\models \mathsf{cwp} \mapsto \emptyset w_{id}, F \} ::= (S \models \mathsf{cwp} \mapsto w_{id}) \land \exists F'.F \cdot F' = S.Q.F \\ S &\models \mathsf{a} = a v & ::= \llbracket \mathsf{a} \rrbracket_{S.Q.R} = v \land \mathsf{word\_align}(v) \\ S &\models \mathsf{o} = v & ::= \llbracket \mathsf{o} \rrbracket_{S.Q.R} = v \\ S &\models p_1 \ast p_2 & ::= \exists S_1, S_2. S_1 \models p_1 \land S_2 \models p_2 \land S = S_1 \uplus S_2 \\ S_1 \uplus S_2 ::= \begin{cases} (M_1 \cup M_2, (R_1 \cup R_2, F), D) & \text{if } M_1 \bot M_2 \land R_1 \bot R_2 \land \\ S_1 = (M_1, (R_1, F), D) \land S_2 = (M_2, (R_2, F), D) \\ \text{undefined} & \text{otherwise} \end{cases} \\ \mathsf{dom}(D) ::= \begin{cases} \{\mathsf{sr}\} \cup \mathsf{dom}(D') & \text{if } D = (t, \mathsf{sr}, w) :: D' \\ \emptyset & \text{if } D = \mathsf{nil} \end{cases} \\ \mathsf{noDup}(D, \mathsf{sr}) ::= \begin{cases} \mathsf{sr} \not\in \mathsf{dom}(D') & \text{if } D = (t, \mathsf{sr}, w) :: D' \\ \mathsf{rr} \neq \mathsf{sr}' \land \mathsf{noDup}(D', \mathsf{sr}) & \text{if } D = (t, \mathsf{sr}', w) :: D' \\ \mathsf{True} & \text{if } D = \mathsf{nil} \end{cases} \end{split}
```

Fig. 8. Semantics of Assertions

resources. The assertion emp says that the memory and the register file are both empty.  $l \mapsto v$  specifies a singleton memory cell with value v stored in the address l.  $\mathtt{rn} \mapsto v$  says that  $\mathtt{rn}$  is the only register in the register file and it contains the value v. Also  $\mathtt{rn}$  is not in the delay buffer. Separating conjunction p\*q has the standard semantics as in separation logic [26].

The assertion  $\triangleright_t \mathbf{sr} \mapsto w$  describes a delayed write in the delay buffer D. It describes the uncertainty of  $\mathbf{sr}$ 's value in R, which is unknown for now but will become w in up to t+1 cycles. We use  $\_ \rightrightarrows^k \_$  to represent k-step execution of the delayed writes in D. It also requires that there be at most one delayed write for a specific special register  $\mathbf{sr}$  in D (*i.e.*  $\mathbf{noDup}(\mathbf{sr}, D)$ ). This prevents more than one delayed writes to the same register within 4 instruction cycles, which practically have no restrictions on programming. By the semantics we have

$$\mathtt{sr} \!\mapsto\! w \implies \rhd_t \!\mathtt{sr} \!\mapsto\! w \qquad \qquad \rhd_t \!\mathtt{sr} \!\mapsto\! w \implies \rhd_{t\!+\!k} \!\mathtt{sr} \!\mapsto\! w$$

The assertion  $p\downarrow$  allows us to reduce the uncertainty by executing one step of the delayed writes. It specifies states reachable after executing one step of delayed writes from those states satisfying p. Therefore we know:

$$(\rhd_0 \mathtt{sr} \mapsto w) \!\downarrow \implies \mathtt{sr} \mapsto w \qquad \qquad (\rhd_{t\!+\!1} \mathtt{sr} \mapsto w) \!\downarrow \implies \rhd_t \mathtt{sr} \mapsto w$$

Also it's easy to see that if p syntactically does not contain sub-terms in the form of  $\triangleright_t \mathtt{sr} \mapsto w$ , then  $(p\downarrow) \iff p$ .

```
 \begin{array}{ll} - \; \{ (\mathrm{fp}, \mathrm{fq}) \} & \quad \text{ fp} \; ::= \; \lambda \, lv. \, (\% \mathtt{i}_0 \mapsto lv[0]) * \, (\% \mathtt{i}_1 \mapsto lv[1]) * \, (\% \mathtt{i}_2 \mapsto lv[2]) \\ \text{add} \; \; \% \mathtt{i}_7, \; \% \mathtt{i}_2, \; \% \mathtt{i}_7 & \quad *\% \mathtt{i}_7 \mapsto \_ * \, (\mathtt{r}_{15} \mapsto lv[3]) \\ \text{retl} & \quad \text{nop} \end{array}
```

Fig. 9. Example for Function Specification

The following lemma shows  $(\_)\downarrow$  is distributive over separating conjunction.

```
Lemma 3.1. (p*q)\downarrow \iff (p\downarrow)*(q\downarrow).
```

The lemma can be proved following Lemma 2.1.

We use  $\mathsf{cwp} \mapsto (w_{id}, F)$  to describe the pointer  $\mathsf{cwp}$  of the current register window and the frame list as a circular stack. Note that F is just a prefix of the frame list, since usually we do not need to know contents of the full list. Here we use  $F \cdot F'$  to represent the concatenation of lists F and F'. Therefore we have  $\mathsf{cwp} \mapsto (w_{id}, F \cdot F') \implies \mathsf{cwp} \mapsto (w_{id}, F)$ .

The assertions  $\mathbf{a} =_a v$  and  $\mathbf{o} = v$  describe the value of  $\mathbf{a}$  and  $\mathbf{o}$  respectively. They are intuitionistic assertions. Since  $\mathbf{a}$  is used as an address, we also require it to be properly aligned on a 4-byte boundary. We define **word\_align** to represent this restriction below. The result of the address expression  $\mathbf{a}$  may be a word, if it's a pointer in code heap, or an memory address, if it's a location of memory.

**word align**(
$$v$$
) ::=  $\exists w. (v = w \lor (\exists b. v = (b, w))) \land w\%4 = 0$ 

#### 3.2 Inference Rules

The code specification  $\theta$  and code heap specification  $\Psi$  are defined below:

(valList) 
$$\iota \in \text{list value}$$
 (pAsrt) fp, fq  $\in \text{valList} \to Asrt$  (CdSpec)  $\theta ::= \{\text{f} \mapsto \theta\}^*$ 

The code heap specification  $\Psi$  maps the code labels for basic blocks to their specifications  $\theta$ , which is a pair of pre- and post-conditions. Instead of using normal assertions, the pre- and post-conditions are assertions parameterized over a list of values lgvl. They play the role of auxiliary variables — Feeding the pre- and the post-conditions with the same lgvl allows us to establish relationship of states specified in the pre- and post-conditions.

Although we assign a  $\theta$  to each basic block, the post-condition does not specify the states reached at the end of the block. Instead, it specifies the condition that needs to be specified in the future when the *current function* returns. This follows the idea developed in SCAP [11], but we use the standard unary state assertion instead of the binary state assertions used in SCAP, so that existing proof techniques (such as Coq tactics) for standard Hoare-triples can be applied to simplify the verification process.

We give a simple example in Fig. 9 to show a specification for a function, which simply sums the values of the registers  $\%i_0$ ,  $\%i_1$  and  $\%i_2$  and writes the result into the register  $\%l_7$ . The specification (fp, fq) says that, when provided with the same lv as argument, the function preserves the value of  $\%i_0$ ,  $\%i_1$  and  $\%i_2$ ,  $\%l_7$  at the end contains the sum of  $\%i_0$ ,  $\%i_1$  and  $\%i_2$ , and the function also preserves the value of  $\mathbf{r}_{15}$ , which it uses as the return address. To verify the function, we need to prove that it satisfies (fp lv, fq lv) for all lv.

Figure 10 shows selected inference rules in our logic. The top rule **CDHP** verifies the code heap C. It requires that every basic block specified in  $\Psi$  can be verified with respect to the specification, with any argument  $\iota$  used to instantiate the pre- and post-conditions.

The **SEQ** rule is applied when meeting an instruction sequence starting with a simple instruction  $\mathbf i$ . The instruction  $\mathbf i$  is verified by the corresponding well-formed instruction rules, with the precondition  $p\downarrow$  and some post-condition p'. We use  $p\downarrow$  because there is an implicit step executing delayed writes before executing every instruction. The post-condition p' for  $\mathbf i$  is then used as the precondition to verify the remaining part of the instruction sequence.

**Delayed control transfers.** We distinguish the jmp and call instructions — The former makes an *intra-function* control transfer, while the latter makes function calls. The **JMP** rule requires that the target address is a valid one specified in  $\Psi$ . Starting from the precondition p, after executing the instruction i following **JMP** and the corresponding delayed writes, the post-condition p' of i should satisfy the precondition of the target instruction sequence, with some instantiation  $\iota$  of the logical variables and a frame assertion  $p_r$ . Since the target instruction sequence of jmp is in the same function as the jmp instruction itself, the post-condition p specified at the target address (with the same instantiation p of the logical variables and the frame assertion p should meet the post-condition p of the current function. As we explained before, the post-condition p does not specify the states reached at the end of the instruction sequence (which are specified by p' instead).

The **CALL** rule is similar to the **JMP** rule in that it also requires the post-condition  $p_2$  of the instruction  $\mathbf{i}$  following the **call** satisfy the precondition of the target instruction sequence, with some instantiation  $\iota$  of the logical variables and a frame assertion  $p_r$ . Here we need to record that the code label  $\mathbf{f}$  is saved in  $\mathbf{r}_{15}$  by the **call** instruction. When the callee returns, its post-condition fq (with the same instantiation of auxiliary variables  $\iota$ ) needs to ensure  $\mathbf{r}_{15}$  still contains  $\mathbf{f}$ , so that the callee returns to the correct address. Also the fq with the frame  $p_r$  needs to satisfy the precondition p' for the remaining instruction sequences of the caller.

The **RETL** rule simply requires that the post-condition q holds at the end of the instruction i following ret1. Also i cannot touch the register  $r_{15}$ , therefore  $r_{15}$  specified in p must be the same as in q. Since at the calling point we already required that the post-condition of the callee guarantees  $r_{15}$  contains the correct return address, we know  $r_{15}$  contains the correct value before ret1.

Fig. 10. Seleted Inference Rules

**Delayed writes and register windows.** The bottom layer of our logic is for well-formed instructions. The **WR** rule requires the ownership of the target register sr in the precondition  $(sr \mapsto_{-})$ . Also it implies there is no delayed writes to sr in the delay buffer (see the semantics defined in Fig. 8). At the end of the delayed write, we use  $\triangleright_3 sr \mapsto w_1 \oplus w_2$  to indicate the new value will be ready in up to 3 cycles. Since the maximum delay cycle X cannot be bigger than 3 and the value of X may vary in different systems, programmers usually take a conservative approach to assume X = 3 for portability of code. Our rule reflects this conservative view. The **RD** rule says the special register can be read only if it is not in the delay buffer. The **SAVE** and **RESTORE** rules reflect the save and recovery of the execution contexts, which is consistent with the operational semantics of the save and restore instructions given in Figs. 5 and 6.

#### 3.3 Semantics and Soundness

We first define the safety of instruction sequences, safe\_insSeq $(C, S, pc, npc, q, \Psi)$ . It says C can execute safely from S, pc and npc until reaching the end of the current instruction sequence (C[pc]), and q holds if C[pc] ends with the return instruction retl. It is formally defined in Def. 3.2. Here we use " $\_ \mapsto^n \_$ " to represent n-step execution. The defintion can ensure the progress and preservations of the execution of the instruction sequence. Progress property means the program can execute a step, when meeting simple instructions, or two steps, when meeting the delayed control transfer instructions, e.g.call and jmp. Preservations property means that if the program can execute one or two steps, the remaining part of instruction sequence can still execute safely.

### Definition 3.2 (Safety of Instruction Sequences).

safe\_insSeq $(C, S, pc, npc, q, \Psi)$  holds if and only if the following are true (we omit the case for be here, which is similar to jmp):

```
- if C(pc) = i then:
     • there exist S', pc, npc', such that C \vdash (S, pc, npc) \longmapsto (S', pc', npc'),
     • for any S', pc', npc', if C \vdash (S, pc, npc) \longmapsto (S', pc', npc'), then
        safe_insSeq(C, S', pc', npc', q, \Psi).
- if C(pc) = jmp a then:
     • there exist S', pc', npc', such that C \vdash (S, pc, npc) \longmapsto^2 (S', pc', npc'),
     • for any S', pc', npc', if C \vdash (S, pc, npc) \longmapsto^2 (S', pc', npc'), then there exist
        fp, fq, \iota and p_r, such that the following hold:
         (1) pc' = pc' + 4, \Psi(pc') = (fp, fq),
         (2) S' \models (\text{fp } \iota) * p_r, (\text{fq } \iota) * p_r \Rightarrow q.
- if C(pc) = be f then ...
- if C(pc) = call f then:
     • there exist S', pc', npc', such that C \vdash (S, pc, npc) \longmapsto^2 (S', pc', npc'),
     • for any S', pc' and npc', if C \vdash (S, pc, npc) \longmapsto^2 (S', pc', npc'), then there exist
        fp, fq, \iota and p_r, such that the following hold:
        (1) npc' = pc' + 4, \Psi(pc') = (fp, fq),
         (2) S' \models (\text{fp } \iota) * p_r,
         (3) for any S', if S' \models (\text{fq } \iota) * p_r, then safe_insSeq (C, S', pc + 8, pc + 12, q, \Psi),
```

```
(4) for any S', if S' |= (fq ι), then S'.Q.R(r<sub>15</sub>) = pc.
if C(pc) = ret1 then:
there exist S', pc', npc', such that C ⊢ (S, pc, npc) → (S', pc', npc'),
for any S', pc' and npc', if C ⊢ (S, pc, npc) → (S', pc', npc'), then S' |= q, pc' = S'.Q.R(r<sub>15</sub>)+8, and npc' = S'.Q.R(r<sub>15</sub>)+12.
```

Then we can define the semantics for well-formed instruction sequences and well-formed code heap. The semantics of well-formed instruction sequences tells us that for any C, if the instruction sequence starting from label  $\mathbf{f}$  is  $\mathbb{I}$ , then the instruction sequence  $\mathbb{I}$  can execute safely if the initial state S satisfies the precondition p. And the semantics of well-formed code heap says that all the instruction sequences in code heap C given specifications can execute safely.

## Definition 3.3 (Judgment Semantics).

- $-\Psi \models \{(p,q)\}\ \mathtt{f} : \mathbb{I} \text{ if and only if, for all } C \text{ and } S \text{ such that } C[\mathtt{f}] = \mathbb{I} \text{ and } S \models p, \text{ we have } \mathsf{safe\_insSeq}(C,S,\mathtt{f},\mathtt{f}+4,q,\Psi).$
- $\models C : \Psi$  if and only if, for all f, fp and fq such that  $\Psi(f) = (fp, fq)$ , we have  $\Psi \models \{(fp \ \iota, fq \ \iota)\} \ f : C[f]$  for all  $\iota$ .

Next we define the safety  $\mathsf{safe}^n(C, S, \mathsf{pc}, \mathsf{npc}, q, k)$  of whole program execution in Def. 3.4. It says that, starting with  $\mathsf{pc}$ ,  $\mathsf{npc}$  and the state S, and with the depth k of function calls, the code C either halts in less than n steps, with the final state satisfies q, or it executes at least n steps safely. Here we say C halts if it reaches the return point of the topmost function (when the depth k of the function call is 0). In the definition below, the depth k increases by the call instruction and decreases by  $\mathsf{retl}$  (unless k=0).

**Definition 3.4 (Program Safety).** safe<sup>0</sup>(C, S, pc, npc, q, k) always holds. safe<sup>n+1</sup>(C, S, pc, npc, q, k) holds if and only if the following are true:

```
    if C(pc) ∈ {i, jmp a, be f}, then:

            there exist S', pc', npc', such that C ⊢ (S, pc, npc) → (S', pc', npc');
            for any S', pc', npc', if
            C ⊢ (S, pc, npc) → (S', pc', npc'), then safe<sup>n</sup> (C, S', pc', npc', q, k);

    if C(pc) = call f, then:

            there exist S', pc', npc' such that C ⊢ (S, pc, npc) → <sup>2</sup> (S', pc', npc');
            for any S', pc', npc', if C ⊢ (S, pc, npc) → <sup>2</sup> (S', pc', npc'),
            then safe<sup>n</sup>(C, S', pc', npc', q, k + 1);

    if C(pc) = retl, then:

            there exist S', pc', npc' such that C ⊢ (S, pc, npc) → <sup>2</sup> (S', pc', npc');
            for any S', pc', npc', if C ⊢ (S, pc, npc) → <sup>2</sup> (S', pc', npc'), then if k = 0 then
            S' ⊨ q
            else

    safe<sup>n</sup>(C, S', pc', npc', npc', q, k-1).
```

Then the following theorem and corollary show the soundness of our logic. The correctness of corollary 3.6 guarantees that we can get the correctness of the whole program, if each basic code block or internal function composing it is verified by our program logic.



Fig. 11. The Structure of Context Switch Module

Theorem 3.5 (Soundness).  $\vdash C : \Psi \Longrightarrow \models C : \Psi$ 

Corollary 3.6 (Function Safety). If  $\Psi \models \{(p,q)\}$  pc : C[pc],  $S \models p$ , and  $\models C: \Psi$ , then  $\forall n$ . safe  $C: \Psi$ , safe  $C: \Psi$ , then  $\forall n$  safe  $C: \Psi$  can be defined as  $C: \Psi$ .

## 4 Verifying a Realistic Context Switch Module

We apply our program logic to verify the main body of a context switch routine implemented in SPARCv8, which is used to save the current task's context and restore the new task's context. Figure 11 shows the structure of the code.

- SwitchEntry is the entry of the module. It checks SwitchFlag to see if a context switch is needed. If yes, it enters the Window OK block.
- Window\_OK checks if the current task is null (which may happen if the switch follows the delete of the current task). If yes, it jumps to Adjust\_CWP, which resets the pointer cwp of the current register window so that it points to the last valid window. It essentially pops all the frames to empty the circular stack of register windows. If the current task is not null, it calls reg\_save to save the general registers into the TCB, and then enter the code block Save\_UsedWindows to save other register windows (F in our state model).
- Save\_UsedWindows saves the register windows (except the current one) into the current task's stack in memory. It checks whether the previous window is valid. If it's valid, use the instruction restore to set the previous window as the current one, and save its contents into stack (in memory), then check the previous one continuously.
- Switch\_NewContext restores the general registers and other register windows from the new task's TCB and its stack in memory, respectively. Then it sets the new task as the current one.

The main complexity of the verification lies in the code manages the register windows. To save all the register windows, Save\_UsedWindows repetitively restores the next window into general registers (as the current window) and then saves them into memory, until all the windows are saved.

**Specification.** Below we give the pre- and post-conditions ( $a_{pre}$  and  $a_{post}$ ) of the verified module. Each of them takes 5 arguments, the id of the current task  $t_c$ , the id of the new task  $t_n$ , the value flag of the SwitchFlag, the values env of general registers and all other register windows, and the new task's context nst that needs to be restored.

```
\begin{aligned} \mathsf{a}_{pre}(t_c,t_n,flag,\mathit{env},\mathit{nst}) \; &:= \; \mathsf{Env}(\mathit{env}) * (\mathsf{SwitchFlag} \mapsto \mathit{flag}) * (\mathsf{TaskNew} \mapsto t_n) * \\ & (\mathit{flag} = \mathsf{false} \vee \mathsf{CurT}(t_c,\_,\mathit{env}) * \mathsf{NoCurT}(t_n,\mathit{nst})) \\ \mathsf{a}_{post}(t_c,t_n,\mathit{flag},\mathit{env},\mathit{nst}) \; &:= \; \exists \mathit{env}'. \; \mathsf{Env}(\mathit{env}') * (\mathsf{SwitchFlag} \mapsto \mathsf{false}) * (\mathsf{TaskNew} \mapsto t_n) * \\ & (\mathit{flag} = \mathsf{false} \wedge \mathsf{p\_env}(\mathit{env}) = \mathsf{p\_env}(\mathit{env}') \\ & \vee (\mathsf{CurT}(t_n,\mathit{nst},\mathit{env}') \wedge \mathsf{p\_env}(\mathit{env}') = \mathit{nst}) * \\ & \mathsf{NoCurT}(t_c,\mathsf{p\_env}(\mathit{env}))) \end{aligned}
```

In the specification, we use  $\mathsf{Env}(\mathit{env})$  to specify the values of general registers and the register windows. The variable TaskNew records the identifier of the new task. If SwitchFlag is false, we do not need any knowledge about the current and the new tasks since there is no context switch. Otherwise we describe the state of the current task (its TCB and stack in memory) using  $\mathsf{CurT}(t_c, \_, \mathit{env})$ , and the saved context of the new task using  $\mathsf{NoCurT}(t_n, \mathit{nst})$ . Due to space limitation we omit the detailed definitions here.

If we compare  $\mathsf{a}_{pre}$  and  $\mathsf{a}_{post}$ , we can see that  $t_n$  becomes the current task  $(\mathsf{CurT}(t_n, nst, env'))$ , and its general registers and stack, specified by  $\mathsf{Env}(env')$ , are loaded from the saved context nst (i.e.  $\mathsf{p\_env}(env') = nst$ ). Here  $\mathsf{p\_env}(env')$  refers to the part of the environment that we want to save or restore as context. Correspondingly,  $t_c$  becomes non-current-thread, and part of its environment env at the entry of the context switch is saved, as specified by  $\mathsf{NoCurT}(t_c, \mathsf{p\_env}(env))$ .

Tactics for Automated Reasoning. Cao et al.[8] propose a set of practical tactics for verifying C program in Coq.

$$\frac{p_1 \Longrightarrow p_2}{p_1 * q' \Longrightarrow p_2 * q'} L_1 \qquad \frac{w = w' \qquad p_1 \Longrightarrow p_2}{(l \mapsto w) * p_1 \Longrightarrow (l \mapsto w') * p_2} L_2$$

One of a important tacitc in their work is  $sep\_cancel$ . It has a library, including some rules (e.g.  $L_1$  and  $L_2$  shown above), and uses the rules in library repeatedly to solve " $p \Longrightarrow q$ ". The rules  $L_1$  and  $L_2$  let we can use  $sep\_cancel$  to eliminate the subterms, which describe the same resources, in p and q.

In our work, we add some additional rules  $L_3 \sim L_9$  (defined in Fig. 12) to the library of sep\_cancel. The soundness of rules  $L_3 \sim L_9$  can be achieved from the properties of assertions introduced in Sec. 3.1 directly. Aftering extending the original sep\_cancel tactic, we can find that the correctness proof of the following implication can be accomplished by using the extended sep\_cancel directly.

$$(\triangleright_3 Y \mapsto w_1 * \triangleright_0 wim \mapsto w_2 * r_5 \mapsto w_3) \downarrow \Longrightarrow \triangleright_2 Y \mapsto w_1 * wim \mapsto w_2 * r_5 \mapsto w_3$$

Using the extended sep\_cancel can greatly simplify our verification work, and improve the efficiency of the code proving.

$$\frac{w=w' \quad p_1 \Longrightarrow p_2}{(\mathtt{rn} \mapsto w) * p_1 \Longrightarrow (\mathtt{rn} \mapsto w') * p_2} \ L_3 \qquad \frac{w=w' \quad p_1 \Longrightarrow p_2}{(\rhd_t \mathtt{st} \mapsto w) * p_1 \Longrightarrow (\rhd_t \mathtt{st} \mapsto w) * p_2} \ L_4$$

$$\frac{w=w' \quad p_1 \Longrightarrow p_2}{(\rhd_0 \mathtt{sr} \mapsto w) \downarrow * p_2 \Longrightarrow (\mathtt{sr} \mapsto w) * p_2} \ L_5$$

$$\frac{w=w' \quad p_1 \Longrightarrow p_2}{(\rhd_{t+1} \mathtt{sr} \mapsto w) \downarrow * p_1 \Longrightarrow (\rhd_t \mathtt{sr} \mapsto w') \downarrow * p_2} \ L_6$$

$$\frac{w=w' \quad p_1 \Longrightarrow p_2}{(\mathtt{rn} \mapsto w) \downarrow * p_1 \Longrightarrow (\rhd_t \mathtt{sr} \mapsto w') \downarrow * p_2} \ L_6$$

$$\frac{w=w' \quad p_1 \Longrightarrow p_2}{(\mathtt{rn} \mapsto w) \downarrow * p_1 \Longrightarrow \mathtt{rn} \mapsto w' * p_2} \ L_7 \qquad \frac{w=w' \quad p_1 \Longrightarrow p_2}{(l\mapsto w) \downarrow * p \Longrightarrow l\mapsto w' * (l\mapsto w') * p'} \ L_8$$

$$\frac{p_1 \downarrow \Longrightarrow p'_1 \quad p_2 \downarrow \Longrightarrow p'_2}{(p_1 * p_2) \downarrow \Longrightarrow p'_1 * p'_2} \ L_9$$

Fig. 12. Extended rules for Tactic sep\_cancel

Fig. 13. Example of Context Switch Routine Proof in Coq

**Proof Efforts in Coq.** We omit the code that manages interrupt and float registers in the original system, which are not supported in our logic. The segment we verify has around 250 lines of assembly code, and we verify it by 6690 lines of Coq proof scripts.

We give a simple introduction to show how we verify this context switch routine in Coq using our program logic. Fig. 13 (a) is the final theorem of the correctness of context routine in Coq. The "CtxSwitch\_Correct" is the name of this theorem. And "wf\_cdhp spec C" is the Coq implementation of well-formed code heap " $\vdash C : \Psi$ ". Here, the code heap "C" is the code heap storing the context switch code, and "spec" is the specifications of each code block. The correctness of this theorem can be proved by the definition of well-formed code heap, which is defined as "wf\_cdhp" in Coq. According to the definition of " $\vdash C : \Psi$ " in Fig. 10, we need to proof the correctness of each code block given specification in code heap. We use SwitchEntry as an example of proving the correctness of code block (instruction sequence) in Coq. Fig. 13 (b) shows the lemma describing the correctness of SwitchEntry in Coq. This lemma can

be viewed as the a Coq implementation of the following properties.

for all 
$$\iota$$
.  $\Psi \vdash \{(\mathsf{a}_{pre}\ \iota, \mathsf{a}_{post}\ \iota)\}\ \mathsf{f}_0: \mathsf{SwitchEntry}$ 

As shown in Fig. 11, the SwitchEntry is the entry of the context switch routine. So, it's specification is the  $a_{pre}$  and  $a_{post}$  defined before, according to the meaning of the specification of a code block. The name of this lemma is SwitchEntryProof in Coq. The "switch\_entry\_pre" and "switch\_entry\_post" are Coq implementations of pre- and postcondition  $a_{pre}$  and  $a_{post}$ . The "vl" and "f0" are logical variables and the starting label of code block SwitchEntry in Coq respectively. And the code block SwitchEntry is represented as "switch\_entry\_code" in Coq, which is an instruction sequence extracted from code heap "C".

Readers may find that the **CALL** rule in our logic, defined in Fig. 10, doesn't support the calling of the context switch routine, because the switching of return pointer isn't permitted. We don't consider to solve this problem by modifying our **CALL** rule, but address it in another way, proving the context refinement between context switch routine and its abstract assembly primitive switch. The **CALL** rule is just used for verifying the internal functions.

## 5 Refinement Verification for SPARCv8

Inspired by the *relational* program logic for refinement verification, *e.g.*[18, 29], we extend our program logic for SPARCv8 to support refinement verification in this section. We define the high- and low-level program in Sec. 5.1 and Sec. 5.2, and their state relation in Sec. 5.3. The refinement relation is presented in Sec. 5.4, and program logic is shown in Sec. 5.5 and Sec. 5.6. Finally, we show the logic soundness in Sec. 5.7.

### 5.1 High-level Pseudo-SPARCv8 Language

We choose the SPARCv8 program defined in Sec. 2 as the *low-level* program. And let's think about what the *high-level* program should look like. *First*, the high-level program should be multithreaded, because we hope to verify the switch primitive, whose execution will set a ready thread as the current one. *Second*, considering programmers in this level may define various abstract assembly primitives, we should just give the type of the abstract assembly primitive, which is like an interface and its implementation is remained for users to complete. *Third*, delay buffer D is not needed in high-level program state. The special registers are usually used to record the state of the processors. So, modifying them should be carefully and hidden in the implementations of the abstract assembly primitives. *Fourth*, the programmers in the high-level do not need to understand the full complexities of *register window* mechanism. They just need to know that we can use instruction save to save the contents of the current window, and use restore to restore the contexts of the previous procedure. According to

```
 \begin{array}{lll} (\operatorname{HCode}) & \varPi & ::= (C, \varOmega) & (\operatorname{CodeHeap}) & C & \in \operatorname{Word} \rightharpoonup \operatorname{Comm} \\ (\operatorname{PrimSet}) & \varOmega & ::= \{ \mathbf{f}_1 \leadsto \varUpsilon_1, \ldots, \mathbf{f}_n \leadsto \varUpsilon_n \} \\ & (\operatorname{Prim}) & \varUpsilon & \in \operatorname{List} \operatorname{Val} \to \operatorname{HPstate} \to \operatorname{HPstate} \to \operatorname{Prop} \\ & (\operatorname{Comm}) & c & ::= \mathbf{i} \mid \operatorname{call} \mathbf{f} \mid \operatorname{jmp} \mathbf{a} \mid \operatorname{retl} \mid \operatorname{be} \mathbf{f} \\ & (\operatorname{SimpIns}) & \mathbf{i} & ::= \operatorname{Psave} w \mid \operatorname{Prestore} \mid \operatorname{print} \mid \operatorname{ld} \mathbf{a} \ \mathbf{r}_d \mid \operatorname{st} \ \mathbf{r}_s \ \mathbf{a} \mid \operatorname{add} \ \mathbf{r}_s \ \mathbf{o} \ \mathbf{r}_d \\ & \mid \operatorname{rd} \ \operatorname{sr} \ \mathbf{r}_d \mid \operatorname{wr} \ \mathbf{r}_s \ \mathbf{o} \ \mathbf{r}_d \mid \ldots \\ & (\operatorname{HMsg}) & \alpha & ::= \tau \mid \operatorname{out}(v) \mid \operatorname{call}(\mathbf{f}, \overline{v}) \end{array}
```

Fig. 14. Syntax of Concurrent Pseudo-SPARCv8 Code

Fig. 15. Machine States for Pseudo-SPARCv8 Code

these considerations, we define the Pseudo-SPARCv8 program as our high-level program in this subsection.

We define the syntax of the high-level concurrent SPARCv8 language in Fig. 14. The code  $\Pi$  in high-level has two parts: the code heap C, whose definition is similar with the one defined in Fig. 2; and a set of abstract primitives  $\Omega$ , which is a partial mapping from labels to abstract assembly primitive. The code heap C in  $\Pi$  can be viewed as the client code to call abstract assembly primitive. The high-level abstract assembly primitive  $\Upsilon$  is defined as a relation that takes a list of value as arguments and maps a high-level program state (defined in Fig. 15) to another. Comparing the simple instruction with the one shown in Fig. 2, we add three pesudo instructions. The "Psave w" is used to allocate a new stack frame of size w words for procedure. The Prestore is responsible for freeing the current stack frame and restore the contexts of the previous procedure. We also introduce a pseudo instruction print, whose execution will occur an output out(v), to replace the system call of function print. The high-level message v0 can be either an empty message v1, or an output out(v2), or a call(v3, v4) meaning to call primitive labelled v6 with arguments v7.

The machine states of high-level Pseudo-SPARCv8 program is defined in Fig. 15. The high-level program  $\mathbb{P}$  is a a pair of high-level code H and high-level state  $\mathbb{S}$ . High-level program state is a tuple including: a thread pool T, current thread id t, the thread local state K of the current thread, and the memory M.

Thread Local State. The thread local state  $\mathcal{K}$  is a triple of high-level register state  $\mathbb{Q}$ , and program conunters pc and npc. The high-level register state  $\mathbb{Q}$  consists: the high-level register file  $\mathbb{R}$ , the block of the current stack frame b, and the high-level frame list  $\mathbb{F}$ . We can find that the high-level register names  $\hat{rn}$  do not include the cwp and special registers sr, because we require that only the implementation of the abstract assembly primitives will operate them. So, we also do not give the semantics for instructions, like vr, rd, save, and

$$\begin{split} \frac{H = (C,\Omega) \quad C \Vdash (\mathcal{K},M) \circ \overset{-}{\longrightarrow} (\mathcal{K}',M')}{(H,(T,t,\mathcal{K},M)) : \overset{-}{\longrightarrow} (H,(T,t,\mathcal{K}',M'))} & \frac{H = (C,\Omega) \quad C \Vdash (\mathcal{K},M) \circ \overset{\text{out}(w)}{\longrightarrow} (\mathcal{K}',M)}{(H,(T,t,\mathcal{K},M)) : \overset{-}{\Longrightarrow} (H,(T,t,\mathcal{K}',M))} \\ & \frac{H = (C,\Omega) \quad C \Vdash (\mathcal{K},M) \circ \overset{\text{call}(t,\overline{v})}{\longrightarrow} (\mathcal{K}',M) \quad \Omega(\mathfrak{f}) = \Upsilon}{\Upsilon(\overline{v})(T,t,\mathcal{K}',M)(T',t',\mathcal{K}'',M')} \\ & \frac{H = (C,\Omega) \quad C \Vdash (\mathcal{K},M) \circ \overset{\text{call}(t,\overline{v})}{\longrightarrow} (\mathcal{K}',M) \quad \Omega(\mathfrak{f}) = \Upsilon}{(H,(T,t,\mathcal{K}',M)) : \overset{-}{\longrightarrow} (H,(T',t',\mathcal{K}'',M'))} \\ & (a) \quad \text{High-level Program Transition} \\ & \frac{C(\mathfrak{pc}) = \mathbf{i} \quad \text{execi}(\mathbf{i},(\mathbb{Q},M)) =_{\mathbb{H}} (\mathbb{Q}',M')}{C \Vdash ((\mathbb{Q},\mathfrak{pc},\mathfrak{npc}),M) \circ \overset{-}{\longrightarrow} ((\mathbb{Q}',\mathfrak{npc},\mathfrak{npc}+4),M')} \\ & \frac{C(\mathfrak{pc}) = \operatorname{call} \quad \mathbf{f} \quad \mathbf{r}_{15} \in \operatorname{dom}(\mathbb{R})}{C \Vdash (((\mathbb{R},b,\mathbb{F}),\mathfrak{pc},\mathfrak{npc}),M) \circ \overset{-}{\longrightarrow} (((\mathbb{R}\{\mathfrak{r}_{15} \leadsto \mathfrak{pc}\},b,\mathbb{F}),\mathfrak{npc},\mathfrak{f}),M)} \\ & \frac{C(\mathfrak{pc}) = \operatorname{retl} \quad \mathbb{R}(\mathfrak{r}_{15}) = \mathbf{f}}{C \Vdash (((\mathbb{R},b,\mathbb{F}),\mathfrak{pc},\mathfrak{npc}),M) \circ \overset{-}{\longrightarrow} (((\mathbb{R},b,\mathbb{F}),\mathfrak{npc},\mathfrak{gc},\mathfrak{f}+8),M)} \\ & \frac{C(\mathfrak{pc}) = \operatorname{print} \quad \mathbb{R}(\%\mathfrak{o}_0) = v}{C \Vdash (((\mathbb{R},b,\mathbb{F}),\mathfrak{pc},\mathfrak{npc}),M) \circ \overset{-}{\longrightarrow} (((\mathbb{R},b,\mathbb{F}),\mathfrak{npc},\mathfrak{npc},\mathfrak{gc},\mathfrak{gc}),M)} \\ & \frac{\mathfrak{pc} \notin \operatorname{dom}(C) \quad \mathfrak{npc} = \mathfrak{pc} + 4 \quad \operatorname{args}(\mathbb{Q},M,\overline{v})}{C \Vdash ((\mathbb{Q},\mathfrak{pc},\mathfrak{npc}),M) \circ \overset{-}{\longrightarrow} ((\mathbb{Q},\mathfrak{pc},\mathfrak{npc}),M)} \\ & (b) \quad \operatorname{High-level Control Transfer Instruction Transition} \\ & \mathbb{Q} = (\mathbb{R},b_0,\mathbb{F}) \quad \mathbb{R}' = \mathbb{R}\{\operatorname{out} \leadsto_{-},\operatorname{local} \leadsto_{-},\operatorname{in} \leadsto_{-} \mathbb{R}([\operatorname{lout}])\} \{\%\mathfrak{p} \leadsto_{-} (b,0)\} \\ & = \operatorname{alloc}(M,b,64,w) = M' \quad \mathbb{Q}' = (\mathbb{R}',b,(b_0,\mathbb{R}([\operatorname{local}]),\mathbb{R}([\operatorname{lin}])) ::\mathbb{F})} \\ & = \operatorname{execi}(\operatorname{Psave} w, (\mathbb{Q},M)) =_{\mathbb{H}} (\mathbb{Q}',M') \\ & = \operatorname{execi}(\operatorname{Prestore}, (\mathbb{Q},M)) =_{\mathbb{H}} (\mathbb{Q}',M) \\ & = \operatorname{execi}(\operatorname{Prestore}, (\mathbb{Q},M)) =_{\mathbb{H}} (\mathbb$$

Fig. 16. Seletcted operational semantics rules for high-level program

restore, whose execution will operate these registers, in high-level program. The high-level frame list  $\mathbb{F}$  is also different with the F. It's a list of the triple  $(b, \mathrm{fm}_1, \mathrm{fm}_2)$ . Here, b is the block id of the stack frame of the previous procedure, and  $\mathrm{fm}_1$  and  $\mathrm{fm}_2$  are contexts (local and in registers) of this procedure saved. After introducing the state of high-level program, We can define switch primitive switch as an instantiation of  $\Upsilon$  following:

$$\begin{split} \text{switch} &::= \lambda \, \overline{v}, \mathbb{S}, \mathbb{S}'. \, \exists \, t'. \, M(\mathsf{TaskNew}) = (t',0) \, \wedge \, T(t') = (\mathbb{Q}',\mathsf{pc}',\mathsf{npc}') \\ & \wedge \, T' = T\{t \leadsto (\mathbb{Q},\mathsf{pc},\mathsf{npc})\} \, \wedge \, t \neq t' \, \wedge \, \overline{v} = \mathsf{nil} \\ \text{where} \, \, \mathbb{S} &= (T,t,(\mathbb{Q},\mathsf{pc},\mathsf{npc}),M), \, \mathbb{S}' = (T',t',(\mathbb{Q}',\mathsf{f}+8,\mathsf{f}+12),M), \mathsf{f} = \mathbb{Q}'.\mathbb{R}(\mathsf{r}_{15}). \end{split}$$

The execution of switch primitive takes no arguments ( $\overline{v} = \text{nil}$ ), and change the id of current thread according to the pointer saved in location TaskNew.

Operational Semantics in High-level. The operational semantics for high-level Pseudo-SPARCv8 program is defined in Fig. 16. The high-level program transition relation  $(\Pi, \mathbb{S}) : \stackrel{\alpha}{\Longrightarrow} (\Pi, \mathbb{S}')$  is defined in Fig. 16 (a). In each step, the program may either execute the instruction pointed by pc, and occur empty message  $\tau$  or an output  $\operatorname{out}(v)$ , or call an abtract assembly primitive in primitive set. When calling an abstract assembly primitive, the execution of current thread (defined as  $\_ \Vdash \_ \circ \stackrel{-}{\longrightarrow} \_$  in Fig. 16 (b)) will occur a message  $\operatorname{call}(f, \overline{v})$ , which means that it hopes to call the abstract assembly primitive  $\Upsilon$  labelled f, which is not in the domain of code heap C, with arguments  $\overline{v}$  (we use " $\operatorname{args}(\mathbb{Q}, M, \overline{v})$ " to get arguments  $\overline{v}$  from high-level state, and its definition is omitted here).

The control transfer step is defined in Fig. 16 (b). Here, the step for simple instruction i is represented as " $execi(i, _) =_{H} _$ ". We show the state transition relation for pseudo instructions Psave w and Prestore in Fig. 16 (c). Supposing the current register state  $\mathbb{Q}$  is  $(\mathbb{R}, b_0, \mathbb{F})$ , executing instruction Psave w will save the local and in registers and the block id  $b_0$  of the current stack frame into highlevel frame list  $\mathbb{F}$ . It also allocates a new block b as a new stack frame in memory (represented as **alloc**(M, b, 64, w) = M'). The size of the block b is from 64 byte to w byte. The reason why it starts from 64 byte is that the 0 to 64 bytes (16 words) in a stack frame are usually reserved to save the contents of the register window in convention [4] when needed, because the number of register windows is limited. However, we don't need to save the register windows into memory in high-level, because they have already been saved in high-level stack frame  $\mathbb{F}$ , whose upper bound is *unlimited*. The instruction Prestore does the reverse, freeing the block of current stack frame (represented as free(b, M) = M'), and restoring the contexts of the previous procedure saved in F. More details of the high-level Pseudo SPARCv8 program can be seen in Appendix A.

#### 5.2 Low-level SPARCv8 Program

The low-level SPARCv8 program are very closed to the SPARCv8 program defined in Fig. 2. The only difference here is that we use simple instructions and commands defined in Fig. 14. So, the global program transition of the low-level SPARCv8 program is defined as the following form:

$$\begin{split} &(R,D) \rightrightarrows (R',D') \\ & \xrightarrow{C \vdash ((M,(R',F),D'),\operatorname{pc},\operatorname{npc})} \circ \xrightarrow{\tau/\operatorname{out}(v)} (M',(R'',F'),D'') \\ & \xrightarrow{(C,(M,(R,F),D),\operatorname{pc},\operatorname{npc})} :: \xrightarrow{\tau/\operatorname{out}(v)} (C,(M,(R'',F'),D'')) \end{split}$$

Each step of the program produces either an empty message  $\tau$ , or an output out(v), which is produced by the instruction print and acts as an observable behavior. More details of the low-level program can be found in Appendix B.

## 5.3 State Relation between Low- and High-level Program

In order to achieve refinement verification, we need to establish the states relation as an *invariant* between low- and high-level program. We illustrate this relation, defined as " $S \sim \mathbb{S}$ " formally in Fig. 18, with Fig. 17.



Fig. 17. State Relation between Low- and High-level Program State

$$\frac{M = M_c \uplus M_T \uplus \{\mathsf{TaskCur} \leadsto (t,0)\} \uplus M'}{(M_c,Q) \Downarrow_{\mathsf{c}} (t,\mathcal{K}) \qquad M_T \Downarrow_{\mathsf{r}} T \backslash \{t\} \qquad D = \mathrm{nil}}{(M,Q,D) \sim (T,t,\mathcal{K},M')}$$

Fig. 18. Relation for Whole Program State

The low-level memory M is splitted into four parts :  $M_c$  used to save the context of the current thread t;  $M_T$  saving the contexts of the ready threads; a singleton memory cell located TaskCur saving the current thread id; and shared memory M'. Note, ready threads are threads in thread pool, except the current thread t. The delayed buffer D is nil, because the execution of high-level program will never operate the special registers. The memory  $M_T$  used to save the contexts of the ready threads is abstracted as a thread pool in high-level program. Their relation is represented as " $M_T \Downarrow_{\mathsf{T}} T \backslash \{t\}$ ". We use " $(M_c, Q) \Downarrow_{\mathsf{C}} (t, \mathcal{K})$ " to represent the state relation of current thread t in low- and high-level program. Full definitions of the state relation can be found in Appendix C.

## 5.4 Correctness of Abstract Assembly Primitive

The correctness of abstract assembly primitive can be defined in terms of *contextual refinement*. Below we will give the formal definition in Def. 5.1. And we use *event trace refinement* proposed by Liang *et al.* [19] here.

**Definition 5.1 (Primitive Correctness).** 
$$C_{as} \sqsubseteq \Omega$$
 iff for any  $C$ ,  $S$ ,  $\mathbb{S}$ , pc and npc, if  $S \sim \mathbb{S}$ , and  $ProgSafe(\mathbb{P})$ , then  $P \sqsubseteq \mathbb{P}$  holds. where  $P = (C \uplus C_{as}, S, pc, npc)$ ,  $\mathbb{P} = ((C, \Omega), \mathbb{S})$ , and  $\mathbb{S}.\mathcal{K} = (\_, pc, npc)$ .

Here, we use  $C_{\rm as}$ , which is a code heap, to represent the implementations of abstract assembly primitives and  $\Omega$  to represent the set of corresponding abstract assembly primitives. The contextual refinement between  $C_{\rm as}$  and  $\Omega$ , denoted as  $C_{\rm as} \sqsubseteq \Omega$ , says that if and only if for any client code C, low-level program state S, high-level program state S, program counters pc and npc, if the low- and high-level program states satisfy the state relation  $S \sim S$  and the high-level program will never get stuck (shown as ProgSafe(P)), then there is an event trace refinement relation between low- and high-level program. The property ProgSafe(P), is defined below formally:

$$\mathsf{ProgSafe}(\mathbb{P}) ::= \forall \, \mathbb{P}'. \, (\mathbb{P} : \Longrightarrow^* \mathbb{P}') \Longrightarrow (\exists \, \mathbb{P}'. \, \mathbb{P}' : \Longrightarrow \mathbb{P}'')$$

$$(\text{EvtTrace}) \quad \mathcal{B} ::= \mathsf{out}(v) :: \mathcal{B} \mid \mathsf{nil} \mid \mathbf{abort} \quad (\text{co-induction})$$

$$\frac{\cancel{\underline{\beta}} \, \mathsf{P}' \cdot \mathsf{P} :: \Longrightarrow^{+} \, \mathsf{P}'}{Etr(\mathsf{P}, \mathbf{abort})} \qquad \frac{\mathsf{P} :: \stackrel{\tau}{\Longrightarrow}^{+} \, \mathsf{P}' \quad Etr(\mathsf{P}', \mathsf{nil})}{Etr(\mathsf{P}, \mathsf{nil})} \qquad \frac{\mathsf{P} :: \stackrel{\mathsf{out}(v)}{\Longrightarrow}^{+} \, \mathsf{P}' \quad Etr(\mathsf{P}', \mathcal{B})}{Etr(\mathsf{P}, \mathsf{out}(v) :: \mathcal{B})}$$

$$\frac{\cancel{\underline{\beta}} \, \mathbb{P}' \cdot \mathbb{P} : \Longrightarrow^{+} \, \mathbb{P}'}{Etr(\mathbb{P}, \mathbf{abort})} \qquad \frac{\mathbb{P} :: \stackrel{\tau}{\Longrightarrow}^{+} \, \mathbb{P}' \quad Etr(\mathbb{P}', \mathsf{nil})}{Etr(\mathbb{P}, \mathsf{nil})} \qquad \frac{\mathbb{P} :: \stackrel{\mathsf{out}(v)}{\Longrightarrow}^{+} \, \mathbb{P}' \quad Etr(\mathbb{P}', \mathcal{B})}{Etr(\mathbb{P}, \mathsf{out}(v) :: \mathcal{B})}$$

$$\mathsf{P} \sqsubseteq \mathbb{P} ::= \forall \mathcal{B} . Etr(\mathsf{P}, \mathcal{B}) \Longrightarrow Etr(\mathbb{P}, \mathcal{B})$$

Fig. 19. Event Trace Refinement

**Event Trace Refinement.** We define the event trace refinement co-inductively in Fig. 19. "nil" means an empty trace. A trace is a sequence of output  $\operatorname{out}(v)$ , and may end with a abort marker **abort**. P  $\sqsubseteq \mathbb{P}$  means that all the event trace generated by the low-level program P can also generated by the high-level program  $\mathbb{P}$ . We introduce our extended program logic to ensure the event trace refinement between low- and high-level program in the following subsections.

### 5.5 Relational Assertion Language

$$\begin{array}{ll} (RelAsrt) & \mathsf{p}, \mathsf{q} \; ::= \; \hat{\mathsf{rn}} \rightarrowtail v \mid l \rightarrowtail v \mid \mathsf{Emp} \mid t \leadsto_\mathsf{c} \mathcal{K} \mid t \leadsto_\mathsf{r} \mathcal{K} \mid p \mid (\!\!\mid A\!\!\mid) \mid \blacklozenge(w) \\ & \mid \; \mathsf{p} \downarrow \mid \; \mathsf{p} \land \mathsf{q} \mid \; \mathsf{p} \lor \mathsf{q} \mid \; \mathsf{p} \ast \mathsf{q} \mid \; \ldots \end{array}$$

Fig. 20. Syntax of Relational Assertion

Fig. 20 gives the relational assertion language, and its semantics is given in Fig. 21. The relational assertions are interpreted over relational states  $(S, \mathbb{S}, A, w)$ , which contains the low-level state S, the high-level state  $\mathbb{S}$ , the abstract assembly primitive command A defined in Fig. 21, and a word w recording the number of the tokens. The high-level primitive command A is either an abstract assembly primitive  $\Upsilon$  parametered with its arguments  $\overline{v}$ , or a  $\bot$  meaning that the primitive has already been executed. Compared with the assertion defined before in Fig. 7, relational assertion p reserves original assertion p describing the low-level state S, and adds some assertions describing the high-level state.

 $\hat{\text{rn}} \rightarrow v$  says that value in register  $\hat{\text{rn}}$  in high-level register file is  $v.\ l \rightarrow v$  specifies a singleton memory cell with value v in the address of the location l in high-level memory. The assertion Emp says that the high-level memory and thread pool are both empty, and the low-level state satisfies emp defined in Fig. 8. The assertion  $t \rightsquigarrow_{\mathsf{c}} \mathcal{K}$  and  $t \rightsquigarrow_{\mathsf{r}} \mathcal{K}$  represent the thread local state of current thread and ready thread respectively. Note the threads in thread pool are viewed resources and can be separated by separation conjunction.

$$(S,\mathbb{S},A,w)\models p \qquad ::= S\models p \land \mathbb{S}.M = \emptyset \land \mathbb{S}.T = \emptyset$$
 
$$(S,\mathbb{S},A,w)\models r\mathbf{\hat{n}} \rightarrow v ::= \exists t,\mathcal{K}. (S,\mathbb{S},A,w)\models t \rightsquigarrow_{\mathsf{c}} \mathcal{K} \land \mathcal{K}.\mathbb{Q}.\mathbb{R}(r\mathbf{\hat{n}}) = v$$
 
$$(S,\mathbb{S},A,w)\models l \rightarrow v \qquad ::= \mathbb{S}.M = \{l \rightsquigarrow_{\mathsf{c}} v\} \land \mathbb{S}.T = \emptyset \land S\models \mathsf{emp}$$
 
$$(S,\mathbb{S},A,w)\models \mathsf{Emp} \qquad ::= \mathbb{S}.M = \emptyset \land \mathbb{S}.T = \emptyset \land S\models \mathsf{emp}$$
 
$$(S,\mathbb{S},A,w)\models t \rightsquigarrow_{\mathsf{c}} \mathcal{K} \qquad ::= \mathbb{S}.t = t \land \mathbb{S}.\mathcal{K} = \mathcal{K} \land (S,\mathbb{S},A,w)\models \mathsf{Emp}$$
 
$$(S,\mathbb{S},A,w)\models t \rightsquigarrow_{\mathsf{r}} \mathcal{K} \qquad ::= \mathbb{S}.T = \{t \rightsquigarrow_{\mathsf{c}} \mathcal{K}\} \land \mathbb{S}.M = \emptyset \land S\models \mathsf{emp}$$
 
$$(S,\mathbb{S},A,w)\models (A') \qquad ::= A = A' \land (S,\mathbb{S},A,a)\models \mathsf{Emp}$$
 
$$(S,\mathbb{S},A,w)\models \Phi(w') \qquad ::= w' \leq w \land (S,\mathbb{S},A,w)\models \mathsf{Emp}$$
 
$$(S,\mathbb{S},A,w)\models \mathbb{p} \downarrow \qquad ::= \exists S'. ((S',\mathbb{S},A,w)\models \mathbb{p}) \land (R',D') \rightrightarrows (R,D) \qquad \qquad \text{where } S = (M,(R,F),D), S' = (M,(R',F),D')$$
 
$$(S,\mathbb{S},A,w)\models \mathbb{p} * \mathbb{q} \qquad ::= \exists S_1,S_2,\mathbb{S}_1,\mathbb{S}_2,w_1,w_2.S = S_1 \uplus S_2 \land \mathbb{S} = \mathbb{S}_1 \uplus \mathbb{S}_2 \land \qquad w = w_1 + w_2 \land (S_1,\mathbb{S}_1,A,w_1)\models \mathbb{p} \land (S_2,\mathbb{S}_2,A,w_2)\models \mathbb{q}$$
 
$$\mathbb{S}_1 \uplus \mathbb{S}_2 \ ::= \begin{cases} (T_1 \cup T_2,t,\mathcal{K},M_1 \cup M_2) & \text{if } T_1 \perp T_2 \land M_1 \perp M_2 \land \mathcal{K} = \mathcal{K}_1 \uplus \mathcal{K}_2 \\ S_1 = (T_1,t,\mathcal{K}_1,M_1) \land S_2 = (T_2,t,\mathcal{K}_2,M_2) \\ \text{undefined} & \text{otherwise} \end{cases}$$
 
$$(\mathsf{HPrimCom}) \ A \ ::= \Upsilon(\overline{v}) \mid \perp \qquad \qquad \Upsilon(\overline{v})(\mathbb{S})(\mathbb{S}')$$

Fig. 21. Semantics of Relation Assertion

The assertion (A) means the current high-level primitive command is A. And the assertion (w) takes a word w to record the number of the tokens. In the introduce of the inference rules following, we use tokens to avoid infinite loops and recursive calls to make sure the termination preserving refinement.

### 5.6 Inference Rules for Refinement Verification

The code specification  $\hat{\theta}$  and code heap specification  $\Psi$  for refinement verification are defined below :

$$\begin{array}{lll} \mbox{(valList)} & \iota \in \mbox{ list value} & \mbox{(pAsrt)} & \mbox{\it fp}, \mbox{\it fq} \in \mbox{valList} \rightarrow RelAsrt \\ \mbox{(CdSpec)} & \hat{\theta} ::= (\mbox{\it fp}, \mbox{\it fq}) & \mbox{(CdHpSpec)} & \mbox{\it $\Psi$} & ::= \{\mbox{\it f} \leadsto \hat{\theta}\}^* \\ \end{array}$$

Here,  $\mathfrak{fp}$  and  $\mathfrak{fq}$  are relational assertions paramterized over a list of values  $\iota$ . We give a simple example in Fig. 22 to show a specification for a function, which is the same function shown in Fig. 9. We require that, in precondition  $\mathfrak{fp}$  and postcondition  $\mathfrak{fq}$ , the values of the general registers  $\mathfrak{hi_0}$ ,  $\mathfrak{hi_1}$ ,  $\mathfrak{hi_2}$ ,  $\mathfrak{hl_7}$  and  $\mathfrak{r}_{15}$  are equal with the values of the same registers in high-level register file, which is restricted by state relation defined previously between low- and high-level program. The corresponding abstract assembly primitive ADD is described

```
 \begin{array}{lll} - \left\{ ( \mathrm{fp}, \mathrm{fq}) \right\} & \mathrm{fp} ::= \lambda \, lv. \, ( \% \mathbf{i}_0 = \% \mathbf{i}_0') * ( \% \mathbf{i}_1 = \% \mathbf{i}_1') * ( \% \mathbf{i}_2 = \% \mathbf{i}_2') \\ \mathrm{add} & \% \mathbf{i}_0, \, \% \mathbf{i}_1, \, \% \mathbf{1}_7 & * ( \% \mathbf{1}_7 = \% \mathbf{1}_7') * ( \mathbf{r}_{15} = \mathbf{r}_{15}') * \left( \mathrm{ADD}(\mathrm{nil}) \right) \\ \mathrm{add} & \% \mathbf{1}_7, \, \% \mathbf{i}_2, \, \% \mathbf{1}_7 & \mathrm{fq} ::= \lambda \, lv. \, ( \% \mathbf{i}_0 = \% \mathbf{i}_0') * ( \% \mathbf{i}_1 = \% \mathbf{i}_1') * ( \% \mathbf{i}_2 = \% \mathbf{i}_2') \\ \mathrm{nop} & * ( \% \mathbf{1}_7 = \% \mathbf{1}_7') * ( \mathbf{r}_{15} = \mathbf{r}_{15}') * \left( \bot \right) \\ \mathrm{ADD} ::= \lambda \, \overline{v}, \, \mathbb{S}, \, \mathbb{S}'. \, \mathbb{S}' = (T, t, ((\mathbb{R}', b, \mathbb{F}), \mathbb{R}(\mathbf{r}_{15}) + 8, \mathbb{R}(\mathbf{r}_{15}) + 12), M) \\ \mathrm{where} & \, \mathbb{S} = (T, t, ((\mathbb{R}, b, \mathbb{F}), \mathbf{pc}, \mathbf{npc}), M), \, \mathbb{R}' = \mathbb{R} \{ \% \mathbf{1}_7 \, \rightsquigarrow \, (\mathbb{R}(\% \mathbf{i}_0) + \mathbb{R}(\% \mathbf{i}_1) + \mathbb{R}(\% \mathbf{i}_2)) \} \\ \mathrm{rn} = \, \hat{\mathbf{rn}}' ::= \exists \, v. \, \mathrm{rn} \mapsto v * \hat{\mathbf{rn}} \mapsto v \end{array}
```

Fig. 22. Example for Function Specification for Refinement Verification

in the precondition fp. In the postcondition fq, we require that the abstract assembly primitive has been done (shown as  $(\bot)$ ).

Fig. 23 shows selected inference rules for refinement verification in our logic. The top rule **WfPrim** verifies the contextual refinement between the code heap  $C_{\rm as}$  and the corresponding abstract assembly primitive set  $\Omega$ . It requires that there exisits a code heap specification  $\Psi_i$ , such that the internal functions in code heap  $C_{\mathrm{as}}$  are well-formed ( $\vdash C_{\mathrm{as}} : \Psi_{\mathsf{i}}, \mathbf{WfInt}$ ), and each implementation of abstract assembly primitive  $(C_{as}[f])$  is correct with respect to its corresponding primitive  $\Upsilon$ . The wdSpec( $\mathfrak{fp},\mathfrak{fq},\Upsilon$ ) is used to restrict the form of the function specification, and we will discuss it in more details following. Most of the inference rules for verifying the instruction sequence  $(\Psi_i \vdash \{(p,q)\} f : \mathbb{I})$  are similar with the rules shown in Fig. 10. Here, we require that verifying the instruction jmp and call will consume a token, shown as  $\phi(1)$ , in order to avoid infinite loop and recursive function call. There is an omitted side condition in presentation that: the state of current thread, such as  $t \sim_{\mathsf{c}} \mathcal{K}$ , and high-level primitive comm and do not described in frame  $p_r$ , because they are not separated by separation conjunction \*. The ABSCSQ rule allows us to execute the high-level primitive command described in precondition. The implication  $p \Rightarrow p'$  is defined below.

```
\forall S, \mathbb{S}, A, w. ((S, \mathbb{S}, A, w) \models \mathbb{p}) \Longrightarrow (\exists \mathbb{S}', A', w'. ((A, \mathbb{S}) \dashrightarrow^* (A', \mathbb{S}')) \land ((S, \mathbb{S}', A', w') \models \mathbb{p}')) \lor (S, \mathbb{S}, A, w) \models \mathbb{p}'
```

The inference rules for instructions are omitted here, because they are no difference with the well-formed instruction rules shown in Fig. 10.

Well-defined Specification. We define  $\mathsf{wdSpec}(\mathfrak{fp},\mathfrak{fq},\varUpsilon)$  formally in Def. 5.2. It contains two properities that the specifications need to satisfy, and we explain them in turn in the following.

**Definition 5.2 (Well-defined Specification).**  $\mathsf{wdSpec}(\mathfrak{fp},\mathfrak{fq},\varUpsilon)$  holds, iff

```
1. for any \overline{v}, \mathbb{S}, \mathbb{S}', \mathbb{S}_r. if \Upsilon(\overline{v})(\mathbb{S})(\mathbb{S}'), and \mathbb{S} \perp \mathbb{S}_r, then the following holds:

-\mathbb{S}'.\mathcal{K}.\mathsf{pc} = \mathbf{f} + 8, \mathbb{S}'.\mathcal{K}.\mathsf{npc} = \mathbf{f} + 12 (where \mathbb{S}'.\mathcal{K}.\mathbb{Q}.\mathbb{R}(\mathbf{r}_{15}) = \mathbf{f});

-\text{ there exists } \mathbb{S}'', \mathbb{S}'_r, \Upsilon(\overline{v})(\mathbb{S} \uplus \mathbb{S}_r)(\mathbb{S}''), \mathbb{S}'' = \mathbb{S}' \uplus \mathbb{S}'_r, and \mathbb{S}_r.T = \mathbb{S}'_r.T, \mathbb{S}_r.M = \mathbb{S}'_r.M;
```

 ${\bf Fig.\,23.}\ {\bf Selected}\ {\bf Inference}\ {\bf Rules}\ {\bf for}\ {\bf Refinement}\ {\bf Verification}$ 

- 2. for any  $\iota$ , there exists  $\overline{\nu}$ ,  $\mathfrak{fp} \ \iota \Longrightarrow (\Upsilon(\overline{\nu})) * \text{true}$ , and  $\mathfrak{fq} \ \iota \Longrightarrow (\bot) * \text{true}$ ;
- 3. for any  $\overline{v}$ , there exists  $\iota$ ,  $p_r$  and w, and the following holds:  $\mathsf{INV}(\Upsilon(\overline{v}), w, \overline{v}) \Longrightarrow \mathfrak{fp} \ \iota * \mathfrak{p}_r, \mathfrak{fq} \ \iota * \mathfrak{p}_r \Longrightarrow \mathsf{INV}(\bot, \_, \_), \text{ and } \mathsf{Sta}(\Upsilon(\overline{v}), \mathfrak{p}_r).$

**First**, we should give some restrictions for abstract assembly primitive. The return code pointers should be equal to f + 8 and f + 12, where f is contained in  $r_{15}$  register after the execution of abstract assembly primitive. This restriction ensures that the low-level function and high-level abstract assembly primitive will return to the same code pointers after executions, because the **RETL** rule in our logic also restricts that the execution of low-level program will return to code pointers according to  $r_{15}$  register in final state of function. We also restrict that the execution of abstract assembly primitive should satisfy *locality*, which means that if an abstract assembly primitive can execute safely on a subset of program state, it can also execute safely on the whole program state, and additional

program state keeps unchanged. **Second**, the abtract assembly primitive should be specifid in the precondition, and it's execution should be done in the final state. **Third**, there is an *invariant* between low- and high-level program, holding at the entry of the function, and our logic needs to ensure that such invariant can be reestablished at the exit of function. We define INV to represent this invariant below formally:

```
\begin{split} \mathsf{INV}(\bot, w, \overline{v}) &::= \{ (S, \mathbb{S}, A, w) \mid S \sim \mathbb{S} \land \mathbf{args}(\mathbb{S}.\mathcal{K}.\mathbb{Q}, \mathbb{S}.M, \overline{v}) \} \\ \mathsf{INV}(A, w, \overline{v}) &::= \{ (S, \mathbb{S}, A, w) \mid S \sim \mathbb{S} \land \exists \mathbb{S}'. (A, \mathbb{S}) \dashrightarrow (\bot, \mathbb{S}') \land \mathbf{args}(\mathbb{S}.\mathcal{K}.\mathbb{Q}, \mathbb{S}.M, \overline{v}) \} \end{split}
```

The invariant consists of the state relation between low- and high-level program state (define as  $S \sim \mathbb{S}$  in Fig. 17), and the safe execution of the primitive command. Including the safe execution of the primitive command is important because we can get some knowledges of high-level program state from safe execution of specific primitive command A. For example, if  $\mathsf{INV}(\mathsf{switch}(\mathsf{nil}), w, \mathsf{nil})$  holds, we can know that the location TaskNew must save a pointer pointing to a ready thread. Otherwise, the execution of primitive switch will not be safe.

$$\mathsf{INV}(\mathsf{switch}(\mathsf{nil}), w, \mathsf{nil}) \Longrightarrow \exists \, t, \mathcal{K}. \, \mathsf{TaskNew} \rightarrowtail (t, 0) * (t \leadsto_{\mathsf{r}} \mathcal{K}) * \mathsf{true}$$

We introduce frame  $p_r$  here for local reasoning, and it should be stable under the execution of the abstract assembly primitive, and we define this property  $(\mathsf{Sta}(\Upsilon(\overline{v}), p_r))$  formally below:

```
\forall \, \mathbb{S}_{c}, \, \mathbb{S}'_{r}, w, \, S, \, \mathbb{p}_{r}.
(\Upsilon(\overline{v}), \mathbb{S}_{c}) \xrightarrow{--+} (\bot, \mathbb{S}'_{c}) \, \wedge \, \mathbb{S}_{c} \, \bot \, \mathbb{S}_{r} \, \wedge \, (S, \mathbb{S}_{r}, \Upsilon(\overline{v}), w) \models \mathbb{p}_{r} \Longrightarrow
\exists \, \mathbb{S}'_{r}. \, (\Upsilon(\overline{v}), \mathbb{S}_{c} \, \uplus \, \mathbb{S}_{r}) \xrightarrow{--+} (\bot, \mathbb{S}'_{c} \, \uplus \, \mathbb{S}'_{r}) \, \wedge \, \mathbb{S}'_{c} \, \bot \, \mathbb{S}'_{r} \, \wedge \, (S, \mathbb{S}'_{r}, \bot, w) \models \mathbb{p}_{r}
```

#### 5.7 Logic Ensuring Contextual Refinement

We give the semantics of **WfPrim** rule in Def. 5.3. It says that any high-level abstract assemly primitive can establish a simulation relation with its low-level implementation in code heap  $C_{as}$ . We define this simulation relation in Def. 5.4, which means that if there exists a relational state  $(S, \mathbb{S}, A, w)$  satisfies the precondition  $\mathfrak{p}$ , then we have the simulation  $\mathfrak{q} \models (C_{as}, S, \mathfrak{f}, \mathfrak{f} + 4) \preccurlyeq_i^0 (A, \mathbb{S})$  defined in Def. 5.5.

## Definition 5.3 (Well-defined Primitive Set Semantics).

```
\Psi \models C_{\mathrm{as}} : \Omega ::= \forall \, \mathbf{f} \in \mathrm{dom}(\Omega), \, \iota. \, \exists \, \Upsilon, \, \overline{v}, \, \mathsf{fp}, \mathsf{fq}.
\mathsf{wdSpec}(\mathsf{fp}, \mathsf{fq}, \Upsilon) \, \wedge \, (\mathsf{fp} \, \iota \Longrightarrow ( \Upsilon(\overline{v}) ) * \mathsf{true}) \, \wedge \, (C_{\mathrm{as}}, \mathbf{f}) \preccurlyeq^{\mathsf{fp}} {}^{\iota, \, \mathsf{fq}} {}^{\iota} \, \Upsilon(\overline{v})
\mathsf{where} \, \, \Omega(\mathbf{f}) = \Upsilon, \Psi(\mathbf{f}) = (\mathsf{fp}, \mathsf{fq}), \, \mathsf{and} \, (C_{\mathrm{as}}, \mathbf{f}) \preccurlyeq^{\mathsf{p}, \mathsf{q}} A \, \mathsf{is} \, \mathsf{defined} \, \mathsf{Def.} \, 5.4.
```

## Definition 5.4 (Simulation for Implementation and Primitive).

$$(C_{as}, \mathbf{f}) \preceq^{\mathbf{p}, \mathbf{q}} A ::= \forall S, \mathbb{S}, w. (S, \mathbb{S}, A, w) \models \mathbf{p} \Longrightarrow \exists i \in Index. \ \mathbf{q} \models (C_{as}, S, \mathbf{f}, \mathbf{f} + 4) \preceq^{0}_{i} (A, \mathbb{S})$$
  
where  $\mathbf{q} \models (C_{as}, S, \mathbf{pc}, \mathbf{ppc}) \preceq^{k}_{i} (A, \mathbb{S})$  is defined in Def. 5.5.

**Definition 5.5.** Whenever  $q \models (C_{as}, S, pc, npc) \leq_i^k (A, \mathbb{S})$  holds, we have the following holds:

- 1. if  $C_{as}(pc) = i$ , then:

  - there exists S', pc', npc', such that  $(C_{as}, S, pc, npc) :: \xrightarrow{\tau} (C_{as}, S', pc', npc');$  for any S', pc', npc', if  $(C_{as}, S, pc, npc) :: \xrightarrow{\tau} (C_{as}, S', pc', npc')$ , then one of the following holds:
    - (a)  $\exists j < i$ .  $q \models (C_{as}, S', pc', npc') \preccurlyeq_i^k (A, S);$
    - (b) there exists S',  $j \in Index$ , such that

$$(A, \mathbb{S}) \longrightarrow (\perp, \mathbb{S}')$$
 and  $\mathfrak{q} \models (C_{as}, S', pc', npc') \preccurlyeq_j^k (\perp, \mathbb{S}')$  holds;

- 2. if  $C_{as}(pc) = call f$ , then:

  - there exists S', pc', npc', such that  $(C_{as}, S, pc, npc) :: \stackrel{\tau}{\Longrightarrow}^2 (C_{as}, S', pc', npc');$  for any S', pc', npc', if  $(C_{as}, S, pc, npc) :: \stackrel{\tau}{\Longrightarrow}^2 (C_{as}, S', pc', npc')$ , then one of the following holds:

(a) 
$$\exists j < i. \ q \models (C_{as}, S', pc', npc') \preccurlyeq_j^{k+1} (A, \mathbb{S});$$
  
(b) there exists  $\mathbb{S}', j \in Index$ , such that  $(A, \mathbb{S}) \dashrightarrow (\bot, \mathbb{S}')$  and  $q \models (C_{as}, S', pc', npc') \preccurlyeq_j^{k+1} (\bot, \mathbb{S}')$  holds;

- 3. if  $C_{as}(pc) = ret1$ , then:

  - there exists S', pc', npc', such that  $(C_{as}, S, pc, npc) :: \stackrel{\tau}{\Longrightarrow}^2 (C_{as}, S', pc', npc');$  for any S', pc', npc', if  $(C_{as}, S, pc, npc) :: \stackrel{\tau}{\Longrightarrow}^2 (C_{as}, S', pc', npc')$  then there exists  $j \in Index$ , S' and A', such that the following holds:
    - (a) either j < i,  $\mathbb{S}' = \mathbb{S}$  and A' = A; or  $(A, \mathbb{S}) \dashrightarrow (A', \mathbb{S}')$ ;
    - (b) if k = 0, then there exists w': (where  $S'.Q.R(\mathbf{r}_{15}) = \mathbf{f}$ )  $(S', \mathbb{S}', A', w') \models q, A' = \perp, pc' = f' + 8, and pc' = f' + 12;$

$$q \models (C_{\mathrm{as}}, S', \mathsf{pc}', \mathsf{npc}') \preccurlyeq_{j}^{k-1} (A', \mathbb{S}')$$

The definition of simulation  $q \models (C_{as}, S, pc, npc) \preccurlyeq_i^k (A, \mathbb{S})$  carries an index i, which is used to ensure the termination preserving, and the depth k of function call. The simulation relation can not only make sure the safe execution of lowlevel SPARCv8 function, which is similar with the safe defined in Def. 3.4, but also ensure the safe execution of the corresponding high-level abstract assembly primitive. The following theorem, whose correctness can be derived from Lemmas 5.6 and 5.7, shows the soundness of our logic, which means that our logic can imply the contextual refinement between implementation  $C_{as}$  and the set of abstract assembly primitives  $\Omega$ .

Lemma 5.6 (Logic Ensures Simulation).

$$\Psi \vdash C_{\mathrm{as}} : \Omega \Longrightarrow \Psi \models C_{\mathrm{as}} : \Omega$$

Lemma 5.7 (Simulation Implies Primitive Correctness).

$$\Psi \models C_{\mathrm{as}} : \Omega \Longrightarrow C_{\mathrm{as}} \sqsubseteq \Omega$$

Theorem 5.8 (Logic Soundness).

$$\Psi \vdash C_{\mathsf{as}} : \Omega \Longrightarrow C_{\mathsf{as}} \sqsubseteq \Omega$$

### 6 Related Work and Conclusion

There has been much work on assembly or machine code verification. Most of them do not support function calls or simply treat function calls in the continuation-passing style where return addresses are viewed as first class code pointers [23, 5, 20, 21, 31, 24, 27]. SCAP [11] supports assembly code verification with various stack-based control abstractions, including function call and return. We follow the same idea here. However, SCAP gives a syntactic-based soundness proof by establishing the preservation of the syntactic judgment, which makes it difficult to interact with other modules verified in different logic. Since our goal is to verify inline assembly and link the verified code with the verified C programs, we give a direct-style semantic model of the logic judgments. Also SCAP is based on a simplified subset of assembly instructions, while our work is focused on a realistically modeled subset of SPARCv8 instructions.

In terms of the support of realistic instruction sets, previous work on proofcarrying code (PCC) and typed assembly language (TAL) mostly supports subsets of x86. Myreen's work [22] presents a framework for ARM verification based on a realistic model (but it doesn't support function call and return).

As part of the Foundational Proof-Carrying Code (FPCC) project [5], Tan and Appel present a program logic  $\mathcal{L}_c$  for reasoning about control flow in assembly code [27]. Although  $\mathcal{L}_c$  is implemented on top of SPARC machine language, the underlying logic is a type system instead of a full-blown program logic for functional correctness. It reasons about functions in the continuation-passing style. Also handling SPARC features such as delayed writes or delayed control transfers is not the focus of  $\mathcal{L}_c$ . There has been work on mechanized semantics of the SPARCv8 ISA. Hou et al. [32] model the SPARCv8 ISA in Isabelle/HOL, and test their formal model against LENON3 simulation board [1], which is a synthesisable VHDL model of a 32-bit processor compliant with the SPARC V8 architecture, through more than 100,000 instruction instances. Wang et al. [28] formalize its semantics in Coq. Our operational semantics of SPARCv8 follows Wang et al. [28]. But Wang et al. do not validate their formalization against actual hardware, we remain it as a future work.

Ni et al. [25] verify a context switch module of 19 lines in x86 code to show case the support of embedded code pointers (ECP) in XCAP [24]. The context switch module we verify comes from a practical OS kernel, which is more realistic and consists of more than 250 lines of assembly code, but our logic (CALL rule) does not really support the switch of return pointers, which requires further extension like OCAP [9]. Our focus is to verify the code manages the register windows, and the function calls made internally. We address this problem of calling context switch routine in another way in our work, such that we can use the extended program logic to verify the contextual refinement between context switch routine and switch primitive. The method of verifying context switch primitive in our work is inspired by the approach used by Guo et al. [13] that protects the TCBs through abstraction, but the state relation between low- and high-level program state in our work is more sophisticated than theirs, because of some specific mechanism, like register windows, in SPARCv8. They also do

not develop a general program logic for refinement verification of the assembly. Our extended program logic is based on the relational program logic, which has already been well used successfully in the refinement verification of C language [29, 18, 19].

Yang and Hawblitzel [30] verify Verve, an x86 implementation of an experimental operating system. Verve has two levels, the high-level TAL code and the low-level "Nucleus" that provides primitive access to hardware and memory. The Nucleus code is verified automatically using the Z3 SMT solver, while the goal of our work is to generate machine checkable proofs. Another key difference is the use of different ISAs. Here we give details to verify specific features of SPARCv8 programs.

There have been many techniques and tools proposed for automated program verification (e.g. [7, 6]). It is possible to adapt them to verify SPARCv8 code. We propose a new program logic and do the verification in Coq mainly because the work is part of a big project for a fully certified OS kernel for aerospace crafts whose inline assembly is written in SPARCv8. We already have a program logic implemented in Coq for C programs, which allows us to verify C code with Coq proofs. Therefore we want to have a program logic for SPARCv8 so that it can be linked with the logic for C and can generate machine-checkable Coq proofs too. That said, many of the automated verification techniques can be applied to reduce the manual efforts to write Coq proofs, which we would like to study in the future work. We have already shown that it's possible to apply some Coq tactics based on separation logic [8] in our work.

Conclusion. We present a program logic for SPARCv8. Our logic is based on a realistic semantics model and supports main features of SPARCv8, including delayed control transfer, delayed writes, and register windows. We have applied the program logic to verify the main body of the context switch routine in a realistic embedded OS kernel. And we also extend the program logic to support refinement verification. The extended program logic in our current work is designed for verifying the contextual refinement between the implementation of context switch routine and switch primitive mainly. It will be interesting to improve and use our extended program logic to verify the correctness of more abstract assembly primitives, like thread creating primitive spawn, and thread joining primitive join, whose execution will add or delete some threads. Our current work can only handle sequential SPARCv8 program verification, and do not consider interrupts in our machine model. Finally, the client code in our contextual refinement verification is SPARCv8 code. However, many works, like [29], use C program as the client code of calling abstract assembly primitive. we would like to link the verified inline assembly with verified C code for whole system verification in the future. Maybe it's possible to follow some complier verification works, e.g. [16, 14], to make sure that the behaviors of the client C code is equivalent with the complied SPARCv8 code, which acts as a client code in our work.

### References

- [1] Leon3. https://www.gaisler.com/index.php/products/processors/leon3.
- [2] Linux v2.6.17.10. https://elixir.bootlin.com/linux/v2.6.17.10/source/arch.
- [3] Program logic for SPARCv8 implementation in Coq (project code). https://github.com/jpzha/VeriSparc.
- [4] SPARC. https://gaisler.com/doc/sparcv8.pdf.
- [5] A. W. Appel. Foundational proof-carrying code. In *Proc. 16th Annual IEEE Symposium on Logic in Computer Science*, pages 85–97, Jan 1998.
- [6] J. Berdine, C. Calcagno, and P. O'Hearn. Smallfoot: Modular automatic assertion checking with separation logic. In FMCO, 2005.
- [7] J. Berdine, C. Calcagno, and P. O'Hearn. Symbolic execution with separation logic. In APLAS, 2005.
- [8] J. Cao, M. Fu, and X. Feng. Practical tactics for verifying c programs in coq. In CPP, pages 97–108, January 2015.
- [9] X. Feng, Z. Ni, Z. Shao, and Y. Guo. An open framework for foundational proofcarrying code. In *TLDI*, pages 67–78, 2007.
- [10] X. Feng, Z. Shao, Y. Guo, and Y. Dong. Certifying low-level programs with hardware interrupts and preemptive threads. In PLDI, pages 170–182, 2008.
- [11] X. Feng, Z. Shao, A. Vaynberg, S. Xiang, and Z. Ni. Modular Verification of Assembly Code with Stack-Based Control Abstractions. In PLDI, June 2006.
- [12] R. Gu, J. Koenig, T. Ramananandro, Z. Shao, X. N. Wu, S.-C. Weng, H. Zhang, and Y. Guo. Deep specifications and certified abstraction layers. In *POPL*, pages 595–608, Jan 2015.
- [13] Y. Guo, X. Feng, Z. Shao, and P. Shi. Modular Verification of Concurrent Thread Management. In APLAS, page 315–331, December 2012.
- [14] H. Jiang, H. Liang, S. Xiao, J. Zha, and X. Feng. Towards Certified Separate Compilation for Concurrent Programs. In PLDI, pages 111–125, June 2019.
- [15] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal Verification of an OS Kernel. In SOSP, pages 207–220, Oct 2009.
- [16] X. Leroy. A formally verified compiler back-end. *Journal of Automated Reasoning*, 43(363), December 2009.
- [17] X. Leroy and S. Blazy. Formal verification of a C-like memory model and its uses for verifying program transformations. *Journal of Automated Reasoning*, 41(1):1– 31, July 2008.
- [18] H. Liang and X. Feng. Modular verification of linearizability with non-fixed linearization points. In PLDI, pages 459–470, June 2013.
- [19] H. Liang, X. Feng, and Z. Shao. Compositional verification of terminationpreserving refinement of concurrent programs. In *LICS*, July 2014.
- [20] G. Morrisett, K. Crary, N. Glew, D. Grossman, R. Samuels, F. Smith, D. Walker, S. Weirich, and S. Zdancewic. Talx86:a realistic typed assembly language. In 1999 ACM SIGPLAN Workshop on Compiler Support for System Software, pages 25–35, May 1996.
- [21] G. Morrisett, D. Walker, K. Crary, and N. Glew. From System F to typed assembly language. In POPL, pages 85–97, Jan 1998.
- [22] M. O. Myreen and M. J. Gordon. Hoare logic for realistically modelled machine code. In Proc. 13th International Conference on Tools and Algorithms for Construction and Analysis of Systems, 2007.

- [23] G. C. Necula and P. Lee. Safe Kernel Extensions Without Run-Time Checking. In Proc.2nd USENIX Symp. on Operating System Design and Impl, pages 229–243, 1996
- [24] Z. Ni and Z. Shao. Certified Assembly Programming with Embedded Code Pointers. In POPL, pages 320–333, 2006.
- [25] Z. Ni, D. Yu, and Z. Shao. Using XCAP to Certify Realistic Systems code: Machine context management. In TPHOLs, Sept 2007.
- [26] J. Reynolds. Separation logic: a logic for shared mutable data structures. July 2002.
- [27] G. Tan and A. W. Appel. A compositional logic for control flow. In  $\mathit{VMCAI}$ , Jan 2006.
- [28] J. Wang, M. Fu, L. Qiao, and X. Feng. Formalizing SPARCv8 Instruction Set Architecture in Coq. In SETTA, Oct 2017.
- [29] F. Xu, M. Fu, X. Feng, X. Zhang, H. Zhang, and Z. Li. A practical verification framework for preemptive os kernels. In CAV, pages 59–79, July 2016.
- [30] J. Yang and C. Hawblitzel. Safe to the last instruction: automated verification of a type-safe operating system. In *PLDI*, pages 99–110, 2010.
- [31] D. Yu, A. H. Nadeem, and Z. Shao. Building certified libraries for PCC: Dynamic storage allocation. Science of Computer Programming, 50(1-3):101-127, Mar 2004.
- [32] H. Zhe, D. Sanan, A. Tiu, Y. Liu, and K. C. Hoa. An Executable Formalisation of the SPARCv8 Instruction Set Architecture: A Case Study for the LEON3 Processor. In FM, 2016.

# A More about High-level Instructions Execution

We give some supplements about the execution of high-level instructions. As we have explained in Sec. 5.1, the register windows and delayed buffer in pyhsical SPARCv8 program state are omitted in high-level Pseudo-SPARCv8 program state. So, we do not define state transition rules for instructions save, restore, rd, and wr. The instruction transition rules for the rest of instructions, like 1d and add, have no much difference with the rules in pyhical SPARCv8 program.

$$\begin{split} & \underbrace{ \begin{bmatrix} \mathbf{a} \end{bmatrix}_{\mathbb{R}} = l \quad M(l) = v \quad \mathbb{R}' = \mathbb{R} \{ \mid \mathbf{r}_d \leadsto v \mid \} }_{\mathbf{execi}(\mathsf{ld} \ \mathbf{a} \ \mathbf{r}_d, ((\mathbb{R}, b, \mathbb{F}), M)) =_{\mathsf{H}} ((\mathbb{R}', b, \mathbb{F}), M)} \\ & \underbrace{ \mathbb{R}(\mathbf{r}_s) = v_1 \quad \llbracket \mathbf{o} \rrbracket_{\mathbb{R}} = v_2 \quad \mathbf{r}_d = \mathrm{dom}(\mathbb{R}) \quad \mathbb{R}' = \mathbb{R} \{ \mid \mathbf{r}_d \leadsto v \mid \} }_{\mathbf{execi}(\mathsf{add} \ \mathbf{r}_s, \mathbf{o}, \mathbf{r}_d, ((\mathbb{R}, b, \mathbb{F}), M)) =_{\mathsf{H}} ((\mathbb{R}', b, \mathbb{F}), M)} \end{split}$$

Fig. 24. Transition rules for instructions 1d and add in high-level

We show the state transition rules for instructions 1d and add in high-level in Fig. 24. The register file updating operation is defined formally below:

$$\mathbb{R}\{\|\hat{\mathbf{rn}} \leadsto v\|\} ::= \mathbb{R}\{\hat{\mathbf{rn}} \leadsto v\} \quad \text{where } \hat{\mathbf{rn}} \notin \{\%\mathsf{sp}, \%\mathsf{fp}\}$$

According to the definition, we can find that updating the register %sp (alias of  $r_{14}$ ), which is used to point to the top of the current stack frame, and %fp (alias of  $r_{14}$ ), which is used to point to the top of the previous stack frame is not allowed. Only the execution of instructions Psave, which is used to allocate a new stack frame, and Prestore, which is used to free the current stack frame, can modify them. The evaluation of the opand and address expression in high-level is defined formally below:

$$\llbracket \mathtt{o} \rrbracket_{\mathbb{R}} ::= \begin{cases} R(r) & \text{if } \mathtt{o} = r \\ w & \text{if } \mathtt{o} = w, \\ -4096 \leq w \leq 4095 & & \llbracket \mathtt{a} \rrbracket_{\mathbb{R}} ::= \begin{cases} \llbracket \mathtt{o} \rrbracket_{\mathbb{R}} & \text{if } \mathtt{a} = \mathtt{o} \\ v_1 + v_2 & \text{if } \mathtt{a} = \mathtt{r} + \mathtt{o}, \ \mathbb{R}(\mathtt{r}) = v_1 \\ & \text{and } \llbracket \mathtt{o} \rrbracket_{\mathbb{R}} = v_2 \\ \bot & \text{otherwise} \end{cases}$$

The "Psave w " can be viewed as a macro of "save %sp, -w, %sp ", and "Prestore" can be viewed as a marco of "restore %g<sub>0</sub>, %g<sub>0</sub>, %g<sub>0</sub>".

Fig. 25. Realistic SPARCv8 Code and Pseudo-SPARCv8 Code

Fig. 25 gives a simple comparision with the realistic SPARCv8 code and our Pseudo SPARCv8 code in high-level. Fig. 25 (a) is the realistic SPARCv8 code. It uses instruction "save %sp, -64, %sp" to store the caller's context and allocate a new stack frame size 64 bytes for the current procedure, and use instruction "restore %g<sub>0</sub>, %g<sub>0</sub>, %g<sub>0</sub>" to restore the caller's context at the exitance of the current procedure. Fig. 25 (b) is the same function in Pseudo-SPARCv8 code, and we can find that the instructions that is responsible for saving and restoring the context of caller is replaced by "Psave w" and "Prestore".

### B More about Low-level Language

The machine states and syntax low-level SPARCv8 language (defined in Fig. 26) are taken from the model of SPARCv8 defined in Fig. 2. So, we omit some definitions, like RegName and DelayCycle, which are same as ones defined in Fig. 2 here.

The low-level program P is a tuple including the code heap C, low-level program state S, program counter  $\operatorname{pc}$  and  $\operatorname{npc}$ . The code heap C is defined in Fig. 14. The low-level program state S uses the block-based memory model M, which is the same as the high-level program. The low-level message does not need  $\operatorname{call}(\mathbf{f}, \overline{v})$ , because the low-level program does not call abstract assembly primitive, but call its corresponding implementation, which is a function.

Fig. 26. Machine States and Syntax for Low-level SPARCv8 Language

**Operational Semantics for Low-level Code.** The operational semantics for low-level program is defined in Fig. 27. Most of the state transition rules are taken from Fig. 6. Here, we use " $\mathbf{execi}(\mathbf{i}, \underline{\ }) =_{\mathsf{L}} \underline{\ }$ " to represent the step for simple instruction  $\mathbf{i}$ .

The execution of instruction Psave is discussed in divided into two cases: (1) if we can successfully set the next register window as the current one (represented as  $\mathbf{save}(R,F)=(R',F')$ ), a new stack frame in memory will be allocated (shown as  $\mathbf{alloc}(M,b,0,w)=M'$ ); (2) if we can't set the next register window as the current one (represented as  $\mathbf{save}(R,F)=\mathbf{undefined}$ ), a windows overflow trap will be triggered, and we redo the instruction Psave. The execution of instruction Prestore does the reverse. Here, we use " $_{-} \uparrow _{-}$ " to represent the state transition caused by window overflow trap and use " $_{-} \downarrow _{-}$ " to represent the state transition caused by window underflow trap. Their formal definitions are shown in Fig. 29.

# C More about State Relation Between Low- and High-level Program

After introducing the definitions of low- and high-level program, we establish the state relation between low- and high-level program in this section. Establishing their state relation is not a trivial task, because there are two major differences low- and high-level program states. **First**, all the procedures' contexts of a specific thread are saved in high-level frame list  $\mathbb{F}$ . However, for low-level program, part of the contexts are saved in register windows (modeled as low-level frame list F), the other part of the contexts are saved in corresponding stack frame in memory, because the number of register windows is limited; **Second**, the high-level concurrent Pseudo-SPARCv8 program is multithreaded, but the low-level SPARCv8 program does not have the concept of thread pool.

Relation for low- and high-level FrameList. The relation between lowand high-level frame list is defined in Fig. 30. We represent this relation as form " $(b, F, M_K) \Downarrow \mathbb{F}$ ", The tuple of b, F and  $M_K$  is the state of stack in lowlevel program, because, in the low-level program, part of the produces' contexts are saved in frame list F, which can also be understand as a prefix the whole frame list describe in assertion  $\text{cwp} \mapsto (\! \lfloor , F \! \rfloor)$ , the other part of the contexts are saved in corresponding frame list represent as  $M_K$ . The high-level frame list  $\mathbb{F}$ represents the state of stack in high-level program. Fig. 31 gives a more intuition understanding of this relation. Here, some part of the contexts F (the pink part in the left side of the Fig. 31) are saved in register windows, and the other part of contexts  $M_K$  (the green part in the left side of the Fig. 31) are saved in stack

$$(R,F) \rightrightarrows (R',F') \\ C \vdash ((M,(R',F),D'),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\beta} (M',(R'',F'),D'') \\ C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) ::: \xrightarrow{\beta} (C,(M,(R'',F'),D'')) \\ (a) \text{ Low-level Program Transition} \\ \hline C(\mathsf{pc}) = \mathbf{i} \quad \expci(\mathbf{i},(M,Q,D)) = \mathbf{i},(M',Q',D') \\ \hline C \vdash ((M,Q,D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M',Q',D'),\mathsf{npc},\mathsf{npc} + 4) \\ \hline C(\mathsf{pc}) = \mathrm{jmp} \ \mathbf{a} \quad [\mathbf{a}]_R = \mathbf{f} \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M,(R,F),D),\mathsf{npc},\mathbf{f}) \\ \hline C(\mathsf{pc}) = \mathrm{call} \ \mathbf{f} \quad \mathbf{r}_{15} \in \mathrm{dom}(R) \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M,(R,F),D),\mathsf{npc},\mathbf{f}) \\ \hline C(\mathsf{pc}) = \mathrm{ret} \mathbf{1} \quad R(\mathsf{r}_{15}) = \mathbf{f} \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M,(R,F),D),\mathsf{npc},\mathbf{f} + 8) \\ \hline C(\mathsf{pc}) = \mathrm{print} \quad R(\%_{00}) = v \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M',(R,F),D),\mathsf{npc},\mathsf{npc} + 4) \\ \hline C(\mathsf{pc}) = \mathrm{Psave} \ w \quad \mathrm{save}(R,F) = \mathrm{undefined} \quad (M,R,F) \ \text{if} \ (M',R',F') \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M',(R',F'),D),\mathsf{npc},\mathsf{npc}) \\ \hline C(\mathsf{pc}) = \mathrm{Prestore} \quad \mathrm{restore}(R,F) = \mathrm{undefined} \quad (M,R,F) \ \text{if} \ (M',R',F') \\ \hline C \vdash ((M,(R,F),D),\mathsf{pc},\mathsf{npc}) \circ \xrightarrow{\tau} ((M',(R',F'),D),\mathsf{pc},\mathsf{npc}) \\ \hline (b) \ \mathrm{Low-level} \ \mathrm{Control} \ \mathrm{Transfer} \ \mathrm{Instruction} \ \mathrm{Transition} \\ \hline alloc(M,b,0,w) = M' \quad \mathrm{save}(R,F) = (R',F') \quad R'' = R'\{\%\mathsf{sp} \leadsto (b,0)\} \\ \hline \expci(\mathsf{Psave} \ w, (M,(R,F),D)) = \mathbf{l} \ (M',(R',F',D)) \\ \hline \expci(\mathsf{Prestore} \ (M,(R,F),D)) = \mathbf{l} \ (M',(R',F',D)) \\ \hline \expci(\mathsf{Prestore} \ (M,(R,F),D)) = \mathbf{l} \ (M',(R',F',D)) \\ \hline \expci(\mathsf{Prestore} \ (M,(R,F),D)) = \mathbf{l} \ (M',(R',F',D)) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D)) = \mathbf{l} \ (M',(R',F',D)) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R',F',D')) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R',F',D')) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R',F',D')) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R',F',D')) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R',F',D')) \\ \hline \expci(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R,(R,F),D)) = \mathbf{l} \ (M',(R,(R,F),D)) \\ \hline \ \mathrm{execi}(\mathsf{restore} \ (R,(M,(R,F),D))) = \mathbf{l} \ (M',(R,(R,F),D)) = \mathbf{l} \$$

Fig. 27. Selected operational semantics rules for low-level program

(d) Low-level Expression Semantics

frame in memory. However, in high-level state, they are abstracted as list named high-level frame list  $\mathbb{F}$ .

$$\begin{aligned} \mathbf{fresh}(b,M) &::= \forall \ w. \ (b,w) \notin \mathrm{dom}(M) \\ \mathbf{alloc}(M,b,w_l,w_h) &= M' \ ::= (M' = M \land w_l = w_h) \lor \\ & (M' = M\{(b,w_l) \leadsto \_, \dots, (b,w_h-1) \leadsto \_\} \land \mathbf{fresh}(b,M) \land w_l < w_h) \\ \mathbf{free}(b,M) &= M' \ ::= \forall \ b' \neq b, w'. \ M'(b',w') = M(b',w') \land \nexists w. \ (b,w) \in \mathrm{dom}(M) \end{aligned}$$

Fig. 28. Auxiliary Definitions for Memory Operation

$$F = F_1 \cdot \operatorname{fm}_1 \cdot \operatorname{fm}_2 \cdot \operatorname{fm}_3 \cdot \operatorname{fm}_4 \quad \operatorname{fm}_1[6] = (b, 0)$$

$$R(\operatorname{wim}) = 2^n \quad \{(b_0, 0), \dots, (b_0, 15)\} \subseteq \operatorname{dom}(M) \quad R' = R''\{\operatorname{wim} \leadsto 2^{\operatorname{next\_cwp}(n)}\}$$

$$M' = M\{[(b_0, 0), \dots, (b_0, 7)] \leadsto R_0([\operatorname{local}])\}\{[(b_0, 8), \dots, (b_0, 15)] \leadsto R_0([\operatorname{in}])\}$$

$$(M, (R, F)) \uparrow (M', (R', F'))$$

$$F = \operatorname{fm}_1 :: \operatorname{fm}_2 :: F'' \quad R(\mathbf{r}_{30}) = (b, 0) \quad R(\operatorname{wim}) = 2^n$$

$$\{[(b_0, 0), \dots, (b_0, 7)] \leadsto \operatorname{fm}'_1, [(b_0, 8), \dots, (b_0, 15)] \leadsto \operatorname{fm}'_2\} \subseteq M'$$

$$R' = R''\{\operatorname{wim} \leadsto 2^{\operatorname{prev\_cwp}(n)}\} \quad F' = \operatorname{fm}'_1 :: \operatorname{fm}'_2 :: F''$$

$$(M, (R, F)) \downarrow (M, (R', F'))$$

Fig. 29. Windows Over- and UnderFlow

$$\frac{M_K = \{(b, \operatorname{fm}_1, \operatorname{fm}_2)\} \uplus M_K' \quad \operatorname{fm}_2[6] = (b', 0) \quad (b', \operatorname{nil}, M_K') \Downarrow \mathbb{F}}{(b, \operatorname{nil}, M_K) \Downarrow (b, \operatorname{fm}_1, \operatorname{fm}_2) :: \mathbb{F}}$$
 
$$\frac{\operatorname{fm}_1 = \operatorname{fm}_1' \quad \operatorname{fm}_2 = \operatorname{fm}_2' \quad M_K = \{(b, \_, \_)\} \uplus M_K' \quad \operatorname{fm}_2[6] = (b', 0) \quad (b', F, M_K') \Downarrow \mathbb{F}}{(b, \operatorname{fm}_1 :: \operatorname{fm}_2 :: F, M_K) \Downarrow (b, \operatorname{fm}_1', \operatorname{fm}_2') :: \mathbb{F}}$$

Fig. 30. Relation for low- and high-level FrameList



Fig. 31. Abstraction of Register Windows and Memory

As shown in Fig. 30, if the low-level frame list F is nil and the memory is  $\emptyset$ , and the high-level frame list  $\mathbb{F}$  is nil, it means there is no context stored. If the frame list is nil but the high-level frame list is  $(b, \text{fm}_1, \text{fm}_2) :: \mathbb{F}$ , it means

$$\begin{split} \mathsf{ctxfm}(R,F) &:= \begin{cases} F_1 & \text{ if } R(\mathsf{cwp}) = w_{id}, \, R(\mathsf{wim}) = 2^n, \, \mathsf{cwp} \neq n, \\ & F = F_1 \cdot F_2, \, 0 \leq w_{id}, \, n \leq N, \, |F_1| = 2 \times (N + n - w_{id} - 1) \% N \\ \bot & \text{ otherwise} \end{cases} \\ R &\hookrightarrow \mathbb{R} & ::= (\forall \, i \in \{0,\dots,31\}. \, R(\mathbf{r}_i) = \mathbb{R}(\mathbf{r}_i)) \, \wedge \, (\forall \, \mathsf{sr}. \, \exists \, w. \, R(\mathsf{sr}) = w) \\ & \wedge R(\mathbf{n}) = \mathbb{R}(\mathbf{n}) \, \wedge \, R(\mathbf{z}) = \mathbb{R}(\mathbf{z}) \, \wedge \, R(\mathbf{c}) = \mathbb{R}(\mathbf{c}) \, \wedge \, R(\mathbf{v}) = \mathbb{R}(\mathbf{v}) \end{cases} \\ & \frac{M_c = M_{\mathrm{ctx}} \uplus M_K \quad \mathrm{dom}(M_{\mathrm{ctx}}) = \mathrm{Dom} \mathrm{Ctx} \mathsf{M}(t,b)}{\mathrm{ctxfm}(R,F) = F' \quad R(\mathbf{r}_{30}) = (b',0) \quad (b',F',M_K) \Downarrow \mathbb{F} \quad R \hookrightarrow \mathbb{R}}{(M_c,(R,F)) \Downarrow_c (t,((\mathbb{R},b,\mathbb{F}),\mathsf{pc},\mathsf{npc}))} \\ & \frac{M_1 \Downarrow_r T_1 \quad M_2 \Downarrow_r T_2}{M_1 \uplus M_2 \Downarrow_r T_1 \uplus T_2} \quad \frac{M \blacktriangleright_t Q \quad (M,Q) \Downarrow_c (t,\mathcal{K})}{M \Downarrow_r \{t \leadsto \mathcal{K}\}} \end{split}$$

Fig. 32. Relation for Thread Pool and low-level Memory

that the contexts  $fm_1$  and  $fm_2$  are saved in stack frame in memory, whose block identifier is b. Here, we use " $\{(b, fm_1, fm_2)\}$ " defined below to represent the part of memory saving  $fm_1$  and  $fm_2$ . This memory contains only one block b.

If the frame list is  $\operatorname{fm}_1::\operatorname{fm}_2::F$  and the high-level frame list is  $(b',\operatorname{fm}_1',\operatorname{fm}_2')::\mathbb{F}$ , it means that the contexts  $\operatorname{fm}_1$  and  $\operatorname{fm}_2$  have not been saved in block b'. So, we require the contexts  $\operatorname{fm}_1$  and  $\operatorname{fm}_2$  saved in low-level frame list and the  $\operatorname{fm}_1'$  and  $\operatorname{fm}_2'$  saved in high-level frame list are equal. The block b' used to save  $\operatorname{fm}_1$  and  $\operatorname{fm}_2$  has not been used yet, so we don't care about its contents.

Relation for ThreadPool and low-level Memory. In high-level program, the thread local state of each thread is saved in a thread pool T. However, in low-level program, the local state of each thread is saved in memory (TCB and stack). For example, in Sec. 4, we introduce that the execution of the context switch module will save the register state of current thread into its TCB and stack in memory. So, the thread pool in high-level program can be viewed as an abstraction of low-level memory used to store the contexts of threads.

We define the relation between high-level thread pool and the memory used to save context in Fig. 32 formally. We use " $(M_c,(R,F)) \Downarrow_c (t,((\mathbb{R},b,\mathbb{F}),\mathsf{pc},\mathsf{npc}))$ " to represent the relation between the thread local states of current thread of low- and high-level program. The memory  $M_c$  owned the current thread t can be splitted into two parts  $M_{\mathsf{ctx}}$  and  $M_{\mathsf{K}}$ . The  $M_{\mathsf{ctx}}$  are use the register file, whose domain is represented as  $\mathsf{DomCtxM}(t,b)$ . It takes two arguments: the identifier t of the current thread and the block b of the stack frame at the top of the stack. Because the context switch module may save the register file in TCB and the

stack frame of the current procedure. The other part of the memory  $M_K$  is used to save the contexts of the previous procedures, which is abstracted as  $\mathbb{F}$  in high-level program. We define  $R \hookrightarrow \mathbb{R}$  to represent the relation between the register file R in low-level and  $\mathbb{R}$  in high-level program. The operation  $\mathsf{ctxfm}(R,F)$  is used to exact the prefix  $F_1$  of the frame list F, which saves the contexts of the previous procedures. Suppoing the value of the  $\mathsf{cwp}$  is  $w_{id}$ , meaning that the id of the current window is  $w_{id}$ , and the value of the  $\mathsf{wim}$  is  $2^n$ , meaning the id n register window is invalid. According to the introduction in Fig. 2.1, we usually set a window invalid to avoid over- and underflow of the register windows. So, we known that register windows id from  $(w_{id}+1)\%N$  to (n-1+N)%N save the contexts of the previous procedures. So, we extract the contents  $F_1$  of them from the whole frame list F.

We define " $M \downarrow_r \{t \leadsto \mathcal{K}\}$ " to represent the relation between the thread local states of ready thread of low- and high-level program. The operation " $M \blacktriangleright_t Q$ " means that we can restore the register state Q from memory M. When the context of the ready thread has been restored, we can establish a relation " $(M,Q) \downarrow_c (t,\mathcal{K})$ " between low- and high-level thread local states of thread t. Here, we don't represent the definitions of  $\mathsf{DomCtxM}(t,b)$  and  $M \blacktriangleright_t Q$  here, because their definitions are based on the implementation of the context switch routine in OS kernel. And the soundness of our extended program logic does not rely on their concrete definition.

**Relation for Whole Program State.** Finally, we introduce the state relation for whole program states between low- and high-level program below:

$$\frac{M = M_c \uplus M_T \uplus \{\mathsf{TaskCur} \leadsto (t,0)\} \uplus M'}{(M_c,Q) \Downarrow_{\mathsf{c}} (t,\mathcal{K}) \qquad M_T \Downarrow_{\mathsf{r}} T \backslash \{t\} \qquad D = \mathrm{nil}}{(M,Q,D) \sim (T,t,\mathcal{K},M')}$$