# **SMT-based Test Program Generation for Cache-memory Testing**

Evgeni Kornikhin Institute for System Programming of RAS Solzhenitsyn St., 25, Moscow, Russia kornevgen@ispras.ru

## **Abstract**

Core-level verification of microprocessors is performed using many assembly programs (test programs). Instructions (and their behavior) for test program can be selected combinatorially. But special initial values for registers are required to satisfy this test program. This article proposes algorithm for generation these values. The article considers instructions performed memory access through cache. Proposing algorithm describes test program behavior as SMT-assertions and uses SMT-solvers to get initial values of registers for given initial state of cache-memory.

#### 1 Introduction

System functional testing of microprocessors uses many assembly programs (test programs). Such programs are loaded to the memory, executed, execution process is logged and analyzed. But modern processors testing requires a lot of test programs. Technical way of test program generation was proposed in [5]. This way based on the microprocessor's model. Its first stage is systematic generation abstract test programs (test templates). This abstract form contains a sequence of instructions with arguments (registers) and test situations (behavior of this instruction; these can be overflow, cache hits, cache misses). The second stage is generation of initial microprocessor state for given test template, i.e. initial values of registers. Technical way from [5] is useful for aimed testing when aim is expressed by instruction sequence with specific behavior. Based on registers values the third, final, stage is generation the sequence of instructions to reach initial microprocessor state. This sequence of instructions with test template get ready assembly program. This paper devoted to the second stage, i.e. initial state generation.

Known researches about test data generation problem contain the following methods of its solving:

## 1. combinatorial methods;

- 2. ATPG-based methods;
- 3. constraint-based methods.

Combinatorial methods are useful for simple test templates (each variable has explicit directive of its domain, each value in domain is possess) [4]. ATPG-based methods are useful for structural but not functional testing [7]. Constraint-based methods are the most promising methods. Test template is translated to the set of constraints (predicates) with variables which represented test data. Then special solver generates values for variables to satisfy all constraints. This paper contains constraint-based method also. IBM uses constraint-based method in Genesys-Pro [8]. But it works inefficiently on test templates from [5]. Authors of another constraint-based methods restrict on registers only and don't consider cache-memory.

## 2 Preliminaries

## 2.1 Test templates description

Test template defines requirements to the future test program. Test template contains sequence of instructions. Each element of this sequence has instruction name, arguments (registers, addresses, values) and *test situation* (constraint on values of arguments and microprocessor state before execution of instruction). Example of test template description for model instruction set:

REGISTER reg1 : 32; REGISTER reg2 : 32; LOAD reg1, reg2 @ 11Miss STORE reg2, reg1 @ 11Hit

This template has 2 instructions – LOAD and STORE. Template begins from variable definitions (consist of name and bit-length). Test situation is specified after "@": test situation of the first instruction if "11Hit" (which means hit in first-level cache) and test situation of the second instruction is "11Miss" (which means miss in first-level cache).

• "LOAD reg1, reg2" loads value from memory by *virtual* address from register "reg2" to the register "reg1";

• "STORE reg1, reg2" stores value from register "reg1" to the memory by *virtual* address in "reg2".

Behavior of "LOAD/STORE x, y" is the following: the first step is physical address generation (*address translation*) and the second step is memory access through cache using generated physical address (see pic. 2.1).



Figure 1. Model of LOAD/STORE instructions.

## 2.2 Tagsets

Tagset is bit-field of physical address (see pic. 2.1). Each LOAD/STORE instruction has a tagset. Tagset is used by memory management unit for cache access. *Displacing tagset* is tagset of instruction with 11Miss test situation. *Hit tagset* is tagset of instruction with 11Hit test situation.

The following syntax will be used for operations on tagsets and virtual addresses:

- 1. "bit-extraction"  $x_{<5>}$  is the 5th bit of x;
- 2. "bit-range-extraction"  $x_{<5...3>}$  is bits from 5th to 3th of x:
- 3. "bit-concatenation" x||y.

#### 2.3 Satisfiability Modulo Theories

SMT (Satisfiability Modulo Theories) problem is "the satisfiability problem of the quantifier-free fragment of various first-order theories" [6]. SMT-solvers concern SMT problem as SAT problem (satisfiability) with special terms. Conjunctions of these terms can be satisfied by special algorithms and heuristics. Examples of SMT-solvers are Yices [2], Z3 [3], CVC3 [1]. This article uses bit-vector theory for constraints described tagsets and registers values relations.

## 2.4 Limitations of the approach

There are some requirements to the testing process which are took into account and efficiently used in proposing approach.

- 1. known initial state of the cache and TLB; this is a key requirement of the approach; initial state is used for initial values of registers generation;
- hits in Joint TLB are considered only; misses (i.e., absence a TLB line for virtual address) requires hardware or software refilling programs, so this test situations are very hard for constraints definition;
- 'Cached Mapped' segments of virtual memory only; another segments may require more complex constraints because of number of cases growth;
- L1 cache only; L2 cache can contain data and instructions; so cache with only type of contents is considered.

## 3 Test case generation

Test case is a test program. Each test program consists of initialization instructions, body instructions and finalization instructions. Initialization instruction prepare microprocessor to the appropriate state. Then body instructions are executed (they are defined in test template). And finalization instructions check state of microprocessor after body instructions execution (for example, check values for registers).

The aim of proposing algorithm is generation initial values for registers using solution of SMT-constraints. Generated initial values are used for initialization instructions generation. SMT-constraints describe relations on registers values. Each instruction can change values of registers and cache state. Additional variables should be introduced to describe changes of cache state. These variables are tagsets. They don't present cache as it is (by cells). Instead of this constraints on tagsets describe relations between virtual addresses of instructions. Registers are used to get virtual addresses (as second arguments of LOAD and STORE) and values loaded or stored to memory (as first arguments of LOAD and STORE).

The following constraints present cache behavior for simple test templates only, i.e. templates with no more than w+1 misses (and unlimited count of hits). w is associativity of cache.

## 3.1 Tagsets constraints

Lets L is set of all tagsets from initial cache state and PFN is set of all physical frame number fields of TLB

lines from initial TLB state. Lets  $LP = \{x | x \in L \land x_{< pfn\ bits>} \in PFN\}$ . Then walk through given test template and collect constraints from test situations by the following way:

- 1. for "l1Hit(x)":  $x \in LP \cup \{x_1,...,x_n\}$ , where  $x_1,...,x_n$  are all previous displacing tagsets; algorithm returns "no solution", if  $LP \cup \{x_1,...,x_n\} = \emptyset$ ;
- 2. for "l1Miss(x)":  $x \notin \{x_1,...,x_n\} \land (x \notin LP \lor \bigvee_{\lambda \in LP} (x = \lambda \land LRU(\lambda,x_1,...,x_n))$ , where  $x_1,...,x_n$  are all previous displacing tagsets.

Constraint  $LRU(\lambda, x_1, ..., x_n) = \bigvee_{\{y_1, ..., y_m\} \subseteq \{x_1, ..., x_n\}} \bigwedge_{t \in \{x_1, ..., y_m\}} TS(t, \lambda, x_1, ..., x_n, y_1, ..., y_m)$ , where  $\lambda = \lambda_m$  (m is a number of  $\lambda$  in its set,  $\lambda_1$  is the youngest element of set by LRU order). Constraint TS is the following:

$$TS(t,\lambda,x_1,...,x_n,y_1,...,y_m) = \begin{cases} R(t) \neq R(\lambda), & \text{if } t: 11 \text{Miss} \land t \notin \{y_1,...,y_m\} \\ R(t) = R(\lambda), & \text{if } t: 11 \text{Miss} \land t \in \{y_1,...,y_m\} \\ t \notin \{\lambda_{m+1},...,\lambda_w\}, & \text{if } t: 11 \text{Hit} \land t = x_i \land y_1 = x_j \land i < j \\ t \in \{\lambda_{m+1},...,\lambda_w\} & \text{if } t: 11 \text{Hit} \land t = y_p \\ \setminus \{y_1,...,y_p\}, \\ t \notin \{\lambda_{m+1},...,\lambda_w\} & \text{if } t: 11 \text{Hit} \land t = x_i \land x_i \notin \{y_1,...,y_m\} \\ \setminus \{y_1,...,y_p\}, & \land y_p = x_j \land y_{p+1} = x_{j'} \land y_m = x_q \\ & \land j < i < j' \land i < q \end{cases}$$

 $t: 11 \mathrm{Hit}$  (or  $11 \mathrm{Miss}$ ) means corresponded test situation of instruction with tagset t.

Constraint  $LRU(\lambda, x_1, ..., x_n)$  has other representation using usefulness predicates. Tagset is useful, if it helps to displace tagset  $\lambda$  (otherwords, if it does this tagset more older by means of LRU). Constraint  $LRU(\lambda, x_1, ..., x_n)$ is satisfied, if tagset  $\lambda$  is displaced before current instruction (and current instruction uses  $\lambda$  as displacing tagset). So constraint LRU requires all tagsets before the closest cache miss before current instruction, hit tagsets and displacing tagsets. So, this constraint can be represented as follows:  $LRU(\lambda, x_1, ..., x_n) = (\sum u(x_i))$ w-m), where  $x_i$  is a tagset (not only displacing tagset) before the closest cache miss before the current instruction, m is a number of  $\lambda$  in its set, w is set size (i.e., cache associativity), and  $u(x_i)$  is a usefulness predicate for  $x_i$ . Usefulness predicates can be represented as follows:  $u(x_i) = (x_i \in \{\lambda_{m+1}, \lambda_{m+2}, ..., \lambda_w\} \land x_i \notin$ hit tagsets from  $\{x_1,...,x_{i-1}\}\)$ , if  $x_i$  is hit tagset, and  $u(x_i) = (R(x_i) = R(\lambda)), \text{ if } x_i \text{ is displacing tagset.}$ This representation is simpler but requires linear inequalities support in solver.

These constraints describe cache behavior without exponential selection all values of any variable. And often LP is not large. Proposed algorithm is correct, full and defines fast way to get tagsets.

#### 3.2 Register constraints

Additional constraints should be extracted from test template (TsLen is bit-length of tagsets, PfnLen is bit-length of physical frame number field of TLB line, OfsLen is offset bit-length of virtual address):

- 1. low bits of tagset equal to the high bits of virtual address's offset for each instruction "LOAD/STORE x,y" with tagset ts:  $ts_{< TsLen-PfnLen-1..0>} = y_{< OfsLen-1..OfsLen-TsLen+PfnLen+1>};$
- 2. high bits of virtual address correspond to high bits of tagset (physical frame number bits) for each instruction "LOAD/STORE x, y" with tagset ts; these constraints depend on TLB structure; for example,  $y_{< range\ bits>} = line.r \land y_{< vpn\ bits>} = line.vpn$ , where line is TLB line with  $line.pfn = ts_{< TsLen-1} ... TsLen-PfnLen>$ ;
- 3. high bits of virtual address correspond to the segment of virtual memory;
- 4. for each pair "STORE  $x_1, y_1$ " and "LOAD  $x_2, y_2$ " (LOAD follows from STORE)  $phys_1 = phys_2 \wedge phys_1 \notin P \rightarrow x_1 = x_2$ , where  $phys_i = ts_i||y_{i < OfsLen-TsLen+PfnLen..0}\rangle$  (physical addresses),  $i \in \{1,2\}$ , P is set of physical addresses of STORE instructions between these "STORE" and "LOAD" instructions;
- 5. for each pair "LOAD  $x_1, y_1$ " and "LOAD  $x_2, y_2$ " (the second LOAD follows from the first LOAD)  $phys_1 = phys_2 \wedge phys_1 \notin P \rightarrow x_1 = x_2$ , where  $phys_i = ts_i||y_{i < OfsLen-TsLen+PfnLen ... 0>}$  (physical addresses),  $i \in \{1,2\}$ , P is set of physical addresses of STORE instructions between these "LOAD" instructions.

## 3.3 SMT representation of constraints

All generated constraints can be (and should be) represented by constraints on bit-vectors (using bit-extraction and equality relation) and efficiently solved by SMT-solver (we use Yices). *LRU* constraint was represented using usefulness predicates.

## 4 Conclusions

The article has presented the algorithm for generation test programs used in cache-memory testing. The algorithm works well for simple test templates, i.e. test templates with no more than w+1 instructions with cache misses (w is associativity of cache). Further it is planned to increase the

algorithm for test situations with uncached or unmapped access and add possibility to offer changes of cache-memory before test program execution to cover more test templates.

#### References

- C. Barrett and C. Tinelli. CVC3. In W. Damm and H. Hermanns, editors, Proceedings of the 19<sup>th</sup> International Conference on Computer Aided Verification (CAV '07), volume 4590 of Lecture Notes in Computer Science, pages 298–302. Springer-Verlag, July 2007. Berlin, Germany.
- [2] B.Dutertre and L. de Moura. The yices smt solver. *Tool paper: http://yices.csl.sri.com/tool-paper.pdf*, 2006.
- [3] L. de Moura and N.Bjørner. Z3: An efficient smt solver. Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2008.
- [4] F. Fallah and K. Takayama. A new functional test program generation methodology. *Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors*, pages 76–81, 2001.
- [5] A. Kamkin. Combinatorial model-based test program generation for microprocessors. *Preprint 21 of Institute for System Programming of the Russian Academy Sciences*, pages 19–34, 2008.
- [6] D. Kroening and O. Strichman. Decision Procedures an algorithmic point of view. Springer, 2008.
- [7] M.Beardo, F.Bruschi, F.Ferrandi, and D.Sciuto. An approach to functional testing of vliw architectures. *Proceedings of the IEEE International High-Level Validation and Test Workshop* (HLDVT'00), pages 29–33, 2000.
- [8] M.Behm, J.Ludden, Y.Lichtenstein, M.Rimon, and M.Vinov. Industrial experience with test generation languages for processor verification. *Proceedings of the 41st Design Automa*tion Conference (DAC04), 2004.