# Multi-Core Model Checking and Maximum Satisfiability Applied to Hardware-Software Partitioning

# Alessandro Trindade, Hussama Ismail

Federal University of Amazonas

E-mail: {alessandro.b.trindade, hussamaismail}@gmail.com

# Edilson Galvão, Renato Degelo

Federal University of Amazonas

E-mail: {rdegelo, esj.galvao}@gmail.com

# Helder Silva, and Lucas Cordeiro

Federal University of Amazonas

E-mail: prhsilva2012@gmail.com, lucascordeiro@ufam.edu.br

Abstract: We present an alternative approach to solve the hardware and software partitioning problem, which uses Bounded Model Checking (BMC) based on Satisfiability Modulo Theories (SMT) in conjunction with a multi-core support using Open Multi-Processing. In a nutshell, the multi-core SMT-based BMC approach allows initializing many verification instances based on the number of available processing cores. Each instance checks for a different optimum value until the optimization problem is satisfied. We implement our algorithms on top of the Efficient SMT-Based Context-Bounded Model Checker (ESBMC) and also integrate the maximum satisfiability solver  $\nu Z$  tool into ESBMC. We compare all our approaches to another state-of-the-art optimization tool (Matlab). Experimental results show that there is no single optimization tool to solve all hardware-software partitioning benchmarks; however, Matlab and ESBMC- $\nu Z$  are the most efficient ones to solve hardware-software partitioning problems, although multi-core ESBMC had a significant performance improvement in particular cases.

**Keywords:** hardware-software co-design, hardware-software partitioning, optimization, model checking, multi-core, OpenMP.

**Reference** to this paper should be made as follows: Rodríguez Bolívar, M.P. and Senés García, B. (xxxx) 'The corporate environmental disclosures on the internet: the case of IBEX 35 Spanish companies', *International Journal of Metadata, Semantics and Ontologies*, Vol. x, No. x, pp.xxx–xxx.

Biographical notes: Manuel Pedro Rodríguez Bolívar received his PhD in Accounting at the University of Granada. He is a Lecturer at the Department of Accounting and Finance, University of Granada. His research interests include issues related to conceptual frameworks of accounting, diffusion of financial information on Internet, Balanced Scorecard applications and environmental accounting. He is author of a great deal of research studies published at national and international journals, conference proceedings as well as book chapters, one of which has been edited by Kluwer Academic Publishers.

Belén Senés García received her PhD in Accounting at the University of Granada. She is a Lecturer at the Department of Accounting and Finance, University of Granada. Her research interests are related to cultural, institutional and historic accounting and in environmental accounting. She has published research papers at national and international journals, conference proceedings as well as chapters of books.

Both authors have published a book about environmental accounting edited by the Institute of Accounting and Auditing, Ministry of Economic Affairs, in Spain in October 2003.

#### 1 Introduction

Nowadays, with the strong development of embedded systems, the design phase plays an important role. At early stages, the design is split into separated flows: hardware and software. The partitioning decision process, which deals with decisions upon which parts of the application have to be designed in hardware (HW) and which one in software (SW), must be supported by any well-structured methodology. If there is no methodology support, a number of issues, e.g., design flow interruptions, redesigns, and undesired iterations may affect the overall development process, the quality, and the life-cycle of the final system.

Starting at the 1990s, intensive research was performed in HW-SW partitioning, and several approaches proposed, as shown in ? and ?. In ?? was shown that it is possible to use Bounded Model Checking (BMC) based on Satisfiability Modulo Theories (SMT), implemented in a tool called Efficient SMT-Based Context-Bounded Model Checker (ESBMC), in order to perform HW-SW partitioning in embedded systems. The present work extends those studies since there is a substantial improvement in terms of the SMT-based verification method, which has been extended with a multi-core parallel- and binary-search approaches, as well as, the integration of the Maximum SMT (MaxSMT) solver  $\nu Z$  into ESBMC, which is a state-of-art optimization tool based on SMT ?.

Here, we exploit the availability of multi-core processors; in particular, SMT-based verification methods are applied to the HW-SW partition problem in three different ways using a multi-core ESBMC approach with OpenMP ?: ESBMC-SS using a sequential-search (SS), ESBMC-PS using a parallel-search (PS), and ESBMC-PB using a binary-search (BS). Experimental results are compared to ILP (integral linear programming), GA (generic algorithms) in a multi-core version, and also to  $\nu Z$ , which supports only a single-core approach ?. The ILP and GA algorithms are implemented with the optimization toolbox of Matlab ?, while  $\nu Z$  is a built-in tool to the SMT solver Z3. All multi-core ESBMC approaches, together with  $\nu Z$ , are implemented with the ESBMC tool ?.

#### 1.1 Contributions

The main contribution of the present study is to describe and evaluate a comprehensive SMT-based BMC approach in a multi-core architecture applied to solve HW-SW optimization problems. Additionally, we integrate the MaxSMT solver  $\nu Z$  into an off-the-shelf BMC tool, which is typically used for software verification, in order to formulate and solve optimization problems within the logical context of constraints. Experimental results show that multi-core model-checking techniques can be effective, in particular cases, to find the optimal solution of the HW-SW partitioning problem using an SMT-based BMC approach. Although

there is no single tool for efficiently solving all HW-SW partitioning problems, we show that the MaxSMT solver  $\nu Z$  is faster than other state-of-the-art optimization tools for small- and medium-size optimization problems. To the best of our knowledge, this is the first work to use a multi-core SMT-based verification and a MaxSMT solver to check for HW-SW partitioning problems in embedded systems.

#### 1.2 Availability of Data and Tools

Our experiments are based on a set of publicly available benchmarks. All benchmarks, tools, and results of our evaluation are available on a supplementary web page<sup>1</sup>.

This article is organized as follows: Section 2 gives a background on optimization techniques,  $\nu Z$ , ESBMC, and OpenMP tools. Section 3 describes the informal and formal mathematical modeling. The SMT-based BMC method is presented in Section 4, and in particular, Section 4.6 presents the partitioning model using  $\nu Z$ . In Section 5, we show the experimental results using several embedded systems applications. In Section 6, we discuss the related work and we conclude and describe future work in Section 7.

## 2 Background

The HW-SW partitioning problem is typically represented as a set of constraints and an objective function in linear programming. We describe the linear programming problem and present related tools that are used to model and solve the HW-SW partitioning problem.

#### 2.1 Optimization

Optimization is the act of obtaining the best result (i.e., the optimal solution) under given circumstances ?. There is no single method available for efficiently solving all optimization problems ?. The most well-known technique is linear programming, which is an method applicable for the solution of problems in which the objective function and the constraints appear as linear functions of the decision variables. A particular case of linear programming is ILP, in which the variables can assume just integer values. Eq. 1 shows a typical linear programming problem, where A and b are vectors or matrixes that describe the constraints

$$minf^{t}x \ such \ that = \begin{cases} A.x \le b, \\ Aeq.x = beq, \\ x \ge 0. \end{cases}$$
 (1)

In some cases, the time to find a solution using ILP is impractical. Even with the use of powerful computers, a problem can take hours before an optimal solution is reached. If the optimization problem is complex, some

heuristics can be used to solve the same problem faster, e.g., those used in the GA?. The only drawback is that the found solution may not be the global minimum or maximum. Alternatively, tools such as ESBMC and  $\nu Z$  can be used to solve optimization problems so that the global minimum or maximum solution is found. The following sections describe the main features of ESBMC and  $\nu Z$  tools.

## 2.2 Bounded Model Checking with ESBMC

Among the recent model checking techniques, there is one that combines model checking with satisability solving. This technique, known as bounded model checking (BMC), does a very fast exploration of the state space, and for some types of problems, it offers large performance improvements over previous approaches, as shown in ?. In particular, BMC based on Boolean Satisfiability (SAT) has been introduced as a complementary technique to binary decision diagrams for alleviating the state explosion problem ?.

The basic idea of BMC is to check the negation of a given property at a given depth: given a transition system M, a property  $\phi$ , and a bound k, BMC unrolls the system k times and translates it into a verification condition (VC)  $\psi$  such that  $\psi$  is satisfiable if and only if  $\phi$  has a counterexample of depth k or less?. To cope with increasing software complexity, SMT solvers can be used as back-ends for solving the generated VCs, as shown in ?, ?, ?.

In this study, ESBMC has been used as a BMC tool to solve HW-SW partitioning problems ?. In particular, there are two directives in ESBMC that can be used to guide it to solve an optimization problem: ASSUME and ASSERT. The directive ASSUME is responsible for ensuring the compliance of constraints (software costs), and the directive ASSERT controls the halt condition (minimum hardware cost). Then, with some C/C++ code, it is possible to guide ESBMC to solve optimization problems.

#### 2.2.1 ESBMC Architecture

Fig. 1 shows the current ESBMC architecture, which consists of the C/C++ parser, GOTO Program, GOTO Symex, and SMT solver ?. In particular, ESBMC compiles C/C++ code into equivalent GOTO-programs (i.e., control-flow graphs) using a gcc-compliant style. GOTO-programs can then be processed by the symbolic execution engine, called GOTO Symex, where two recursive functions compute the constraints (C) and properties (P); finally, it generates two sets of equations (i.e.,  $C \land \neg P$ ), which are checked for satisfiability by an SMT solver.

The main factor for ESBMC to use only a single-core relies on its back-end (*i.e.*, SMT Solver). Currently, the SMT solvers supported by ESBMC are: Z3?, Boolector?, MathSAT?, CVC4?, and Yices?. Most of them do provide neither multi-threaded support nor a parallel version to solve the generated SMT equations.



Figure 1: ESBMC architecture.

### 2.3 OpenMP

The OpenMP is a set of directives for parallel programming that augments C/C++ and Fortran languages ?. OpenMP supports most processor architectures and operating systems, e.g., Solaris, AIX, HP-UX, Linux, Mac OS X, and Windows. OpenMP uses a portable and very robust model to facilitate the development of parallel applications for a variety of platforms.

In particular, OpenMP uses the *fork-join* model of parallel execution ?. The main thread executes the sequential parts of the program; if a parallel region is encountered, then it forks a team of worker threads. After the parallel region finishes (*i.e.*, the API waits until all threads terminate), then the main procedure returns to the single-threaded execution mode ?.

The most basic directive of OpenMP is the "#pragma omp parallel for", which parallelizes the enclosing loop; a basic OpenMP example is shown below:

```
int k;
2 #pragma omp parallel for
3 for (k = 0; k < 10; k++)
4 a[k] = 2*k;</pre>
```

Algorithm 1: OpenMP basic Example.

In the above example, the *for* loop is executed in parallel. Each iteration of the loop is executed in a separated thread; and each thread may use an idle processor. There is also a way to specify critical regions, which is a code block that is guaranteed to be executed by a single thread at a time. To create a critical region, the "#pragma omp critical" directive is routinely used.

## 2.4 Solving Optimization Problems with $\nu Z$

In this study, the SMT solver Z3 is used to check for the satisfiability of formulas generated from the HW-SW partitioning problem ?. In particular, we exploit the use of MaxSMT solver  $\nu Z$ , which is implemented on top of the SMT solver Z3, in order to solve optimization problems;  $\nu Z$  base function is to optimize objective functions, which formulate optimized criteria, within the logical context of constraints.  $\nu Z$  also includes an incremental version of the Maximum Resiliency (MaxRes) ?, in order to achieve Maximum Satisfiability (MaxSAT) ? and a Simplex to solve numbers without defined patterns.

In  $\nu Z$ , MaxSAT is responsible for the restrictions, while OptSMT optimizes linear arithmetic objectives ?. In summary,  $\nu Z$  provides three main functions that extend Z3 for solving optimization problems, which are: maximize, minimize, and assert-soft.

- maximize(T) this function informs to the solver that a given variable T should be maximized, which includes real, integer, or bit-vector variables.
- minimize(T) this function informs the solver that a given variable T should be minimized, the accepted types are the same as maximize function.
- Assert-Soft F: weight n the function assertsoft adds a restriction to F, which can also add a weight n; the default value is 1.

As an example, one can optimize (K+W), which is subject to restrictions in (K<2) and (W-K<1). The expected result of this optimization problem described in the code below is 2. In fact, the model generated by  $\nu Z$  shows that K=1 and W=1.

```
1 (Declare-Const K Int)
2 (Declare-Const W Int)
3 (assert (< K 2))
4 (assert (< (- W K) 1))
5 (maximize (+ K W))
6 (check-sat)
```

Algorithm 2: OpenMP basic Example.

Fig. 2 shows the  $\nu Z$  architecture. Initially, the SMT formula with objectives is converted to 0-1 constraints, which leads to a Pseudo-Boolean Optimization (PBO) ??. If there are many objective functions,  $\nu Z$  invokes OptSAT for arithmetic or MaxSAT for soft constraints. For constraints using real values,  $\nu Z$  combines linear arithmetic objectives and uses only one instance of OptSMT. When "soft constrains" is used in the mode "lexicographic",  $\nu Z$  invokes MaxSAT using multiple calls for its engine.



Figure 2:  $\nu Z$  architecture extracted from ?.

Z3 is available for platforms in C, C++, Java, .NET, and Python; it is possible to download Z3 with  $\nu$ Z from its github repository ?. In this work, the python API is used to formulate HW-SW partitioning problems using the  $\nu$ Z tool.

### 3 Mathematical modeling

The mathematical modeling of the HW-SW partitioning problem was taken from ??.

## 3.1 Informal Model (or Assumptions)

The informal model can be described by five characteristics. First, there is only one software context, *i.e.*, there is just one general-purpose processor, and there is only one hardware context. The components of the system must be mapped to either one of these two contexts. Second, the software implementation  $\,$ of a component is associated with a software cost, which is the running time of the component. Third, the hardware implementation of a component has a hardware cost, which can be area, heat dissipation, and energy consumption. Fourth, based on the premise that hardware is significantly faster than software, the running time of the components in hardware is considered as zero. Finally, if two components are mapped to the same context, then there is no overhead of communication between them; otherwise, there is an overhead. The consequence of these assumptions is that scheduling does not need to be addressed in this work. Hardware components do not need scheduling, because the running time is assumed to be zero. Because there is only one processor, software components do not need to be scheduled as well. Therefore, the focus is only on the partitioning problem. That configuration describes a first-generation co-design, where the focus is on bipartitioning ?.

### 3.2 Formal Model

The inputs of the problem are: a directed simple graph G = (V, E), called the task graph of the system, is necessary. The vertices  $V = \{V_1, V_2, \ldots, V_n\}$  represent the nodes that are the components of the system that will be partitioned. The edges E represent communication between components. Additionally, each node  $V_i$  has a cost  $h(V_i)$  (or  $h_i$ ) of hardware if implemented in hardware and a cost  $s(V_i)$  (or  $s_i$ ) of software if implemented in software. Finally,  $c(V_i, V_j)$  represents the communication cost between  $V_i$  and  $V_j$  if they are implemented in different contexts (hardware or software).

Based on ?, is called a hardware-software partition if it is a bipartition of  $V: P = (V_h, V_s)$ , where  $V_h \cup V_s = V$  and  $V_h \cap V_s = 0$ . The crossing edges are  $E_p = \{(V_i, V_j): V_i \in V_s, V_j \in V_h \text{ or } V_i \in V_h, V_j \in V_s.$  The hardware cost of P is given by Eq. 2

$$H_p = \sum_{V_i \in V_h} h_i, \tag{2}$$

and the software cost of P (i.e., software cost of the nodes and the communication cost) is given by Eq. 3

$$S_p = \sum_{V_i \in V_s} s_i + \sum_{(V_i, V_i) \in E_p} c(V_i, V_j). \tag{3}$$

Three different optimization and decision problems can be defined. In this paper, the focus is on the case that  $S_0$  is given, *i.e.*, to find a P HW-SW partitioning so that  $S_p \leq S_0$  and  $H_p$  is minimal, which is thus related to system with hard real-time constraints. Based on Eq. 1 and 3, the constraints can be reformulated as

$$s(1-x) + c|Ex| \le S_0, \tag{4}$$

where x represents the decision variable. Concerning the complexity of this problem, ? demonstrate that it is NP-Hard ?.

#### 4 Analysis of the partitioning problem

As computer hardware architecture moves from single-to multi-cores, parallel programming environments should be exploited to take advantage of the ability to run several threads on different processing cores. This section describes the verification algorithm using sequential ESBMC, followed by three multi-core model checking algorithms and the integration of the MaxSMT solver  $\nu$ Z into ESBMC, in order to speed up the HW-SW partitioning verification. HW-SW partitioning using ILP-based and Genetic Algorithms are also explained.

# 4.1 Partitioning problem using ILP-based, Genetic Algorithms

The ILP and GA were taken from our previous studies  $\ref{Mather Mather Mather$ 

# 4.2 Verification Algorithm using Sequential ESBMC

Algorithm 3 shows ESBMC pseudocode with the same constraints and conditions placed on ILP and GA. Two values must be controlled to obtain the results and to perform the optimization. One is the initial software cost, as defined in Section 3.2. The other is the halting condition (code violation) that stops the algorithm.

The ESBMC algorithm starts with the declarations of hardware, software, and communication costs.  $S_0$  must also be defined, as the transposed incidence matrix and

the identity matrix, as typically done in MATLAB. Here, matrices A and b are generated. At that point, the ESBMC algorithm starts to differ from the ILP and GA presented in ?.

It is possible to inform to ESBMC with which type of values the variables must be tested. Therefore, there is a declaration to populate all decision variables x with non-deterministic Boolean values. Those values that change for each test will generate a possible solution and obey the constraints. If this is achieved, then a feasible solution is found and the ASSUME directive is responsible for ensuring the compliance of those constrains (i.e.,  $A.x \leq b$ ).

A loop controls the cost of hardware hint, starting with zero and reaching the maximum value considering the case, where all nodes are partitioned to hardware. To every test performed, the hardware hint is compared to the feasible solution. This is accomplished by an ASSERT statement at the end of the algorithm, a predicate that controls the halt condition (a true-false statement). If the predicate is FALSE, then the optimization is finished, i.e., the solution is found.

The ASSERT statement tests the objective function, i.e., the hardware cost, and will stop if the hardware cost found is lower than or equal to the optimal solution. However, if ASSERT returns a true condition, i.e., the hardware cost is higher than the optimal solution, then the model-checking algorithm restarts and a new possible solution is generated and tested until the ASSERT generates a false condition. When the false condition happens at verification-time, the execution code is aborted and ESBMC presents the counterexample that caused the condition to be broken. That is the point in which the solution is presented (minimum HW cost).

In the ESBMC algorithm, which is shown below, it is not necessary to add slack variables, because the modulus operation is kept, which reduces the number of variables to be solved.

```
Initialize variables
  Declare number of nodes and edges
  Declare hardware cost of each node as array
  Declare software cost of each node as array
  Declare communication cost of each edge (c)
  Declare the initial software cost of (S_0)
  Declare transposed incidence matrix graph G(E)
  Define the solutions variable (X_i) as Boolean
  main {
   For TipH = 0 to Hmax Do  {
10
    Populate X_i with nondeterministic / test values
11
    Calculate s(1-x)+c|Ex| and store at variable
    Requirement is used by Assume (Variable \leq S_0)
    Calculate Hp cost Based on value tested of X_i
     Violation check with Assert(Hp > TipH)
   }
```

Algorithm 3: Pseudocode describing sequential ESBMC.

# 4.3 Multi-core ESBMC with OpenMP (ESBMC-SS)

Typically, ESBMC verification runs are performed only in a single-core. If the processor provides 8 processing cores, only one is used for the verification and the others remain idle. Thus, there is a significant unused hardware resource during this process.

To optimize the CPU resources utilization without modifying the underlying SMT solver, the Open Multi-Processing (OpenMP) library? is used in this present work as a front-end for ESBMC. Fig. 3 shows our first approach called ESBMC sequential-search "ESBMC-SS".



Figure 3: ESBMC-SS approach.

ESBMC-SS obtains the problem specification represented by a ANSI-C program. The HW-SW partitioning is violated, when the correct optimum value (TipH) parameter is reached; ESBMC-SS starts a parallel region with different instances of ESBMC, based on the number of available processing cores. All these ESBMC instances run independently of each other, as shown in Fig. 3. Note that there is no shared-memory (or message-passing) mechanism among threads. In particular, different threads are managed by the OpenMP API, which is responsible for the thread life-cycle: start, running, and dead states, using different values as condition. After executing N instances, if there is no code violation, then ESBMC-SS starts new instances again; this represents a sequential-search on a multi-core environment. During the parallel region execution, if a violation is found in any running thread, then it presents a counterexample with the violation condition and the verification time. If all threads of the batch processing are terminated, then ESBMC-SS finishes its execution.

# 4.4 Multi-core ESBMC with OpenMP using Workers (ESBMC-PS)

The previous parallelization is implemented by continuously forking ESBMC instances in a sequential manner until the first violation is found. However, since OpenMP only returns from a parallelized loop, when every forked thread finishes, some processing cores could remain idle for some period of time.



Figure 4: ESBMC-PS approach.

Consequently, the second approach aims to remove the idle time from the parallel loops, by creating workers inside threads so that the next step is immediately executed if there is a processing core available, as shown in Fig. 4. This approach could potentially lead to great performance improvements, but as ESBMC checks for each step almost at the same rate, the processor does not remain idle for a longer period and thus there is almost no optimization.

## 4.5 Multi-core ESBMC with OpenMP using Binary Search (ESBMC-PB)

The most optimized approach applies a parallelized binary-search to reduce the amount of steps to be executed in order to find the optimal solution. A controller is designed to return the step to be executed so that the number of verification runs are substantially reduced. The parallelized binary search accomplishes this by splitting the domain of possible values into intervals and then by returning the middle of the largest interval so that two new intervals are created.

As an example, given a problem of domain from 1 to 20 (see Fig. 6), we firstly create an initial interval from 1 to 20. When the next available core requests a step to be executed, the controller obtains the largest interval, *i.e.*, [1, 20], divides it by two, which creates two new intervals (*i.e.*, [1, 9] and [11, 20]), and returns the middle of the original interval (*i.e.*, 10). The controller also checks



Figure 5: ESBMC-PB approach.

| 01 | 02 | 03 | 04 | 05  | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|----|----|----|----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 01 |    |    |    |     |    |    |    | 09 | 10 | 11 |    |    |    |    |    |    |    |    | 20 |
|    |    |    |    | 0.5 |    |    |    |    | 1  |    |    |    |    | 1  |    |    |    |    |    |
| 01 |    | 03 | 04 | 05  |    |    |    | 09 | J  | 11 |    |    | 14 | 15 | 16 |    |    |    | 20 |
| 01 | 02 | 03 |    | 05  | 06 | 07 |    | 09 | 1  | 11 | 12 | 13 | 14 | ]  | 16 | 17 | 18 |    | 20 |

Figure 6: Binary step calculation.

whether an interval has less than two elements to avoid creating empty or invalid intervals.

Note that there might gaps between steps, which are produced by the customized binary-search. For instance, in the example shown in Fig. 6, if step 10 returns false, then one can conclude that all steps after 10 is false as well. However, if the same step 10 returns true, we can assume that all steps before 10 is true as well. As a result, an auxiliary method to remove unnecessary steps is implemented in the controller by removing or shrinking existing intervals. This approach leads to a high impact in the verification time. However, if a step is running and is not needed anymore, the worker kills the forked process and starts a new one.

Algorithm 4 describes how the customized binary search calculates and returns the step to be executed.

```
GetNextStep(){
     int largestChunk = -1;
     chunk largest;
     for each (chunk in chunks) {
         largestChunk = chunk.right - chunk.left;
         largest = chunk;
      }
     }
9
     chunks.remove(largest);
10
     int median = largest.left + floor((largest.righommunication.dosts,) where the result should be less
11
     if (median > 0)
12
       if (largest.right - largest.left > 1)
13
14
       if(largest.right != largest.left)
15
         chunks.add(new chunk(median + 1, largest.right);
16
```

```
return median;
18
  }
19
```

Algorithm 4: Steps calculation using intervals.

Note that Algorithm 4 is called from each worker in order to get the next step to execute if it exists; otherwise, either zero or a negative number is returned. From lines 4 to 9, the algorithm finds the largest interval. Then, from line 10 the largest interval is removed and the median is calculated in line 11. After that, two new intervals are created, the left side (in line 14) and the right side (in line 16). At the end, the median is returned.

```
step = controller.GetNextStep();
int pid = ExecuteStep(step);
while (isRunning (pid)) {
  if (!controller.isNeeded(step))
    kill (pid);
}
```

Algorithm 5: Worker sample.

Algorithm 5 describes how the worker starts and monitors ESBMC instances. The algorithm starts by retrieving the step to be executed from the controller (line 1), then initiates the ESBMC instance and obtains the process id from the forked process (line 2). While the step is being executed, the controller checks whether this step is still needed (line 4). If not, then the ESBMC instance is killed (line 5) and the worker is free to initiate another step.

## 4.6 Analysis of the partitioning problem using $\nu Z$ $(ESBMC-\nu Z)$

Algorithm 6 encodes the objective function and constraints related to the HW-SW partitioning problem using  $\nu Z$  functions ?. A  $\nu Z$  logical context must firstly be created (line 2), in order to add constraints and to check whether a given model exists to the set of constraints. Note that the number of nodes and edges, software, hardware, and communications costs as well as the incidence matrix E must also be declared.

The arithmetic expressions from lines 10 to 12 represent the constraints described in Eq. 4. Here, variable SC refers to the software cost, while CC denotes the communication cost. In line 12, the Fobj (objective function) is declared, which denotes the product between if (chunk.right - chunk.left > largestChunk) { the hardware cost and the decision variables vector, which contains only Boolean values. Fobj should be minimized to obtain the optimal hardware solution. To achieve this, two constraints are imposed to ESBMC- $\nu$ Z: the first one refers to the sum of the software and than  $S_0$ ; and the second one informs to ESBMC- $\nu$ Z that Fobj should be minimized. Finally, the model is checked chunks.add(new chunk(largest.left, median by  $ESBMC-\nu Z$  and if there is a solution that meets the constraints, then the *Fobj* value is provided.

```
Initialize Variables
     Create vZ context
     Create binary vector (x)
     Declare number of nodes, edges and S_0
     Declare hardware cost of each node as array
     Declare software cost of each node as array
     Declare communication cost of each edge (c)
     Declare transposed incidence matrix graph G(E) ones?.
     Arithmetic Expressions
9
    SC = s(1-x)
10
    CMC = c * |EX|
11
     Fobj = x[i] * h[i]
12
     Assert Constraints
13
     Add constraints (SF + CMC \leq S<sub>0</sub>)
14
     Add constraints to minimize Fobj
15
     Check Model
16
     Print Result
17
```

Algorithm 6: Pseudocode describing ESBMC- $\nu$ Z.

### 5 Experimental Evaluation

This section is split into three parts. The setup is described in Section 5.1, while Section ?? describes all benchmarks that were used for performing the experimental evaluation. Section ?? reports a comparison among Matlab ?, ESBMC-SS, ESBMC-PS, ESBMC-PB, and ESBMC- $\nu$ Z using a set of standard HW-SW partitioning benchmarks ?.

#### 5.1 Experimental Setup

ESBMC v2.0 running on a 64-bit Ubuntu 14.04.1 LTS operating system was used. A parallel approach of the ESBMC-SS, ESBMC-PS, ESBMC-PB were implemented in C++11. Version 2.0.1 of Boolector SMT-solver ? (freely available) was used as the default solver for ESBMC. ESBMC- $\nu$ Z as a built-in tool to Z3 was also used ?. For ILP and GA formulations, MATLAB R2013a from MathWorks with Parallel Computing Toolbox was used ?. MATLAB is a dynamically typed high-level language, known as the state-of-the-art mathematical software ? and is widely used by the engineering community ?.

All experiments were conducted on an otherwise idle Intel Core i7-2600 (8-cores), with 3.4 GHz and 24 GB of RAM, running Ubuntu 64-bits. Each time was measured 3 times (average taken). Based on standard deviation and tolerance interval to each set of time sample, it was obtained a statistical confidence of 91.7% to ESBMC (sequential, SS, PB and  $\nu$ Z), 95.9% to ESBMC-PS, and 92.0% to ILP and GA. A timeout condition (TO) is reached when the verification time is longer than 3600 seconds. A memory-out (MO) occurs when the tool reaches 15 GB of memory.

#### 6 Related Work

Since the second half of the first decade of 2000s, three main paths have been tracked to improve or to present (halternative solutions to the optimization of HW-SW (spartitioning, *i.e.*, to find the exact solution?, to use heuristics to speed up performance time?, and hybrid E) ones?

In the first group, the exact solution to the HW-SW partitioning problem is found. The use of SMT-based verification presented in this paper can be grouped into this category, because the exact solution is found with the given algorithms. The difference is based only on the technique chosen to solve the problem. Another path followed in past initiatives and which has had more studies is the creation of heuristics to speed up the running time of the solution. The difference of this kind of solution to SMT-based verification and maximum satisfiability is based on two facts: all ESBMC implementations guarantee to find the exact solution, but heuristics are faster, when the complexity is greater.

Finally, there are approaches that mixes heuristics with exact solution tools. The idea is to use a heuristic to speed up some phase of an exact solution tool. It worth mentioning that the final solution is not necessarily an optimal global solution. Only the SMT-based verification is guaranteed to find the exact solution, but hybrid algorithms are faster when complexity rises.

In terms of SMT-based verification, most related studies are restricted to present the model, its modification to programming languages (e.g., C/C++and Java), and the application to multi-thread algorithms or to embedded systems to check for program correctness. In? it presents a bounded model checker for C++ programs, which is an evolution of dealing with C programs and ? uses ESBMC for embedded ANSI-C software. In? and? it was proven that it is possible to use ESBMC to solve HW-SW partitioning in a singleand multi-core way. There are related studies focused on decreasing the verification time of model checkers by applying Swarm Verification ?, and modifications of internal search engines to add support for parallelism? but there is still the need for initiatives related to parallel SMT solvers ?.

Recently, the SMT solver Z3 has been extended to pose and solve optimization problems modulo theories ?. In particular,  $\nu Z$  tool offers substantial performance improvement in optimization problems ??. As an application example, ? propose an approach which considers all possible compiler error sources for statically typed functional programming languages and reports the most useful one subject to some usefulness criterion. The authors formulate this approach as an optimization problem related to SMT and use  $\nu Z$  to compute an optimal error source in a given ill-typed program. The approach described by ?, which uses MaxSMT solver  $\nu Z$ , shows a significant performance improvement if compared to previous SMT encodings and localization algorithms.

The problem addressed in this present paper uses a single objective function for minimization. In ?, OptiMathSAT and  $\nu$ Z are compared for software optimization problems. OptiMathSAT, using multiple objective functions, works better than  $\nu$ Z; however, for problems with a single objective function, the performance of  $\nu$ Z is better than OptiMathSAT.

#### 7 Conclusions

We presented five approaches to solve the HW-SW partitioning problem and compared them to other state-of-the-art techniques. Experimental results showed that for a number of nodes larger than 300, the best solution for the HW-SW partitioning problem is ILP. Below that, the best solution turns out to be ESBMC- $\nu$ Z since its execution time is faster and notorious. ESBMC-PB is a viable alternative for a number of nodes lower than 150. GA had an intermediate result in terms of performance, but the error presented from exact solution made it not acceptable to that kind of application.

If considering off-the-shelf tools, as MATLAB to ILP and GA, the coding is simpler. However, ESBMC and  $\nu$ Z have BSD-Style and MIT licenses, respectively and can be downloaded and used for free. Experimental results also pointed to an improvement of ESBMC, when using a parallel approach. In particular, all three parallel approaches described in this paper produced expressive results. The fastest ESBMC approaches is ESBMC-PB, which produces good results for an intermediate amount of edges and nodes. Thus, considering that nowadays processors have more and more cores, when modeling the problem, it is possible to consider multicore model checking as an alternative to solve the HW-SW partitioning problem.

Finally, there is an issue about 150 nodes problem, since it seems to be the limit of multi-core ESBMC. However, it really depends on the modeling granularity of the problem. Some researchers propose fine-grained models, in which each instruction can be mapped to either HW or SW. This may lead to thousands of nodes or even more. Others defend coarse-grained models, where decisions are made for larger components, thus even complex systems may consist of just some dozens of nodes to partition. In principle, a fine-grained approach may allow to obtain better partitions, but at the cost of an exponential increase of the search space size. In future work, we will address improvements in ESBMC to remove the parallel layer on top of ESBMC and implement it during symbolic execution so that we can optimize the overall verification time.

### References

Myneni, S. and Patel, V.L. (2010) 'Organization of biomedical data for collaborative scientific research: a research information management system',

- International Journal of Information Management, Vol. 30, No. 3, pp.256–264
- Bekelman, J.E., Li, Y. and Gross, C.P. (2003) 'Scope and impact of financial conflicts of interest in biomedical research: a systematic review', *JAMA*, Vol. 289, No. 19, pp.454–65
- Stein, L.D. 2008 'Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges', *Nature Reviews Genetics*, Vol. 9, No. 9, pp.678–688.
- NIH Statement on Sharing Scientific Research Data, http://grants.nih.gov/grants/policy/data\_sharing/
- Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies, http://grants.nih.gov/grants/guide/notice-files/not-od-07-088.html
- Network for Translational Research (NTR): Optical Imaging in Multimodality Platforms, http://imaging.cancer.gov/programsandresources/specializedir
- Piwowar, H., Becich, M., Bilofsky, H. and Crowley, R. (2008) 'PLoS medicine', Sept, No. 9, Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers, Vol. 5.
- Birnholtz, J.P. and Bietz, M.J. (2003) 'Data at work: supporting sharing in science and engineering', *GROUP*, pp.339–348.
- Data Sharing & Intellectual Capital (DSIC) Workspace, https://cabig.nci.nih.gov/working\_groups/DSIC\_SLWG
- Getting Connected with caBIG, https://cabig.nci.nih.gov/ getting\_connected/
- Digital Imaging and Communications in Medicine (DICOM), http://medical.nema.org/
- The Cancer Genome Atlas (TCGA) Data Portal, http://cancergenome.nih.gov/dataportal
- National Biomedical Imaging Archive, https://cabig.nci.nih.gov/tools/NCIA
- caBIG: cancer Biomedical Informatics Grid, http://caBIG.nci.nih.gov/
- Biomedical Informatics Research Network, http://www.nbirn.net/
- Cyberinfrastructure for the Biological Sciences: Plant Science Cyberinfrastructure Collaborative (PSCIC), http://www.nsf.gov/pubs/2006/nsf06594/nsf06594.htm
- MIRC, http://mirc.rsna.org
- Foster, I. and Iamnitchi, A. (2003) 'On death, taxes, and the convergence of peer-to-peer and grid computing', *IPTPS'03*.

- Keidl, M., Kreutz, A., Kemper, A. and Kossmann, D. (2002) 'A publish & subscribe architecture for distributed metadata management', *ICDE*.
- Taylor, N.E. and Ives, Z.G. (2006) 'Reconciling while tolerating disagreement in collaborative data sharing', *SIGMOD*.
- Rader, E.J. and Wash, R. (2008) *CSCW*, pp.239-248, Influences on tag choices in del.icio.us.
- Halevy, A., Rajaraman, A. and Ordille, J. (2006) VLDB, Data integration: the teenage years, http://portal.acm.org/citation.cfm?id=1182635.1164130
- Doan, A. and Halevy, A.Y. (2005) Semantic Integration Research in the Database Community: A Brief Survey, *AI Magazine*, Vol. 26, No. 1, pp.83–94.
- Beynon-Davies, P., Bonde, L., McPhee, D. and Jones, C.B. (1997) 'A collaborative schema integration system', *Comput. Supported Coop. Work*, Vol. 6, No. 1, Norwell, MA, USA, pp.1–18, issn = 0925-9724.
- Wang, F. and Vergara-Niedermayr, C. (2008) 'Collaboratively Sharing Scientific Data', CollaborateCom, pp.805–823.
- Chin Jr., G. and Lansing, C.S. (2008) 'Capturing and Supporting Contexts for Scientific Data Sharing via the Biological Sciences Collaboratory', *CSCW*, ISBN 1-58113-810-5.
- W3C XML Query (XQuery), http://www.w3.org/XML/Query
- Wang, F., Hussels, P. and Liu, P. (2009) 'Securely and flexibly sharing a biomedical data management system', *SPIE*.