# Quantum Circuit Transformation Based on Subgraph Isomorphism and Tabu Search\*

First Aaaaaaauthor  $^{1[0000-ereer1111-2222-3333]},$  Second Author  $^{2,3[1111-2222-3333-4444]},$  and Third Author  $^{3[2222-3333-4444-5555]}$ 

 Princeton University, Princeton NJ 08544, USA
 Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany lncs@springer.com

http://www.springer.com/gp/computer-science/lncs

ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany
{abc,lncs}@uni-heidelberg.de

Abstract. The process of circuit transformation is to find an automatic method to map any logical quantum circuits to physical circuits effectively in an acceptable time, and add as few auxiliary gates as possible. We mainly propose an initial mapping algorithm based on a combined subgraph isomorphism algorithm (CSI) and a circuit transformation algorithm based on Tabu Search (QCTS). Our experimental results show that the algorithm is effective. Compared with the initial mapping based on the VF2 algorithm, auxiliary gates added to our initial mapping are reduced by 22.26%, and the depth of the output circuit is reduced by 11.17%. QCTS is scalable on large-scale circuits, compared with other state-of-the-art algorithms.

**Keywords:** Quantum circuit transformation  $\cdot$  Subgraph isomorphism  $\cdot$  Initial mapping  $\cdot$  Tabu Search

#### 1 Introduction

Quantum technology has been applied in practice, but large quantum computers have not yet been built. Most of the contributions of quantum information to computer science are still in the theoretical stage. In March 2017, IBM developed the first 5-qubit backend called IBM QX2. In June, it launched the 16-qubit backend called IBM QX3. The revised versions of 5-qubit and 16-qubit are called IBM QX4 and IBM QX5, respectively. IBM Q Experience provides the public with free quantum computer resources on the cloud and opens source the quantum computing software framework  $Qiskit^4$ .

The biggest problem facing quantum information is the problem of quantum decoherence. Due to the decoherence problem of qubits, the quantum gates need to complete in a coherent period, and the time of qubits in the coherent state is

<sup>\*</sup> Supported by organization x.

<sup>&</sup>lt;sup>4</sup> https://www.qiskit.org/.

short. The entanglement of the quantum system with the surrounding environment will lead to quantum decoherence. It is unrealistic to use quantum error correction in the circuit mapping process, since there are only dozens of quantum in the NISQ era [16]. It is necessary to transform circuits by adding auxiliary gates to satisfy logical and physical constraints, since quantum algorithms do not consider any hardware connectivity constraints and quantum circuit transformation is an important part of quantum circuit compilation. Thus we require a set of highly efficient and automatic mapping procedures to handle it. We call the circuit mapping adjustment as circuit transformation. The process may introduce many errors, which brings a huge challenge to circuit compilation because noise has a greater impact on the final circuit and may make the result meaningless. The quantum coherence time is short. The longest coherence time of a superconducting quantum chip is still within 10us-100us, the time of a single quantum gate is about 20ns, the time of a 2-qubit gate is about 40ns, and the time of a measurement operation is about 300ns-1us.

Paler proved that the initial mapping has an important influence on quantum circuit transformation [15]. Paler used a heuristic method to find the initial mapping and IBM's compiler to benchmark. Preliminary results show that just by placing qubits in different positions from the default (trivial placement) in the actual circuit instance on the actual NISQ device, the cost can be reduced by up to 10%. In 2018, Li proposed a novel reverse traversal technique, which determines the initial mapping by considering the entire circuit [9]. Zhou proposed an annealing algorithm to find an initial mapping, but it is unstable [23]. In 2020, Li use VF2 subgraph isomorphism algorithm to generate an initial mapping [10].

The goal of circuit transformation algorithm is to find a minimum number of SWAPs. There are currently five main methods for solving the quantum circuit transformation problem.

Unitary matrix factorization algorithm. The first method uses the unitary matrix factorization algorithm to rearrange the quantum circuit from the beginning while retaining the input circuit [8, 14].

Converting into some existing problems. The second method converts the quantum circuit transformation problem into some existing problems, such as AI planning [22, 3], Integer Linear Programming (ILP) [1], Satisfiability Modulo Theory (SMT) [12]. They use tools to find acceptable results, which cannot take advantage of certain properties of quantum mapping. Furthermore, they may run for a long time and apply to small-scale quantum circuits.

*Exact methods*. The exact method is only suitable for simple quantum architecture and cannot be extended to complex quantum architecture [19].

Graph theory. In [17], Shafaei used the minimum linear permutation problem in graph theory to model the problem of reducing the interaction distance. It divides a given circuit into several subcircuits and applies the minimum linear permutation problem, respectively. Then it turns non-adjacent gates in the subcircuits into adjacent gates by adding auxiliary gates. Finally, it uses the minimum linear permutation problem to find an appropriate permutation and bubble sort to calculate the number of SWAP gates needed. Guerreschi and Mat-

suo proposed a two-step method to reduce the quantum circuit transformation to the graph problem to minimize the number of auxiliary gates, based on the graph coloring problem and the largest subgraph isomorphism problem [7,11].

Heuristic search. Heuristic search uses an evaluation function to obtain an acceptable solution in exponential time. Zulehner layered the circuits, grouped the circuits that could be executed in parallel into the same layer, and then determined compatible mappings for each of these layers to add as few auxiliary gates as possible. Zhou designed a heuristic search algorithm with a novel selection mechanism [23]. He did not choose the lowest cost operation to apply but looked forward one step and then chose the best continuous operation. In this way, the algorithm can effectively avoid local minimum. Moreover, a pruning mechanism is introduced to reduce the search space's size and ensure that the program terminates in a reasonable amount of time. This algorithm's time complexity is  $O(|V|^4)$ .

Li proposed a SWAP-based search algorithm SABRE [9]. Compared with previous search algorithms based on exhaustive mapping, SABRE achieves exponential search complexity and ensures the scalability of SABRE to adapt to the large quantum equipment in the NISQ era. The routing algorithm implemented in  $t | ket \rangle$  can ensure that any quantum circuit is compiled into any architecture [4]. The algorithm is divided into four stages: decomposing the input circuit into time steps, determining the initial mapping, routing across time steps, and finally cleaning up. The heuristics in  $t|ket\rangle$  give the same or better results than other circuit transformation systems in terms of depth and total number of gates in the compiled circuit, with much shorter running times, and can handle larger circuits. Tannu proposed a variation-aware qubit movement strategy, which takes advantage of the change in error rate and a change-aware quantum circuit transformation strategy by trying to select the route with the lowest probability of failure [21]. This strategy uses the error rate of SWAP to allocate logical qubits to physical qubits, thus avoiding paths with high error rates as much as possible.

The main contributions of this paper are as follows.

- 1. We propose an combined subgraph isomorphism algorithm (CSI) to generate the initial mapping, which can be reduced to subgraph isomorphism. Thus we use a suitable subgraph isomorphism algorithm to generate part of the initial mapping and then complete the mapping based on the connectivity between qubits.
- 2. We propose a heuristic circuit transformation algorithm based on Tabu Search (QCTS) [6], which can handle large circuits in a short time at a low cost. Compared with the previous precise search and heuristic algorithms, it can complete the circuit transformation in a shorter time. QCTS can complete the search of the 159 circuits [25] only with a few minutes, but another heuristic search cannot deal with them in a few minutes, especially large circuits.

The rest of this paper is organised as follows. In Section 2 we recall some background of quantum computing and quantum information. We propose the

#### 4 F. Author et al.

problems of the transformation of quantum circuits in Section 3. Section 4 describes and analyses our algorithm in detail. The experimental results are reported in Section 5. The last section concludes the paper and discusses future research.

### 2 Background

This section introduces some notions and notations of quantum computing and quantum information.

#### 2.1 Qubits

Classical information is stored in bits, while quantum information is stored in qubits. Besides two basic states  $|0\rangle$  and  $|1\rangle$ , a qubit can be in any linear superposition state with the  $|\phi\rangle=a\,|0\rangle+b\,|1\rangle$ , where  $a,b\in\mathbb{C}$  satisfy  $|a|^2+|b|^2=1$ . Then  $|\phi\rangle$  is in the state  $|0\rangle$  with the probability  $|a|^2$  or in the state  $|1\rangle$  with the probability  $|b|^2$ .

#### 2.2 Quantum Gate

Commonly used quantum gate symbols and their matrices are shown in Fig. 1. A physical qubit or logical qubit is represented by q, q, respectively.

| Hadamard gate        | ——[H]—— | $\frac{1}{\sqrt{2}} \left[ \begin{array}{cc} 1 & 1 \\ 1 & -1 \end{array} \right]$                  |
|----------------------|---------|----------------------------------------------------------------------------------------------------|
| Pauli-X gate         | X       | $\left[\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right]$                                        |
| Pauli-Y gate         | Y       | $\left[\begin{array}{cc} 1 & -i \\ i & 0 \end{array}\right]$                                       |
| Pauli-Z gate         | Z       | $\left[\begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array}\right]$                                       |
| phase gate           | <u></u> | $\left[\begin{array}{cc} 1 & 0 \\ 0 & i \end{array}\right]$                                        |
| $\frac{\pi}{8}$ gate |         | $\left[ \begin{smallmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{smallmatrix} \right]$                     |
| CNOT gate            |         | $ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} $ |

Fig. 1. The symbols of common quantum gates and their matrices

#### 2.3 Quantum Circuit

A quantum logical circuit LC (see Fig. 2) consists of quantum gates interconnected by quantum wires [5]. A quantum wire is a mechanism for moving quantum data from one location to another. Each line represents a qubit, and the gate operation on the line acts on the corresponding qubit. The execution order of a quantum logical circuit graph is from left to right. The width w of a circuit refers to the number of qubits in the circuit. The depth d of a circuit refers to the number of layers executed in parallel. The directed acyclic graph (see Fig. 3) of a circuit is obtained by parallelizing and layering the circuit by topological sorting. For example, the depth of the circuit (see Fig. 2) is 6, and the width is 5. In this paper, circuits with a depth less than 100 are called small-sized circuits, circuits with a depth greater than 1000 are called large-sized circuits, and the rest are medium-sized circuits. It is unnecessary to consider single quantum gates in circuit transformation since the qubit is local [18]. Architecture graph  $\mathcal{AG}_L$  is generated by regarding qubits in LC as nodes V and 2-qubit gates as edges E.



Fig. 2. Original circuit



Fig. 3. The directed acyclic graph (DAG) of original circuit in Fig. 2



Fig. 4. (a) The architecture graph of original circuit in Fig. 2. (b) The partial architecture graph of IBM Q20.



**Fig. 5.** The above circuit changes the direction of the *CNOT* gate by adding four H gates, and below is the circuit of the SWAP gate.

#### 2.4 Architectures

We mainly discuss the physical circuits of IBM Q series. Let  $\mathcal{AG}_{\mathcal{P}} = (V_P, E_P)$  denote the architecture graph of the physical circuit, where  $V_P$  denotes the physical qubit set and  $E_P$  represents the directed edge set that the CNOT gates. Fig. 6 (a) and (b) are PAG of the 5-qubit of IBM QX2, (c) and (d) are PAG of 16-qubit of IBM QX3, and (e) is the PAG of IBM Q20. The arrow direction in the figure indicates the control direction of the gate, and the 2-qubit gate operations can only be performed between qubits with edges connected. IBM physical circuit only supports single quantum gates and CNOT gates between two adjacent qubits. Fig. 4(a) is the logical architecture graph of the original circuit in Fig. 2, and Fig. 4(b) is the partial architecture graph of IBM Q20.

Given a logical circuit LC, a physical structure  $\mathcal{AG}_P$ , an initial mapping  $\tau$ , and a CNOT gate  $g = \langle q_i, q_j \rangle$ , where  $q_i$  is the control qubit,  $q_j$  is the target qubit.  $\langle \tau(q_i), \tau(q_i) \rangle$  is a directed edge on  $\mathcal{AG}_P$ , if gate g is executable.

Example 1. Fig. 4 (a) is the logical structure of Fig. 2, Fig. 4 (b) is the partial architecture graph of IBM Q20, an initial mapping is  $\tau = \{q_0 \to q_{10}, q_1 \to q_0, q_2 \to q_6, q_3 \to q_5, q_4 \to q_{11}\}$ .  $g_0 = \langle q_2, q_1 \rangle$  is not executable, since  $\langle \tau(q_2), \tau(q_1) \rangle = \langle q_6, q_0 \rangle$  does not exists in  $\mathcal{AG}_P$ . But  $g_3 = \langle q_1, q_3 \rangle$  is executable, since  $\langle \tau(q_1), \tau(q_3) \rangle = \langle q_0, q_5 \rangle$  exist in  $\mathcal{AG}_P$ .



Fig. 6. IBM QX architectures

## 3 Problem Analysis

Problem in qubit Mapping. Single qubit gates and CNOT gates are used as basic gates, since they are commonly used to implement any quantum circuit supported by the IBM QX architecture. Before circuit transformation, the circuit should be simplified to a circuit with only single quantum gates and CNOT gates [13, 2]. We insert auxiliary gates (see Fig. 5) to move two non-adjacent quantum positions to adjacent positions or change the direction of the CNOT gate, but this process may introduce errors. The introduction of auxiliary gates may lead to errors. We hope to find a circuit transformation algorithm to make the output circuit with the minimum number of auxiliary gates and the circuit depth in an acceptable amount of time. A quantum circuit transformation problem mainly includes the following four steps. Isomorphism and transformation are both NPCs [19].

- 1. Preprocess the logical quantum circuit. It includes extracting the LAG of the circuit, adjusting the life cycle of qubits (the work is done by Zhang [24]), and calculating the shortest path of the physical circuit.
- 2. Compute isomorphic substructures. It uses the subgraph isomorphism algorithm to find part of the initial mapping, which is done by Sun [20]
- 3. Generate a high-quality initial mapping. We perform mapping completion because the remaining nodes cannot satisfy all isomorphism requirements. According to the connectivity between the unmapped node and the mapped nodes. Unmapped nodes are mapped to the neighborhood of mapped nodes, which satisfies the connectivity of part of the structure and reduces the length of the shortest path.
- 4. Transforming logical circuits to meet physical constraints. Circuit transformation problems need to be solved before implementing quantum circuits,



Fig. 7. Circuit transformation process

since he design of quantum algorithms does not refer to the connectivity constraints of any hardware. Therefore, circuit transformation forms a necessary stage of any quantum compiler.

#### 4 Solution

The solution proposed in this paper mainly includes preprocessing, initial mapping, and circuit transformation algorithm based on Tabu Search. In this section we will introduce them in detail.

#### 4.1 Preprocessing

Before transforming the SWAP circuit based on Tabu Search, we need to preprocess it to get more convenient data to shorten our search time and space. In the preprocessing stage, we adjust the circuit of the input openQASM program to shorten the life cycle of qubits. Then we use Breadth-First Search (BFS) to calculate the shortest distance between each node on the architecture graph.

Circuit Adjustment We use a layered method to analyze the life cycle of qubits [24] and pack the gates that can be executed in parallel into a bundle, forming a layered bundle format. A conversion method is designed to use the layered bundle format to determine which gates can be moved, which reduces the life cycle of these qubits. The algorithm reduces the error rate of quantum programs by 11%. In most quantum workloads, the longest qubit lifetime and

the average qubit lifetime can be reduced by more than 20%, and the execution time of some quantum programs can also be reduced.

Shortest Distance Given PAG and the distance of each edge is 1, we can use Floyd-Warshall calculate the shortest distance matrix dist[i][j], which represents the shortest distance from  $q_i$  to  $q_i$ .

For IBM QX2, QX3, QX4, QX5, the SWAP operation needs 7 gates (3 CNOT gates and 4 H gates). Only 4 H gates are needed to change the direction of an adjacent CNOT gate. For a CNOT gate  $g = \langle q_i, q_j \rangle$ , two qubits are mapped to  $\mathbf{q}_i$  and  $\mathbf{q}_j$  respectively, with  $\tau(q_i) = \mathbf{q}_m$ ,  $\tau(q_j) = \mathbf{q}_n$ . Then the cost of executing g under the shortest distance path is  $cost_{cnot}(q_i, q_j) = 7 \times (dist[m][n] - 1)$ . For IBM Q20, in which all edges are bidirectional, a SWAP operation requires 3 CNOT gates. Thus the cost between them is  $cost_{cnot}(q_i, q_j) = 3 \times (dist[m][n] - 1)$ . The time complexity is  $O(N^3)$ .

Example 2. Take the QX5 structure as an example. Suppose there is a CNOT gate  $g = \langle q_i, q_j \rangle$ ,  $q_i$  is mapped to  $q_1$ ,  $q_j$  is mapped to  $q_{14}$ , and the shortest distance between them is dist[1][14] = 3. There are 3 shortest paths to move  $q_1$  to the adjacent position of  $q_{14}$ :  $\Pi = \{\pi_0, \pi_1, \pi_2\}$   $\pi_0 = q_1 \rightarrow q_2 \rightarrow q_3 \rightarrow q_{14}$ ,  $\pi_1 = q_1 \rightarrow q_2 \rightarrow q_{15} \rightarrow q_{14}$ ,  $\pi_2 = q_1 \rightarrow q_0 \rightarrow q_{15} \rightarrow q_{14}$ . Their costs are  $cost_{\pi_0} = 18$ ,  $cost_{\pi_1} = 14$ ,  $cost_{\pi_2} = 14$ , respectively.

Circuit Layering Quantum gates acting on different qubits can execute in parallel. Thus, we layer the adjusted circuit, traverse the entire program sequentially. Otherwise, a new layer is added.  $L(LC) = \{\mathcal{L}_0, \mathcal{L}_1, ..., \mathcal{L}_n\}$  represents the layered circuit, where  $\mathcal{L}_i$   $(0 \le i \le n)$  represents a quantum gate set that can be executed in parallel. The quantum gate set separated by the dotted line in Fig. 2 are the following  $\mathcal{L}_0 = \{g_0, g_1\}, \mathcal{L}_1 = \{g_2\}, \mathcal{L}_2 = \{g_3, g_4\}, \mathcal{L}_3 = \{g_5, g_6\}, \mathcal{L}_4 = \{g_7\}, \mathcal{L}_5 = \{g_8\}.$ 

At the same time, we generate logical circuit architecture graph  $\mathcal{AG}_{\mathcal{L}} = (V_L, E_L)$ , which is an undirected graph.  $V_L$  contains the vertices and the degree of each vertex, and  $E_L$  represents the set of undirected edges that the CNOT gates can execute.

#### 4.2 Initial Mapping

It has been proved that the initial mapping has an important influence on quantum circuit transformation, and the subgraph isomorphism can be reduced to initial mapping problem, so we want to use the subgraph isomorphism algorithm to find an initial mapping that helps to minimize auxiliary gates added by the output circuit.

In PAG, it is almost impossible to find a subgraph that exactly matched nodes LAG, so we hope to find a partial mapping that can maximize the number of matched nodes. SubgraphCompare [20] compares several state-of-the-art subgraph isomorphism algorithm composition. It shows that using the filtering

and sorting ideas of GraphQL algorithm to process candidate nodes, and the local candidates' calculation method LFTJ based on set-intersection to enumerate the results is the best. We artificially connect the isolating qubit to the qubit with the largest degree in the logical architecture diagram, since SubgraphCompare cannot handle the not connected graph. We want to minimize the impact of logical dependency graph, so match it with the node with the largest degree.

The input of Algorithm 1 is a target graph  $(\mathcal{AG}_P)$ , query graph  $(\mathcal{AG}_L)$ , and the partial mappings T. First, we initialize an empty queue Q, which stores unmatched nodes in the map  $\tau \in T$ . Then we traverse  $\tau$  and adds the unmatched nodes to the queue. For the remaining unmatched nodes, we try to map them with the nodes that do not match in the more concentrated area of  $\mathcal{AG}_P$ . Finally, a dense graph is generated, which can reduce subsequent SWAP operations. We would try to match the remaining unmatched nodes randomly, but this may lead to mapping to a node far away from other nodes. If the unmatched node has an edge adjacent to the matched point in the query graph, it will be matched to one of the adjacent nodes first. Finally, it gets all initial candidate mappings and outputs them to the file.

In Algorithm 1, lines 2-7 calculate the maximum number of qubits l that match in the mapping T between logical qubits and physical qubits obtained by the Subgraph Compare. Lines 3-44 completes the logical qubit unmapped nodes in the mapping algorithm with the number of matched nodes equal to l. In line 6, we initialized an empty queue Q, which stores unmapped logical qubits. In lines 8-13, we traverse the map and add the unmapped qubits to Q. We loop until Q is empty, and all logical qubits map to physical qubits. We take out the first element in Q to  $q_id$ . Lines 15 and 16 respectively are used to get the adjacency matrices of  $\mathcal{AG}_P$  and  $\mathcal{AG}_L$ . Line 18 initializes an empty map cans, sorted by the connectivity in descending order. Lines 25-31 traverse the point m connected to q id in the adjacency matrix. If the node m has not been mapped, the node stores in cans. Lines 32-47 traverse the cans, select the node with the largest number of connections to  $q_i$  in the cans, and it has been mapped to the node (cans. first) on PAG. The t id in line 33 is the node with the largest number of  $q_id$  connections corresponding to the node on PAG. Line 35 removes the object to match in the cans from the cans. Lines 36-43 select the node adjacent to the t id in the adjacency matrix of the t id, and map the q id to the node.

Example 3. Following the previous example, we first use CSI algorithm for LAG (see Fig. 4 (a)) and PAG (see Fig. 6 (e)) to obtain the partial mapping set  $T = \{\tau_0, \tau_1, ..., \tau_n\}$ . Then we use one of the partial mapping set as an example  $\tau_0 = \{q_0 \to q_{10}, q_1 \to -1, q_2 \to q_6, q_3 \to q_5, q_4 \to q_{11}\}$ ,  $0 \le i < n$ .  $q_1 \to -1$  means that  $q_1$  is not mapped to the physical structure in the subgraph isomorphism stage, so we need to perform mapping completion. Algorithm 1 completes the partial mapping with the maximum mapped nodes in T as the initial mapping. In the example, the maximum number of mapped nodes is 4. Next, we demonstrate how  $\tau_0$  is completed. We add all unmapped nodes the queue Q,  $Q = \{q_1\}$ , and the loop ends until Q is empty. We put the first element of Q into  $q_{-i}d$ , and delete it from Q. Then we get the adjacency matrix

#### **Algorithm 1:** initial mapping algorithm CSI

```
Input: \mathcal{AG}_{\mathcal{L}}: The architecture of logical circuit
    \mathcal{AG}_{\mathcal{P}}: The architecture of physical circuit
    T: A partial mapping set obtained by SubgraphCompare
    Output: result: A collection of mapping relations between \mathcal{AG}_{\mathcal{L}} and \mathcal{AG}_{\mathcal{P}}
 1 Initialize result = \emptyset;
 2 l \leftarrow max_{\tau \in T} \tau.length;
 з for \tau \in T do
        if l = \tau.length then
 4
              result.add(\tau);
 5
              Q \leftarrow initialing an empty unmapped node queue
 6
              i \leftarrow 1;
 7
              while i \leq \tau.length do
 8
                  if \tau[i] = -1 then
10
                   Q \leftarrow i;
                  end
11
                  i \leftarrow i + 1;
12
13
              end
              while Q is not empty do
14
                   int q\_id \leftarrow Q.poll();
15
                   targetAdj \leftarrow \mathcal{AG}_{\mathcal{P}}.adjacencyMatrix();
16
                   queryAdj \leftarrow \mathcal{AG}_{\mathcal{L}}.adjacencyMatrix();
17
                   cans \leftarrow initialing an empty candidate node list;
                                                                                       // sorted by
18
                    the connectivity of nodes
                   m \leftarrow 1;
19
                   while m \leq queryAdj[q\_id].length do
20
                       cans \leftarrow cans \cup \{m\}; m \leftarrow m + 1;
21
                  end
22
                   while cans is not empty do
23
                       t\_id \leftarrow \tau[cans.first];
\mathbf{24}
25
                       k \leftarrow 0;
26
                       cans \leftarrow cans \backslash cans.first;
                       while k < targetAdj[t\_id].length do
27
                            if (targetAdj[t\_id][k] \neq -1 \text{ or } targetAdj[k][t\_id] \neq -1)
28
                             and not \tau.contains(k) then
29
                                 \tau[q\_id] \leftarrow k;
30
                                 break;
31
                            end
32
                            k \leftarrow k + 1;
33
34
                       end
                       if k \neq targetAdj[t\_id].length then
35
                            break;
36
                       end
37
38
                   end
              end
39
         end
40
41 end
```

of the query graph and the target graph, and traversing the nodes  $q_m$  in the adjacency matrix. We put  $q_m$  into the candidate nodes list cans, which is sorted by the connectivity of  $q_m$  and  $q\_id$ . Thus we get  $cans = \{q_3, q_2, q_4, q_0\}$ . Thereafter, we traverse cans and take out of the first element  $value = q_3$  in cans, and calculate the phycical nodes  $t\_id = q_5$ ,  $\tau_0(q_3) = q_5$ . Finally, we map  $q\_id$  to the node connected to  $t\_id$  and not yet mapped. If the nodes connected to  $t\_id$  have been mapped, the loop continues. In this example, it can be directly mapped to  $q_0$ . In the end, we get  $\tau_0 = \{q_0 \to q_{10}, q_1 \to q_0, q_2 \to q_6, q_3 \to q_5, q_4 \to q_{11}\}$ .

#### 4.3 Swap Minimization

Tabu Search Tabu Search algorithm is a type of heuristic algorithm [6]. Tabu Search uses a tabu list to avoid searching for repeated spaces, thereby avoiding deadlock. The algorithm uses amnesty rules to jump out of the optimal local solution to ensure the diversity of transformed results. The circuit transformation mainly relies on the Tabu Search algorithm, aiming to deal with the large-scale circuits that the current algorithm is difficult to handle and the output circuit closer to the optimum solution in a short time.

There are mainly the following objects in Tabu Search: neighborhood field, neighborhood action, tabu list, candidate set, tabu object, evaluation function, and amnesty rule. All the edges that can be swapped in the current map are the neighborhood fields in Tabu Search. The tabu list avoids local minimum and fits the parallelism requirements of qubits. The tabu object is the object in the tabu list. We try not to use the recently swapped qubits as much as possible, which are added to the tabu list, at the same time. The candidate set selects some neighborhood objects from the neighborhood fields. We perform pruning to save search space, since only the swap of edges adjacent to the gate node with at least one edge is meaningful. We select the edge in the shortest path that has an intersection with the qubits contained in the gate as the candidate set. The tabu object is the object in the tabu list. The evaluation function selects a SWAP evaluation formula from the candidate set, generally taking the objective function as the evaluation function. The evaluation function satisfies some gate operations, and the number of SWAP gates added should be small, and the depth of the entire circuit should be small. The amnesty rule is that when all objects in the candidate set are banned, or after one object is banned, the target value will be greatly reduced. In order to achieve the global optimum, the tabu object can be added to the candidate set.

The calculation of the neighborhood fields is shown in Algorithm 2. The input is the current circuit mapping  $\tau_p$ , qubits represents the mapping of physical qubits to logical qubits, Where j=qubits[i] means that the i-th physical qubit has been mapped to the j-th logical qubit. locations represents the mapping of logical qubits to physical qubits, Where j=locations[i] means that the ith logical qubit has been mapped to the j-th physical qubit. The current layer list of all gates cl, and the output is a candidate set of the current mapping. E is the edge of all the shortest paths in the physical architecture graph of all gates

in the current layer. Lines 17-35 swap all the edges of this path and add them to the candidate set, and calculate the cost of each candidate.

Example 4. Under the mapping  $\tau_0 = \{q_0 \to q_{10}, q_1 \to q_0, q_2 \to q_6, q_3 \to q_5, q_4 \to q_{11}\}$ , for  $L_0 = \{g_0, g_1\}$ ,  $dist_{cnot}(g_0) = 3$ ,  $dist_{cnot}(g_1) = 3$ . Gate  $g_1$  can be executed directly under the  $\tau_0$  mapping, so we delete it from  $L_0$ , but  $g_0$  cannot be executed under the mapping  $\tau_0$ . Thus circuit transformation is required. Nodes that cannot be executable join the set  $swap\_nodes = \{q_0, q_6\}$ . The shortest path is  $paths = \{\{q_6 \to q_1 \to q_0\}, \{q_6 \to q_5 \to q_0\}\}$ , and then we traverse the shortest path to calculate candidate set. The two endpoints of the edge passed by the shortest path should intersect the swap set and join the candidate set. so the current candidate set is  $\{SWAP(q_6, q_1), SWAP(q_1, q_0), SWAP(q_6, q_5), SWAP(q_5, q_0)\}$ .

The circuit mapping algorithm based on Tabu Search takes a layered circuit and an initial mapping as input and outputs a circuit that can be executed in the specified architecture graph(see Algorithm 3). The transformed circuit mapping of each layer is used as the initial mapping of the next layer circuit. Lines 2 to 3 regard the initial mapping  $\tau_{ini}$  as the best mapping  $\tau_{best}$  and the current mapping  $\tau_{curr}$ . Lines 4 to 17 cyclically check whether all the current layer gates can execute under the mapping  $\tau_{curr}$ . If it does not satisfy the execution of all gates or the number of iterations has not reached the given maximum number, the search will continue. Otherwise, the search will terminate. Line 5 gets the current mapping candidate, and line 6 finds the best mapping in the candidate set. The mapping will first remove the overlapping elements of the candidate set and the tabu list. Then from the remaining candidates, we choose a mapping with the lowest cost. Lines 7 to 12 are the amnesty rules. When the best candidate is not found, the candidate set elements are all the same as the tabu list elements. The amnesty rule selects the lowest cost mapping in the candidate set as the best candidate mapping. Lines 13-16 update the best mapping  $\tau_{best}$  and the current mapping  $\tau_{curr}$ , and add the SWAPs operation performed by the best mapping to the tabu list tl, indicating that this SWAPs has just been performed, and the algorithm should try to avoid re-swap the just swapped qubits. Then it will judge whether the algorithm stop condition is satisfied. The stopping condition determines whether the number of iterations has reached the maximum number, or the current mapping satisfies the execution of all gates in the current layer. If the stop condition is not satisfied, continue to loop.

Example 5. We continue the previous example. Tabu Search requires an initial solution and then searches based on this solution. We use the initial mapping as the initial solution. We need to get a series of initial candidate SWAP sets and select the one with the lower evaluation scores. For  $L_0 = \{g_0, g_1\}$ , the initial candidate set is  $\{SWAP(q_6, q_1), SWAP(q_1, q_0), SWAP(q_6, q_5), SWAP(q_5, q_0)\}$ , and the costs are  $cost(SWAP(q_6, q_1)) = 3.0$ ,  $cost(SWAP(q_1, q_0)) = 3.0$ ,  $cost(SWAP(q_6, q_5)) = 3.0$ ,  $cost(SWAP(q_5, q_0)) = 3.0$ , respectively. The algorithm will choose the first SWAP operation, the mapping becomes  $\tau_0 = \{q_0 \rightarrow q_{10}, q_1 \rightarrow q_0, q_2 \rightarrow q_1, q_3 \rightarrow q_5, q_4 \rightarrow q_{11}\}$ . The Tabu Search loops to determine

#### Algorithm 2: Calculate the candidate sets

```
Input: dist: The shortest paths of physical architecture
    qubits: The mapping from physical qubits to logical qubits
    locations: The mapping from logical qubits to physical qubits
    cl: Gates included in the current layer of circuits
    Output: results: The set of candidate solution
 1 Initialize results \leftarrow \emptyset;
 2 E_w \leftarrow \text{Calculate the weight of each edge}
 \mathbf{3} \;\; swap\_nodes \leftarrow \text{An empty set of candidate swap nodes}
 4 foreach g \in cl do
 5
         q_1 \leftarrow locations[g.control];
         q_{\textit{2}} \leftarrow locations[g.target];
 6
        if g is executable then
             cl \leftarrow cl \backslash g;
 8
             continue;
 9
10
         end
         swap\_nodes.add(q_1);
11
         swap\_nodes.add(q_2);
12
13 end
14 foreach g \in cl do
         q_1 \leftarrow locations[g.control];
15
         q_2 \leftarrow locations[g.target];
16
         foreach path \in paths[q_1][q_2] do
17
             for
each e \in path do
18
                  \mathbf{if}\ swap\_nodes.contains(sour\_node)\ or
19
                   swap\_nodes.contains(tar\_node) then
20
                       new\_qubits \leftarrow qubits;
21
                       new\_locations \leftarrow locations;
22
                       q_1 \leftarrow new\_qubits[e.source];
                       q_{2} \leftarrow new\_qubits[e.target];
23
                       new\_qubits[e.source] \leftarrow q_2 \; ;
24
                       new\_qubits[e.target] \leftarrow q_{\it 1};
25
                       if q_1 \neq -1 then
26
                          new\_locations[q_1] \leftarrow q_2;
27
                       end
28
                       if q_2 \neq -1 then
29
                       | new\_locations[q_2] \leftarrow q_1;
30
                       end
31
                       s \leftarrow \emptyset:
32
                       s.value \leftarrow compute\_evaluate\_value(dist, new\_locations, cl);
33
                       results \leftarrow results \cup s;
34
                  \mathbf{end}
35
             end
36
         end
37
    end
38
39 return results;
```

whether it has reached the stop condition. It can be seen that the current mapping has satisfied the execution of  $g_0$ . Thus the search of the current layer is over, and the Tabu Search of the next layer is continued.

#### Algorithm 3: Tabu Search

```
Input: \tau_{ini}: The initial mapping
    tl: Tabu list
    Output: \tau_{best}: The final state and SWAPs
 1 Initialize \tau_{best} \leftarrow \tau_{ini};
 \tau_{curr} \leftarrow \tau_{ini};
 siter \leftarrow 1;
                                                                       // Number of iterations
    while not mustStop(iter, \tau_{best}) do
         C \leftarrow \tau_{curr}.candidates();
                                                                                    // candidate set
 6
         C_{best} \leftarrow find\_best\_candidates(C, tl);
         if C_{best} is empty then
              if C = NULL then
 8
               break;
 9
              end
10
              C_{best} \leftarrow find\_amnesty\_candidates(C, tl);
11
         end
12
         \tau_{best} \leftarrow C_{best};
13
         \tau_{curr} \leftarrow C_{best};
14
         tl \leftarrow tl \cup \{C_{best}.swap\};
15
         iter \leftarrow iter + 1;
16
17 end
18 return \tau_{best}
```

**Evaluation function design** Our purpose is to add as few gates as possible to the circuit or the depth of the generated circuit is relatively small.

We test two evaluation functions, one uses the depth of the generated circuit as the evaluation criterion 4, and the other uses the number of auxiliary gates in the generated circuit as the evaluation criterion 3.

$$cost(SWAP(q_m, q_n)) = \sum_{g \in L_i} (dist[g.control][g.target])$$
 (1)

$$cost(SWAP(q_i, q_i)) = Depth(L_i)$$
(2)

 $cost(SWAP(q_i, q_j))$  represents the cost of executing all gates of the current layer  $L_i$  after swapping  $q_i$ ,  $q_j$ . We only calculate the distance of the unmapped gates of the after the SWAP operation as in the equation (4) or the depth between the unmapped gates as in the equation (3).

Look ahead We deserved that the number of gates in each layer after layering is small. If we only consider the gates of current layer when choosing the swap, the swap only satisfies the requirement of the i-th layer. The output of the i-th (i < n) layer is used as the input of the (i + 1)-th layer. Note that the swap algorithm of the i-th layer will affect the mapping of the (i+1)-th layer. Thus we take the circuit of the (i+x)-th (i+x < n) layer into consideration. However, it is necessary to give priority to the execution of the gate set of the i-th layer, so we introduce an attenuation factor  $\delta$ , which controls the influence of the (i+x)-th layer gate set on the circuit swap of the i-th layer. Experiments show that for x=2,  $\delta=0.9$ , the final effect is the best. Our evaluation function can be rewritten as

$$cost(SWAP(\mathbf{q_m}, \mathbf{q_n})) = \sum_{g \in L_i} (dist[g.control][g.target]) + \\ \delta \times \sum_{j=i}^{i+x} \sum_{g \in L_j} (dist[g.control][g.target])$$

$$(3)$$

$$cost(SWAP(q_{m}, q_{n})) = Depth(L_{i}) + \delta \times Depth(\sum_{j=i}^{i+x} L_{j}).$$
 (4)

Complexity Given logical circuit architecture graph  $\mathcal{AG}_{\mathcal{L}} = (V_L, E_L)$ , physical circuit architecture graph  $\mathcal{AG}_{\mathcal{P}} = (V_P, E_P)$ , the initial mapping  $\tau$ , the depth of the circuit d, the number of qubits  $V_L$ , Tabu Search deals with one layer at a time, and searches at most d times. Starting from the initial mapping, we first delete the executable gates of the first layer under the initial mapping. Then, the edges of all the shortest paths of all the gates that are not executed in the current layer are added to the candidate set where at least one node is a node of the gate mapping. In the worst case, the shortest path length is  $(|E_P|-1)$ , and the candidate set size is  $(|E_P|-1)$ . Each SWAP will make the total distance between the gates smaller. In the worst case, the number of SWAPs is  $(|E_P|-1)^{|E_P|-2}$ , but our selection strategy will make the number of SWAPs significantly reduced. Our time complexity is  $d * ((|E_P|-1))^{(|E_P|-2)}$ , and the space complexity is the size of our candidate set  $(E_P-1)$ .

#### 5 Experiment

The experiment in this paper is performed on a 2.3GHz Linux machine with 64G memory. This paper compares CSI algorithm and circuit transformation algorithm based on Tabu Search QCTS with the wghtgraph in [10] and the heuristic algorithm  $A^*$  in [25].

First, we compared the efficiency of initial mapping on  $\tau_{optm}$  [25],  $\tau_{CSI}$  and  $\tau_{wghtgraph}$  [10]. In order to observe the results of these two initial mapping algorithms intuitively, we used the same circuit transformation  $A^*$  algorithm to compare the initial mapping algorithms [25].

Among 159 circuits, experiments show that within five minutes  $\tau_{optm}$  can deal with 121 circuits,  $\tau_{wghtgraph}$  can deal with 106 circuits,  $\tau_{CSI}$  can deal with 131 circuits. There are 103 circuits that they can handle. Comparing  $\tau_{wghtgraph}$  algorithm and  $\tau_{CSI}$  algorithm, the  $\tau_{wghtgraph}$  algorithm has 21 circuits with fewer SWAPs and 19 circuits with a small depth, and the  $\tau_{CSI}$  algorithm has 54 circuits with fewer SWAPs and 60 circuits with a small depth, and they have 25 circuits with equal depth and 29 circuits with equal SWAPs. The SWAPs of the  $\tau_{CSI}$  algorithm is relatively reduced by 22.4418%, and the depth is reduced by 11.2482%.

Comparing  $\tau_{optm}$  algorithm and  $\tau_{CSI}$  algorithm, the  $\tau_{optm}$  algorithm has one circuit with fewer SWAPs and two circuits with a small depth, and the  $\tau_{CSI}$  algorithm has 99 circuits with fewer SWAPs and 98 circuits with a small depth, and they have 4 circuits with equal depth and 4 circuits with equal SWAPs. The SWAPs of the  $\tau_{CSI}$  algorithm is relatively reduced by 27.0219%, and the depth is reduced by 14.1242%. As shown in Table 1, there are 104 circuits. Three initial mapping algorithms are compared with the depth of the generated circuits under the  $A^*$  algorithm, and the number of SWAP gates added. $\tau_{CSI}/\tau_{optm}$  calculate the efficiency improvement of the former upon the latter, the formula is  $(n_{optm}-n_{CSI})/n_{optm}$ .

|       | $	au_{optm}$ | $	au_{wghtgraph}$ | $	au_{CSI}$ | $	au_{CSI}/	au_{optm}$ | $\tau_{CSI}/\tau_{wghtgraph}$ |
|-------|--------------|-------------------|-------------|------------------------|-------------------------------|
| depth | 168895       | 163422            | 145040      | 14.1241%               | 11.2482%                      |
| added | 20439        | 19232             | 14916       | 27.0219%               | 22.4418%                      |

**Table 1.** Compare  $\tau_{optm}$ ,  $\tau_{wghtgraph}$ , and  $\tau_{CSI}$ 

We compared the use of two indicators  $(QCTS_{dep} \text{ and } QCTS_{num})$  that prioritize smaller depth and fewer auxiliary gates. The two indicators were used as objective functions, and 159 circuits were tested. The depth of the final circuit obtained by  $QCTS_{num}$  is 1.93% smaller than  $QCTS_{dep}$  on average, and the number of auxiliary gates added is 4.53% smaller on average. Inserting a SWAP gate, the circuit needs to add 3 CNOT gates, and the depth will be increased by 3. While the number of SWAP gates added is small, the circuit depth reduces accordingly. Thus we use SWAP quantity first to give better results.

Finally, we compared QCTS and wgtgraph. Since the wgtgraph algorithm only uses 2-qubit gates, it is impossible to compare the depth of the generated circuit, So we compared the number of SWAP gates added and compared the time. Since large circuits may not successfully handle for a long time, we consider it meaningless. This paper sets a five-minute timeout period and tested 159 circuits.  $QCTS_{num}$  only takes 461 seconds,  $QCTS_{dep}$  takes 485 seconds, and wgtgraph run 159 circuits in 1908 seconds, but only 98 files get results, 64 of them there are 66 circuits for small circuits to get results, 49 medium circuits only have 35 circuits for results, and no circuit output in 44 large circuits. Although Tabu Search can quickly produce results on large circuits, in contrast, more auxiliary

gates are added. In 98 small and medium-sized circuits with the results obtained by wgtgraph, the number of SWAP gates added by wgtgraph is 26.87% less than  $QCTS_{num}$  on average, and the number of SWAP gates added by wgtgraph is 24.89% less than  $QCTS_{dep}$  on average. Tabu Search can quickly output converted circuits on large circuits, but wgtgraph cannot get results in a short time. The detailed results of the circuit comparisons are in the appendix.

| benchmarks    | #circ   | #circ QCTS |      | $CS_{num}$ $QCTS_{de}$ |      | wgtgraph |      | SABRE  |      |
|---------------|---------|------------|------|------------------------|------|----------|------|--------|------|
| Delicilliarks | #-CIIC. | #succ.     | time | #succ.                 | time | #succ.   | time | #succ. | time |
| small         | 66      | 66         | 32   | 66                     | 29   | 64       | 587  | 66     |      |
| medium        | 49      | 49         | 45   | 49                     | 40   | 35       | 1183 | 46     |      |
| large         | 44      | 44         | 407  | 44                     | 432  | 0        | -    |        |      |
| total         | 159     | 159        | 461  | 159                    | 501  | 98       | -    |        |      |

**Table 2.** Compare  $\tau_{optm}$ ,  $\tau_{wghtgraph}$ , and  $\tau_{QCTS}$ 

#### 6 Conclusion

We proposes CSI algorithm to generate high-quality initial mappings and a heuristic SWAP method QCTS based on Tabu Search to overcome the short-comings of previous works. Experimental results showed that the initial mappings generated by CSI reduced the number of SWAP gates inserted and results could be obtained in a short time. Most small and medium-sized circuits could be obtained in a few seconds. The result could be obtained within a few minutes, even for a large circuit, but the cost of insertion might be equal to or more than wgtgraph. We introduced a look-ahead plan to make each selected SWAP more in line with the constraints of the back gates. In future, we would study how to reduce the number of auxiliary gates inserted as much as possible based on increasing speed, and apply the proposed method to more NISQ devices to get useful experimental data. Since our analog circuit ignores the noise generated by the circuit, we would introduce quantum noise to the circuits.

#### References

- 1. Almeida, A., Dueck, G., Silva, A.: Finding optimal qubit permutations for ibm's quantum computer architectures pp. 1–6 (08 2019). https://doi.org/10.1145/3338852.3339829
- Barenco, A., Bennett, C., Cleve, R., DiVincenzo, D., Margolus, N., Shor, P., Sleator, T., Smolin, J., Weinfurter, H.: Elementary gates for quantum computation. Physical Review A 52 (03 1995). https://doi.org/10.1103/PhysRevA.52.3457
- 3. Bernal, D., Booth, K., Dridi, R., Alghassi, H., Tayur, S., Venturelli, D.: Integer programming techniques for minor-embedding in quantum annealers (12 2019)
- 4. Cowtan, A., Dilkes, S., Duncan, R., Krajenbrink, A., Simmons, W., Sivarajah, S.: On the qubit routing problem (02 2019)

- 5. Daei, O., Navi, K., Zomorodi, M.: Optimized quantum circuit partitioning (05 2020)
- Glover, F.: Tabu search—part ii. ORSA Journal on Computing 2, 4–32 (02 1990). https://doi.org/10.1287/ijoc.2.1.4
- Guerreschi, G.G., Park, J.: Two-step approach to scheduling quantum circuits. Quantum Science and Technology 3 (06 2018). https://doi.org/10.1088/2058-9565/aacf0b
- 8. Kissinger, A., Meijer, A.: Cnot circuit extraction for topologically-constrained quantum memories (04 2019)
- 9. Li, G., Ding, Y., Xie, Y.: Tackling the qubit mapping problem for nisq-era quantum devices (09 2018)
- 10. Li, S., Zhou, X., Feng, Y.: Qubit mapping based on subgraph isomorphism and filtered depth-limited search (2020)
- Matsuo, A., Yamashita, S.: An efficient method for quantum circuit placement problem on a 2-d grid pp. 162–168 (05 2019). https://doi.org/10.1007/978-3-030-21500-2 10
- Murali, P., Linke, N., Martonosi, M., Abhari, A., Nguyen, N., Huerta Alderete, C.: Full-stack, real-system quantum computer studies: architectural comparisons and design insights pp. 527–540 (06 2019). https://doi.org/10.1145/3307650.3322273
- 13. Möttönen, M., Vartiainen, J.: Decompositions of general quantum gates. Frontiers in Artificial Intelligence and Applications (05 2005)
- 14. Nash, B., Gheorghiu, V., Mosca, M.: Quantum circuit optimizations for nisq architectures. Quantum Science and Technology **5** (02 2020). https://doi.org/10.1088/2058-9565/ab79b1
- 15. Paler, A.: On the influence of initial qubit placement during nisq circuit compilation  $(11\ 2018)$
- 16. Preskill, J.: Quantum computing in the nisq era and beyond. Quantum 2 (2018)
- 17. Shafaei, A., Saeedi, M., Pedram, M.: Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures. Proceedings Design Automation Conference pp. 1–6 (05 2013). https://doi.org/10.1145/2463209.2488785
- 18. Shafaei, A., Saeedi, M., Pedram, M.: Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures (2013)
- 19. Siraichi, M.Y., dos Santos, V.F., Collange, S., Pereira, F.M.Q.: Qubit allocation (2018)
- 20. Sun, S., Luo, Q.: In-memory subgraph matching: An in-depth study pp. 1083–1098 (06 2020). https://doi.org/10.1145/3318464.3380581
- 21. Tannu, S., Qureshi, M.: Not all qubits are created equal: A case for variability-aware policies for nisq-era quantum computers pp. 987–999 (04 2019). https://doi.org/10.1145/3297858.3304007
- 22. Venturelli, D., do, M., Rieffel, E., Frank, J.: Temporal planning for compilation of quantum approximate optimization circuits pp. 4440–4446 (08 2017). https://doi.org/10.24963/ijcai.2017/620
- 23. Xiangzhen, Z., Li, S., Feng, Y.: Quantum circuit transformation based on simulated annealing and heuristic search. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems **PP**, 1–1 (01 2020). https://doi.org/10.1109/TCAD.2020.2969647
- 24. Zhang, Y., Deng, H., Li, Q., Haoze, S., Nie, L.: Optimizing quantum programs against decoherence: Delaying qubits into quantum superposition pp. 184–191 (07 2019). https://doi.org/10.1109/TASE.2019.000-2

- F. Author et al.
- 25. Zulehner, A., Paler, A., Wille, R.: Efficient mapping of quantum circuits to the ibm qx architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (12 2017). https://doi.org/10.1109/TCAD.2018.2846658

# A Experimental details of the SWAP gates added by the output circuit

| Circuit                           | aubit | CNOT | $QCTS_{num}$         | $OCTS_{I}$ | optm  | wghtgr                                             |
|-----------------------------------|-------|------|----------------------|------------|-------|----------------------------------------------------|
| name                              | no.   | no.  | added                | added      | added |                                                    |
| decod24-enable_126                | 6     | 149  | 28                   | 42         | 60    | 16                                                 |
| 4mod5-v0_19                       | 5     | 16   | 0                    | 0          | 0     | 0                                                  |
| 4mod5-v0 18                       | 5     | 31   | 2                    | 5          | 4     | $\begin{bmatrix} & 0 \\ 4 & \end{bmatrix}$         |
| mod5d2 64                         | 5     | 25   | 5                    | 6          | 8     | 3                                                  |
| 4gt4-v0_72                        | 6     | 113  | 14                   | 10         | 33    | 14                                                 |
| alu-v3_35                         | 5     | 18   | 2                    | 4          | 8     | 2                                                  |
| 4gt4-v0 73                        | 6     | 179  | 27                   | 34         | 76    | 12                                                 |
| alu-v3 34                         | 5     | 24   | 2                    | 3          | 7     | 2                                                  |
| 3 17 13                           | 3     | 17   | 0                    | 0          | 6     | 0                                                  |
| 4gt4-v0_78                        | 6     | 109  | 12                   | 8          | 48    | $\begin{vmatrix} & 0 & 1 \\ 4 & & 1 \end{vmatrix}$ |
| 4gt4-v0_79                        | 6     | 105  | 17                   | 17         | 48    | 3                                                  |
| 4mod7-v1_96                       | 5     | 72   | 16                   | 19         | 27    | $\begin{array}{c c} \hline 7 \end{array}$          |
| mod10_171                         | 5     | 108  | 17                   | 20         | 39    | 9                                                  |
| ex2 227                           | 7     | 275  | 48                   | 59         | 121   | 33                                                 |
| mod10_176                         | 5     | 78   | 14                   | 14         | 38    | 8                                                  |
| 0410184_169                       | 5     | 9    | 2                    | 2          | 49    | 3                                                  |
| 4mod5-v0_20                       | 5     | 10   | 0                    | 0          | 4     |                                                    |
| aj-e11_165                        | 5     | 69   | 8                    | 8          | 33    | 7                                                  |
| alu-v1 28                         | 5     | 18   | $\overset{\circ}{2}$ | 4          | 11    | $\begin{array}{c c} \cdot \\ 2 \end{array}$        |
| 4gt12-v0 86                       | 6     | 116  | 28                   | 33         | 48    | 3                                                  |
| 4gt12-v0 87                       | 6     | 112  | 27                   | 32         | 45    | $\begin{bmatrix} & 0 \\ 2 & \end{bmatrix}$         |
| 4gt12-v0_88                       | 6     | 86   | 5                    | 5          | 25    | 4                                                  |
| alu-v1_29                         | 5     | 17   | 4                    | 4          | 11    | $\begin{vmatrix} 1 & 1 \\ 2 & \end{vmatrix}$       |
| ham7_104                          | 7     | 149  | 28                   | 34         | 68    | 12                                                 |
| C17_204                           | 7     | 205  | 26                   | 53         | 99    | $\frac{1}{22}$                                     |
| xor5 254                          | 6     | 5    | 0                    | 0          | 1     | 0                                                  |
| hwb4 49                           | 5     | 107  | 14                   | 15         | 38    | 11                                                 |
| rd73_140                          | 10    | 104  | 23                   | 26         | 35    | 20                                                 |
| decod24-v0_38                     | 4     | 23   | 0                    | 0          | 6     | 0                                                  |
| rd53_131                          | 7     | 200  | 39                   | 39         | 98    | 24                                                 |
| rd53_133                          | 7     | 256  | 37                   | 47         | 102   | 27                                                 |
| rd53_135                          | 7     | 134  | 28                   | 29         | 38    | 23                                                 |
| decod24-v2_43                     | 4     | 22   | 0                    | 0          | 9     | 0                                                  |
| rd53_138                          | 8     | 60   | 14                   | 16         | 23    | 9                                                  |
| rd32-v0_66                        | 4     | 16   | 0                    | 0          | 6     | 0                                                  |
| 4gt13-v1_93                       | 5     | 30   | 0                    | 0          | 13    | 0                                                  |
| graycode6_47                      | 6     | 5    | 0                    | 0          | 0     | 0                                                  |
| $4 \text{mod} 5\text{-bdd} \_287$ | 7     | 31   | 3                    | 6          | 8     | 6                                                  |
| ham3_102                          | 3     | 11   | 0                    | 0          | 3     | 0                                                  |
| 4gt4-v0_80                        | 6     | 79   | 5                    | 5          | 22    | 5                                                  |
| ex-1_166                          | 3     | 9    | 0                    | 0          | 3     | 0                                                  |
| mod5mils_65                       | 5     | 16   | 0                    | 0          | 6     | 0                                                  |
| 0example                          | 5     | 9    | 1                    | 2          | 3     | 3                                                  |
| alu-v4_36                         | 5     | 51   | 12                   | 8          | 22    | 4                                                  |
| alu-v4_37                         | 5     | 18   | 2                    | 4          | 8     | 2                                                  |
| $ex1_{226}$                       | 6     | 5    | 0                    | 0          | 1     | 0                                                  |
| one-two-three-v0_98               | 5     | 65   | 11                   | 13         | 32    | 10                                                 |
| one-two-three-v0_97               | 5     | 128  | 23                   | 23         | 64    | 16                                                 |
| one-two-three-v3_101              | 5     | 32   | 3                    | 4          | 14    | 3                                                  |
| rd32_270                          | 5     | 36   | 3                    | 3          | 6     | 6                                                  |

 ${\bf Table~3.}$  Comparison of the number of SWAP gates added by the output circuit on the IBM Q20

| Circuit              | ahit | CNOT | $QCTS_{num}$ | OCTC  | on too | olo t om                                           |
|----------------------|------|------|--------------|-------|--------|----------------------------------------------------|
|                      |      |      | !            |       | optm   | wghtgr                                             |
| name                 | no.  | no.  | added        | added | added  |                                                    |
| rd53_130             | 7    | 448  | 89           | 100   | 190    | 49                                                 |
| rd53_251             | 8    | 564  | 104          | 131   | 230    | 45                                                 |
| 4mod5-v1_24          | 5    | 16   | 0            | 0     | 3      | 0                                                  |
| mod5adder_127        | 6    | 239  | 21           | 56    | 111    | 20                                                 |
| 4_49_16              | 5    | 99   | 20           | 17    | 40     | 10                                                 |
| hwb5_53              | 6    | 598  | 141          | 168   | 173    | 59                                                 |
| ex3_229              | 6    | 175  | 10           | 9     | 50     | 11                                                 |
| 4gt10-v1_81          | 5    | 66   | 14           | 15    | 28     | 6                                                  |
| alu-v2_32            | 5    | 72   | 15           | 17    | 27     | 7                                                  |
| alu-v2_31            | 5    | 198  | 42           | 54    | 85     | 13                                                 |
| alu-v2_30            | 6    | 223  | 41           | 45    | 96     | 20                                                 |
| sf_276               | 6    | 336  | 12           | 52    | 138    | 12                                                 |
| $decod24-v1\_41$     | 5    | 38   | 4            | 4     | 14     | 3                                                  |
| sf_274               | 6    | 336  | 34           | 21    | 82     | 12                                                 |
| 4gt4-v1_74           | 6    | 119  | 17           | 24    | 37     | 9                                                  |
| alu-v2_33            | 5    | 17   | 4            | 4     | 8      | 2                                                  |
| cnt3-5_179           | 16   | 85   | 6            | 6     | 35     | 4                                                  |
| 4mod5-v1_22          | 5    | 11   | 0            | 0     | 5      | 0                                                  |
| 4mod5-v1_23          | 5    | 32   | 5            | 5     | 4      | 3                                                  |
| mini_alu_305         | 10   | 77   | 10           | 20    | 28     | 8                                                  |
| alu-v0_26            | 5    | 38   | 7            | 10    | 13     | 3                                                  |
| alu-bdd_288          | 7    | 38   | 4            | 12    | 16     | 6                                                  |
| alu-v0_27            | 5    | 17   | 2            | 4     | 11     | 2                                                  |
| 4gt13_91             | 5    | 49   | 7            | 7     | 10     | 2                                                  |
| 4gt5_77              | 5    | 58   | 12           | 12    | 20     | 6                                                  |
| 4gt13_92             | 5    | 30   | 0            | 0     | 14     | 0                                                  |
| 4gt5_76              | 5    | 46   | 7            | 10    | 24     | 5                                                  |
| 4gt5_75              | 5    | 38   | 5            | 12    | 16     | 4                                                  |
| 4gt12-v1_89          | 6    | 100  | 11           | 21    | 38     | 4                                                  |
| one-two-three-v1_99  | 5    | 59   | 12           | 10    | 26     | 7                                                  |
| 4gt13_90             | 5    | 53   | 7            | 7     | 13     | 3                                                  |
| ising_model_10       | 10   | 90   | 0            | 0     | 5      | 0                                                  |
| 4gt11_84             | 5    | 9    | 0            | 0     | 3      | 0                                                  |
| 4gt11_83             | 5    | 14   | 0            | 0     | 0      | 0                                                  |
| mod5d1_63            | 5    | 13   | 0            | 0     | 1      | 0                                                  |
| 4gt11_82             | 5    | 18   | 1            | 1     | 1      | 1                                                  |
| $decod24-v3\_45$     | 5    | 64   | 15           | 15    | 32     | 8                                                  |
| rd32-v1_68           | 4    | 16   | 0            | 0     | 6      | 0                                                  |
| mini-alu_167         | 5    | 126  | 27           | 27    | 49     | 11                                                 |
| one-two-three-v2 100 | 5    | 32   | 3            | 4     | 8      | 3                                                  |
| 4mod7-v0_94          | 5    | 72   | 8            | 13    | 36     | 9                                                  |
| cm82a_208            | 8    | 283  | 41           | 69    | 84     | 33                                                 |
| mod8-10_178          | 6    | 152  | 5            | 20    | 13     | 7                                                  |
| mod8-10_177          | 6    | 196  | 14           | 33    | 58     | 13                                                 |
| majority_239         | 7    | 267  | 39           | 43    | 105    | 33                                                 |
| miller_11            | 3    | 23   | 0            | 0     | 9      | 0                                                  |
| decod24-bdd_294      | 6    | 32   | 4            | 4     | 9      | $\begin{vmatrix} & 0 & 1 \\ 4 & & 1 \end{vmatrix}$ |
| total                | 551  | 9244 | 1372         | 1738  | 3481   | 800                                                |
|                      | 551  | 0211 | 1012         | 1,00  | 0.101  |                                                    |

**Table 4.** Comparison of the number of SWAP gates added by the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit                      | auhit | CNOT   | OCTS               | OCTS.              | ontm  | wghtgr |
|------------------------------|-------|--------|--------------------|--------------------|-------|--------|
|                              |       |        | $QCTS_{num}$ added | $QCIS_{dep}$ added | added |        |
| name                         | no.   | no.    | 3473               | 4545               |       | added  |
| max46_240<br>rd73_252        | 10    | 2319   | 586                | 761                | -     | -      |
| cycle10_2_110                | 12    | 2648   | 919                | 1216               | 961   | _      |
| sgrt8 260                    | 12    | 1314   | 379                | 492                | 457   |        |
| urf4_187                     | 11    | 224028 | 54785              | 60140              | 457   | -<br>- |
| sqn_258                      | 10    | 4459   | 1199               | 1420               | _     | _      |
| f2 232                       | 8     | 525    | 87                 | 124                | 218   | _      |
| radd_250                     | 13    | 1405   | 386                | 489                | 511   | _      |
| ham15 107                    | 15    | 3858   | 1326               | 1689               | 911   | _      |
| sao2_257                     | 14    | 16864  | 5346               | 7178               | _     | _      |
| sa02_237<br>sym9_148         | 10    | 9408   | 1865               | 2432               | _     | _      |
| urf5_280                     | 9     | 23764  | 6989               | 8730               | _     |        |
| square_root_7                | 15    | 3089   | 812                | 2150               | _     | -      |
| square_root_7<br>sys6-v0_111 | 10    | 98     | 23                 | 26                 | 38    | -      |
| hwb7_59                      | 8     | 10681  | $\frac{23}{2687}$  | $\frac{20}{3551}$  | 3722  | -      |
|                              | 12    |        | 38                 | 55                 | 54    | -      |
| sym9_146                     |       | 148    |                    |                    |       | -      |
| wim_266                      | 11    | 427    | 93                 | 120                | 147   | -      |
| urf2_152                     | 8     | 35210  | 9181               | 11921              | 10577 | -      |
| urf5_159                     | 9     | 71932  | 20258              | 25505              | 2700  | -      |
| urf2_277                     | 8     | 10066  | 2807               | 3798               | 3782  | -      |
| life_238                     | 11    | 9800   | 2762               | 3576               | -     | -      |
| root_255                     | 13    | 7493   | 2128               | 3035               | -     | -      |
| 9symml_195                   | 11    | 15232  | 4553               | 5986               | -     | -      |
| sym10_262                    | 12    | 28084  | 8534               | 11033              | - 971 | -      |
| dc1_220                      | 11    | 833    | 226                | 207                | 371   | -      |
| cm42a_207                    | 14    | 771    | 182                | 229                | 294   | -      |
| rd53_311                     | 13    | 124    | 26                 | 48                 | 47    | -      |
| dc2_222                      | 15    | 4131   | 1383               | 1773               | -     | -      |
| rd84_142                     | 15    | 154    | 49                 | 58                 | 50    | -      |
| sym6_145                     | 7     | 1701   | 317                | 449                | 750   | -      |
| co14_215                     | 15    | 7840   | 3078               | 3819               | -     | -      |
| cnt3-5_180                   | 16    | 215    | 59                 | 74                 | 79    | -      |
| cm152a_212                   | 12    | 532    | 103                | 129                | 168   | -      |
| sym6_316                     | 14    | 123    | 30                 | 39                 | 56    | -      |
| mlp4_245                     | 16    | 8232   | 2780               | 3490               | -     | -      |
| hwb8_113                     | 9     | 30372  | 10749              | 16489              | -     | -      |
| qft_16                       | 16    | 240    | 90                 | 147                | -     | -      |
| plus63mod4096_163            | 13    | 56329  | 19759              | 24273              | -     | -      |
| urf1_149                     | 9     | 80878  | 22551              | 28516              | _     | -      |
| urf3_155                     | 10    | 185276 | 50842              | 62903              | -     | -      |
| urf3_279                     | 10    | 60380  | 17999              | 23318              | -     | -      |
| hwb9_119                     | 10    | 90955  | 22946              | 30031              | -     | -      |
| plus63mod8192_164            | 14    | 81865  | 28022              | 36207              | -     | -      |
| pm1_249                      | 14    | 771    | 182                | 229                | 294   | -      |
| sym9_193                     | 11    | 15232  | 4382               | 5518               | -     | -      |
| misex1_241                   | 15    | 2100   | 480                | 754                | 600   | -      |
| urf1_278                     | 9     | 26692  | 8010               | 10217              | -     | -      |
| squar5_261                   | 13    | 869    | 219                | 313                | 290   | -      |
| ground_state_estimation_10   | 13    | 154209 | 11671              | 22886              | -     | -      |
| adr4_197                     | 13    | 1498   | 516                | 670                | -     | -      |

**Table 5.** Comparison of the number of SWAP gates added by the output circuit on the IBM  $\mathrm{Q}20$ 

## F. Author et al.

| Circuit      | qubit | CNOT  | $QCTS_{num}$ | $QCTS_{dep}$ | optm  | wghtgr |
|--------------|-------|-------|--------------|--------------|-------|--------|
| name         | no.   | no.   | added        | added        | added | added  |
| hwb6_56      | 7     | 2952  | 698          | 933          | 909   | -      |
| clip_206     | 14    | 14772 | 5430         | 6865         | -     | -      |
| $cm85a\_209$ | 14    | 4986  | 2088         | 2225         | -     | -      |
| $rd84\_253$  | 12    | 5960  | 1849         | 2333         | -     | -      |
| dist_223     | 13    | 16624 | 5623         | 7431         | -     | -      |
| $inc_237$    | 16    | 4636  | 1193         | 1667         | -     | -      |
| $qft\_10$    | 10    | 90    | 23           | 34           | 30    | -      |
| urf6_160     | 15    | 75180 | 27524        | 32452        | _     | -      |
| con1_216     | 9     | 415   | 86           | 118          | 177   | -      |

**Table 6.** Comparison of the number of SWAP gates added by the output circuit on the IBM  $\mathrm{Q}20$ 

# B Experimental details of the depth of the output circuit

| Circuit            | aubit | $\overline{CNOT}$ | depths | $QCTS_{num}$ | $QCTS_{dep}$ | optm   |
|--------------------|-------|-------------------|--------|--------------|--------------|--------|
| name               | no.   | no.               | no.    | depths       | depths       | depths |
| decod24-enable_126 | 6     | 149               | 190    | 233          | 275          | 470    |
| 4mod5-v0_19        | 5     | 16                | 21     | 16           | 16           | 21     |
| 4mod5-v0_18        | 5     | 31                | 40     | 37           | 46           | 54     |
| mod5d2_64          | 5     | 25                | 32     | 40           | 43           | 67     |
| 4gt4-v0_72         | 6     | 113               | 137    | 155          | 143          | 297    |
| alu-v3_35          | 5     | 18                | 22     | 24           | 30           | 60     |
| 4gt4-v0_73         | 6     | 179               | 227    | 260          | 281          | 586    |
| alu-v3_34          | 5     | 24                | 30     | 30           | 33           | 63     |
| 3 17 13            | 3     | 17                | 22     | 17           | 17           | 52     |
| 4gt4-v0_78         | 6     | 109               | 137    | 145          | 133          | 352    |
| 4gt4-v0_79         | 6     | 105               | 132    | 156          | 156          | 345    |
| 4mod7-v1_96        | 5     | 72                | 94     | 120          | 129          | 218    |
| mod10_171          | 5     | 108               | 139    | 159          | 168          | 335    |
| ex2 227            | 7     | 275               | 355    | 419          | 452          | 899    |
| mod10_176          | 5     | 78                | 101    | 120          | 120          | 274    |
| cycle10_2_110      | 12    | 2648              | 3386   | 5405         | 6296         | 7467   |
| 0410184 169        | 5     | 9                 | 6      | 15           | 15           | 253    |
| 4mod5-v0_20        | 5     | 10                | 12     | 10           | 10           | 32     |
| sqrt8_260          | 12    | 1314              | 1661   | 2451         | 2790         | 3561   |
| aj-e11_165         | 5     | 69                | 86     | 93           | 93           | 250    |
| alu-v1_28          | 5     | 18                | 22     | 24           | 30           | 70     |
| f2_232             | 8     | 525               | 668    | 786          | 897          | 1672   |
| radd_250           | 13    | 1405              | 1781   | 2563         | 2872         | 3985   |
| 4gt12-v0_86        | 6     | 116               | 135    | 200          | 215          | 334    |
| 4gt12-v0_87        | 6     | 112               | 131    | 193          | 208          | 324    |
| 4gt12-v0_88        | 6     | 86                | 108    | 101          | 101          | 222    |
| alu-v1_29          | 5     | 17                | 22     | 29           | 29           | 64     |
| ham7_104           | 7     | 149               | 185    | 233          | 251          | 491    |
| C17_204            | 7     | 205               | 253    | 283          | 364          | 688    |
| xor5_254           | 6     | 5                 | 5      | 5            | 5            | 10     |
| hwb4_49            | 5     | 107               | 134    | 149          | 152          | 308    |
| rd73_140           | 10    | 104               | 92     | 173          | 182          | 185    |
| decod24-v0_38      | 4     | 23                | 30     | 23           | 23           | 61     |
| rd53 131           | 7     | 200               | 261    | 317          | 317          | 677    |
| rd53_133           | 7     | 256               | 327    | 367          | 397          | 777    |
| rd53_135           | 7     | 134               | 159    | 218          | 221          | 331    |
| sys6-v0_111        | 10    | 98                | 75     | 167          | 176          | 188    |
| decod24-v2_43      | 4     | 22                | 30     | 22           | 22           | 75     |
| hwb7_59            | 8     | 10681             | 13437  | 18742        | 21334        | 29601  |
| rd53_138           | 8     | 60                | 56     | 102          | 108          | 114    |
| rd32-v0_66         | 4     | 16                | 20     | 16           | 16           | 51     |
| sym9_146           | 12    | 148               | 127    | 262          | 313          | 309    |
| 4gt13-v1_93        | 5     | 30                | 39     | 30           | 30           | 102    |
| graycode6_47       | 6     | 5                 | 5      | 5            | 5            | 5      |
| wim_266            | 11    | 427               | 514    | 706          | 787          | 1180   |
| urf2_152           | 8     | 35210             | 44100  | 62753        | 70973        | 90299  |
| urf2_277           | 8     | 10066             | 11390  | 18487        | 21460        | 26548  |
| 4mod5-bdd_287      | 7     | 31                | 41     | 40           | 49           | 71     |
| ham3_102           | 3     | 11                | 13     | 11           | 11           | 28     |
| 4gt4-v0_80         | 6     | 79                | 101    | 94           | 94           | 206    |
| 4g14-VU_0U         | U     | 19                | 101    | 94           | 94           | 200    |

Table 7. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit              | auhit | CNOT | donthe | $QCTS_{num}$            | $QCTS_{dep}$                       | optm   |
|----------------------|-------|------|--------|-------------------------|------------------------------------|--------|
| name                 | no.   | no.  | no.    | $\operatorname{depths}$ | $\frac{QCIS_{dep}}{\text{depths}}$ | depths |
| ex-1 166             | 3     | 9    | 12     | 9                       | 9                                  | 28     |
| mod5mils_65          | 5     | 16   | 21     | 16                      | 16                                 | 52     |
| 0example             | 5     | 9    | 6      | 12                      | 15                                 | 15     |
| alu-v4 36            | 5     | 51   | 66     | 87                      | 75                                 | 170    |
| alu-v4_37            | 5     | 18   | 22     | 24                      | 30                                 | 60     |
| ex1 226              | 6     | 5    | 5      | 5                       | 5                                  | 10     |
| one-two-three-v0_98  | 5     | 65   | 82     | 98                      | 104                                | 234    |
| one-two-three-v0 97  | 5     | 128  | 163    | 197                     | 197                                | 443    |
| one-two-three-v3_101 | 5     | 32   | 40     | 41                      | 44                                 | 95     |
| rd32_270             | 5     | 36   | 47     | 45                      | 45                                 | 76     |
| $dc1_{220}$          | 11    | 833  | 1041   | 1511                    | 1454                               | 2711   |
| rd53_130             | 7     | 448  | 569    | 715                     | 748                                | 1417   |
| rd53_251             | 8     | 564  | 712    | 876                     | 957                                | 1767   |
| $cm42a\_207$         | 14    | 771  | 940    | 1317                    | 1458                               | 2279   |
| rd53_311             | 13    | 124  | 130    | 202                     | 268                                | 300    |
| 4mod5-v1_24          | 5     | 16   | 21     | 16                      | 16                                 | 36     |
| mod5adder_127        | 6     | 239  | 302    | 302                     | 407                                | 817    |
| 4_49_16              | 5     | 99   | 125    | 159                     | 150                                | 320    |
| hwb5_53              | 6     | 598  | 758    | 1021                    | 1102                               | 1560   |
| ex3_229              | 6     | 175  | 226    | 205                     | 202                                | 462    |
| $rd84\_142$          | 15    | 154  | 110    | 301                     | 328                                | 253    |
| 4gt10-v1_81          | 5     | 66   | 84     | 108                     | 111                                | 210    |
| alu-v2_32            | 5     | 72   | 92     | 117                     | 123                                | 215    |
| alu-v2_31            | 5     | 198  | 255    | 324                     | 360                                | 650    |
| alu-v2_30            | 6     | 223  | 285    | 346                     | 358                                | 734    |
| sym6_145             | 7     | 1701 | 2187   | 2652                    | 3048                               | 5716   |
| sf_276               | 6     | 336  | 435    | 372                     | 492                                | 1096   |
| decod24-v1_41        | 5     | 38   | 50     | 50                      | 50                                 | 120    |
| sf_274               | 6     | 336  | 436    | 438                     | 399                                | 822    |
| 4gt4-v1_74           | 6     | 119  | 154    | 170                     | 191                                | 329    |
| alu-v2_33            | 5     | 17   | 22     | 29                      | 29                                 | 59     |
| cnt3-5_180           | 16    | 215  | 209    | 392                     | 437                                | 482    |
| cm152a_212           | 12    | 532  | 684    | 841                     | 919                                | 1423   |
| cnt3-5_179           | 16    | 85   | 61     | 103                     | 103                                | 166    |
| sym6_316             | 14    | 123  | 135    | 213                     | 240                                | 378    |
| 4mod5-v1_22          | 5     | 11   | 12     | 11                      | 11                                 | 37     |
| 4mod5-v1_23          | 5     | 32   | 41     | 47                      | 47                                 | 55     |
| mini_alu_305         | 10    | 77   | 71     | 107                     | 137                                | 187    |
| alu-v0_26            | 5     | 38   | 49     | 59                      | 68                                 | 108    |
| alu-bdd_288          | 7     | 38   | 48     | 50                      | 74                                 | 112    |
| alu-v0_27            | 5     | 17   | 21     | 23                      | 29                                 | 63     |
| 4gt13_91             | 5     | 49   | 61     | 70                      | 70                                 | 108    |
| 4gt5_77              | 5     | 58   | 74     | 94                      | 94                                 | 170    |
| 4gt13_92             | 5     | 30   | 38     | 30                      | 30                                 | 103    |
| 4gt5_76              | 5     | 46   | 56     | 67                      | 76                                 | 171    |
| 4gt5_75              | 5     | 38   | 47     | 53                      | 74                                 | 127    |
| 4gt12-v1_89          | 6     | 100  | 130    | 133                     | 163                                | 313    |
| one-two-three-v1_99  | 5     | 59   | 76     | 95<br>74                | 89                                 | 194    |
| 4gt13_90             | 5     | 53   | 65     | 74                      | 74                                 | 124    |
| pm1_249              | 14    | 771  | 940    | 1317                    | 1458                               | 2279   |

**Table 8.** Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit                    | qubit | CNOT  | depths | $QCTS_{num}$ | $QCTS_{dep}$ | optm   |
|----------------------------|-------|-------|--------|--------------|--------------|--------|
| name                       | no.   | no.   | no.    | depths       | depths       | depths |
| ising_model_10             | 10    | 90    | 52     | 90           | 90           | 107    |
| $misex1\_241$              | 15    | 2100  | 2676   | 3540         | 4362         | 5326   |
| 4gt11_84                   | 5     | 9     | 11     | 9            | 9            | 25     |
| 4gt11_83                   | 5     | 14    | 16     | 14           | 14           | 16     |
| $mod5d1\_63$               | 5     | 13    | 13     | 13           | 13           | 17     |
| 4gt11_82                   | 5     | 18    | 20     | 21           | 21           | 25     |
| squar5_261                 | 13    | 869   | 1051   | 1526         | 1808         | 2309   |
| $decod24-v3\_45$           | 5     | 64    | 84     | 109          | 109          | 244    |
| rd32-v1_68                 | 4     | 16    | 21     | 16           | 16           | 52     |
| $hwb6\_56$                 | 7     | 2952  | 3736   | 5046         | 5751         | 7773   |
| mini-alu_167               | 5     | 126   | 162    | 207          | 207          | 400    |
| one-two-three-v2_100       | 5     | 32    | 40     | 41           | 44           | 80     |
| $4 \mod 7 \text{-v0} - 94$ | 5     | 72    | 92     | 96           | 111          | 270    |
| $cm82a\_208$               | 8     | 283   | 340    | 406          | 490          | 699    |
| mod8-10_178                | 6     | 152   | 193    | 167          | 212          | 243    |
| mod8-10_177                | 6     | 196   | 251    | 238          | 295          | 525    |
| majority_239               | 7     | 267   | 344    | 384          | 396          | 839    |
| qft_10                     | 10    | 90    | 37     | 159          | 192          | 135    |
| miller_11                  | 3     | 23    | 29     | 23           | 23           | 75     |
| $decod24-bdd\_294$         | 6     | 32    | 40     | 44           | 44           | 86     |
| con1_216                   | 9     | 415   | 508    | 673          | 769          | 1197   |
| total                      | 823   | 83416 | 103023 | 145372       | 164848       | 224731 |

Table 9. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit                                      | auhit | CNOT                | donthe | $QCTS_{num}$ | $QCTS_{dep}$ | optm   |
|----------------------------------------------|-------|---------------------|--------|--------------|--------------|--------|
| name                                         | no.   | no.                 | no.    | depths       | depths       | depths |
| max46_240                                    | 10.   | 11844               | 14257  | 22263        | 25479        | черив  |
| rd73 252                                     | 10    | 2319                | 2867   | 4077         | 4602         | _      |
| urf4_187                                     | 11    | 224028              | 264330 | 388383       | 404448       | _      |
| sqn_258                                      | 10    | 4459                | 5458   | 8056         | 8719         | _      |
| ham15_107                                    | 15    | 3858                | 4819   | 7836         | 8925         | _      |
| $\frac{\text{nam15}\_107}{\text{sao2}\_257}$ | 14    | 16864               | 19563  | 32902        | 38398        | _      |
| sao2_237<br>sym9_148                         | 10    | 9408                | 12087  | 15003        | 16704        | _      |
| urf5_280                                     | 9     | 23764               | 27822  | 44731        | 49954        | _      |
| square_root_7                                | 15    | 3089                | 3847   | 5525         | 9539         | _      |
| urf5_159                                     | 9     | 71932               | 89148  | 132706       | 148447       | _      |
| life 238                                     | 11    | 9800                | 12511  | 18086        | 20528        | _      |
| root_255                                     | 13    | 7493                | 8839   | 13877        | 16598        | _      |
| 9symml 195                                   | 11    | 15232               | 19235  | 28891        | 33190        | _      |
| sym10_262                                    | 12    | 28084               | 35572  | 53686        | 61183        | _      |
| $\frac{\text{dc2 } 222}{\text{dc2 } 222}$    | 15    | 4131                | 5242   | 8280         | 9450         | _      |
| co14_215                                     | 15    | 7840                | 8570   | 17074        | 19297        | _      |
| mlp4_245                                     | 16    | 8232                | 10328  | 16572        | 18702        | _      |
| hwb8_113                                     | 9     | 30372               | 38717  | 62619        | 79839        | _      |
| qft_16                                       | 16    | $\frac{30372}{240}$ | 61     | 510          | 681          | _      |
| plus63mod4096_163                            | 13    | 56329               | 72246  | 115606       | 129148       | _      |
| urf1_149                                     | 9     | 80878               | 99586  | 148531       | 166426       | _      |
| urf3_155                                     | 10    | 185276              | 229365 | 337802       | 373985       | _      |
| urf3_279                                     | 10    | 60380               | 70702  | 114377       | 130334       | _      |
| hwb9_119                                     | 10    | 90955               | 116199 | 159793       | 181048       | _      |
| plus63mod8192_164                            | 14    | 81865               | 105142 | 165931       | 190486       | _      |
| sym9_193                                     | 11    | 15232               | 19235  | 28378        | 31786        | _      |
| ising_model_13                               | 13    | 120                 | 46     | 120          | 120          | _      |
| urf1_278                                     | 9     | 26692               | 30955  | 50722        | 57343        | _      |
| ising_model_16                               | 16    | 150                 | 57     | 150          | 150          | _      |
| ground_state_estimation_10                   | 13    | 154209              | 217236 | 189222       | 222867       | _      |
| adr4 197                                     | 13    | 1498                | 1839   | 3046         | 3508         | _      |
| clip 206                                     | 14    | 14772               | 17879  | 31062        | 35367        | _      |
| cm85a_209                                    | 14    | 4986                | 6374   | 11250        | 11661        | _      |
| rd84 253                                     | 12    | 5960                | 7261   | 11507        | 12959        | _      |
| dist_223                                     | 13    | 16624               | 19694  | 33493        | 38917        | _      |
| inc 237                                      | 16    | 4636                | 5864   | 8215         | 9637         | _      |
| urf6_160                                     | 15    | 75180               | 93645  | 157752       | 172536       | _      |
| u110_100                                     | 10    | 10100               | 20040  | 101102       | 112000       | _      |

Table 10. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$