# Quantum Circuit Transformation Based on Subgraph Isomorphism and Tabu Search\*

First Aaaaaaauthor  $^{1[0000-ereer1111-2222-3333]},$  Second Author  $^{2,3[1111-2222-3333-4444]},$  and Third Author  $^{3[2222-3333-4444-5555]}$ 

 $^1$  Princeton University, Princeton NJ 08544, USA  $^2$  Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany  ${\tt lncs@springer.com}$ 

http://www.springer.com/gp/computer-science/lncs

ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany
{abc,lncs}@uni-heidelberg.de

Abstract. How to find an automatic method to map any logic quantum circuits to physical circuits effectively in acceptable time, and make the output circuits add as few additional auxiliary gates as possible. This is crucial in the development of quantum computing. This paper mainly proposes the initial mapping algorithm based on combined subgraph isomorphism algorithm (CSI) and a circuit transformation algorithm based on tabu search (QCTS), and summarizes the advantages and disadvantages of the state-of-the-art circuit transformation algorithm. The experimental results show that our algorithm is feasible. Compared with the initial mapping based on the VF2 algorithm, adding auxiliary gates to our initial mapping reduced by 22.26%, and the depth was reduced by 11.17%. QCTS is scalable on large-scale circuits with less overhead, but other state-of-the-art algorithms are difficult to handle large-scale circuits.

Keywords: Quantum circuit transformation · Subgraph isomorphism · Initial mapping · Tabu search

## 1 Introduction

Since the discovery of quantum mechanics in the early 20th century, quantum technology has been applied in practice, but large quantum computers have not yet been builted. Most of the contributions of quantum information to computer science are still in the theoretical stage. In March 2017, IBM developed the first 5-qubit backend called IBM QX2. In June, it launched the 16-qubit backend called IBM QX3. The revised versions of 5-qubit and 16-qubit are called IBM QX4 and IBM QX5, respectively. IBM Q experience provides the public with free quantum computer resources on the cloud, and opens source the quantum computing software framework  $Qiskit^1$ . If we want to use these quantum computer resources, we must map the logical quantum circuit to a given physical

<sup>\*</sup> Supported by organization x.

circuit and satisfy physical constraints. This requires a set of highly efficient and automatic mapping procedures. Quantum circuit transformation is an important part of quantum circuit compilation. The main idea convert the input logical circuit (LC) into a physical circuit (PC) and satisfy the constraints of the physical constraints.

There are currently five main methods for solving qubit allocation.

- 1. Unitary matrix factorization algorithm. The first mathod used the unitary matrix factorization algorithm to rearrange the quantum circuit from the beginning while retaining the function of the input circuit [8, 15].
- 2. Converting into some existing problems. The second method converted the quantum circuit transformation problem into some existing problems, such as AI planning [23, 3], Integer Linear Programming (ILP) [1], Satisfiability Modulo Theory (SMT) [13], and used the ready-made tools for these problems to find acceptable results. But these methods may run for a long time and can only be applied to a small amount of qubit. And these tools cannot take advantage of some of the properties of quantum mapping.
- 3. Precise methods. The precise method is only suitable for simple quantum structures and cannot be extended to complex quantum structures [20].
- 4. Graph theory. For example, in [18] Shafaei used the minimum linear arrangement problem in graph theory to model the problem of reducing the interaction distance. It divided a given circuit into several sub-circuits, and applied the minimum linear arrangement problem, respectively. Then turned non-adjacent gates in the sub-circuits into adjacent circuits by adding auxiliary gates. Finally, it used the minimum linear permutation problem to find an appropriate permutation, and bubble sort to calculate the number of SWAP gates needed. In [7] and [12], they proposed a two-step approach to reformulate the subtasks of gate scheduling as a graph problem. According to the graph coloring problem and the maximum subgraph isomorphism, the SWAP operations were added to minimize its overhead. Both of them moved a qubit from the initial position to the target position in the best possible path with minimal cost. The former defined a priority to get the initial mapping, and the latter purely solved the problem of position movement. They all divided the swapping of qubits into three categories. The first is a movement that is beneficial to both qubits; the second considers one advantageous, but the other is not mapped; the third is that one is advantageous and the other is harmful. Then they calculate the scores from the initial position to the target position according to the types, and move.
- 5. Heuristic search [26, 4, 9, 24, 20]. The circuit transformation algorithm hopes to find a minimum number of SWAPs. Heuristic search use evaluation function to obtain an acceptable solution in exponential time. Zulehner divided a given circuit into multiple layers, which can be implemented in a CNOT constraint compatible manner [26]. Then, for each of these layers, a respectively compatible mapping is determined, which requires as few additional gates as possible. The main idea determine the cheapest path from the root node to the target node (the path with the lowest cost). Since the search space is

usually exponential. Complex mechanisms are used to keep the paths considered as few as possible. Zhou designed a heuristic search algorithm with a novel selection mechanism, in which in each step of the search process [24]. He did not choose the lowest cost operation to apply, but looked forward one step, and then chooses the best continuous operation. In this way, the algorithm can effectively avoid local minimum. And a pruning mechanism is introduced to reduce the size of the search space and ensure that the program terminates in a reasonable amount of time. The time complexity of this algorithm is  $O(|V|^4)$ . Li proposed a SWAP-based search scheme SABRE [9]. Compared with previous search algorithms based on exhaustive mapping, SABRE achieves an exponential acceleration of search complexity, and ensures the scalability of SABRE to adapt to the large quantum equipment in the NISQ era. By introducing the attenuation effect in the heuristic cost function, different hardware compatible circuits are generated by switching the number of gates in the circuit according to the circuit depth. This makes SABRE suitable for NISQ devices with different characteristics and optimization goals. The routing algorithm implemented in  $t | ket \rangle$  can ensure that any quantum circuit is compiled into any architecture [4]. The algorithm is divided into four stages: decomposing the input circuit into time steps, determining the initial position, routing across time steps, and finally cleaning up. The heuristic method in  $t | ket \rangle$  matches or is better than the results of other circuit transformation systems in terms of depth and total number of gates of the compiled circuit, and the running time is greatly reduced, allowing larger circuits to be routed. Tannu proposed a Variation-aware qubit Movement strategy, which takes advantage of the change in error rate and a change-aware qubit allocation strategy by trying to select the route with the lowest probability of failure [22]. This strategy allocates program qubits to physical qubits to take advantage of SWAPs in the error rate, thereby minimizing the use of links with high error rates.

In general, An initial algorithm can be used to generate an initial mapping. Paler used a heuristic method to find the initial mapping, and uses IBM's compiler to benchmark [16]. The preliminary results show that the cost can be reduced by up to 10% only by placing qubits that are different from the default position (trivial placement) only in the actual circuit instance on the actual NISQ device. In 2018, a novel reverse traversal technique is proposed in [9], which selects the initial mapping method by considering the whole circuit. In [24], an annealing algorithm was proposed to find a favorable initial mapping. The heuristic initial mapping generated by the scheme is unstable and cannot be used in practice. In [10], VF2 subgraph isomorphism algorithm was used to generate an initial mapping. Compared with VF2 mapping, our algorithm based on CSI reduces the number of SWAP gates by 22.29% and the depth by 11.17%.

The biggest problem facing quantum information processing is the problem of quantum decoherence. The entanglement of the quantum system with the surrounding environment and quantum measurement will cause the disappearance of quantum coherence. Since it is now in NISQ era, there are only dozens

#### 4 F. Author et al.

of qubits, and it is unrealistic to realize quantum error correction [17]. Quantum gate operations are limited in physical circuits. They can only be performed between adjacent qubits. Thus it is necessary to convert circuits by adding auxiliary gates to satisfy logical and physical constraints. This process may introduce a lot of errors, which brings a huge challenge to circuit compilation, because noise has a greater impact on the final circuit and may make the result meaningless. The quantum coherence time is very short. The longest coherence time of a superconducting q uantum chip is still within 10us-100us, the time of a single quantum gate is about 20ns, the time of a 2-qubit gate is about 40ns, and the time of a measurement operation is about 300ns-1us. The main contributions of this paper are as follows.

- 1. We summary the current status, problems, and breakthrough directions of quantum mapping work, and point out their advantages and disadvantages.
- 2. We propose an algorithm (CSI) to obtain an initial mapping. The initial mapping of quantum circuits can be reduced to subgraph isomorphism, thus we use a better subgraph matching algorithm to generate part of the initial mapping, and then complete the mapping based on the connectivity between qubits.
- 3. We propose a heuristic SWAP search algorithm based on Tabu Search, which can handle large circuits in a short time at a low cost. Compared with the previous precise search and heuristic algorithms, this algorithm can complete the circuit transformation in a shorter time. QCTS can complete the search of the 159 circuits only with a few minutes, but other heuristic search cannot deal with them in a few minutes. Even some heuristic methods may not handle large circuits.

The rest of this paper is organised as follows. In Section 2 we recall some background of quantum computing and quantum information, and analyze the problems that need to be dealt with for the transformation of quantum circuits in Section 3. Section 4 describes and analyses our algorithm in detail. The experiment and results are reported in Section 5, and last section concludes the paper and discusses future research.

#### 2 Background

This section introduces some notions and notations of quantum computing and quantum information.

## $2.1 \quad Qubits.$

Classical information is stored in bits, while quantum information is stored in qubits. A qubit has two basic states, marked as  $|0\rangle$  or  $|1\rangle$ . A qubit also can be in any linear superposition state  $|\phi\rangle = a\,|0\rangle + b\,|1\rangle$ , where a and b are complex numbers and  $a^2 + b^2 = 1$ . Then  $|\phi\rangle$  is in the state  $|0\rangle$  with the probability  $|a|^2$  or in the state  $|1\rangle$  with the probability of  $|b|^2$ .

#### 2.2 Quantum Gate.

Suppose U is a unitary operator on a single qubit, then there are real numbers  $\alpha, \beta, \delta, \gamma$ . such that U satisfies the following equation,

$$U = e^{i\alpha} R_z(\beta) R_y(\gamma) R_z(\delta) \tag{1}$$

where  $R_z(\beta)$  represents a rotation by  $\beta$  about the  $\hat{z}$  axis, and  $R_y(\gamma)$  represents a rotation by  $\gamma$  about the  $\hat{y}$  axis.

Commonly used quantum gate symbols and their matrices are shown in the Fig. 1. Physical qubit and logical qubit are represented by Q, q, respectively.



Fig. 1. The symbols of common quantum gates and their matrices

#### 2.3 Quantum Circuit.

A quantum logical circuit LC (see Fig. 2) consists of quantum gates interconnected by quantum wires [5]. A quantum wire is a mechanism for moving quantum data from one location to another. Each line represents a qubit, and the gate operation on the line acts on the corresponding qubit. The width w of a circuit refers to the number of qubits in the circuit. The depth d of a circuit refers to the number of layers that can be executed in parallel. The directed acyclic graph (see Fig. 3) of a circuit is obtained by parallelizing and layering the circuit by topological sorting. In this paper, circuits with a depth less than

100 are called small-scale circuits, circuits with a depth greater than 1000 are called large-scale circuits, and the rest are medium-scale circuits. The execution order of a quantum logical circuit graph is from left to right. The depth of the circuit (see Fig. 2) is 6 and the width is 5. It is not necessary to consider single quantum gates in circuit transformation, since the qubit is local [19]. Architecture graph  $\mathcal{AG}_L$  is generated by regarding qubits in LC as nodes V and 2-qubit gates as edges E. Our initial mapping try to finding subgraphs isomorphic to the logical architecture graph (LAG) on the physical architecture graph (PAG).



Fig. 2. Original circuit



Fig. 3. The directed acyclic graph (DAG) of original circuit in Fig. 2

#### 2.4 Architectures

We mainly discusses the physical circuit of IBM. Let  $\mathcal{AG}_{\mathcal{P}} = (V_P, E_P)$  denote the architecture graph of the physical circuit, where  $V_P$  denotes the physical qubit set, and  $E_P$  represents the directed edge that the CNOT gates can execute. Fig. 7 (a) and (b) are PAG of the 5-qubit of IBM QX2, (c) and (d) are PAG of 16-qubit of IBM QX3, and (e) are the PAG of IBM Q20. The arrow in the figure indicates that the qubit at the beginning of the arrow can control the qubit at the end of the arrow, and the 2-qubit gate operations can only be performed



Fig. 4. (a) The architecture graph of original circuit in Fig. 2. (b) The partial architecture graph of IBM Q20



Fig. 5. Transformation of gate direction



Fig. 6. Decomposition of a SWAP gate

between qubits with edges connected. Fig. 4(a) is the logical architecture of the original circuit of Fig. 2, and Fig. 4(b) is the partial architecture graph of IBM Q20.

Lloyd proved that almost all gates of two qubits can be represented by general quantum gates [11]. Before circuit transformation, the circuit is simplified to a circuit with only single quantum gates and CNOT gates [14,2]. LC does not consider physical constraints, any gate operation can be performed between two non-adjacent logical qubits, but IBM physical circuit only supports single quantum gates and CNOT gates between two adjacent qubits.

Generally, mapping logical qubit directly to physical qubit cannot satisfy the limit of logical circuit, and circuit transformation is needed to make quantum gate executable on physical devices. This process is called circuit transformation in this paper. In this paper, we insert the auxiliary gates (SWAP) as shown in Fig. 6, so that two non-adjacent qubits can be logically swap to adjacent positions, or change direction between two adjacent qubits (see Fig. 5). The introduction of auxiliary gates may lead to errors, which may lead to large deviation between the final results and the actual situation. The quantum system is easy to interact with the surrounding environment, resulting in errors. In the NISQ era, quantum error correction is difficult to achieve. Due to the decoher-

ence problem of quantum, the quantum operation needs to be completed in the coherent period, and the time of quantum in the coherent state is very short. Therefore, it is necessary to improve the parallelism of qubits as much as possible to minimize the depth of quantum circuit. This is the focus of this paper. We hope to find a circuit mapping scheme with the minimum number of auxiliary gates and the circuit depth in an acceptable amount of time.



Fig. 7. IBM QX architectures

#### 3 Problem Analysis

**Problem in qubit Mapping** A quantum circuit transformation problem mainly includes the following four steps, among which the third step of *qubit assignment* and the fourth step of qubit routing problem are both NPC [20]

- 1. Convert a quantum program into a logical quantum circuit.
- 2. Decompose the circuit into basic gates. Single qubit gates and CNOT gates are used as basic gates, because they are commonly used to implement any quantum circuit and are supported by the IBM QX architecture.
- 3. Generate a high-quality initial mapping. We use the subgraph isomorphism algorithm to get a logical architecture graph LAG that satisfies the constraints of physical architecture graph (PAG) as much as possible. We try to satisfy the requirements of as many nodes as possible, since most LAG cannot find a perfect mapping on the PAG. It will change the original mapping relations  $\tau$  by adding SWAP gates,  $\tau:q\to Q$ . q, Q are the sets of logical qubits and physical qubits, respectively.  $|Q|\geq |q|$ , so it is not a bijective function. Two mappings are equal  $\tau(q_i)=\tau(q_i)$ , if and only if i=j.

4. The task of modifying the circuit to match the memory layout of a particular quantum computer is called the qubit routing problem [4]. Since quantum algorithms are usually designed without referring to the connectivity constraints of any specific hardware, routing problems need to be solved before the implementation of quantum circuits. Therefore, qubit routing forms a necessary stage of any compiler for quantum software.

Given the logical circuit LC, physical structure  $\mathcal{AG}_P$ , and an initial mapping  $\tau$ , CNOT gate  $g = \langle q_i, q_j \rangle$ , if gate G is executable, then  $\langle \tau(q_i), \tau(q_j) \rangle$  is a directed edge on  $\mathcal{AG}_P$ .

Example 1. Fig. 4 (a) is the logical structure of Fig. 2, Fig. 4 (b) is the partial architecture graph of IBM Q20, an initial mapping is  $\tau = \{q_0 \to Q_{10}, q_1 \to Q_0, q_2 \to Q_6, q_3 \to Q_5, q_4 \to Q_{11}\}$ .  $g_0 = \langle q_2, q_1 \rangle$  is executable, since  $\langle \tau(q_2), \tau(q_2) \rangle = \langle Q_5, Q_0 \rangle$  exists in  $\mathcal{AG}_P$ . But  $g_3 = \langle q_1, q_3 \rangle$  is not executable, since  $\langle \tau(q_1), \tau(q_3) \rangle = \langle Q_0, Q_6 \rangle$  does not exist in  $\mathcal{AG}_P$ .

#### 4 Solution

The solution proposed in this paper mainly includes preprocessing, initial mapping, and minimum SWAP algorithm based on Tabu Search and  $A^*$  algorithm.

#### 4.1 Preprocessing

Before the transformation of the SWAP circuit based on Tabu Search, we need to preprocess it to get more convenient data to shorten our search time and space. In the preprocessing stage, we first adjust the circuit of the input openQASM program to shorten the life cycle of qubits, convert the openQASM code into a layered form, and generate DAG and LAG. Then we read PAG into the memory in the specified format, then use BFS to calculate the shortest distance between each node on the architecture graph.

Circuit Adjustment In order to shorten the life cycle of qubits and improve the parallelism of qubits, we use a layered method [25] to analyze the life cycle of qubits, and pack the operations that can be executed in parallel into a bundle, forming a layered bundle format. A conversion method is designed to use the layered bundle format to determine which operations can be moved, which reduces the life cycle of these qubits. The algorithm reduces the error rate of quantum programs by 11%. On most quantum workloads, the longest qubit lifetime and the average qubit lifetime can be reduced by more than 20%, and the execution time of some quantum programs can also be reduced.

**Shortest Distance** As long as PAG is determined, the shortest distance between two qubits can be calculated. In this paper, the shortest distance matrix dist[i][j] is calculated by Floyd - Warshall algorithm, which represents the

shortest distance from  $Q_i$  to  $Q_j$ , and the distance of each edge is 1. For IBM QX2, QX3, QX4, QX5, the SWAP operation needs 7 gates (3 CNOT gates and 4 H gates). Only 4 H gates are needed to change directions between two adjacent qubits. For a CNOT gate  $\langle q_i, q_j \rangle$ , Two qubits are mapped to  $Q_i$  and  $Q_j$  respectively, with  $\tau(q_i) = Q_i$ ,  $\tau(q_j) = Q_j$ , then the cost of executing g under the shortest distance path is  $cost_{cnot}(q_i, q_j) = 7 \times (dist[i][j] - 1)$ . If they move to adjacent positions, but there is no edge from  $Q_i$  to  $Q_j$ , they need to add 4 H gates to adjust their directions. Then the cost between them is  $cost_{cnot}(q_i, q_j) = 3 \times (dist[i][j] - 1)$ . In the IBM Q20 structure, all the edges are bidirectional. The SWAP operation requires 3 gates (3 CNOT gates), and there is no need to change the direction. Then the calculation formula for IBM Q20 structure is  $cost_{cnot}(q_i, q_j) = 3 \times (dist[i][j] - 1)$  The time complexity of this step is  $O(N^3)$ .

Example 2. Take the QX5 structure as an example. Suppose there is a CNOT gate  $g = \langle q_i, q_j \rangle$ ,  $q_i$  is mapped to  $Q_1$ ,  $q_j$  is mapped to  $Q_{14}$ , and the shortest distance between them is dist[1][14] = 3. There are three shortest paths to move  $Q_1$  to the adjacent position of  $Q_{14}$ :  $\Pi = \{\pi_0, \pi_1, \pi_2\}$   $\pi_0 = Q_1 \rightarrow Q_2 \rightarrow Q_3 \rightarrow Q_{14}$ ,  $\pi_1 = Q_1 \rightarrow Q_2 \rightarrow Q_{15} \rightarrow Q_{14}$ ,  $\pi_2 = Q_1 \rightarrow Q_0 \rightarrow Q_{15} \rightarrow Q_{14}$ . Their costs are  $cost_{\pi_0} = 18$ ,  $cost_{\pi_1} = 14$ ,  $cost_{\pi_2} = 14$ , respectively.

Circuit Layering The SWAP minimization algorithm based on Tabu Search or the  $A^*$  algorithm traverses each level of quantum gates to search those can be executed in parallel. Thus, we layer the adjusted circuit, traverse the entire program sequentially, and add gates that can be executed in parallel to one layer, otherwise a new layer is added. The CNOT gate is represented by  $\langle q_i, q_j \rangle$ ,  $q_i$  is the control qubit, and  $q_j$  is the target qubit.  $L(LC) = \{\mathcal{L}_0, \mathcal{L}_1, ..., \mathcal{L}_n\}$  represents the layered circuit, and  $\mathcal{L}_i$   $(0 \le i \le n)$  represents a quantum gate set that can be executed in parallel. The quantum gate set separated by the dotted line in Fig. 2 are the following  $\mathcal{L}_0 = \{g_0, g_1\}, \mathcal{L}_1 = \{g_2\}, \mathcal{L}_2 = \{g_3, g_4\}, \mathcal{L}_3 = \{g_5, g_6\}, \mathcal{L}_4 = \{g_7\}, \mathcal{L}_5 = \{g_8\}.$ 

At the same time, we generate circuit architecture graph  $\mathcal{AG}_{\mathcal{L}} = (V_L, E_L)$ , which is an undirected graph.  $V_L$  contains the vertices and the degree of each vertex, and  $E_L$  represents the set of undirected edges that the CNOT gates can execute.

#### 4.2 Initial Mapping

It has been proved that the initial mapping has an important influence on qubit assignment, and the subgraph isomorphism can be reduced to qubit assignment, so we want to use the subgraph isomorphism algorithm to find an initial mapping which is closer to the optimum. In PAG, it is almost impossible to find a subgraph that exactly matches LAG, so we hope to find a partial mapping that can maximize the match. SubgraphCompare [21] compares several state-of-the-art subgraph isomorphism algorithm composition. It shows that using the filtering and sorting ideas of GraphQL algorithm to process candidate nodes, and the

local candidates calculation method LFTJ based on set-intersection to enumerate the results is the best. Since SubgraphCompare used in this paper is only suitable for fully connected subgraph isomorphism, there may be no 2-qubit gate operation between one qubit and other qubits in our circuit. The architecture graph formed in this way cannot use SubgraphCompare to generate part of the initial mapping, because the subgraph matching will first match the node with the largest degree, and we hope to minimize the impact on the logical dependency graph. Therefore, we artificially connect the qubit with degree 0 to the qubit with the largest degree in the logical architecture diagram.

We use SubgraphCompare generate partial mappings denoted by T. LAG  $\mathcal{AG}_{\mathcal{L}}$  and PAG  $\mathcal{AG}_{\mathcal{P}}$  generated by the preprocessing process are regarded as an undirected graph as input, and then the CSI algorithm SubgraphCompare is executed. The output of CSI algorithm is a file containing all the isomorphism processes. Since only a small number of nodes may be matched during the isomorphism process, we finally select only the case with the most isomorphic nodes as the result of CSI. Then we complete the unmapped nodes in the partial mapping based on the connectivity of the nodes or the degree of the nodes. The mapping completion algorithm based on node connectivity is shown in Algorithm 1.

The input of Algorithm 1 is a target graph  $(\mathcal{AG}_P)$ , query graph  $(\mathcal{AG}_L)$ , and the partial mappings T. First, we initialize an empty queue Q, which stores unmatched nodes in the map  $\tau \in T$ . Then it traverses  $\tau$  and adds the unmatched nodes to the queue. For the remaining unmatched points, we try to map them with the nodes that are not matched in the more concentrated area of  $\mathcal{AG}_P$ . Finally, a dense isomorphic subgraph is generated, which can reduce subsequent SWAP operations. We would tried to randomly match the remaining unmatched nodes, but this may lead to mapping to a position far away from other nodes, thus needs to add auxiliary gate operations. In the query graph, if the unmatched point has an edge adjacent to the matched point, it will be matched to its adjacent position first, and if the adjacent position has been matched, it will be matched to the adjacent unmatched node. Finally it gets all the processed candidate initial mappings and outputs them to a file.

Lines 2-7 calculate the maximum number of qubits ml that can be matched in the mapping relations between logical qubits and physical qubits obtained by the SubgraphCompare algorithm. Lines 8-49 complete the logical qubit unmapped nodes in the mapping scheme with the number of matches equal to ml in the mapping relations, and we use the greedy strategy to allocate. In line 11, we initialized an empty queue Q, which stores unmapped logical qubits. In lines 12-18, we traverse the map and add the unmapped qubits to Q. We loop until Q becomes empty, and all logical qubits are mapped to physical qubits. We take out the first element in Q. Lines 21 and 22 are respectively used to get the adjacency matrix of  $\mathcal{AG}_P$  and  $\mathcal{AG}_L$ . Line 23 initializes an empty map tm, and the keys are sorted in descending order. The key consists of the number of nodes that are connected to qId in the adjacency matrix and have been mapped in the current mapping scheme and the nodes constitute a unique key. Lines 25-31

traverse the point m connected to qId in the adjacency matrix. If the node m has not been mapped, the node is stored in tm. Lines 32-47 traverse the tm, select the node with the largest number of connections to qId in the tm, and it has been mapped to the node (tm.firstValue) on PAG. The tId in line 33 is the node with the largest number of qId connections corresponding to the node on PAG. Line 35 remove the object to be matched in the tm from the tm. Lines 36-43 select the node adjacent to the tId in the adjacency matrix of the tId, and map the qId to the node.

Example 3. Following the previous example, we first use CSI algorithm for LAG(see Fig. 4 (a)) and PAG (see Fig. 7 (e)) to obtain the partial mapping result set  $T = \{\tau_0, \tau_1, ..., \tau_n\}$ , and then we use one of the partial mapping as an example  $\tau_i = \{q_0 \to Q_{10}, q_1 \to -1, q_2 \to Q_6, q_3 \to Q_5, q_4 \to Q_{11}\}, \ 0 \le i < n. \ q_1 \to 0$ -1 means that  $q_1$  is not mapped to the physical structure in the subgraph isomorphism stage, so we need to perform mapping completion. The algorithm 1 completion strategy find the part of the mapping with the maximum mapped nodes in T and perform the mapping completion as the initial mapping. In this example, the maximum number of mapped nodes is four. Next, we demonstrate that  $\tau_i$  is mapped and completed, and the unmapped nodes in  $\tau_i$  are added to the queue  $Q, Q = \{q_1\}$ , and the loop ends until Q is empty. We Take the first element of Q and put it in qId, then get the adjacency matrix of the query graph and the target graph, traverse the node  $q_m$  connected to qId in the adjacency matrix, if the node  $q_m$  is matched, then we put  $q_m$  into the map, The key of the map is the number of connections between  $q_m$  and  $qId + "-" + \tau_i(q_m)$ , the value is  $q_m$ , and the keys are sorted in a descending order. Then we get  $tm = [\{'0 - 5', q_3\}, \{'-1 - 6', q_2\}, \{'-1 - 11', q_4\}, \{'-1 - 10', q_0\}],$  Then traverse tm and take out the  $value = q_3$  of the elements in the tm, calculate the value  $tId = Q_5$  of  $q_m$  in the current mapping  $\tau_i(q_3)$ , and map qId to the node connected to tId and not yet mapped. If the nodes connected to tId have been mapped, the loop continues. In this example, it can be directly mapped to  $Q_0$ . In the end, we get  $\tau_i = \{q_0 \to Q_{10}, q_1 \to 0, q_2 \to Q_6, q_3 \to Q_5, q_4 \to Q_{11}\}.$ 

#### 4.3 Swap Minimization

Tabu Search Tabu Search algorithm [6] is a type of heuristic algorithm. The disadvantage of heuristic algorithm is that it considers that the local optimum as the global optimum. There is a Tabu table in the Tabu Search to avoid falling into the local optimum. The circuit transformation operation in this paper mainly relies on the idea of Tabu Search algorithm, aiming to deal with the large-scale circuits that the current algorithm is difficult to handle, and hope to get a result closer to the optimum solution in a short time. There are mainly the following objects in Tabu Search: neighborhood fields, neighborhood action, Tabu table, candidate set sum, Tabu object, evaluation function, and amnesty rules. All the edges that can be swapped in the current map are the neighborhood fields in Tabu Search. The recently reached state is added to the Tabu table, and objects in the Tabu table will not be searched as much as possible. The Tabu table in this

#### **Algorithm 1:** initial mapping algorithm based on CSI

```
Input: \mathcal{AG}_{\mathcal{L}}: The architecture of logical circuit
    \mathcal{AG}_{\mathcal{P}}: The architecture of physical circuit
    T: A partial mapping set obtained by SubgraphCompare
    Output: result: A collection of mapping relations between \mathcal{AG}_{\mathcal{L}} and \mathcal{AG}_{\mathcal{P}}
 1 Initialize result= \emptyset;
 2 ml \leftarrow 0; The most mapped length in mappings
 з for \tau \in T do
         if ml < \tau.length then
 4
              ml \leftarrow \tau.length;
 5
        end
 6
 7 end
 s for \tau \in T do
        if ml=\tau.length then
              result.add(\tau);
10
              Q \leftarrow \text{initial an empty unmapped node Queue}
11
              i \leftarrow 1;
12
              while i \le \tau.length do
13
                  if \tau/i=-1 then
                       Q \leftarrow i;
15
                   end
16
                  i \leftarrow i + 1;
17
              end
18
              while Q is not empty do
19
                  int qId \leftarrow Q.poll();
20
                   targetAdj \leftarrow \mathcal{AG}_{\mathcal{P}}.adjacencyMatrix();
21
                   queryAdj \leftarrow \mathcal{AG}_{\mathcal{L}}.adjacencyMatrix();
22
                   tm \leftarrow \text{initial an empty map}
23
                   m \leftarrow 1;
24
                   while m \leq queryAdj/qId].length do
25
                        if \tau/m\neq -1 then
26
                             tm \leftarrow tm \cup
27
                             {queryAdj[qId][m] + " - " + \tau[m], m};
28
                        \mathbf{end}
29
30
                        m \leftarrow m + 1;
                   end
31
                   while tm is not empty do
32
                        tId \leftarrow \tau[\text{tm.firstValue}];
33
34
                        k \leftarrow 0;
                        tm \leftarrow tm \backslash tm.first();
35
                        while k < targetAdj/tId/.length do
36
                             if (targetAdj[tId][k] \neq -1 \text{ or } targetAdj[k][tId] \neq -1)
37
                             and not \tau.contains(k) then
38
                                 \tau[qId] \leftarrow k;
39
40
                                 break;
                             end
41
                             k \leftarrow k + 1;
42
43
                        if k \neq targetAdj/tId.length then
44
                            break;
45
                        end
46
                   end
47
              end
48
         end
49
50 end
```

paper is just in line with our parallel thinking. In order to increase the parallelism of quantum circuits, we will give priority to inserting auxiliary operations that can be executed in parallel. We try not to use the recently operated qubits as much as possible, which are added to the Tabu table, in the same time. The candidate set selects several neighborhood objects with the best target value or evaluation value from the neighborhood fields to join the candidate set. It can be obtained by observation that many SWAP operations are meaningless, so in order to save search space, we should perform pruning. Only the swap of edges adjacent to the gate node with at least one edge is meaningful, so our neighborhood fields is the shortest path on PAG of the gates. Their qubits are not adjacent to the current layer, and the edges on these shortest paths are all part of the neighborhood fields. The Tabu object is the object in the Tabu table. The evaluation function selects a SWAP evaluation formula from the candidate set, generally taking the objective function as the evaluation function. The evaluation function satisfy some gate mapping operations, and the number of SWAP gates added should be small, and the depth of the entire circuit should be small. The amnesty rule is that when all objects in the candidate set are Tabu, or after one object is Tabu, the target value will be greatly reduced. In order to achieve the global optimum, the Tabu object can be added to the candidate set. as shown in Algorithm 3.

The calculation of the neighborhood fields is shown in Algorithm 2. The input are the current circuit mapping parent  $\tau_p$ , physical qubits to logical qubits mapping qubits, logical qubits to physical qubits mapping locations, the current layer list of all gates currentLayers cl, all the gates nextLayer of the next layer nl, and the output is a candidate set of the current mapping, The mapping generated by a transformation, where the physical mapping and the logical mapping are changed synchronously.  $E_w$  is the edge of all the shortest paths in the physical architecture graph of all gates in the current layer. The weight of the edge is the number of times each edge appears in the path. If there are multiple edges reachable in the same shortest distance, we will choose the path with the largest total weight in all the shortest paths, lines 8-18. Lines 22-37 swap all the edges of this path and add them to the candidate set, and calculate the cost of each candidate.

Example 4. Under the mapping  $\tau_i = \{q_0 \to Q_{10}, q_1 \to 0, q_2 \to Q_6, q_3 \to Q_5, q_4 \to Q_{11}\}$ , for  $L_0 = \{g_0, g_1\}$ ,  $dist_{cnot}(g_0) = 7$ ,  $dist_{cnot}(g_1) = 7$ . Gate  $g_1$  can be executed directly under the  $\tau_i$  mapping, so it is directly deleted from  $L_0$ . But  $g_0$  cannot be executed under the mapping  $\tau_i$ , so circuit transformation is required. First, we need to calculate the field, and add the edge of the shortest path of the gate that cannot be executed in  $L_0$  to the number of occurrences of the same edge of  $E_w$  as the weight of the edge. Now the gate that cannot be executed in  $L_0$  is  $g_0$ . Its shortest path is  $T = \{\{Q_6 \to Q_1 \to Q_0\}, \{Q_6 \to Q_5 \to Q_0\}\}$   $E_w = \{\langle Q_6, Q_1 \rangle, \langle Q_1, Q_0 \rangle, \langle Q_6, Q_5 \rangle, \langle Q_5, Q_0 \rangle, \}$ , and then we traverse the shortest path to calculate the path with the highest weight,  $w_{\pi_0} = 2$ ,  $w_{\pi_0} = 2$ . The two weights are equal, so we take the former.

Then the edges on the swap path are added to the candidate set, so the current candidate set is  $\{SWAP(Q_6, Q_1), SWAP(Q_1, Q_0)\}$ .

The circuit mapping algorithm based on Tabu Search takes a layered circuit and an initial mapping as input, and outputs a circuit that can be executed on the specified architecture graph. Algorithm 3 performs a Tabu Search on the gates of each layer of the layered circuit, and obtains the transformed circuit of each layer. The transformed circuit mapping of each layer is used as the initial mapping of the circuit of the next layer. Lines 2 and 3 regard the initial mapping as the best mapping and the current mapping. Lines 5 to 17 loop to check whether the current mapping satisfies the execution of all gates of the current layer. If it is not satisfied or the number of iterations has not reached the set maximum number, the loop will continue, otherwise the program will terminate and the circuit is not found in the physical device on the way of execution. Line 6 get the neighborhood fields of the current mapping, and line 7 find the best mapping in the candidate set. The mapping will first remove the overlapping elements of the candidate set and the Tabu table, and then from the remaining candidates choose the mapping with the lowest cost. Lines 8 to 13 are the amnesty rules. When the best candidate is not found, the candidate set elements are all the same as the Tabu table elements, then the amnesty rule needs to be executed to find the candidate elements. An optimum mapping, as the next initial mapping, select the candidate with the lowest cost in the candidate set. Lines 14-16 update the best mapping and the current mapping with the selected mapping, and add the SWAPs operation performed by the best mapping to the Tabu table, indicating that this SWAPs has just been performed, and the algorithm should try to avoid re-swap the just swapped qubits. Then it will judge whether the program stop condition is reached. The program stopping condition determine whether the number of iterations has reached the maximum number, or the current mapping satisfies the execution of all gates in the current layer. If the stop condition not satisfied, continue to loop.

Example 5. We continue the previous example. Tabu Search requires an initial solution, and then searches based on this solution. Before searching, we need to get a series of initial candidate SWAP sets and select the one with the lower evaluation score as the initial solution. For  $L_0 = \{g_0, g_1\}$ , the initial candidate set is  $\{SWAP(Q_6, Q_1), SWAP(Q_1, Q_0)\}$ .  $cost(SWAP(Q_6, Q_1)) = 3.0$ ,  $cost(SWAP(Q_1, Q_0)) = 3.0$ . The algorithm will choose to swap  $Q_6$  and  $Q_1$ , at this time the mapping becomes  $\tau_i = \{q_0 \to Q_{10}, q_1 \to Q_0, q_2 \to Q_1, q_3 \to Q_5, q_4 \to Q_{11}\}$ . The Tabu Search loops to determine whether it has reached the stop condition, when the current iteration number reaches the limits, or the current mapping satisfies the execution of all gates in  $L_0$ . It can be seen that the current mapping has satisfied the execution of  $g_0$ , thus the search of the current layer is over, and the Tabu Search of the next layer is continued.

**Evaluation function design** The goal of this paper is mainly to consider the depth or the size of the generated circuit, because the current quantum error

## Algorithm 2: Calculate the neighborhood fields

```
Input: p: The current circuit's solution
   dist: The shortest paths of physical architecture
   qubits: The mapping from physical qubits to logical qubits
   locations: The mapping from logical qubits to physical qubits
   cl: Gates included in the current layer of circuits
   nl: Gates included in the next layer of circuits
   Output: results: The set of candidate solution
 1 Initialize results \leftarrow \emptyset;
 2 E_w \leftarrow \text{Calculate the weight of each edge}
 з for l \in cl do
        l_1 \leftarrow locations[l.control];
 4
        l_2 \leftarrow locations[l.target];
 5
        distance \leftarrow dist[l_1][l_2];
 6
        ml \leftarrow 0; sum \leftarrow 0 ; path\_index \leftarrow -1;
 7
        for path \in distance.paths do
 8
             for edge \in path do
                 if E_w.contains(edge) then
10
                     e \leftarrow E_w.findEqualWithEdge(edge);
11
                      sum \leftarrow sum + e.weight;
12
                 end
13
14
             end
             if ml < sum then
15
                 ml \leftarrow sum;
16
                 path\_index \leftarrow l.index;
17
             end
18
19
        if path\_index>=0 then
20
            j \leftarrow 1;
21
             while j < distance.paths/path\_index/.length do
22
                 newQubits \leftarrow qubits;
23
24
                 newLocations \leftarrow locations;
25
                 q_1 \leftarrow newQubits[distance.paths[path\_index]][j].source;
                 q_2 \leftarrow newQubits[distance.paths[path\_index]][j].target;
26
                 if q_1 \neq -1 then
27
                    newLocations[q_1] \leftarrow q_2;
28
                 end
29
                 if q_2 \neq -1 then
30
                     newLocations[q_2] \leftarrow q_1;
31
                 end
32
33
                 s \leftarrow \emptyset;
                 s.swaps \leftarrow p.swaps \cup \{distance.paths[path\_index][j]\};
34
                 s.value \leftarrow computeEvaluateValue(dist, newLocations, cl, p);
35
                 results \leftarrow results \cup s;
36
37
                 j \leftarrow j + 1;
             end
38
        end
39
        return results;
40
41 end
```

#### Algorithm 3: Tabu Search

```
Input: iniS: The initial state
   tl: Tabu list
   Output: bestS: The final state and SWAPs
 1 Initialize bestS \leftarrow iniS;
 2 \ currS \leftarrow iniS;
 siter \leftarrow 1;
   while not \ mustStop(++iter, \ bestS) do
 4
                                        /* candidate set */
        C \leftarrow currS.neighbors;
 5
        bestN \leftarrow findBestNeighbor(C, tl);
 6
        if bestN is empty then
            if C = NULL then
 8
                break:
 9
            end
10
            bestN \leftarrow findAmnestyNeighbor(C, tl);
11
        end
12
        bestS \leftarrow bestN:
13
        currS \leftarrow bestN:
14
        tl \leftarrow tl \cup \{currS\}
16 end
17 return best S:
```

correction cannot be applied in practice. The longer the quantum execution time, the greater the error introduced, so we want to shorten the life cycle of qubits as much as possible. And in the algorithm based on Tabu Search, there is a Tabu list naturally, which just satisfies our needs. This paper tested two evaluation functions, one using the depth of the generated circuit as the evaluation criterion 5, and the other using the number of gates in the generated circuit as the evaluation criterion 4.

$$cost_{L_i}(SWAP(Q_i, Q_j)) = \sum_{g \in L} (dist[g.control][g.target])$$
 (2)

$$cost_{L_i}(SWAP(Q_i, Q_j)) = Depth(L_i)$$
(3)

 $cost_h(SWAP(Q_i,Q_j))$  represents the cost of executing all gates of the current layer after swapping  $Q_i$ ,  $Q_j$ . We only calculate the depth of the unmapped gates of the after the SWAP operation as in the equation (5) or the distance between the unmapped gates as in the equation (4).

**Look ahead** Since the number of qubits used in current quantum circuits is small, the number of gates in each layer after layering is small. If we only consider the gates of one layer when choosing the swap scheme, the swap scheme selected by the algorithm only satisfies the requirement of the gate execution of the i-th layer, but the output of the i-th layer is used as the input of the (i+1)-th (less than the total number of layers of the circuit) layer. Note that

the swap scheme of the i-th layer will affect the mapping of the (i+1)-th layer, so we add the circuit of the (i+d)-th layer into the consideration. However, in terms of priority, it is necessary to give priority to the execution of the gate set of the i-th layer, so we introduce an attenuation factor  $\delta$ , which controls the influence of the (i+d)-th layer gate set on the circuit swap of the i-th layer. Each time the algorithm is swapped, the gates of the latter layer or several layers are considered together. Experiments show that for d=2,  $\delta=0.9$ , the final effect is the best. Our evaluation function can be rewritten as

$$cost_h(SWAP(Q_i, Q_j)) = \sum_{g \in L_i} (dist[g.control][g.target]) + \delta \times \sum_{j=i}^{i+d} \sum_{g \in L_j} (dist[g.control][g.target])$$

$$(4)$$

$$cost_h(SWAP(Q_i, Q_j)) = Depth(L_i) + \delta \times Depth(\sum_{j=i}^{i+d} L_j).$$
 (5)

Complexity Suppose the given logical circuit structure diagram is  $\mathcal{AG}_{\mathcal{L}} = (V_L, E_L)$ , the physical circuit structure diagram is  $\mathcal{AG}_{\mathcal{P}} = (V_P, E_P)$ , the initial mapping is  $\tau$ , the depth of the circuit is d, the number of qubits is  $V_L$ , TS searches each time the 2-qubit gates on the layer, and performs TS searches at most d times. Starting from the initial mapping, first delete the executable gates of the first layer under the initial mapping, and then add all the shortest paths of the remaining k gates to the candidate set. The shortest path length is  $(|E_P|-1)$ , the candidate set size is  $(|E_P|-1)*k$ . Each SWAP will make the total distance between the gates smaller. In the worst case, the number of SWAPs is  $((|E_P|-1)*k)^{(|E_P|-2)*k}$ , but our selection strategy will make the number of SWAPs significantly reduced. Our time complexity is  $d*((|E_P|-1)*k)^{(|E_P|-2)*k}$ , and the space complexity is the size of our candidate set  $(E_P-1)*k$ .

#### 5 Experiment

The experiment in this paper is performed on a 2.3GHz Linux machine with a memory of 64G. This paper compares CSI algorithm and circuit transformation algorithm based on Tabu Search with the wghtgr in [10] and the heuristic algorithm  $A^*$  in [26].

First, we compared the efficiency of initial mapping on  $\tau_{optm}$  [26],  $\tau_{wghtgraph}$  [10] and  $\tau_{GQLTS}$ . In order to observe the results of these two initial mapping algorithms intuitively, we used the same circuit transformation  $A^*$  algorithm to compare the initial mapping algorithms [26]. We tested 159 circuits. Experiments show that within five minutes  $\tau_{optm}$  can deal with 121 circuits,  $\tau_{wghtgraph}$  can deal with 106 circuits,  $\tau_{GQLTS}$  can deal with 131 circuits. Among them, there can deal with 103 circuits. Comparing  $\tau_{wghtgraph}$  algorithm and  $\tau_{GQLTS}$ 

algorithm, the  $\tau_{wghtgraph}$  algorithm has 21 circuits with fewer SWAPs and 19 circuits with a small depth, and the  $\tau_{GQLTS}$  algorithm has 54 circuits with fewer SWAPs and 60 circuits with a small depth, and they have 25 circuits with equal depth and 29 circuits with equal SWAPs. The SWAPs of the  $\tau_{GQLTS}$  algorithm is relatively reduced by 22.4418%, and the depth is reduced by 11.2482%. Comparing  $\tau_{optm}$  algorithm and  $\tau_{GQLTS}$  algorithm, the  $\tau_{optm}$  algorithm has 1 circuit with fewer SWAPs and 2 circuits with a small depth, and the  $\tau_{GQLTS}$  algorithm has 99 circuits with fewer SWAPs and 98 circuits with a small depth, and they have 4 circuits with equal depth and 4 circuits with equal SWAPs. The SWAPs of the  $\tau_{GQLTS}$  algorithm is relatively reduced by 27.0219%, and the depth is reduced by 14.1242%. As shown in Table 1, there are 104 circuits. Three initial mappings are compared with the depth of the generated circuits under the  $A^*$  algorithm and the number of SWAP gates added  $\tau_{GQLTS}/\tau_{optm}$  calculate the efficiency improvement of the former upon the latter, the formula is  $(n_{optm} - n_{GQLTS})/n_{optm}$ .

|       | $	au_{optm}$ | $	au_{wghtgraph}$ | $	au_{GQLTS}$ | $	au_{GQLTS}/	au_{optm}$ | $\tau_{GQLTS}/\tau_{wghtgraph}$ |
|-------|--------------|-------------------|---------------|--------------------------|---------------------------------|
| depth | 168895       | 163422            | 145040        | 14.1241%                 | 11.2482%                        |
| added | 20439        | 19232             | 14916         | 27.0219%                 | 22.4418%                        |

**Table 1.** Compare  $\tau_{optm}$ ,  $\tau_{wqhtqraph}$ , and  $\tau_{GQLTS}$ 

We compared the use of two indicators (GQLTS<sub>dep</sub> and GQLTS<sub>num</sub>) that prioritize smaller depth and fewer auxiliary gates. The two indicators were used as objective functions, and 159 circuits were tested. The depth of the final circuit obtained by GQLTS<sub>num</sub> is 1.93% smaller than GQLTS<sub>dep</sub> on average, and the number of auxiliary gates added is 4.53% smaller on average. Adding a SWAP gate, the circuit needs to add three CNOT gates, and the depth will be increased by 3. While the number of SWAP gates added is small, the circuit depth is also reduced accordingly. Thus we use SWAP quantity first to give better results.

Finally, we compared the use of GQLTS Search and wgtgraph algorithm. Since the wgtgraph algorithm only uses 2-qubit gates, it is impossible to compare the depth of the generated circuit, So we compared the number of SWAP gates added and compared the time. Since large circuits may not be successfunlly handled for a long time, we consider it meaningless. This paper sets a five-minute timeout period and tested 159 circuits. GQLTS<sub>num</sub> Search only takes 461 seconds, GQLTS<sub>dep</sub> takes 485 seconds, and wgtgraph run 159 circuits in 1908 seconds, but only 98 files get results, 64 of them there are 66 circuits for small circuits to get results, 49 medium circuits only have 35 circuits for results, and no circuit output in 44 large circuits. Although Tabu Search can quickly produce results on large circuits, in contrast, more auxiliary gates are added. In 98 small and medium-sized circuits with the results obtained by wgtgraph, the number of SWAP gates added by wgtgraph is 26.87% less than GQLTS<sub>num</sub> on average, and the number of SWAP gates added by wgtgraph is 24.89% less

#### F. Author et al.

than  $\mathrm{GQLTS}_{dep}$  on average. Tabu Search can quickly output converted circuits on large circuits, but wgtgraph cannot get results in a short time. The detailed results of the circuit comparisons are in the appendix.

| benchmarks    | #circ. | $GQLTS_{num}$ |      | $GQLTS_{dep}$ |      | wgtgraph |      | SABRE  |      |
|---------------|--------|---------------|------|---------------|------|----------|------|--------|------|
| Deliciillarks |        | #succ.        | time | #succ.        | time | #succ.   | time | #succ. | time |
| small         | 66     | 66            | 461  | 66            | 485  | 64       | 1908 |        |      |
| medium        | 49     | 49            | 461  | 49            | 485  | 35       | 1908 |        |      |
| large         | 44     | 44            | 461  | 44            | 485  | 0        | 1908 |        |      |
| total         | 159    | 159           | 461  | 159           | 485  | 98       | 1908 |        |      |

**Table 2.** Compare  $\tau_{optm}$ ,  $\tau_{wghtgraph}$ , and  $\tau_{GQLTS}$ 

#### 6 Conclusion

This paper proposes a heuristic SWAP method GQLTS based on Tabu Search to overcome the shortcomings of previous works, and proposes CSI algorithm to generate high-quality initial mapping. Experimental results show that the initial mapping generated by GQLTS greatly reduces the number of SWAP gates inserted, and achieves multiple optimization goals in the search phase, and results can be obtained in a short time. Most small and medium-sized circuits can be obtained in a few seconds. The result can be obtained within a few minutes even for a large circuit, but the cost of insertion may be equal to or more than wghtgr. We introduce a lookahead plan to make each selected SWAP more in line with the constraints of the back gates. In future work, we will focus on reducing the number of auxiliary gates inserted as much as possible on the basis of increasing speed. In theory, our method is applicable to all NISQ devices, but further experiments are needed. Since our analog circuit ignores the noise generated by the circuit, we may introduce quantum noise to the circuits.

# A Experimental details of the SWAP gates added by the output circuit

| Circuit                               | aubit  | CNOT              | $GQLTS_{num}$                          | COLTS | optm  | wghtgr                                     |
|---------------------------------------|--------|-------------------|----------------------------------------|-------|-------|--------------------------------------------|
| name                                  | no.    | no.               | added                                  | added | added | added                                      |
| decod24-enable 126                    | 6      | 149               | 28                                     | 42    | 60    | 16                                         |
| 4mod5-v0_19                           | 5      | 16                | 0                                      | 0     | 0     | 0                                          |
| 4mod5-v0_18                           | 5      | 31                | $\frac{0}{2}$                          | 5     | 4     | $\begin{bmatrix} 0 \\ 4 \end{bmatrix}$     |
| mod5d2 64                             | 5      | 25                | 5                                      | 6     | 8     | 3                                          |
| 4gt4-v0_72                            | 6      | $\frac{20}{113}$  | 14                                     | 10    | 33    | $\begin{vmatrix} 3 \\ 14 \end{vmatrix}$    |
| alu-v3_35                             | 5      | 18                | 2                                      | 4     | 8     | 2                                          |
| 4gt4-v0_73                            | 6      | 179               | 27                                     | 34    | 76    | $\begin{bmatrix} 2 \\ 12 \end{bmatrix}$    |
| alu-v3_34                             | 5      | 24                | 2                                      | 3     | 70    | $\begin{bmatrix} 12 \\ 2 \end{bmatrix}$    |
| 3_17_13                               | 3      | $\frac{24}{17}$   | $\begin{bmatrix} 2 \\ 0 \end{bmatrix}$ | 0     | 6     | $\begin{bmatrix} 2 \\ 0 \end{bmatrix}$     |
| 4gt4-v0 78                            | 6      | 109               | 12                                     | 8     | 48    | $\begin{bmatrix} 0 \\ 4 \end{bmatrix}$     |
| 4gt4-v0_78<br>4gt4-v0_79              | 6      | 109               | 17                                     | 17    | 48    | $\begin{vmatrix} 4 \\ 3 \end{vmatrix}$     |
| 4gt4-v0_79<br>4mod7-v1_96             | 5      | 72                | 16                                     | 19    | 27    | $\begin{bmatrix} 3 \\ 7 \end{bmatrix}$     |
| mod10_171                             | 5      | 108               | 17                                     | 20    | 39    | $\begin{bmatrix} & ' \\ 9 & \end{bmatrix}$ |
| $\frac{10010_{171}}{\text{ex}2_{27}}$ | 7      | $\frac{108}{275}$ | 48                                     | 59    | 121   |                                            |
|                                       | 5      |                   |                                        |       |       | 33                                         |
| mod10_176<br>0410184_169              | 5<br>5 | 78                | 14                                     | 14    | 38    | 8                                          |
|                                       | _      | 9                 | 2                                      | 2     | 49    | $\begin{bmatrix} 3 \\ 0 \end{bmatrix}$     |
| 4mod5-v0_20                           | 5      | 10                | 0                                      | 0     | 4     | $\begin{bmatrix} 0 \\ 7 \end{bmatrix}$     |
| aj-e11_165                            | 5      | 69                | 8                                      | 8     | 33    | 7                                          |
| alu-v1_28                             | 5      | 18                | 2                                      | 4     | 11    | $\begin{vmatrix} 2 \\ 2 \end{vmatrix}$     |
| 4gt12-v0_86                           | 6      | 116               | 28                                     | 33    | 48    | 3                                          |
| 4gt12-v0_87                           | 6      | 112               | 27                                     | 32    | 45    | $\begin{vmatrix} 2 \\ 4 \end{vmatrix}$     |
| 4gt12-v0_88                           | 6      | 86                | 5                                      | 5     | 25    | 4                                          |
| alu-v1_29                             | 5      | 17                | 4                                      | 4     | 11    | 2                                          |
| ham7_104                              | 7      | 149               | 28                                     | 34    | 68    | 12                                         |
| C17_204                               | 7      | 205               | 26                                     | 53    | 99    | 22                                         |
| xor5_254                              | 6      | 5                 | 0                                      | 0     | 1     | 0                                          |
| hwb4_49                               | 5      | 107               | 14                                     | 15    | 38    | 11                                         |
| rd73_140                              | 10     | 104               | 23                                     | 26    | 35    | 20                                         |
| decod24-v0_38                         | 4      | 23                | 0                                      | 0     | 6     | 0                                          |
| rd53_131                              | 7      | 200               | 39                                     | 39    | 98    | 24                                         |
| rd53_133                              | 7      | 256               | 37                                     | 47    | 102   | 27                                         |
| rd53_135                              | 7      | 134               | 28                                     | 29    | 38    | 23                                         |
| decod24-v2_43                         | 4      | 22                | 0                                      | 0     | 9     | 0                                          |
| rd53_138                              | 8      | 60                | 14                                     | 16    | 23    | 9                                          |
| rd32-v0_66                            | 4      | 16                | 0                                      | 0     | 6     | 0                                          |
| 4gt13-v1_93                           | 5      | 30                | 0                                      | 0     | 13    | 0                                          |
| graycode6_47                          | 6      | 5                 | 0                                      | 0     | 0     | 0                                          |
| 4mod5-bdd_287                         | 7      | 31                | 3                                      | 6     | 8     | 6                                          |
| ham3_102                              | 3      | 11                | 0                                      | 0     | 3     | 0                                          |
| 4gt4-v0_80                            | 6      | 79                | 5                                      | 5     | 22    | 5                                          |
| ex-1_166                              | 3      | 9                 | 0                                      | 0     | 3     | 0                                          |
| mod5mils_65                           | 5      | 16                | 0                                      | 0     | 6     | 0                                          |
| 0example                              | 5      | 9                 | 1                                      | 2     | 3     | 3                                          |
| alu-v4_36                             | 5      | 51                | 12                                     | 8     | 22    | 4                                          |
| alu-v4_37                             | 5      | 18                | 2                                      | 4     | 8     | 2                                          |
| ex1_226                               | 6      | 5                 | 0                                      | 0     | 1     | 0                                          |
| one-two-three-v0_98                   | 5      | 65                | 11                                     | 13    | 32    | 10                                         |
| one-two-three-v0_97                   | 5      | 128               | 23                                     | 23    | 64    | 16                                         |
| one-two-three-v3_101                  | 5      | 32                | 3                                      | 4     | 14    | 3                                          |
| rd32_270                              | 5      | 36                | 3                                      | 3     | 6     | 6                                          |

 ${\bf Table~3.}$  Comparison of the number of SWAP gates added by the output circuit on the IBM Q20

|                      |       |      |               |               | •     |        |
|----------------------|-------|------|---------------|---------------|-------|--------|
| Circuit              | qubit | CNOT | $GQLTS_{num}$ | $GQLTS_{dep}$ | _     | wghtgr |
| name                 | no.   | no.  | added         | added         | added |        |
| rd53_130             | 7     | 448  | 89            | 100           | 190   | 49     |
| rd53_251             | 8     | 564  | 104           | 131           | 230   | 45     |
| 4mod5-v1_24          | 5     | 16   | 0             | 0             | 3     | 0      |
| $mod5adder\_127$     | 6     | 239  | 21            | 56            | 111   | 20     |
| 4_49_16              | 5     | 99   | 20            | 17            | 40    | 10     |
| hwb5_53              | 6     | 598  | 141           | 168           | 173   | 59     |
| ex3_229              | 6     | 175  | 10            | 9             | 50    | 11     |
| 4gt10-v1_81          | 5     | 66   | 14            | 15            | 28    | 6      |
| alu-v2_32            | 5     | 72   | 15            | 17            | 27    | 7      |
| alu-v2_31            | 5     | 198  | 42            | 54            | 85    | 13     |
| alu-v2_30            | 6     | 223  | 41            | 45            | 96    | 20     |
| sf_276               | 6     | 336  | 12            | 52            | 138   | 12     |
| decod24-v1_41        | 5     | 38   | 4             | 4             | 14    | 3      |
| sf_274               | 6     | 336  | 34            | 21            | 82    | 12     |
| 4gt4-v1_74           | 6     | 119  | 17            | 24            | 37    | 9      |
| alu-v2_33            | 5     | 17   | 4             | 4             | 8     | 2      |
| cnt3-5_179           | 16    | 85   | 6             | 6             | 35    | 4      |
| 4mod5-v1_22          | 5     | 11   | 0             | 0             | 5     | 0      |
| 4mod5-v1_23          | 5     | 32   | 5             | 5             | 4     | 3      |
| mini_alu_305         | 10    | 77   | 10            | 20            | 28    | 8      |
| alu-v0_26            | 5     | 38   | 7             | 10            | 13    | 3      |
| alu-bdd_288          | 7     | 38   | 4             | 12            | 16    | 6      |
| alu-v0_27            | 5     | 17   | 2             | 4             | 11    | 2      |
| 4gt13_91             | 5     | 49   | 7             | 7             | 10    | 2      |
| 4gt5_77              | 5     | 58   | 12            | 12            | 20    | 6      |
| 4gt13_92             | 5     | 30   | 0             | 0             | 14    | 0      |
| 4gt5_76              | 5     | 46   | 7             | 10            | 24    | 5      |
| 4gt5_75              | 5     | 38   | 5             | 12            | 16    | 4      |
| 4gt12-v1_89          | 6     | 100  | 11            | 21            | 38    | 4      |
| one-two-three-v1_99  | 5     | 59   | 12            | 10            | 26    | 7      |
| 4gt13_90             | 5     | 53   | 7             | 7             | 13    | 3      |
| ising_model_10       | 10    | 90   | 0             | 0             | 5     | 0      |
| 4gt11_84             | 5     | 9    | 0             | 0             | 3     | 0      |
| 4gt11_83             | 5     | 14   | 0             | 0             | 0     | 0      |
| $mod5d1\_63$         | 5     | 13   | 0             | 0             | 1     | 0      |
| 4gt11_82             | 5     | 18   | 1             | 1             | 1     | 1      |
| $decod24-v3\_45$     | 5     | 64   | 15            | 15            | 32    | 8      |
| rd32-v1_68           | 4     | 16   | 0             | 0             | 6     | 0      |
| mini-alu_167         | 5     | 126  | 27            | 27            | 49    | 11     |
| one-two-three-v2_100 | 5     | 32   | 3             | 4             | 8     | 3      |
| 4mod7-v0_94          | 5     | 72   | 8             | 13            | 36    | 9      |
| $cm82a\_208$         | 8     | 283  | 41            | 69            | 84    | 33     |
| mod8-10_178          | 6     | 152  | 5             | 20            | 13    | 7      |
| mod8-10_177          | 6     | 196  | 14            | 33            | 58    | 13     |
| majority_239         | 7     | 267  | 39            | 43            | 105   | 33     |
| miller_11            | 3     | 23   | 0             | 0             | 9     | 0      |
| $decod24-bdd\_294$   | 6     | 32   | 4             | 4             | 9     | 4      |
| total                | 551   | 9244 | 1372          | 1738          | 3481  | 800    |

**Table 4.** Comparison of the number of SWAP gates added by the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit                    | auhit | CNOT   | $\overline{\mathrm{GQLTS}_{num}}$ | COLTS.        | ontm  | wghtgr |
|----------------------------|-------|--------|-----------------------------------|---------------|-------|--------|
|                            |       |        |                                   |               | added |        |
| name<br>max46_240          | no.   | no.    | added<br>3473                     | added<br>4545 | added | added  |
|                            | 10    | !      |                                   |               | -     | -      |
| rd73_252                   | 10    | 2319   | 586<br>919                        | 761<br>1216   | 061   | -      |
| cycle10_2_110              | 12    | 2648   |                                   |               | 961   | -      |
| sqrt8_260                  | 12    | 1314   | 379                               | 492           | 457   | -      |
| urf4_187                   | 11    | 224028 | 54785                             | 60140         | -     | -      |
| sqn_258                    | 10    | 4459   | 1199                              | 1420          | - 010 | -      |
| f2_232                     | 8     | 525    | 87                                | 124           | 218   | -      |
| radd_250                   | 13    | 1405   | 386                               | 489           | 511   | -      |
| ham15_107                  | 15    | 3858   | 1326                              | 1689          | -     | -      |
| sao2_257                   | 14    | 16864  | 5346                              | 7178          | -     | -      |
| sym9_148                   | 10    | 9408   | 1865                              | 2432          | -     | -      |
| urf5_280                   | 9     | 23764  | 6989                              | 8730          | -     | -      |
| square_root_7              | 15    | 3089   | 812                               | 2150          | -     | -      |
| sys6-v0_111                | 10    | 98     | 23                                | 26            | 38    | -      |
| hwb7_59                    | 8     | 10681  | 2687                              | 3551          | 3722  | -      |
| sym9_146                   | 12    | 148    | 38                                | 55            | 54    | -      |
| wim_266                    | 11    | 427    | 93                                | 120           | 147   | -      |
| urf2_152                   | 8     | 35210  | 9181                              | 11921         | 10577 | -      |
| urf5_159                   | 9     | 71932  | 20258                             | 25505         | -     | -      |
| urf2_277                   | 8     | 10066  | 2807                              | 3798          | 3782  | -      |
| life_238                   | 11    | 9800   | 2762                              | 3576          | -     | -      |
| root_255                   | 13    | 7493   | 2128                              | 3035          | -     | -      |
| 9symml_195                 | 11    | 15232  | 4553                              | 5986          | -     | -      |
| sym10_262                  | 12    | 28084  | 8534                              | 11033         | -     | -      |
| dc1_220                    | 11    | 833    | 226                               | 207           | 371   | -      |
| cm42a_207                  | 14    | 771    | 182                               | 229           | 294   | -      |
| rd53_311                   | 13    | 124    | 26                                | 48            | 47    | -      |
| dc2_222                    | 15    | 4131   | 1383                              | 1773          | -     | -      |
| rd84_142                   | 15    | 154    | 49                                | 58            | 50    | -      |
| sym6_145                   | 7     | 1701   | 317                               | 449           | 750   | -      |
| co14_215                   | 15    | 7840   | 3078                              | 3819          | -     | -      |
| cnt3-5_180                 | 16    | 215    | 59                                | 74            | 79    | -      |
| cm152a_212                 | 12    | 532    | 103                               | 129           | 168   | -      |
| sym6_316                   | 14    | 123    | 30                                | 39            | 56    | -      |
| mlp4_245                   | 16    | 8232   | 2780                              | 3490          | -     | -      |
| hwb8_113                   | 9     | 30372  | 10749                             | 16489         | -     | -      |
| qft_16                     | 16    | 240    | 90                                | 147           | -     | -      |
| plus63mod4096_163          | 13    | 56329  | 19759                             | 24273         | -     | -      |
| urf1_149                   | 9     | 80878  | 22551                             | 28516         | -     | -      |
| urf3_155                   | 10    | 185276 | 50842                             | 62903         | -     | -      |
| urf3_279                   | 10    | 60380  | 17999                             | 23318         | -     | -      |
| hwb9_119                   | 10    | 90955  | 22946                             | 30031         | -     | -      |
| plus63mod8192_164          | 14    | 81865  | 28022                             | 36207         | -     | -      |
| pm1_249                    | 14    | 771    | 182                               | 229           | 294   | -      |
| sym9_193                   | 11    | 15232  | 4382                              | 5518          | -     | -      |
| misex1_241                 | 15    | 2100   | 480                               | 754           | 600   | -      |
| urf1_278                   | 9     | 26692  | 8010                              | 10217         | -     | -      |
| $squar5\_261$              | 13    | 869    | 219                               | 313           | 290   | -      |
| ground_state_estimation_10 | 13    | 154209 | 11671                             | 22886         | -     | -      |
| adr4197                    | 13    | 1498   | 516                               | 670           |       | -      |

 ${\bf Table~5.}$  Comparison of the number of SWAP gates added by the output circuit on the IBM Q20

## F. Author et al.

| Circuit     | qubit | CNOT  | $GQLTS_{num}$ | $\overline{\mathrm{GQLTS}_{dep}}$ | optm  | wghtgr |
|-------------|-------|-------|---------------|-----------------------------------|-------|--------|
| name        | no.   | no.   | added         | added                             | added | added  |
| hwb6_56     | 7     | 2952  | 698           | 933                               | 909   | -      |
| clip_206    | 14    | 14772 | 5430          | 6865                              | -     | -      |
| $cm85a_209$ | 14    | 4986  | 2088          | 2225                              | -     | -      |
| $rd84\_253$ | 12    | 5960  | 1849          | 2333                              | -     | -      |
| dist_223    | 13    | 16624 | 5623          | 7431                              | _     | -      |
| $inc_237$   | 16    | 4636  | 1193          | 1667                              | _     | -      |
| qft_10      | 10    | 90    | 23            | 34                                | 30    | -      |
| urf6_160    | 15    | 75180 | 27524         | 32452                             | -     | -      |
| con1_216    | 9     | 415   | 86            | 118                               | 177   | -      |

**Table 6.** Comparison of the number of SWAP gates added by the output circuit on the IBM  $\mathrm{Q}20$ 

## B Experimental details of the depth of the output circuit

| Circuit                  | qubit | CNOT  | denths | $\overline{\mathrm{GQLTS}_{num}}$ | $\overline{\text{GQLTS}_{dep}}$ | optm   |
|--------------------------|-------|-------|--------|-----------------------------------|---------------------------------|--------|
| name                     | no.   | no.   | no.    | depths                            | depths                          | depths |
| decod24-enable_126       | 6     | 149   | 190    | 233                               | 275                             | 470    |
| 4mod5-v0_19              | 5     | 16    | 21     | 16                                | 16                              | 21     |
| 4mod5-v0_18              | 5     | 31    | 40     | 37                                | 46                              | 54     |
| mod5d2_64                | 5     | 25    | 32     | 40                                | 43                              | 67     |
| 4gt4-v0_72               | 6     | 113   | 137    | 155                               | 143                             | 297    |
| alu-v3_35                | 5     | 18    | 22     | 24                                | 30                              | 60     |
| 4gt4-v0_73               | 6     | 179   | 227    | 260                               | 281                             | 586    |
| alu-v3 34                | 5     | 24    | 30     | 30                                | 33                              | 63     |
| 3_17_13                  | 3     | 17    | 22     | 17                                | 17                              | 52     |
| 4gt4-v0_78               | 6     | 109   | 137    | 145                               | 133                             | 352    |
| 4gt4-v0_79               | 6     | 105   | 132    | 156                               | 156                             | 345    |
| 4mod7-v1_96              | 5     | 72    | 94     | 120                               | 129                             | 218    |
| mod10_171                | 5     | 108   | 139    | 159                               | 168                             | 335    |
| ex2 227                  | 7     | 275   | 355    | 419                               | 452                             | 899    |
| mod10_176                | 5     | 78    | 101    | 120                               | 120                             | 274    |
| cycle10_2_110            | 12    | 2648  | 3386   | 5405                              | 6296                            | 7467   |
| 0410184_169              | 5     | 9     | 6      | 15                                | 15                              | 253    |
| $4 \mod 5 - v0 _20$      | 5     | 10    | 12     | 10                                | 10                              | 32     |
| sqrt8 260                | 12    | 1314  | 1661   | 2451                              | 2790                            | 3561   |
| aj-e11_165               | 5     | 69    | 86     | 93                                | 93                              | 250    |
| alu-v1_28                | 5     | 18    | 22     | 24                                | 30                              | 70     |
| f2_232                   | 8     | 525   | 668    | 786                               | 897                             | 1672   |
| $radd_250$               | 13    | 1405  | 1781   | 2563                              | 2872                            | 3985   |
| 4gt12-v0_86              | 6     | 116   | 135    | 200                               | 215                             | 334    |
| 4gt12-v0_87              | 6     | 112   | 131    | 193                               | 208                             | 324    |
| 4gt12-v0_88              | 6     | 86    | 108    | 101                               | 101                             | 222    |
| alu-v1_29                | 5     | 17    | 22     | 29                                | 29                              | 64     |
| $\mathrm{ham}7\_104$     | 7     | 149   | 185    | 233                               | 251                             | 491    |
| C17_204                  | 7     | 205   | 253    | 283                               | 364                             | 688    |
| $xor5\_254$              | 6     | 5     | 5      | 5                                 | 5                               | 10     |
| hwb4_49                  | 5     | 107   | 134    | 149                               | 152                             | 308    |
| rd73_140                 | 10    | 104   | 92     | 173                               | 182                             | 185    |
| $decod24-v0\_38$         | 4     | 23    | 30     | 23                                | 23                              | 61     |
| rd53_131                 | 7     | 200   | 261    | 317                               | 317                             | 677    |
| rd53_133                 | 7     | 256   | 327    | 367                               | 397                             | 777    |
| $rd53\_135$              | 7     | 134   | 159    | 218                               | 221                             | 331    |
| sys6-v0_111              | 10    | 98    | 75     | 167                               | 176                             | 188    |
| $decod24-v2\_43$         | 4     | 22    | 30     | 22                                | 22                              | 75     |
| hwb7_59                  | 8     | 10681 | 13437  | 18742                             | 21334                           | 29601  |
| rd53_138                 | 8     | 60    | 56     | 102                               | 108                             | 114    |
| rd32-v0_66               | 4     | 16    | 20     | 16                                | 16                              | 51     |
| sym9_146                 | 12    | 148   | 127    | 262                               | 313                             | 309    |
| 4gt13-v1_93              | 5     | 30    | 39     | 30                                | 30                              | 102    |
| $graycode6\_47$          | 6     | 5     | 5      | 5                                 | 5                               | 5      |
| wim_266                  | 11    | 427   | 514    | 706                               | 787                             | 1180   |
| urf2_152                 | 8     | 35210 | 44100  | 62753                             | 70973                           | 90299  |
| urf2_277                 | 8     | 10066 | 11390  | 18487                             | 21460                           | 26548  |
| $4 \mod 5$ - $ \mod 287$ | 7     | 31    | 41     | 40                                | 49                              | 71     |
| ham3_102                 | 3     | 11    | 13     | 11                                | 11                              | 28     |
| 4gt4-v0_80               | 6     | 79    | 101    | 94                                | 94                              | 206    |

Table 7. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

| G: :                 | 1.4 | CNOT | 1 (1 | COLTEC        | COLTE  |        |
|----------------------|-----|------|------|---------------|--------|--------|
| Circuit              | _   |      | _    | $GQLTS_{num}$ |        | optm   |
| name                 | no. | no.  | no.  | depths        | depths | depths |
| ex-1_166             | 3   | 9    | 12   | 9             | 9      | 28     |
| mod5mils_65          | 5   | 16   | 21   | 16            | 16     | 52     |
| 0example             | 5   | 9    | 6    | 12            | 15     | 15     |
| alu-v4_36            | 5   | 51   | 66   | 87            | 75     | 170    |
| alu-v4_37            | 5   | 18   | 22   | 24            | 30     | 60     |
| ex1_226              | 6   | 5    | 5    | 5             | 5      | 10     |
| one-two-three-v0_98  | 5   | 65   | 82   | 98            | 104    | 234    |
| one-two-three-v0_97  | 5   | 128  | 163  | 197           | 197    | 443    |
| one-two-three-v3_101 | 5   | 32   | 40   | 41            | 44     | 95     |
| rd32_270             | 5   | 36   | 47   | 45            | 45     | 76     |
| dc1_220              | 11  | 833  | 1041 | 1511          | 1454   | 2711   |
| rd53_130             | 7   | 448  | 569  | 715           | 748    | 1417   |
| rd53_251             | 8   | 564  | 712  | 876           | 957    | 1767   |
| cm42a_207            | 14  | 771  | 940  | 1317          | 1458   | 2279   |
| rd53_311             | 13  | 124  | 130  | 202           | 268    | 300    |
| 4mod5-v1_24          | 5   | 16   | 21   | 16            | 16     | 36     |
| mod5adder_127        | 6   | 239  | 302  | 302           | 407    | 817    |
| 4_49_16              | 5   | 99   | 125  | 159           | 150    | 320    |
| hwb5_53              | 6   | 598  | 758  | 1021          | 1102   | 1560   |
| ex3_229              | 6   | 175  | 226  | 205           | 202    | 462    |
| rd84_142             | 15  | 154  | 110  | 301           | 328    | 253    |
| 4gt10-v1_81          | 5   | 66   | 84   | 108           | 111    | 210    |
| alu-v2_32            | 5   | 72   | 92   | 117           | 123    | 215    |
| alu-v2_31            | 5   | 198  | 255  | 324           | 360    | 650    |
| alu-v2_30            | 6   | 223  | 285  | 346           | 358    | 734    |
| sym6_145             | 7   | 1701 | 2187 | 2652          | 3048   | 5716   |
| sf_276               | 6   | 336  | 435  | 372           | 492    | 1096   |
| decod24-v1_41        | 5   | 38   | 50   | 50            | 50     | 120    |
| sf_274               | 6   | 336  | 436  | 438           | 399    | 822    |
| 4gt4-v1_74           | 6   | 119  | 154  | 170           | 191    | 329    |
| alu-v2_33            | 5   | 17   | 22   | 29            | 29     | 59     |
| cnt3-5_180           | 16  | 215  | 209  | 392           | 437    | 482    |
| cm152a_212           | 12  | 532  | 684  | 841           | 919    | 1423   |
| cnt3-5_179           | 16  | 85   | 61   | 103           | 103    | 166    |
| $sym6\_316$          | 14  | 123  | 135  | 213           | 240    | 378    |
| 4mod5-v1_22          | 5   | 11   | 12   | 11            | 11     | 37     |
| 4mod5-v1_23          | 5   | 32   | 41   | 47            | 47     | 55     |
| mini_alu_305         | 10  | 77   | 71   | 107           | 137    | 187    |
| alu-v0_26            | 5   | 38   | 49   | 59            | 68     | 108    |
| alu-bdd_288          | 7   | 38   | 48   | 50            | 74     | 112    |
| alu-v0_27            | 5   | 17   | 21   | 23            | 29     | 63     |
| 4gt13_91             | 5   | 49   | 61   | 70            | 70     | 108    |
| 4gt5_77              | 5   | 58   | 74   | 94            | 94     | 170    |
| 4gt13_92             | 5   | 30   | 38   | 30            | 30     | 103    |
| 4gt5_76              | 5   | 46   | 56   | 67            | 76     | 171    |
| 4gt5_75              | 5   | 38   | 47   | 53            | 74     | 127    |
| 4gt12-v1_89          | 6   | 100  | 130  | 133           | 163    | 313    |
| one-two-three-v1_99  | 5   | 59   | 76   | 95            | 89     | 194    |
| 4gt13_90             | 5   | 53   | 65   | 74            | 74     | 124    |
| pm1_249              | 14  | 771  | 940  | 1317          | 1458   | 2279   |

**Table 8.** Comparison of the depth of the output circuit on the IBM Q20

| Circuit              | qubit | CNOT  | depths | $\overline{\mathrm{GQLTS}_{num}}$ | $GQLTS_{dep}$ | optm   |
|----------------------|-------|-------|--------|-----------------------------------|---------------|--------|
| name                 | no.   | no.   | no.    | depths                            | depths        | depths |
| ising_model_10       | 10    | 90    | 52     | 90                                | 90            | 107    |
| $misex1\_241$        | 15    | 2100  | 2676   | 3540                              | 4362          | 5326   |
| 4gt11_84             | 5     | 9     | 11     | 9                                 | 9             | 25     |
| 4gt11_83             | 5     | 14    | 16     | 14                                | 14            | 16     |
| $mod5d1\_63$         | 5     | 13    | 13     | 13                                | 13            | 17     |
| 4gt11_82             | 5     | 18    | 20     | 21                                | 21            | 25     |
| $squar5\_261$        | 13    | 869   | 1051   | 1526                              | 1808          | 2309   |
| $decod24-v3\_45$     | 5     | 64    | 84     | 109                               | 109           | 244    |
| rd32-v1_68           | 4     | 16    | 21     | 16                                | 16            | 52     |
| $hwb6\_56$           | 7     | 2952  | 3736   | 5046                              | 5751          | 7773   |
| mini-alu_167         | 5     | 126   | 162    | 207                               | 207           | 400    |
| one-two-three-v2_100 | 5     | 32    | 40     | 41                                | 44            | 80     |
| 4mod7-v0_94          | 5     | 72    | 92     | 96                                | 111           | 270    |
| $cm82a\_208$         | 8     | 283   | 340    | 406                               | 490           | 699    |
| mod8-10_178          | 6     | 152   | 193    | 167                               | 212           | 243    |
| mod8-10_177          | 6     | 196   | 251    | 238                               | 295           | 525    |
| majority_239         | 7     | 267   | 344    | 384                               | 396           | 839    |
| qft_10               | 10    | 90    | 37     | 159                               | 192           | 135    |
| miller_11            | 3     | 23    | 29     | 23                                | 23            | 75     |
| $decod24-bdd\_294$   | 6     | 32    | 40     | 44                                | 44            | 86     |
| con1_216             | 9     | 415   | 508    | 673                               | 769           | 1197   |
| total                | 823   | 83416 | 103023 | 145372                            | 164848        | 224731 |

Table 9. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

| Circuit                    | gubit | CNOT   | depths | $\overline{\mathrm{GQLTS}_{num}}$ | $\overline{\mathrm{GQLTS}_{den}}$ | optm   |
|----------------------------|-------|--------|--------|-----------------------------------|-----------------------------------|--------|
| name                       | no.   | no.    | no.    | depths                            | depths                            | depths |
| max46 240                  | 10    | 11844  | 14257  | 22263                             | 25479                             | -      |
| $rd73 \overline{252}$      | 10    | 2319   | 2867   | 4077                              | 4602                              | _      |
| urf4_187                   | 11    | 224028 | 264330 | 388383                            | 404448                            | _      |
| sqn_258                    | 10    | 4459   | 5458   | 8056                              | 8719                              | _      |
| ham15_107                  | 15    | 3858   | 4819   | 7836                              | 8925                              | _      |
| $sao2\_257$                | 14    | 16864  | 19563  | 32902                             | 38398                             | -      |
| sym9_148                   | 10    | 9408   | 12087  | 15003                             | 16704                             | -      |
| urf5_280                   | 9     | 23764  | 27822  | 44731                             | 49954                             | _      |
| square_root_7              | 15    | 3089   | 3847   | 5525                              | 9539                              | -      |
| urf5_159                   | 9     | 71932  | 89148  | 132706                            | 148447                            | -      |
| life_238                   | 11    | 9800   | 12511  | 18086                             | 20528                             | -      |
| $root\_255$                | 13    | 7493   | 8839   | 13877                             | 16598                             | -      |
| 9symml_195                 | 11    | 15232  | 19235  | 28891                             | 33190                             | -      |
| $sym10\_262$               | 12    | 28084  | 35572  | 53686                             | 61183                             | -      |
| $dc2\_222$                 | 15    | 4131   | 5242   | 8280                              | 9450                              | -      |
| co14_215                   | 15    | 7840   | 8570   | 17074                             | 19297                             | -      |
| $mlp4\_245$                | 16    | 8232   | 10328  | 16572                             | 18702                             | -      |
| hwb8_113                   | 9     | 30372  | 38717  | 62619                             | 79839                             | -      |
| qft_16                     | 16    | 240    | 61     | 510                               | 681                               | -      |
| plus63mod4096_163          | 13    | 56329  | 72246  | 115606                            | 129148                            | -      |
| urf1_149                   | 9     | 80878  | 99586  | 148531                            | 166426                            | -      |
| urf3_155                   | 10    | 185276 | 229365 | 337802                            | 373985                            | -      |
| urf3_279                   | 10    | 60380  | 70702  | 114377                            | 130334                            | -      |
| hwb9_119                   | 10    | 90955  | 116199 | 159793                            | 181048                            | -      |
| plus63mod8192_164          | 14    | 81865  | 105142 | 165931                            | 190486                            | -      |
| sym9_193                   | 11    | 15232  | 19235  | 28378                             | 31786                             | -      |
| ising_model_13             | 13    | 120    | 46     | 120                               | 120                               | -      |
| urf1_278                   | 9     | 26692  | 30955  | 50722                             | 57343                             | -      |
| ising_model_16             | 16    | 150    | 57     | 150                               | 150                               | -      |
| ground_state_estimation_10 | 13    | 154209 | 217236 | 189222                            | 222867                            | -      |
| adr4_197                   | 13    | 1498   | 1839   | 3046                              | 3508                              | -      |
| clip_206                   | 14    | 14772  | 17879  | 31062                             | 35367                             | -      |
| $cm85a\_209$               | 14    | 4986   | 6374   | 11250                             | 11661                             | -      |
| $rd84\_253$                | 12    | 5960   | 7261   | 11507                             | 12959                             | -      |
| $dist\_223$                | 13    | 16624  | 19694  | 33493                             | 38917                             | -      |
| $inc\_237$                 | 16    | 4636   | 5864   | 8215                              | 9637                              | -      |
| urf6160                    | 15    | 75180  | 93645  | 157752                            | 172536                            | -      |

Table 10. Comparison of the depth of the output circuit on the IBM  $\mathrm{Q}20$ 

## References

1. Almeida, A., Dueck, G., Silva, A.: Finding optimal qubit permutations for ibm's quantum computer architectures pp. 1–6 (08 2019). https://doi.org/10.1145/3338852.3339829

- Barenco, A., Bennett, C., Cleve, R., DiVincenzo, D., Margolus, N., Shor, P., Sleator, T., Smolin, J., Weinfurter, H.: Elementary gates for quantum computation. Physical Review A 52 (03 1995). https://doi.org/10.1103/PhysRevA.52.3457
- 3. Bernal, D., Booth, K., Dridi, R., Alghassi, H., Tayur, S., Venturelli, D.: Integer programming techniques for minor-embedding in quantum annealers (12 2019)
- 4. Cowtan, A., Dilkes, S., Duncan, R., Krajenbrink, A., Simmons, W., Sivarajah, S.: On the qubit routing problem (02 2019)
- 5. Daei, O., Navi, K., Zomorodi, M.: Optimized quantum circuit partitioning (05 2020)
- Glover, F.: Tabu search—part ii. ORSA Journal on Computing 2, 4–32 (02 1990). https://doi.org/10.1287/ijoc.2.1.4
- 7. Guerreschi, G.G., Park, J.: Two-step approach to scheduling quantum circuits. Quantum Science and Technology  $\bf 3$  (06 2018). https://doi.org/10.1088/2058-9565/aacf0b
- 8. Kissinger, A., Meijer, A.: Cnot circuit extraction for topologically-constrained quantum memories (04 2019)
- 9. Li, G., Ding, Y., Xie, Y.: Tackling the qubit mapping problem for nisq-era quantum devices (09 2018)
- 10. Li, S., Zhou, X., Feng, Y.: Qubit mapping based on subgraph isomorphism and filtered depth-limited search (2020)
- 11. Lloyd, Seth: Almost any quantum logic gate is universal. Physical Review Letters **75**(2), 346 (1995)
- Matsuo, A., Yamashita, S.: An efficient method for quantum circuit placement problem on a 2-d grid pp. 162–168 (05 2019). https://doi.org/10.1007/978-3-030-21500-2 10
- Murali, P., Linke, N., Martonosi, M., Abhari, A., Nguyen, N., Huerta Alderete, C.: Full-stack, real-system quantum computer studies: architectural comparisons and design insights pp. 527–540 (06 2019). https://doi.org/10.1145/3307650.3322273
- 14. Möttönen, M., Vartiainen, J.: Decompositions of general quantum gates. Frontiers in Artificial Intelligence and Applications (05 2005)
- 15. Nash, B., Gheorghiu, V., Mosca, M.: Quantum circuit optimizations for nisq architectures. Quantum Science and Technology **5** (02 2020). https://doi.org/10.1088/2058-9565/ab79b1
- 16. Paler, A.: On the influence of initial qubit placement during nisq circuit compilation  $(11\ 2018)$
- 17. Preskill, J.: Quantum computing in the nisq era and beyond. Quantum 2 (2018)
- Shafaei, A., Saeedi, M., Pedram, M.: Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures. Proceedings - Design Automation Conference pp. 1–6 (05 2013). https://doi.org/10.1145/2463209.2488785
- 19. Shafaei, A., Saeedi, M., Pedram, M.: Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures (2013)
- Siraichi, M.Y., dos Santos, V.F., Collange, S., Pereira, F.M.Q.: Qubit allocation (2018)
- 21. Sun, S., Luo, Q.: In-memory subgraph matching: An in-depth study pp. 1083–1098 (06 2020). https://doi.org/10.1145/3318464.3380581
- 22. Tannu, S., Qureshi, M.: Not all qubits are created equal: A case for variability-aware policies for nisq-era quantum computers pp. 987–999 (04 2019). https://doi.org/10.1145/3297858.3304007
- 23. Venturelli, D., do, M., Rieffel, E., Frank, J.: Temporal planning for compilation of quantum approximate optimization circuits pp. 4440–4446 (08 2017). https://doi.org/10.24963/ijcai.2017/620

- 24. Xiangzhen, Z., Li, S., Feng, Y.: Quantum circuit transformation based on simulated annealing and heuristic search. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems **PP**, 1–1 (01 2020). https://doi.org/10.1109/TCAD.2020.2969647
- 25. Zhang, Y., Deng, H., Li, Q., Haoze, S., Nie, L.: Optimizing quantum programs against decoherence: Delaying qubits into quantum superposition pp. 184–191 (07 2019). https://doi.org/10.1109/TASE.2019.000-2
- 26. Zulehner, A., Paler, A., Wille, R.: Efficient mapping of quantum circuits to the ibm qx architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (12 2017). https://doi.org/10.1109/TCAD.2018.2846658