# BQSKit Tutorial at IEEE Quantum Week

`$ pip install bqskit`

This tutorial will cover synthesis and circuit optimization techniques with BQSKit.

With bottom-up synthesis, I have discovered an important philosophy that will repeatedly come up in this tutorial:

**Synthesis tools that quickly explore deeper circuits will produces longer circuits faster**.

## Search-Based Synthesis

BQSKit implements both the QSearch and LEAP algorithms and are well integrated into the BQSKit framework. These implementations are fully portable and topology-aware.

1) Let's start with a simple example using QSearch off-the-shelf to synthesize a toffoli circuit:

In [None]:
from bqskit import Circuit
from bqskit import Compiler
from bqskit import UnitaryMatrix
from bqskit import CompilationTask
from bqskit.passes import QSearchSynthesisPass

# Encode the toffoli unitary
toffoli_unitary = UnitaryMatrix([
    [1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 1, 0],
])

# Inputs to the BQSKit compiler must be a circuit
# So first we place the toffoli unitary into a circuit
# Note: NumPy arrays are interchangable with UnitaryMatrix objects
circuit = Circuit.from_unitary(toffoli_unitary)

# We must now tell the compiler that we would like to execute QSearch
# This is done by creating a CompilationTask with the circuit as the input
# and only a QSearchSynthesisPass object in the workflow.
task = CompilationTask(circuit, [QSearchSynthesisPass()])

# Finally, we construct a compiler and submit the task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
print(synthesized_circuit)

In [None]:
# We can print some information about the circuit

# Display the length of the critical path in the circuit
print("Critial Path Length:", synthesized_circuit.depth)

# Display the parallelism in the circuit
print("Parallelism:", synthesized_circuit.parallelism)

# Display the total number of gates
print("Total Number of Operations:", synthesized_circuit.num_operations)

# As well as specific numbers
from bqskit.ir.gates import CNOTGate, U3Gate
print("CNOT Count:", synthesized_circuit.count(CNOTGate()))
print("U3 Count:", synthesized_circuit.count(U3Gate()))

# We can even see the logical connectivity of the circuit
print("Interacting Qubit Pairs:", synthesized_circuit.coupling_graph)

### Accuracy

BQSKit's algorithms are all approximate, meaning that the results might be slightly different than the inputs. This is controllable and most of the time is at the floating-point level. However, we should always check our results when possible:

In [None]:
out_utry = synthesized_circuit.get_unitary()
out_utry.get_distance_from(toffoli_unitary)

This function uses a metric based on the Hilbert-Schmidt inner product:

$$| U_T - U_C | = \sqrt{1 - \frac{|Tr(U_T^\dagger U_C)|}{2^n}^2}$$

This distance is global phase agnostic meaning that $U_T$ and $U_C$ are considered equal if they differ by only a global phase.

**Change Target Accuracy**
If the degree of approximation is too large you can adjust the `success_threshold` parameter in QSearch. This parameter is used to determine convergence of the algorithm. If the unitary implemented by the circuit differs from the target unitary by less than this number, then the algorithm claims success. You may also want to adjust the parameter to allow for more error in the hopes of finding a shorter circuit. This will all depend on your environment and goals.


**Exercise:** Try changing the success threshold in the following example. What do you expect will happen?

In [None]:
# We can increase the success threshold allowing for greater error in the result
# Note 1e-1 is a very high error
configured_qsearch_pass = QSearchSynthesisPass(success_threshold=1e-1)

# Create and execute a compilation task
with Compiler() as compiler:
    task = CompilationTask(circuit, [configured_qsearch_pass])
    synthesized_circuit = compiler.compile(task)

print("New Circuit CNOT Count:", synthesized_circuit.count(CNOTGate()))
out_utry = synthesized_circuit.get_unitary()
print("New Circuit Error:", out_utry.get_distance_from(toffoli_unitary))

### Gatesets and Layer Generation

By default, the QSearch algorithm uses `U3Gate`s and `CNOTGate`s:

In [None]:
synthesized_circuit.gate_set

This can be changed by configuring the way QSearch generates layers in the search tree. These options are entirely modular and fully customizable. QSearch uses a LayerGenerator object to generate the candidate circuits that make up its search space. By default, a layer generator that places CNOT and U3 gate blocks is used. We can easily swap out the single- and two- qubit gates used in the same layer generation algorithm, or we can write our own algorithm entirely. This can be done by subclassing `LayerGenerator` and implementing the API.

**Exercise:** Modify the below code to produce a toffoli circuit using the gates of your choice. You can find the gates currently implemented in BQSKit [here](https://bqskit.readthedocs.io/en/latest/source/ir.html#bqskit-gates-bqskit-ir-gates) and you can learn how to implement your own [here](https://bqskit.readthedocs.io/en/latest/tutorials/Introduction%20to%20BQSKit%20IR.html). Which gate set produces the best result?

In [None]:
from bqskit.passes.search import SimpleLayerGenerator
from bqskit.ir.gates import ISwapGate, PauliGate

# We can use the same layer generation algorithm
# and just change the gates used
layer_gen = SimpleLayerGenerator(two_qudit_gate=ISwapGate(), single_qudit_gate_1=U3Gate())

configured_qsearch_pass = QSearchSynthesisPass(layer_generator=layer_gen)

# Create and execute a compilation task
with Compiler() as compiler:
    task = CompilationTask(circuit, [configured_qsearch_pass])
    synthesized_circuit = compiler.compile(task)

for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

#### Custom Layer Generation (Optional)

We can customize the generation of layers even further. The `SimpleLayerGenerator` places building blocks composed of two-qudit gates followed by single-qudit gates, which has a history in practice. However, this entire flow is modular. The `LayerGenerator` API requires us to implement two methods `gen_initial_layer` and `gen_successors`. In the below example, we do something a bit more unnatural.

For more information, see the API Documentation [here](https://bqskit.readthedocs.io/en/latest/source/autogen/bqskit.passes.LayerGenerator.html).

**Exercise** Alter the below example to generate layers however you would like. Consider customizing the gate set, how successors are generated, and how you would implement topology-awareness here.

In [None]:
from bqskit.ir.gates import RXXGate, RYYGate, RZZGate, U3Gate
from bqskit.passes.search import LayerGenerator

class CustomLayerGenerator(LayerGenerator):
    
    def gen_initial_layer(self, target, data):
        """
        Here we will generate the first circuit that seeds the search space.
        
        By default, the SimpleLayerGenerator places single-qudit gates
        on each qudit. Here let's do something a little more crazy
        to demonstrate the potential.
        """
        
        init_circuit = Circuit(target.num_qudits, target.radixes)
        
        # Place RXX Gates on consective pairs of qudits
        for i in range(init_circuit.num_qudits - 1):
            init_circuit.append_gate(RXXGate(), (i, i+1))
        
        return init_circuit
    
    def gen_successors(self, circuit, data):
        """
        During the search, this will be called when expanding a node.
        
        By default, the SimpleLayerGenerator produces new circuits with
        one more block of gates on each valid edge. Again, let's be
        a little crazy here too.
        """
        
        base_successor = circuit.copy()
        
        # Apply a column of U3Gates
        for i in range(base_successor.num_qudits):
            base_successor.append_gate(U3Gate(), i)
        
        successors = []
        
        # Create 3 successors
        # Each one has a line of a specific type of gate.
        for gate in [RXXGate(), RYYGate(), RZZGate()]:
            successor = base_successor.copy()
            for i in range(base_successor.num_qudits - 1):
                successor.append_gate(gate, (i, i+1))
            successors.append(successor)

        return successors

    
# We can now use this layer generator just like the simple one
configured_qsearch_pass = QSearchSynthesisPass(layer_generator=CustomLayerGenerator())

# Create and execute a compilation task
with Compiler() as compiler:
    task = CompilationTask(circuit, [configured_qsearch_pass])
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

Note that what we do here might not produce the shortest circuits since we are adding more gates at each step, but one advantage is that the circuit grows quicker and will most likely converge quicker.

This layer generation technique is will most likely be useful to synthesize larger unitaries quicker.

### Search Heuristics

The QSearch algorithm was original designed for use with the A* search heuristic, but this is also configurable in BQSKit's implementation. The search heuristic determines the order in which candidate solutions are evaluated and expanded. There are three that are implemented for you: `AStarHeuristic`, `DijkstraHeuristic`, `GreedyHeuristic`. These also have an API that you can implement if you would like to customize it further.

**Exercise:** Modify the below code to use the different heuristics. How do the results change?

In [None]:
from bqskit.passes.search import AStarHeuristic, DijkstraHeuristic, GreedyHeuristic

configured_qsearch_pass = QSearchSynthesisPass(heuristic_function=GreedyHeuristic())

# Create and execute a compilation task
with Compiler() as compiler:
    task = CompilationTask(circuit, [configured_qsearch_pass])
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

## Topology-Aware Synthesis

Everything that we have done so far has been purely virtual. There has been no concept of a physical quantum processor and the restrictions that come with using a real one. We can program the BQSKit workflow with information about a physical machine; Our synthesis tools will take advantage of this information and produce circuits that are already mapped to a machine's topology.

**Note:** For search-based synthesizers this is programmed in the layer generator. If you would like to continue using a custom one from a previous exercise, you can implement topology awareness in there. Take a look at the source code for the `SimpleLayerGenerator` to see how it is done there.

In [None]:
from bqskit.compiler import MachineModel
from bqskit.passes import SimpleLayoutPass

# A MachineModel object models a physical machine
# Here model is a three-qubit model with a linear topology
model = MachineModel(3, [(0, 1), (1, 2)])

# We use a layout pass to:
#    1) associate a MachineModel with the compiler flow
#    2) assign logical to physical qudits
task = CompilationTask(circuit, [
    SimpleLayoutPass(model),
    QSearchSynthesisPass(),
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
print("Circuit Coupling Graph:", synthesized_circuit.coupling_graph)
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

## The LEAP Algorithm

The LEAP algorithm implemented in BQSKit supports the exact same features as QSearch. The search algorithm is just slightly different to encourage deeper searches rather than wider ones.

**Exercise:** Run the below example, how does the synthesized toffoli circuit differ when compiled with LEAP vs QSearch. Modify the example, making use of all the options you learned for QSearch.

In [None]:
from bqskit.passes import LEAPSynthesisPass

task = CompilationTask(circuit, [LEAPSynthesisPass()])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

## Practice Putting It Together

**Exercise:** Three unitaries of different sizes are hardcoded below. Try experiment with different configurations of search based synthesis. Some general tips:

- LEAP will generally be faster with a slight hit to circuit depth
- Use of more expressive gates will usually converge quicker but produce longer circuits
- Search strategies that explore deeper into the circuit space quicker will often times converge quicker
- Using off-the-shelf QSearch on a 5-qubit unitary likely will not converge
- If you want the best results, it will take experimentation

In [None]:
# Three qubit fredkin or Controlled-Swap gate
fredkin_unitary = UnitaryMatrix([
    [1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1],
])

# X-Gate with three controls (CCCX)
cccx_unitary = UnitaryMatrix([
    [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
])

# An XXXXX(np.pi/4) rotation
import math
xxxxx_unitary = PauliGate(5).get_unitary([math.pi/4 if i == 341 else 0 for i in range(4**5)])

In [None]:
# Change this to other unitaries
circuit = Circuit.from_unitary(fredkin_unitary)

task = CompilationTask(circuit, [
    # Build your own compiler workflow here
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

## Search-less Synthesis

BQSKit implements the QFAST algorithm, which is portable and topology-aware as well. Let's look at a simple example:

In [None]:
from bqskit.passes import QFASTDecompositionPass

# Encode the toffoli unitary
toffoli_unitary = UnitaryMatrix([
    [1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 1, 0],
])

# Inputs to the BQSKit compiler must be a circuit
# So first we place the toffoli unitary into a circuit
# Note: NumPy arrays are interchangable with UnitaryMatrix objects
circuit = Circuit.from_unitary(toffoli_unitary)

# We must now tell the compiler that we would like to execute QFAST
# This is done by creating a CompilationTask with the circuit as the input
# and only a QFASTDecompositionPass object in the workflow.
task = CompilationTask(circuit, [QFASTDecompositionPass()])

# Finally, we construct a compiler and submit the task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

Notice the resulting circuit is composed of `PauliGate`s, an encoding of a general unitary gate.
To go all the way to native gates, QFAST needs to be followed by another synthesis tool.
In the below example after QFAST is called, each block is synthesized with QSearch:

In [None]:
from bqskit.passes import ForEachBlockPass
from bqskit.passes import QSearchSynthesisPass
from bqskit.passes import UnfoldPass

task = CompilationTask(circuit, [
    QFASTDecompositionPass(),
    ForEachBlockPass([QSearchSynthesisPass()]),
    UnfoldPass(),
])
# ForEachBlockPass will run it's argument passes on each block
# The UnfoldPass will then unroll the synthesized blocks
# Without the unfold pass, the synthesized blocks will
# still be grouped

# Finally, we construct a compiler and submit the task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

One easy optimization that can lead to shorter circuits with little cost is the `ScanningGateRemovalPass`. This pass will attempt to remove gates one-by-one, and is often successful with QFAST.

**Exercise:** Try to place the `ScanningGateRemovalPass` in different parts of the compilation flow. How many place where you successfully able to place a `ScanningGateRemovalPass`?

In [None]:
from bqskit.passes import ScanningGateRemovalPass

task = CompilationTask(circuit, [
    QFASTDecompositionPass(),
    ForEachBlockPass([QSearchSynthesisPass()]),
    UnfoldPass(),
    ScanningGateRemovalPass(),
])

# Construct a compiler and submit the task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Accuracy

Similar to QSearch and LEAP, the `success_threshold` parameter can be adjusted to allow for more or less approximation.
It is important to note that QFAST is more numerically unstable than QSearch and LEAP, and as a result, optimization cannot always produce results below a certain threshold.

**Exercise:** Try changing the success threshold in the following example. What do you expect will happen? Note that both `QSearchSynthesisPass` and `ScanningGateRemovalPass` also have `success_threshold` parameters. Try adjusting all of them.

In [None]:
# We can increase the success threshold allowing for greater error in the result
# Note 1e-1 is a very high error
configured_qfast_pass = QFASTDecompositionPass(success_threshold=1e-1)

task = CompilationTask(circuit, [
    configured_qfast_pass,
    ForEachBlockPass([QSearchSynthesisPass()]),
    UnfoldPass(),
    ScanningGateRemovalPass(),
])

# Create and execute a compilation task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)

for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))
    
out_utry = synthesized_circuit.get_unitary()
print("New Circuit Error:", out_utry.get_distance_from(toffoli_unitary))

### Model Restrictions

The QFAST algorithm ensures that with each extra gate added the distance to the target decreases by a little amount. This is done to ensure that progress is made; if adding a gate does not reduce the distance sufficiently, then the gate is removed and the valid locations are temporarily restricted. Doing this ensures better quality results when using QFAST, but it can slow down the algorithm by preventing it from reaching deep into the circuit space quickly. With circuits that are expected to be deep, it might be useful to lower the `progress_threshold` or even to disable it by setting it negative.

**Exercise:** Try adjusting the progress_threshold.

In [None]:
configured_qfast_pass = QFASTDecompositionPass(progress_threshold=-1)

task = CompilationTask(circuit, [
    configured_qfast_pass,
    ForEachBlockPass([QSearchSynthesisPass()]),
    UnfoldPass(),
])

# Create and execute a compilation task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)

for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Function Encoding

While the key idea in the QFAST algorithm is using a variable location encoding of a gate,
the gate's function still needs to be encoded as well.
The original algorithm uses an expressive general unitary gate encoded in a `PauliGate`,
but this can be changed entirely. By default, a 2-qubit PauliGate is used. The size of the
`PauliGate` can be changed, or the gate can be swapped out for something else. Keep in mind,
that the gate will need be to universal.

**Exercise:** Try adjusting the gate QFAST uses to encode function in the below example:

In [None]:
from bqskit.ir.gates import CYGate
from bqskit.ir.gates import U3Gate
from bqskit.ir.gates import CircuitGate

# We will group together a few gates in  side of a CircuitGate
# We can then pass the group into QFAST
gate_group = Circuit(2)
gate_group.append_gate(CYGate(), (0, 1))
gate_group.append_gate(U3Gate(), 0)
gate_group.append_gate(U3Gate(), 1)
grouped_gate = CircuitGate(gate_group)

configured_qfast_pass = QFASTDecompositionPass(gate=grouped_gate)

task = CompilationTask(circuit, [
    configured_qfast_pass,
    UnfoldPass(),
])

# Create and execute a compilation task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)

for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

Note that when we use native gates directly, we do not need QSearch inside of a `ForEachBlockPass`.

### Putting It Together

**Exercise:** Try synthesizing the previous three hardcoded unitaries with QFAST. Try experiment with different configurations of QFAST. Some general tips:

- Larger blocks, e.g. `PauliGate(3)`, will usually converge quicker but produce longer circuits
- Less expressive gates will usually converge slower but produce better circuits
- Lowering `progress_threshold` and raising `success_threshold` will speed up synthesis
- If you want the best results, it will take experimentation

In [None]:
# Change this to other unitaries
circuit = Circuit.from_unitary(fredkin_unitary)

task = CompilationTask(circuit, [
    # Build your own compiler workflow here
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

## Using Partitioning to Optimize a Circuit

Synthesis is a very powerful circuit optimization technique. However, the input size to even QFAST doesn't scale to larger circuits well. In fact, to be able to synthesize a circuit currently, we will need to be able to simulate it. This will ultimately cap the scaling of synthesis algorithms. However, we can still use a synthesis tool together with partitioner to optimize small blocks of a circuit at a time. BQSKit was designed for this exact use case.

In [None]:
# Load a 16-qubit time evolution circuit generated from the ArQTIC circuit generator.
circuit = Circuit.from_file('heisenberg-16-20.qasm')

for gate in circuit.gate_set:
    print(f"{gate} Count:", circuit.count(gate))

We will partition the circuit and then use the ForEachBlockPass to perform operations on the individual blocks. Note the ForEachBlockPass will run the sub tasks in parallel using dask.

In [None]:
from bqskit.compiler import CompilationTask
from bqskit.compiler import Compiler
from bqskit.passes import QuickPartitioner
from bqskit.passes import ForEachBlockPass
from bqskit.passes import QSearchSynthesisPass
from bqskit.passes import ScanningGateRemovalPass
from bqskit.passes import UnfoldPass

task = CompilationTask(circuit, [
    QuickPartitioner(3),
    ForEachBlockPass([QSearchSynthesisPass(), ScanningGateRemovalPass()]),
    UnfoldPass(),
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Replace Filters

The `ForEachBlockPass` takes an optional parameter `replace_filter` that determines if the circuit resulting from running the input passes on the original block should replace the original block. In the below example, we alter the above flow to only replace a block if it has fewer two-qubit gates as a result of running `QSearchSynthesisPass` and `ScanningGateRemovalPass`.

**Exercise:** Try changing the replace filter to suite your needs. You might want to select circuits with greater parallelism: `circuit.parallelism` or choose based on depth `circuit.depth`. 

In [None]:
from bqskit.ir.gates import CXGate

def less_2q_gates(result_circuit, initial_block_as_op):
    begin_cx_count = initial_block_as_op.gate._circuit.count(CXGate())
    end_cx_count = result_circuit.count(CXGate())
    return end_cx_count < begin_cx_count

task = CompilationTask(circuit, [
    QuickPartitioner(3),
    ForEachBlockPass(
        [QSearchSynthesisPass(), ScanningGateRemovalPass()],
        replace_filter=less_2q_gates
    ),
    UnfoldPass(),
])

# Finally, we construct a compiler and submit the task
with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Gatesets

Just like we changed the gates used by QSearch in the Search Synthesis tutorial, we can change the gates for the entire circuit using the same method.

**Exercise:** Change the gates used in the below example to change the gate set for the circuit.

In [None]:
from bqskit.ir.gates import ISwapGate, U3Gate
from bqskit.passes.search import SimpleLayerGenerator

layer_gen = SimpleLayerGenerator(two_qudit_gate=ISwapGate(), single_qudit_gate_1=U3Gate())

configured_qsearch_pass = QSearchSynthesisPass(layer_generator=layer_gen)

task = CompilationTask(circuit, [
    QuickPartitioner(3),
    ForEachBlockPass([configured_qsearch_pass, ScanningGateRemovalPass()]),
    UnfoldPass(),
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Block Size

Increasing the partitioner's block size will likely lead to better results at a runtime cost. If you have the computing resources, you can launch a Dask cluster and connect to it via `Compiler()`. The ForEachBlockPass will efficiently distribute the work. See the [Dask documentation](https://docs.dask.org/en/stable/setup.html) for how to launch a cluster.

In [None]:
from bqskit.passes import OptimizedLEAPPass


task = CompilationTask(circuit, [
    QuickPartitioner(4),
    ForEachBlockPass([OptimizedLEAPPass]),
    UnfoldPass(),
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

### Iterative Optimization

We have provided support for passes that manage control flow. This enables us to conditionally apply passes or to apply them in a loop. In the below example we will run the partitioning and synthesis sequence in a loop until the circuit stops decreasing in 2-qubit gate count.

In [None]:
from bqskit.compiler import BasePass
from bqskit.passes import WhileLoopPass, GateCountPredicate
from bqskit.ir.gates import CXGate

# Defining a new pass is as easy as implementing a `run` function.
# In this pass, we just print some information about the circuit
class PrintCNOTsPass(BasePass):
    def run(self, circuit, data) -> None:
        print("Current CNOT count:", circuit.count(CXGate()), end='\r')

task = CompilationTask(circuit, [
    PrintCNOTsPass(),
    WhileLoopPass(
        GateCountPredicate(CXGate()),
        [
            QuickPartitioner(3),
            ForEachBlockPass(
                [
                    QSearchSynthesisPass(),
                    ScanningGateRemovalPass()
                ],
                replace_filter=less_2q_gates
            ),
            UnfoldPass(),
            PrintCNOTsPass(),
        ]
    )
])

with Compiler() as compiler:
    synthesized_circuit = compiler.compile(task)
    
for gate in synthesized_circuit.gate_set:
    print(f"{gate} Count:", synthesized_circuit.count(gate))

There's a lot new in the above example. First, we defined a new pass by subclassing `BasePass` and implementing a `run` method. This pass just prints the number of Controlled-not gates in the circuit when executed. We then use this before and inside a `WhileLoopPass` to see the progress of execution. Second, we perform a `WhileLoopPass` which takes a predicate and a sequence of passes. It will apply the passes supplied until the predicate produces false. We supplied a `GateCountPredicate` which evaluates to False when the specific gate count stops changing.