Support control flow in `SabreSwap` and `SabreLayout`. #10366

kevinhartman · 2023-06-30T19:52:08Z

Summary

Adds support for circuits with control flow operations to the Sabre layout and routing passes.

Details and comments

SabreSwap

The approach taken in this PR aims to add support for control flow with minimal complexity. The most important decision that this advised was the choice to unconditionally route all control flow blocks back to their starting layout. As described in #9419, only "non-exhaustive" control flow operations technically require this. However, by doing it always, we get the following properties:

Sabre can in effect "look through" control flow op nodes when populating its extended set. This works because applying a control flow node is guaranteed not to change the layout, and so nodes in the extended set downstream of a control flow node are still valid.
The Sabre Rust code doesn't need an enumeration of what is and is not a non-exhaustive control flow operation, which keeps the SabreDAG data structure relatively simple.
There's no need to recurse into and reverse blocks when running SabreLayout, since placement of control flow ops doesn't permute the layout.

As a consequence, unnecessary swaps may be inserted in the case of exhaustive control flow (namely if_elses and exhaustive switchs). In the future, we may want to investigate alternative approaches that take advantage of exhaustive control flow. These would likely require more research, since they'd require modifications to the core Sabre algorithm. For example, we might consider mutably sharing the extended set of the outer DAG while routing child blocks with some sort of weighting (e.g. based on branch prediction) so that routing works in the best interest of the overall circuit.

The core logic of this implementation has been done in Rust. To make this work, SabreDAG was (recursively) extended to include a new field node_blocks, which maps node IDs of the original DAG to a vector of SabreDAGs, which are the blocks of the corresponding control flow node. If node_blocks contains an entry for a given node, then the Sabre swap Rust code treats the node as a control flow operation, i.e. it'll recursively route its blocks and place it in the gate order immediately.

The Rust code now returns a SabreResult struct which contains the preexisting swap map (which, as a refresher, maps node IDs from the original DAG to a list of logical swaps that must be performed prior to the corresponding node's placement), the gate order as before, and an additional field node_block_results, which maps node IDs to a list of BlockResult structs for the corresponding control flow node's blocks. Each BlockResult contains two fields, result which is a SabreResult encapsulating the now routed block, and swap_epilogue, which is a list of logical swaps that must be applied to the block by the caller to bring it to the proper layout (its starting layout, for now).

SabreLayout

In theory, the routing_pass argument for SabreLayout should support any routing pass that supports control flow, since the reverse circuit constructed by the Python-side code uses QuantumCircuit.reverse_ops, which will recursively call reverse_ops on each instruction. When unspecified, the Rust-side Sabre swap is used, which does not permute the layout when placing control flow operations, and thus does not need to recursively reverse nested circuits.

Notes

The version of hashbrown we depend on in Rust had to be downgraded explicitly in the Cargo.lock. This is because Rustworkx uses its HashMap type in method rustworkx_core::token_swapper::token_swapper, and Rustworkx is currently using hashbrown at 0.12.3. See Consider replacing hashbrown use in core API rustworkx#911 for more details.

Testing

Relevant control flow tests from StochasticSwap are being ported. I've also locally enabled Sabre layout and routing for circuits with control flow in level1.py and run through the test_transpiler.py tests, which are passing.

To do

Address inline TODOs.
Add additional testing (porting from what we've got currently for StochasticSwap's CF handling).
Add release note.
Run formatting.

Resolves #9419
Resolves #9421

Previously, we were using original DAG node IDs to look up nodes in the expanded nested DAGs, but these IDs are of course not the same.

…logical.

… set.

jakelishman

This looks great, thank you!!

In testing: the current tests have a lot of surface area, but they mostly give the inner blocks the same qubits as the outer blocks. It might be good to include a couple of the unit tests that explicitly attempt to route circuits with control-flow ops that appear in the same layer, and are defined over qubits that aren't in the outer circuit (and maybe some that are in the outer circuit, but bound in different orders). I think all the pieces are in place in your code that it should work out of the box, and I think the random and disjoint tests touch on them a little bit, but it'd be good to have one or two more targetted tests, since this is our premier routing and layout algorithm.

Also, ought we to have some tests of SabreLayout as well?

jakelishman · 2023-07-17T11:48:44Z

crates/accelerate/src/sabre_swap/mod.rs

+    fn swap_epilogue(&self, py: Python) -> PyObject {
+        self.swap_epilogue
+            .iter()
+            .map(|x| x.into_py(py))
+            .collect::<Vec<_>>()
+            .into_pyarray(py)
+            .into()
+    }


It feels weird that we can't have a direct impl for IntoPyArray to ToPyArray, since Vec<[usize; N]> should point to the same memory layout as a 2D Numpy array of usize. I guess that's something we can take up with the numpy crate, though - we couldn't write the impl locally because of the coherence rules.

jakelishman · 2023-07-17T11:56:51Z

crates/accelerate/src/sabre_swap/mod.rs

+        match dag.node_blocks.get(py_node) {
+            Some(blocks) => {
+                // Control flow op. Route all blocks for current layout.
+                let mut block_results: Vec<BlockResult> = Vec::with_capacity(blocks.len());
+                for inner_dag in blocks {
+                    let (inner_dag_routed, inner_final_layout) =
+                        route_block_dag(inner_dag, layout.copy());
+
+                    // For now, we always append a swap circuit that gets the inner block
+                    // back to the parent's layout.
+                    let swap_epilogue =
+                        gen_swap_epilogue(coupling, inner_final_layout, layout, seed);
+                    let block_result = BlockResult {
+                        result: inner_dag_routed,
+                        swap_epilogue,
+                    };
+                    block_results.push(block_result);
+                }
+                node_block_results.insert_unique_unchecked(*py_node, block_results);
            }


The ordering of the iteration means that control-flow ops are routed as they happen to be visited - it's not quite ASAP and definitely not ALAP, but it's definitely there. I think that's totally fine for now, I just wanted to flag that there is an implicit decision point in the routing here. It's the same "decision" we make in StochasticSwap, though.

In the future, we might want to investigate if it'd be better to shift these to be precisely ASAP or ALAP routing.

Yes absolutely, good point, there's certainly an implicit decision here. I thought about this and figured that it shouldn't matter for this implementation, since placement of a control flow op doesn't permute the layout. If it did, I imagine it'd be best to go for ALAP, i.e. place all gate sequences that do not have a data dependency on the control flow op in question, since the lookahead would have been driving towards making these gates as close to placeable as possible, and only then place the control flow op.

I've got some notes set aside from when I was planning the work in this PR that might come in handy when we revisit this to look for optimizations.

crates/accelerate/src/sabre_swap/mod.rs

qiskit/transpiler/passes/routing/sabre_swap.py

jakelishman · 2023-07-17T12:30:13Z

qiskit/transpiler/passes/routing/sabre_swap.py

+    def empty_dag(node):
+        out = DAGCircuit()
+        for qreg in mapped_dag.qregs.values():
+            out.add_qreg(qreg)
+        out.add_clbits(node.cargs)
+        return out


I'm concerned that some inner blocks may have things like global phases or classical registers (for example, if a control-flow block itself contains a control-flow block with a ClassicalRegister condition), which this conversion will miss.

Ah good to know. I was under the impression that inner blocks shouldn't have their own global properties or registers. I suppose we can add the current block as a second arg and copy these things from that.

Blocks need to be able to contain registers, but you can safely assume that any registers they do have will also be in any circuits that contain that block. Blocks can technically have global phases (though that's impossible to achieve with the builder interface), but honestly, I don't think we've ever considered that before, so I wouldn't be surprised if we already have global-phase bugs in other parts of the transpiler.

Ah alright, so the biggest issue here is that I'm not adding the cregs from the mapped_dag, correct? I can also copy in any global phases from the current block as well.

Yeah, I think cregs is the big one. We should probably include a test that a nested block that contains a classical register all works correctly as well - if you need to compare, you can use the control-flow builder interface, then make sure that the inner blocks use register conditions rather than bit conditions, and it should create control-flow ops whose blocks involve registers as well.

Fixed in 386197d.

qiskit/transpiler/passes/routing/sabre_swap.py

- Makes it more explicit that the num_qubits used in a SabreDAG should be the number of physical qubits on the device. - We no longer pass clbit indices because they are always local to the block, i.e. there's no current reason for them to be consistent across the root DAG and inner blocks, like we must do for qubits.

…-ctrl-flow

kevinhartman · 2023-07-18T20:06:54Z

Should be ready to go.

jakelishman

This looks good to me, and the new tests with the nested registers look good as well, thanks. I'm super happy with this work, and I'm really pleased to be bringing Sabre back online with control-flow support.

I left one very minor comment, but it's probably me just being paranoid.

We decided in the Qiskit developers' meeting against Kevin spending time right now to add tests of SabreLayout, when we're relatively confident that it all already works because of the integration testing that's enabled by #10372, and we need all the review effort we can get.

test/python/transpiler/test_sabre_swap.py

mtreinish

This LGTM, thanks for all the updates and the attention to detail on this PR. Sabre is a key part of the transpiler so that was very important.

jakelishman · 2023-07-18T20:57:18Z

(Kevin: feel free to dismiss my comment and just enable automerge if you think I'm being too paranoid)

jakelishman

lol, good timing

In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650

* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>

* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e)

* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

* Fix performance of Sabre rust<->Python boundary In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>

kevinhartman added 21 commits June 21, 2023 16:58

Initial commit.

a7c2bec

Downgrade hashbrown to 0.12.3 for compat with Rustworkx.

2482d7a

Implement gen_swap_epilogue.

8c0d926

Add Python wrappers for structs.

141b2ff

Implement DAG -> SabreDAG.

b2c2001

Apply block results in sabre_swap.py.

ccba78c

Fix node ID lookup issue.

f8a6d17

Previously, we were using original DAG node IDs to look up nodes in the expanded nested DAGs, but these IDs are of course not the same.

Fix bug where block circuits started out non-empty!

28124c7

Move DAG building and result application to functions.

517833a

Implement control flow handling for Sabre layout.

5661409

Fix qreg indexing bug for new swaps versus routed gates.

10d83fe

Run Python formatting.

e6cbc19

Fix bug where root_dag was used instead of mapped_dag.

5848f94

Fix bug where swap_epilogue was in terms of physical bits instead of …

eb6f58c

…logical.

Add more tests from stochastic swap.

bc3991d

Fix bug mapping inner block qubits to outer circuit qubits.

b1eb08f

Update sabre swap testing.

7ad1ed3

Update TODOs.

cddadf0

Fix bug where a 2Q CF nodes would be mistakenly added to the extended…

40e0cbc

… set.

Port more testing from stochastic swap.

e227da4

Run formatting.

dc03e56

kevinhartman force-pushed the sabre-swap-ctrl-flow branch from f07ca45 to dc03e56 Compare June 30, 2023 22:21

kevinhartman added 8 commits July 1, 2023 02:15

Port remaining stochastic swap tests.

d90cec1

Mark Sabre routing and layout as known good for CF.

bdffd78

Add random circuit valid output testing for Sabre.

998951e

Run cargo fmt.

71d024a

Fix lint issues.

f3054cd

Fix lint issues.

86f643a

Update 'test_invalid_methods_raise_on_control_flow'

c946f89

Make gen_swap_epilogue consume 'from_layout'.

01d4d01

jakelishman reviewed Jul 17, 2023

View reviewed changes

kevinhartman added 8 commits July 17, 2023 15:23

Copy cregs during empty_dag.

386197d

Comment-out assert.

20fe8b4

Use intersection_update.

a398ddb

Merge branch 'main' of github.com:Qiskit/qiskit-terra into sabre-swap…

006a71f

…-ctrl-flow

Remove unused import.

63d7d63

Add caching for block dict.

e74b157

Fix import order.

dc11226

kevinhartman requested a review from mtreinish July 18, 2023 20:06

jakelishman previously approved these changes Jul 18, 2023

View reviewed changes

test/python/transpiler/test_sabre_swap.py Outdated Show resolved Hide resolved

mtreinish previously approved these changes Jul 18, 2023

View reviewed changes

Follow types consistently in test.

4313df9

kevinhartman dismissed stale reviews from mtreinish and jakelishman via 4313df9 July 18, 2023 20:57

jakelishman approved these changes Jul 18, 2023

View reviewed changes

jakelishman enabled auto-merge July 18, 2023 20:58

jakelishman added this pull request to the merge queue Jul 18, 2023

Merged via the queue into Qiskit:main with commit 49b383a Jul 19, 2023
14 checks passed

kevinhartman mentioned this pull request Jul 31, 2023

Bump hashbrown to 0.14.0. #10540

Merged

mtreinish mentioned this pull request Aug 16, 2023

Runtime Performance Regression in SabreLayout #10650

Closed

mtreinish mentioned this pull request Aug 16, 2023

Fix performance of Sabre rust<->Python boundary #10652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support control flow in `SabreSwap` and `SabreLayout`. #10366

Support control flow in `SabreSwap` and `SabreLayout`. #10366

kevinhartman commented Jun 30, 2023 •

edited

jakelishman left a comment

jakelishman Jul 17, 2023

jakelishman Jul 17, 2023

kevinhartman Jul 17, 2023 •

edited

jakelishman Jul 17, 2023

kevinhartman Jul 17, 2023

jakelishman Jul 17, 2023

kevinhartman Jul 17, 2023 •

edited

jakelishman Jul 17, 2023

kevinhartman Jul 17, 2023

kevinhartman commented Jul 18, 2023

jakelishman left a comment

mtreinish left a comment

jakelishman commented Jul 18, 2023

jakelishman left a comment

Support control flow in SabreSwap and SabreLayout. #10366

Support control flow in SabreSwap and SabreLayout. #10366

Conversation

kevinhartman commented Jun 30, 2023 • edited

Summary

Details and comments

SabreSwap

SabreLayout

Notes

Testing

To do

jakelishman left a comment

Choose a reason for hiding this comment

jakelishman Jul 17, 2023

Choose a reason for hiding this comment

jakelishman Jul 17, 2023

Choose a reason for hiding this comment

kevinhartman Jul 17, 2023 • edited

Choose a reason for hiding this comment

jakelishman Jul 17, 2023

Choose a reason for hiding this comment

kevinhartman Jul 17, 2023

Choose a reason for hiding this comment

jakelishman Jul 17, 2023

Choose a reason for hiding this comment

kevinhartman Jul 17, 2023 • edited

Choose a reason for hiding this comment

jakelishman Jul 17, 2023

Choose a reason for hiding this comment

kevinhartman Jul 17, 2023

Choose a reason for hiding this comment

kevinhartman commented Jul 18, 2023

jakelishman left a comment

Choose a reason for hiding this comment

mtreinish left a comment

Choose a reason for hiding this comment

jakelishman commented Jul 18, 2023

jakelishman left a comment

Choose a reason for hiding this comment

Support control flow in `SabreSwap` and `SabreLayout`. #10366

Support control flow in `SabreSwap` and `SabreLayout`. #10366

kevinhartman commented Jun 30, 2023 •

edited

kevinhartman Jul 17, 2023 •

edited

kevinhartman Jul 17, 2023 •

edited