Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support control flow in SabreSwap and SabreLayout. #10366

Merged
merged 47 commits into from
Jul 19, 2023

Conversation

kevinhartman
Copy link
Contributor

@kevinhartman kevinhartman commented Jun 30, 2023

Summary

Adds support for circuits with control flow operations to the Sabre layout and routing passes.

Details and comments

SabreSwap

The approach taken in this PR aims to add support for control flow with minimal complexity. The most important decision that this advised was the choice to unconditionally route all control flow blocks back to their starting layout. As described in #9419, only "non-exhaustive" control flow operations technically require this. However, by doing it always, we get the following properties:

  1. Sabre can in effect "look through" control flow op nodes when populating its extended set. This works because applying a control flow node is guaranteed not to change the layout, and so nodes in the extended set downstream of a control flow node are still valid.
  2. The Sabre Rust code doesn't need an enumeration of what is and is not a non-exhaustive control flow operation, which keeps the SabreDAG data structure relatively simple.
  3. There's no need to recurse into and reverse blocks when running SabreLayout, since placement of control flow ops doesn't permute the layout.

As a consequence, unnecessary swaps may be inserted in the case of exhaustive control flow (namely if_elses and exhaustive switchs). In the future, we may want to investigate alternative approaches that take advantage of exhaustive control flow. These would likely require more research, since they'd require modifications to the core Sabre algorithm. For example, we might consider mutably sharing the extended set of the outer DAG while routing child blocks with some sort of weighting (e.g. based on branch prediction) so that routing works in the best interest of the overall circuit.

The core logic of this implementation has been done in Rust. To make this work, SabreDAG was (recursively) extended to include a new field node_blocks, which maps node IDs of the original DAG to a vector of SabreDAGs, which are the blocks of the corresponding control flow node. If node_blocks contains an entry for a given node, then the Sabre swap Rust code treats the node as a control flow operation, i.e. it'll recursively route its blocks and place it in the gate order immediately.

The Rust code now returns a SabreResult struct which contains the preexisting swap map (which, as a refresher, maps node IDs from the original DAG to a list of logical swaps that must be performed prior to the corresponding node's placement), the gate order as before, and an additional field node_block_results, which maps node IDs to a list of BlockResult structs for the corresponding control flow node's blocks. Each BlockResult contains two fields, result which is a SabreResult encapsulating the now routed block, and swap_epilogue, which is a list of logical swaps that must be applied to the block by the caller to bring it to the proper layout (its starting layout, for now).

SabreLayout

In theory, the routing_pass argument for SabreLayout should support any routing pass that supports control flow, since the reverse circuit constructed by the Python-side code uses QuantumCircuit.reverse_ops, which will recursively call reverse_ops on each instruction. When unspecified, the Rust-side Sabre swap is used, which does not permute the layout when placing control flow operations, and thus does not need to recursively reverse nested circuits.

Notes

  • The version of hashbrown we depend on in Rust had to be downgraded explicitly in the Cargo.lock. This is because Rustworkx uses its HashMap type in method rustworkx_core::token_swapper::token_swapper, and Rustworkx is currently using hashbrown at 0.12.3. See Consider replacing hashbrown use in core API rustworkx#911 for more details.

Testing

Relevant control flow tests from StochasticSwap are being ported. I've also locally enabled Sabre layout and routing for circuits with control flow in level1.py and run through the test_transpiler.py tests, which are passing.

To do

  • Address inline TODOs.
  • Add additional testing (porting from what we've got currently for StochasticSwap's CF handling).
  • Add release note.
  • Run formatting.

Resolves #9419
Resolves #9421

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!!

In testing: the current tests have a lot of surface area, but they mostly give the inner blocks the same qubits as the outer blocks. It might be good to include a couple of the unit tests that explicitly attempt to route circuits with control-flow ops that appear in the same layer, and are defined over qubits that aren't in the outer circuit (and maybe some that are in the outer circuit, but bound in different orders). I think all the pieces are in place in your code that it should work out of the box, and I think the random and disjoint tests touch on them a little bit, but it'd be good to have one or two more targetted tests, since this is our premier routing and layout algorithm.

Also, ought we to have some tests of SabreLayout as well?

Comment on lines +129 to +136
fn swap_epilogue(&self, py: Python) -> PyObject {
self.swap_epilogue
.iter()
.map(|x| x.into_py(py))
.collect::<Vec<_>>()
.into_pyarray(py)
.into()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird that we can't have a direct impl for IntoPyArray to ToPyArray, since Vec<[usize; N]> should point to the same memory layout as a 2D Numpy array of usize. I guess that's something we can take up with the numpy crate, though - we couldn't write the impl locally because of the coherence rules.

Comment on lines +567 to 586
match dag.node_blocks.get(py_node) {
Some(blocks) => {
// Control flow op. Route all blocks for current layout.
let mut block_results: Vec<BlockResult> = Vec::with_capacity(blocks.len());
for inner_dag in blocks {
let (inner_dag_routed, inner_final_layout) =
route_block_dag(inner_dag, layout.copy());

// For now, we always append a swap circuit that gets the inner block
// back to the parent's layout.
let swap_epilogue =
gen_swap_epilogue(coupling, inner_final_layout, layout, seed);
let block_result = BlockResult {
result: inner_dag_routed,
swap_epilogue,
};
block_results.push(block_result);
}
node_block_results.insert_unique_unchecked(*py_node, block_results);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the iteration means that control-flow ops are routed as they happen to be visited - it's not quite ASAP and definitely not ALAP, but it's definitely there. I think that's totally fine for now, I just wanted to flag that there is an implicit decision point in the routing here. It's the same "decision" we make in StochasticSwap, though.

In the future, we might want to investigate if it'd be better to shift these to be precisely ASAP or ALAP routing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely, good point, there's certainly an implicit decision here. I thought about this and figured that it shouldn't matter for this implementation, since placement of a control flow op doesn't permute the layout. If it did, I imagine it'd be best to go for ALAP, i.e. place all gate sequences that do not have a data dependency on the control flow op in question, since the lookahead would have been driving towards making these gates as close to placeable as possible, and only then place the control flow op.

I've got some notes set aside from when I was planning the work in this PR that might come in handy when we revisit this to look for optimizations.

crates/accelerate/src/sabre_swap/mod.rs Show resolved Hide resolved
crates/accelerate/src/sabre_swap/mod.rs Show resolved Hide resolved
crates/accelerate/src/sabre_swap/mod.rs Show resolved Hide resolved
qiskit/transpiler/passes/routing/sabre_swap.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/routing/sabre_swap.py Outdated Show resolved Hide resolved
Comment on lines 328 to 333
def empty_dag(node):
out = DAGCircuit()
for qreg in mapped_dag.qregs.values():
out.add_qreg(qreg)
out.add_clbits(node.cargs)
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that some inner blocks may have things like global phases or classical registers (for example, if a control-flow block itself contains a control-flow block with a ClassicalRegister condition), which this conversion will miss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know. I was under the impression that inner blocks shouldn't have their own global properties or registers. I suppose we can add the current block as a second arg and copy these things from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocks need to be able to contain registers, but you can safely assume that any registers they do have will also be in any circuits that contain that block. Blocks can technically have global phases (though that's impossible to achieve with the builder interface), but honestly, I don't think we've ever considered that before, so I wouldn't be surprised if we already have global-phase bugs in other parts of the transpiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah alright, so the biggest issue here is that I'm not adding the cregs from the mapped_dag, correct? I can also copy in any global phases from the current block as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think cregs is the big one. We should probably include a test that a nested block that contains a classical register all works correctly as well - if you need to compare, you can use the control-flow builder interface, then make sure that the inner blocks use register conditions rather than bit conditions, and it should create control-flow ops whose blocks involve registers as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 386197d.

qiskit/transpiler/passes/routing/sabre_swap.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/routing/sabre_swap.py Outdated Show resolved Hide resolved
- Makes it more explicit that the num_qubits used in a SabreDAG
  should be the number of physical qubits on the device.
- We no longer pass clbit indices because they are always local to
  the block, i.e. there's no current reason for them to be consistent
  across the root DAG and inner blocks, like we must do for qubits.
@kevinhartman
Copy link
Contributor Author

Should be ready to go.

jakelishman
jakelishman previously approved these changes Jul 18, 2023
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, and the new tests with the nested registers look good as well, thanks. I'm super happy with this work, and I'm really pleased to be bringing Sabre back online with control-flow support.

I left one very minor comment, but it's probably me just being paranoid.

We decided in the Qiskit developers' meeting against Kevin spending time right now to add tests of SabreLayout, when we're relatively confident that it all already works because of the integration testing that's enabled by #10372, and we need all the review effort we can get.

test/python/transpiler/test_sabre_swap.py Outdated Show resolved Hide resolved
mtreinish
mtreinish previously approved these changes Jul 18, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for all the updates and the attention to detail on this PR. Sabre is a key part of the transpiler so that was very important.

@jakelishman
Copy link
Member

(Kevin: feel free to dismiss my comment and just enable automerge if you think I'm being too paranoid)

@kevinhartman kevinhartman dismissed stale reviews from mtreinish and jakelishman via 4313df9 July 18, 2023 20:57
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, good timing

@jakelishman jakelishman added this pull request to the merge queue Jul 18, 2023
Merged via the queue into Qiskit:main with commit 49b383a Jul 19, 2023
14 checks passed
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Aug 16, 2023
In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to
support nested control flow blocks. As part of that refactor a new
struct SabreResult was created to store the nested results for each
block. This new class however resulted in PyO3 cloning the full swap map
on every access. Since we have at least 1 access per gate in the circuit
(and another one for each swap) this extra clone() adds a lot of extra
overhead for deep circuits. In extreme cases this regression could be
quite extreme. To address this the result format is changed to be a
tuple (as it was originally), while this is less ergonomic the advantage
it provides is that for nested objects it moves the rust object to the
pyo3 interface so we avoid a copy as Python owns the object on the
return. However, for control flow blocks we're still using the
SabreResult class as it simplifies the internal implementation (which is
why Qiskit#10366 introduced the struct). The potential impact of this is
mitigated by switching to only a single clone per control flow block,
by only accessing the SabreResult object's attribute on each recursive
call. However, if it is an issue in the future we can work on flattening
the nested structure before returning it to python to avoid any clones.

Fixes Qiskit#10650
github-merge-queue bot pushed a commit that referenced this pull request Aug 17, 2023
* Fix performance of Sabre rust<->Python boundary

In #10366 the SabreLayout and SabreSwap passes were refactored to
support nested control flow blocks. As part of that refactor a new
struct SabreResult was created to store the nested results for each
block. This new class however resulted in PyO3 cloning the full swap map
on every access. Since we have at least 1 access per gate in the circuit
(and another one for each swap) this extra clone() adds a lot of extra
overhead for deep circuits. In extreme cases this regression could be
quite extreme. To address this the result format is changed to be a
tuple (as it was originally), while this is less ergonomic the advantage
it provides is that for nested objects it moves the rust object to the
pyo3 interface so we avoid a copy as Python owns the object on the
return. However, for control flow blocks we're still using the
SabreResult class as it simplifies the internal implementation (which is
why #10366 introduced the struct). The potential impact of this is
mitigated by switching to only a single clone per control flow block,
by only accessing the SabreResult object's attribute on each recursive
call. However, if it is an issue in the future we can work on flattening
the nested structure before returning it to python to avoid any clones.

Fixes #10650

* Simplify recursive call logic in _apply_sabre_result

This commit simplifies the logic in the recursive call logic in
_apply_sabre_result to always use a tuple so we avoid an isinstance
check.

Co-authored-by: Kevin Hartman <kevin@hart.mn>

---------

Co-authored-by: Kevin Hartman <kevin@hart.mn>
mergify bot pushed a commit that referenced this pull request Aug 17, 2023
* Fix performance of Sabre rust<->Python boundary

In #10366 the SabreLayout and SabreSwap passes were refactored to
support nested control flow blocks. As part of that refactor a new
struct SabreResult was created to store the nested results for each
block. This new class however resulted in PyO3 cloning the full swap map
on every access. Since we have at least 1 access per gate in the circuit
(and another one for each swap) this extra clone() adds a lot of extra
overhead for deep circuits. In extreme cases this regression could be
quite extreme. To address this the result format is changed to be a
tuple (as it was originally), while this is less ergonomic the advantage
it provides is that for nested objects it moves the rust object to the
pyo3 interface so we avoid a copy as Python owns the object on the
return. However, for control flow blocks we're still using the
SabreResult class as it simplifies the internal implementation (which is
why #10366 introduced the struct). The potential impact of this is
mitigated by switching to only a single clone per control flow block,
by only accessing the SabreResult object's attribute on each recursive
call. However, if it is an issue in the future we can work on flattening
the nested structure before returning it to python to avoid any clones.

Fixes #10650

* Simplify recursive call logic in _apply_sabre_result

This commit simplifies the logic in the recursive call logic in
_apply_sabre_result to always use a tuple so we avoid an isinstance
check.

Co-authored-by: Kevin Hartman <kevin@hart.mn>

---------

Co-authored-by: Kevin Hartman <kevin@hart.mn>
(cherry picked from commit e6c431e)
github-merge-queue bot pushed a commit that referenced this pull request Aug 17, 2023
* Fix performance of Sabre rust<->Python boundary

In #10366 the SabreLayout and SabreSwap passes were refactored to
support nested control flow blocks. As part of that refactor a new
struct SabreResult was created to store the nested results for each
block. This new class however resulted in PyO3 cloning the full swap map
on every access. Since we have at least 1 access per gate in the circuit
(and another one for each swap) this extra clone() adds a lot of extra
overhead for deep circuits. In extreme cases this regression could be
quite extreme. To address this the result format is changed to be a
tuple (as it was originally), while this is less ergonomic the advantage
it provides is that for nested objects it moves the rust object to the
pyo3 interface so we avoid a copy as Python owns the object on the
return. However, for control flow blocks we're still using the
SabreResult class as it simplifies the internal implementation (which is
why #10366 introduced the struct). The potential impact of this is
mitigated by switching to only a single clone per control flow block,
by only accessing the SabreResult object's attribute on each recursive
call. However, if it is an issue in the future we can work on flattening
the nested structure before returning it to python to avoid any clones.

Fixes #10650

* Simplify recursive call logic in _apply_sabre_result

This commit simplifies the logic in the recursive call logic in
_apply_sabre_result to always use a tuple so we avoid an isinstance
check.

Co-authored-by: Kevin Hartman <kevin@hart.mn>

---------

Co-authored-by: Kevin Hartman <kevin@hart.mn>
(cherry picked from commit e6c431e)

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
SamD-1998 pushed a commit to SamD-1998/qiskit-terra that referenced this pull request Sep 7, 2023
* Fix performance of Sabre rust<->Python boundary

In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to
support nested control flow blocks. As part of that refactor a new
struct SabreResult was created to store the nested results for each
block. This new class however resulted in PyO3 cloning the full swap map
on every access. Since we have at least 1 access per gate in the circuit
(and another one for each swap) this extra clone() adds a lot of extra
overhead for deep circuits. In extreme cases this regression could be
quite extreme. To address this the result format is changed to be a
tuple (as it was originally), while this is less ergonomic the advantage
it provides is that for nested objects it moves the rust object to the
pyo3 interface so we avoid a copy as Python owns the object on the
return. However, for control flow blocks we're still using the
SabreResult class as it simplifies the internal implementation (which is
why Qiskit#10366 introduced the struct). The potential impact of this is
mitigated by switching to only a single clone per control flow block,
by only accessing the SabreResult object's attribute on each recursive
call. However, if it is an issue in the future we can work on flattening
the nested structure before returning it to python to avoid any clones.

Fixes Qiskit#10650

* Simplify recursive call logic in _apply_sabre_result

This commit simplifies the logic in the recursive call logic in
_apply_sabre_result to always use a tuple so we avoid an isinstance
check.

Co-authored-by: Kevin Hartman <kevin@hart.mn>

---------

Co-authored-by: Kevin Hartman <kevin@hart.mn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support control flow in Python components of SabreLayout Support control flow in SabreSwap
5 participants