Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximation degree should be zero by default #8043

Closed
nonhermitian opened this issue May 11, 2022 · 8 comments
Closed

Approximation degree should be zero by default #8043

nonhermitian opened this issue May 11, 2022 · 8 comments
Assignees
Labels
type: feature request New feature or request

Comments

@nonhermitian
Copy link
Contributor

What should we add?

There have been several instances where having the transpiler approximation_degree nonzero by default has caused trouble. eg see #7961 and #7341. This has once again become a problem as it took me a long time to understand why my circuits were not the same across several backends even though all the seeds in the transpiler were set. As a consequence I was getting odd results and gate counts. As this is at least the third time that this issue has popped up, and no user has said this behavior is desired by default, it should be turned off unless explicitly set otherwise.

@nonhermitian nonhermitian added the type: feature request New feature or request label May 11, 2022
@nonhermitian
Copy link
Contributor Author

To see why this is a mess, consider the following graph that shows heavy outputs from a single QV32 circuit that has been transpiled at O3 100 times:

image

One sees that the best results (just worry about blue dots here) are at the largest number of CX gates. This is quite counter intuitive, and completely wrong. What is happening is that the O3 preset passmanager is randomly selecting qubits based on Sabre that give a good mapping. However because of variations across the system, the default on approximation is approximating the circuit to various degrees. Thus the "best" circuit in terms of CX gates has actually been approximated severely because (at least it this is my guess) it landed on less than ideal qubits. One can see with the orange does that my moving the circuits around one can do better, but not really any better than the 40cx circuits. This is because HOP is measured with respect to the original circuit distribution to which the heavily approximated ones are no longer strictly faithful.

Beyond this example, there is general uncertainty as to exactly which circuits the approximation level is affecting the outcomes. And that is the real problem with approximation_degree being nonzero by default; I cannot trust what the transpiler is giving me by default.

@jakelishman
Copy link
Member

Just to let you know what's happening here: while I personally would vote for the same behaviour of approximation_degree you're suggesting, I think the full question is a philosophical one about the goal of the transpile operation, and neither option is a priori the correct one. Lev and Kevin in #7341 both came down on the other side, that they see the goal as being to produce the best fidelity on the given backend, which allows the output unitary to not always be identical in noiseless conditions. (And there's a bit of a selection bias here, because presumably any users who expect/are fine with this default behaviour don't open issues.)

I think we need an answer on our desired design direction for the transpiler from its technical/research leads, so I've assigned Lev and Kevin as proxies for them, and I'll add it to the agenda in Terra meetings. I can't promise it'll be top of the priority queue, especially since there's a workaround, but I'll add it for discussion.


The question of whether our current approximation strategies actually produce better output in all cases is related, but slightly different. We can merge changes to improve those whether or not they're the default. I also do think it's expected behaviour that changing the backend can change the output circuit including the non-swap gate counts, so I don't fully buy that part of the argument.

@ajavadia
Copy link
Member

One sees that the best results (just worry about blue dots here) are at the largest number of CX gates. This is quite counter intuitive, and completely wrong.

can you plot this for approximation_degree=0? i'm not sure HOP is perfectly correlated with CX counts, but maybe better. I just want to get a sense.

I tend to generally agree with the issue, but not for quite the same reason you state. To me, when you give a backend, you are not just giving a coupling map -- you are giving backend properties too which means the compiler should take it into account. If you just want a mapping, then you should do transpile(circuit, coupling_map, basis_gates) which will not approximation since it doesn't know about noisy gates.

But my reason for turning approximation off by default is that we currently do not have a good way of making decisions on what approximation degree is best (this is open research). Our decisions are local, which may not play well in global circuits. If we have better understanding of making approximations, then I'm fine turning this back on.
(e.g. the ApproximateQuantumCompiler does global approximation which is more reliable)

But I would like to add my own feature request: if the user sets approximation to something non-zero, and gate fidelities are known, the approximation should be some function of those gate fidelities (i.e. approximate more on bad gates, less on good gates).

@nonhermitian
Copy link
Contributor Author

Here is with approximation_degree=0

image

These are the same circuits and seeded transpiler. Note that the spread of number of CX is greatly reduced because there are no longer approximations being performed. Also the results are more inline with expectations.

@nonhermitian
Copy link
Contributor Author

nonhermitian commented May 13, 2022

To me, when you give a backend, you are not just giving a coupling map -- you are giving backend properties too which means the compiler should take it into account. If you just want a mapping, then you should do transpile(circuit, coupling_map, basis_gates) which will not approximation since it doesn't know about noisy gates.

I understand this viewpoint. However, looking at things from the point of an user who is not well versed with the code base I am not sure they would appreciate this. For example, there is no tutorial that explains these nuances and instruct the user as to how to proceed in a manner like you suggest above (which is the proper way to do things here). Even within the source code it is hard to tell. One would have to know to go to the UnitarySynthesis pass to see:

https://qiskit.org/documentation/stubs/qiskit.transpiler.passes.UnitarySynthesis.html#unitarysynthesis

and there it defaults to the highest level of approximation approximation_degree=1. However looking at the transpiler docs:

approximation_degree (float) – heuristic dial used for circuit approximation (1.0=no approximation, 0.0=maximal approximation)

but the default value is approximation_degree=None. So users not intimately knowledgeable with the code are really in the dark as to what is actually going on. I think this lack of knowledge also ties into this statement from @jakelishman

and there's a bit of a selection bias here, because presumably any users who expect/are fine with this default behaviour don't open issues.

I don't think many users actually would be able to identify this as being an issue. The cases that have been reported are so wrong that they get flagged. But in many cases the issues with this setting are likely to fly under the radar given the vast number of other things that can fluctuate when dealing with HW. In my case, the only way I realized this is because I had seeded the QV circuit and transpiler but noted that the CX gate count in my plot had changed when I moved to a different system in an attempt to understand why more CX gates gave better HOP.

@kdk
Copy link
Member

kdk commented May 18, 2022

Thanks for raising this @nonhermitian . We discussed this at yesterday's Terra meeting, and I spent a bit of time thinking about the issues here, in #7961 and #7341 . I think there are a couple of factors and more than one bug at play which complicate the issue, but you are spot on that this is an area where it's easy for the transpiler to currently do something other than what the user expects. I've been writing up some thoughts, but wanted to make sure I understand your QV example first.

One sees that the best results (just worry about blue dots here) are at the largest number of CX gates. This is quite counter intuitive, and completely wrong. What is happening is that the O3 preset passmanager is randomly selecting qubits based on Sabre that give a good mapping. However because of variations across the system, the default on approximation is approximating the circuit to various degrees. Thus the "best" circuit in terms of CX gates has actually been approximated severely because (at least it this is my guess) it landed on less than ideal qubits.

I'm not sure I 100% follow the explanation here. This is for a single QV 32 circuit, with 100 different initial layouts and routings based on the randomness in sabre, and it sounds like you're saying that circuits with higher final CX counts are going to amount to more 2q blocks and thus more opportunities for approximation, leading to erroneously high HOPs.

I would naively expect here though, that the outputs with fewer output CX gates would be those which have been heavily approximated (assuming the same seeding as the second example, approximation would be the only way to end up with fewer than 39, 42, or 45 CX gates) and those don't necessarily have higher HOPs.

One interesting (but maybe difficult to answer) question is whether or not, for a fixed layout and routing, approximation leads to a higher or lower HOP in general. Maybe something like, for a fixed set of transpiler seeds, plot the HOP obtained with the default approximation (approximation_degree=None) against the HOP obtained without any approximation (technically approximation_degree=1 , but due to a bug, approximation_degree=0 seems to also disable approximation). This might help differentiate if the approximate synthesis is overall helping or hindering at least in HOP, but it is a fairly central example as this was added to the transpiler as one of the optimizations QV64 paper.

@nonhermitian
Copy link
Contributor Author

nonhermitian commented May 19, 2022

@kdk thanks for taking a look! The first plot is indeed the exact same initial circuit with the same heavy outputs from which to assign a HOP value. The spread of CX values is indeed from Sabre, but also from the default approximation being turned on. The second plot shows the range of CX with no approximation, and there each "bar" differs by 3 CX; a SWAP.

The issue is not with the high HOP values at high CX count per say; the second graph shows that that is inline with what one would expect. Rather it is that lower CX counts do not equate to a higher HOP. All else being equal, because each gate has error the lower the CX count the better your QV circuits should be at getting higher HOP values. Indeed, this is why a lot of effort is spent to minimize the CX count via integer programming. However, the first plot shows that systematically the lower the number of CX gates the worse the HOP is; the complete opposite of what one would expect. This is bad because what I would naively do is transpile multiple times to get the lowest CX count (thinking it all is due to swaps and mapping). In this case it is a bad idea.

I would naively expect here though, that the outputs with fewer output CX gates would be those which have been heavily approximated (assuming the same seeding as the second example, approximation would be the only way to end up with fewer than 39, 42, or 45 CX gates) and those don't necessarily have higher HOPs.

This is correct; lower CX is higher approximation. The fact that they do not lead to higher HOP than circuits with up to 8 more CX gates is counterintuitive, and really goes against the whole point of an optimization; my optimization should not make things markedly worse (and here it does so by default!) The high level of CX truncation is likely because it randomly landed on high-error qubits and threw away CX gates per block when it thought it could. Executing on those same qubits could of course lead to bad outcomes. As such you can move the circuit around and bring the HOP up to the same level as the larger CX count circuits, although this necessarily violates the approximation that was done. The fact that you can't bring it higher than the larger CX count circuits is a hint that one is limited by the truncation which targets a different distribution than the original full unitary, and thus my HOP values will take a hit as I am aiming for the wrong target distribution.

My personal take on this whole thing is that, by default, the transpiler should preserve the unitary up to known linear transformations that I can keep track of (permutations, similarity transformations, swap reordering, etc). We should have that as the bedrock principle of what the transpiler does, and anything beyond that should be explicit to activate. I do get @ajavadia point though that if an user passes a backend we should try to do our best (even if that is not what might be going on here). I would say though that doing so should be explicit, as stated above, and is really a matter of getting users to understand the workflow and options. We in fact already do this. For example, O3 gives you the best mappings by far, yet it is not default. The end user must recognize that O3 is what you should use, and that you might have to run it several times to get a good mapping. With respect to the approximations used here, this is very similar to options like fast-math in classical compilation where one breaks the usual rules in order to get some optimization (speed in that case), but it can bite you if you do not know what you are doing. That flag must be explicitly activated, and is not included in the usual optimization levels of the compiler. I see the issues with approximation here being more or less equivalent to that, but we have implicitly enabled it by default.

@mtreinish
Copy link
Member

Since #8595 has merged I think this has been implemented so I'm going to close this. If I'm missing something here though or misinterpreting what was needed for this issue please feel free to reopen this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants