Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qpy serialization format #5578

Merged
merged 49 commits into from
May 13, 2021
Merged

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Jan 4, 2021

Summary

This commit adds a portable lightweight binary serialization format that
is intended to represent Qiskit's circuit object. It is intended to be a
forward compatible format that will work with all future versions of
qiskit. The intent here is to provide a good alternative to pickle
(which qobj and qasm are not) which is commonly used but isn't a good
choice for both security and cross-platform and cross-version support
because pickle isn't really intended to be a general serialization
solution.

Details and comments

TODO:

  • Expand testing
  • Add support for storing multiple circuits in a single file
  • Add custom gate and custom instruction support
  • Add Parameter support
  • Add support for conditions on instructions

@mtreinish mtreinish added the on hold Can not fix yet label Jan 4, 2021
@mtreinish mtreinish requested a review from a team as a code owner January 4, 2021 21:00
@mtreinish
Copy link
Member Author

So I did some quick benchmarks to see how the performance compares. For a 65x1024 random circuit generated using random_circuit(65, 1024, measure=True):

random_circuit (to generate the circuit): 2.217 sec
circuit.qasm(): 0.3116 sec
qpy dump(): 0.105 sec
qobj assemble().to_dict(): 0.2863 sec
qobj dict json serialization: 0.2384 sec
pickle.dump(): 0.0748 sec

circuit.from_qasm_file(): 5.9294 sec
qpy load(): 1.46 sec
pickle.load(): 0.1027 sec

We can probably speed up qpy deserialization a bunch too if we switch append() internally to _append() but I wasn't sure if we'd want to do that or not. I think it should be safe even with arbitrary input because while we allow arbitrary input the way things get constructed nothing would break the guarantees on circuit._data that append() guards against.

The only negative with qpy right now is that it's a bit sparse because of the fixed width strings for names, so the uncompressed file size is almost as large as a pickle (for the example above 2.7M for qpy, 3.2M for pickle, 692K for qasm, and 1.6M for qobj). We could fix this by using variable length strings and just storing the size in the metadata, but the tradeoff there is complexity and because the file is sparse with fixed width padding compression works quite well on it (for example the 2.7M qpy file compresses to 333K with the default gzip settings).

@kdk
Copy link
Member

kdk commented Jan 6, 2021

One comment about the downsides of storing QASM was that each circuit required a separate file, unlike qobj where circuits could be grouped into experiments and stored together. Is storing multiple circuits per file an option here?

@mtreinish
Copy link
Member Author

One comment about the downsides of storing QASM was that each circuit required a separate file, unlike qobj where circuits could be grouped into experiments and stored together. Is storing multiple circuits per file an option here?

Yeah that is definitely something that's doable, and shouldn't be too hard; there are a few ways to add support for multiple circuits.. I added it to the TODO list in the PR summary.

This commit adds a portable lightweight binary serialization format that
is intended to represent Qiskit's circuit object. It is intended to be a
forward compatible format that will work with all future versions of
qiskit. The intent here is to provide a good alternative to pickle
(which qobj and qasm are not) which is commonly used but isn't a good
choice for both security and cross-platform and cross-version support
because pickle isn't really intended to be a general serialization
solution.
@mtreinish mtreinish removed the on hold Can not fix yet label Jan 7, 2021
@mtreinish mtreinish changed the title [WIP] Add qpy serialization format Add qpy serialization format Jan 7, 2021
@mtreinish
Copy link
Member Author

mtreinish commented Jan 7, 2021

Ok, I think this is more or less ready for review. I've added support for the missing features including support for multiple circuits in a single file. I haven't done extensive testing though besides what's in test file added here so it might need additional test cases for some edge conditions. It would also be good to run some manual cross-platform (especially on a big-endian system) tests just to verify it works everywhere. It's constructed to be portable so I'm not expecting any issues with that, but it would be good to verify it at least.

@mtreinish
Copy link
Member Author

I also reran the benchmark script I have that times serialization and deserialization of a random 65x1024 circuit and the updated numbers are:

random_circuit (to generate the circuit): 2.1944 sec
circuit.qasm(): 0.3103 sec
qpy dump(): 0.1230 sec
qobj assemble().to_dict(): 0.3318 sec
qobj dict json serialization: 0.2457 sec
pickle.dump(): 0.0852 sec

circuit.from_qasm_file(): 7.152 sec
qpy load(): 0.6757 sec
pickle.load(): 1.0868 sec

The file size for qpy has grown a bit though as there is more state being stored now. The same tradeoffs as before are there with the fixed size strings which is something we can change.

In keeping with compability with changes to the circuit model made in
0.17.0 registers in qpy are now a subset of contiguous indices in the
bit lists for a circuit. This removes the need to access qubit.index
and qubit.name on arguments to instructions (which raises a deprecation
warning). This also should be backwards compatible for terra < 0.17.0
there just isn't any potential for an index without a register in the
circuit for terra < 0.17.0.
@eggerdj
Copy link
Contributor

eggerdj commented Apr 30, 2021

Does this support stand-alone pulse schedules?

@mtreinish
Copy link
Member Author

Does this support stand-alone pulse schedules?

Not yet, I was planning on adding pulse support (both standalone schedules and embedded in circuits as calibrations, although it's really the same thing) for the v2 format. But, that was based on the assumption this was going to be included in 0.17.0, so if I find time to circle back before the release (either before or after this PR merges) I can work on adding it.

@kdk kdk self-assigned this May 4, 2021
kdk
kdk previously approved these changes May 11, 2021
Copy link
Member

@kdk kdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in 7c95f68 look good. One comment below on the order in which we reconstruct the bits/registers on read.

General agreement is that this is good to merge with limited backwards compatibility support. We should follow up before releasing with user documentation and a developer example of handling a breaking change.

qiskit/circuit/qpy_serialization.py Show resolved Hide resolved
qiskit/circuit/qpy_serialization.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants