Faster QubitDevice.marginal_prob #799

antalszava · 2020-09-15T04:24:10Z

Context:
While looking at improvements for the QubitDevice class, it was found, that certain internal functions for computing statistics might not scale very well with larger numbers of qubits. One method, in particular, the marginal_prob method seemed to have performed poorly with an increasing number of wires.

Running a benchmark for 22 qubits and extracting probabilities showed the following results:

61333601 function calls (61261521 primitive calls) in 924.477 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9570  618.700    0.065  618.700    0.065 {built-in method numpy.core._multiarray_umath.c_einsum}
19410/19380  205.138    0.011  205.138    0.011 {built-in method numpy.array}
       30   61.982    2.066  278.104    9.270 _qubit_device.py:425(marginal_prob)
19470/19410   10.603    0.001  629.722    0.032 {built-in method numpy.core._multiarray_umath.implement_array_function}
  4277871    5.188    0.000   11.272    0.000 {method 'update' of 'dict' objects}
  4277790    2.770    0.000   16.490    0.000 unweighted.py:69(_single_shortest_path_length)
     9570    2.309    0.000  623.943    0.065 default_qubit.py:485(_apply_unitary_einsum)

This and further results for smaller systems showed that considerable time is taken for conversion to numpy.array:

A numpy.array is being created in the following expression as part of marginal_probs:

basis_states = np.array(list(itertools.product([0, 1], repeat=len(device_wires))))

Description of the Change:

Adds a faster approach for computing the basis_states by using the states_to_binary method.
Further improves the original array creation by using np.fromiter and itertools.chain

Benefits:

Significant decrease in the contribution of the marginal_probs and sub-methods to the overall time taken for the benchmark on extracting probabilities (>205.138 seconds -> ~35 seconds):

61333691 function calls (61261611 primitive calls) in 627.123 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9570  566.435    0.059  566.435    0.059 {built-in method numpy.core._multiarray_umath.c_einsum}
       30   19.549    0.652   37.160    1.239 _qubit_device.py:425(marginal_prob)
19470/19410   10.014    0.001  576.823    0.030 {built-in method numpy.core._multiarray_umath.implement_array_function}
       30    4.529    0.151    6.861    0.229 _qubit_device.py:314(states_to_binary)
  4277871    4.285    0.000    9.177    0.000 {method 'update' of 'dict' objects}
       30    2.332    0.078    2.332    0.078 {method 'astype' of 'numpy.ndarray' objects}
  4277790    2.222    0.000   13.308    0.000 unweighted.py:69(_single_shortest_path_length)
     9570    2.174    0.000  571.274    0.060 default_qubit.py:485(_apply_unitary_einsum)

Faster default way when the states_to_binary method is not ideal to be used (>30 qubits):

%%time
num = len(device_wires)
basis_states = np.fromiter(itertools.chain(*itertools.product((0, 1), repeat=num)),dtype=int).reshape(-1,num)

CPU times: user 4.59 s, sys: 357 ms, total: 4.95 s
Wall time: 4.95 s

Original:

%%time
num = len(device_wires)
basis_states = np.array(list(itertools.product((0, 1), repeat=num)))

CPU times: user 6.74 s, sys: 304 ms, total: 7.05 s
Wall time: 7.05 s

Possible Drawbacks:
The states_to_binary method creates basis states faster (for larger systems at times over x25 times faster) than the approach
using itertools, at the expense of using slightly more memory when using dtype=int64. Hence we constraint the dtype of the array to representing unsigned integers on 32 bits (uint32).

Due to this constraint, an overflow occurs for 32 or more wires:

import numpy as np

arr_32 = np.zeros(1, dtype=np.uint32)

for i in range(40):
    arr_32[0] = 2 ** i
    if arr_32[0] != 2 ** i:
        print('Overflow.', i)
        break

Overflow: 32

Therefore this approach is used only for fewer wires.

For a smaller number of wires, speed is comparable to the improved original approach, hence we resort to that one for testing purposes

Related GitHub Issues:
N/A

codecov · 2020-09-15T04:30:19Z

Codecov Report

Merging #799 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #799   +/-   ##
=======================================
  Coverage   91.09%   91.10%           
=======================================
  Files         130      130           
  Lines        8701     8709    +8     
=======================================
+ Hits         7926     7934    +8     
  Misses        775      775

Impacted Files	Coverage Δ
pennylane/_qubit_device.py	`98.82% <100.00%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 379c45a...d471bbb. Read the comment docs.

josh146

This is a huge improvement @antalszava! Great work 🙂

pennylane/_qubit_device.py

josh146 · 2020-09-15T11:08:22Z

pennylane/_qubit_device.py

+            states_base_ten = np.arange(2 ** num_wires, dtype=np.int32)
+            basis_states = self.states_to_binary(states_base_ten, num_wires)


Can this be made less memory intensive by having states_base_ten be a generator (so that you don't have to store all basis states in memory at once?)

I'm not sure this idea would work though, since then you might not be able to perform the bitwise shift << 🙂

Have looked into this. As we'd like to eventually use basis_states to compute perm and permute the probabilities, we'd need to have all the basis states in a single array at one time.

One thing I found was that we can use a 32-bit integer for casting in states_to_binary, that helped with memory to some extent (turned to 32 bits from 64).

tests/test_prob.py

pennylane/_qubit_device.py

trbromley · 2020-09-15T12:29:31Z

pennylane/_qubit_device.py

+            # therefore this approach is used only for fewer wires.
+            # For smaller number of wires speed is comparable to the next
+            # approach, hence we resort to that one for testing purposes
+            states_base_ten = np.arange(2 ** num_wires, dtype=np.int32)


If you use np.uint32, do we make it to 32 qubits?

Thanks! Yeah, could increase it so that we can have 31 qubits (overflow at 32)

Co-authored-by: Josh Izaac <josh146@gmail.com>

…nnylane into faster_marginal_prob

Co-authored-by: Josh Izaac <josh146@gmail.com>

…nnylane into faster_marginal_prob

antalszava · 2020-09-26T00:04:21Z

Thanks @josh146 @trbromley for the comments! Addressed them.

(Also found a cool memory profiler, pip install -U memory_profiler. Can be used akin to timeit: %memit 😄 . In a Jupyter notebook after loading with %load_ext memory_profiler, each cell can be run with the %%memit magic to see the memory consumption of the statement.)

antalszava added 3 commits September 14, 2020 22:13

Use states_to_binary in marginal_prob

81691c0

Adding comment; if statements; test

d51bb04

Formatting

2e43d76

antalszava changed the title ~~Faster marginal_prob~~ Faster QubitDevice.marginal_prob Sep 15, 2020

josh146 reviewed Sep 15, 2020

View reviewed changes

trbromley approved these changes Sep 15, 2020

View reviewed changes

josh146 and others added 23 commits September 15, 2020 22:05

Merge branch 'master' into faster_marginal_prob

b4c7e04

Update pennylane/_qubit_device.py

c512ad1

Co-authored-by: Josh Izaac <josh146@gmail.com>

Merge branch 'master' into faster_marginal_prob

0c2b9ca

Merge branch 'master' into faster_marginal_prob

9b90879

Merge branch 'faster_marginal_prob' of https://github.com/XanaduAI/pe…

83da5e2

…nnylane into faster_marginal_prob

Update pennylane/_qubit_device.py

4c9972b

Co-authored-by: Josh Izaac <josh146@gmail.com>

Update pennylane/_qubit_device.py

389178a

Co-authored-by: Josh Izaac <josh146@gmail.com>

Merge branch 'faster_marginal_prob' of https://github.com/XanaduAI/pe…

3fc1311

…nnylane into faster_marginal_prob

Merge branch 'master' into faster_marginal_prob

bfe1ef8

Merge branch 'faster_marginal_prob' of https://github.com/XanaduAI/pe…

b8f2685

…nnylane into faster_marginal_prob

Batching basis states

69b494d

Merge branch 'master' into faster_marginal_prob

0360bc3

Merge branch 'faster_marginal_prob' of https://github.com/XanaduAI/pe…

c101b1e

…nnylane into faster_marginal_prob

Merge branch 'master' into faster_marginal_prob

8ae8a44

dtype

f0365ec

Merge branch 'master' into faster_marginal_prob

0a3ba4f

Merge branch 'master' into faster_marginal_prob

d25033c

Update

cdabe41

Update

5fc3056

Using mocker

9bd09b5

Create generate_basis_states

149e196

Merge branch 'faster_marginal_prob' of https://github.com/XanaduAI/pe…

4674229

…nnylane into faster_marginal_prob

Int

6cf2fd6

antalszava added 5 commits September 25, 2020 19:08

uint32

e1392b7

Docstrings

1df097a

Merge branch 'master' into faster_marginal_prob

745769e

max is 31 as overflow at 32

a3751c5

QubitDevice.states_to_binary

837ed35

antalszava requested a review from josh146 September 26, 2020 00:00

antalszava added 2 commits September 27, 2020 20:20

Merge branch 'master' into faster_marginal_prob

0a11764

Merge branch 'master' into faster_marginal_prob

d471bbb

antalszava merged commit 299517d into master Sep 28, 2020

antalszava deleted the faster_marginal_prob branch September 28, 2020 12:58

josh146 mentioned this pull request Apr 14, 2021

Port plugin to QubitDevice, fix various issues and errors PennyLaneAI/PennyLane-IonQ#16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster QubitDevice.marginal_prob #799

Faster QubitDevice.marginal_prob #799

antalszava commented Sep 15, 2020 •

edited

Loading

codecov bot commented Sep 15, 2020 •

edited

Loading

josh146 left a comment

josh146 Sep 15, 2020

antalszava Sep 25, 2020 •

edited

Loading

trbromley Sep 15, 2020

antalszava Sep 25, 2020

antalszava commented Sep 26, 2020

		states_base_ten = np.arange(2 ** num_wires, dtype=np.int32)
		basis_states = self.states_to_binary(states_base_ten, num_wires)

Faster QubitDevice.marginal_prob #799

Faster QubitDevice.marginal_prob #799

Conversation

antalszava commented Sep 15, 2020 • edited Loading

codecov bot commented Sep 15, 2020 • edited Loading

Codecov Report

josh146 left a comment

Choose a reason for hiding this comment

josh146 Sep 15, 2020

Choose a reason for hiding this comment

antalszava Sep 25, 2020 • edited Loading

Choose a reason for hiding this comment

trbromley Sep 15, 2020

Choose a reason for hiding this comment

antalszava Sep 25, 2020

Choose a reason for hiding this comment

antalszava commented Sep 26, 2020

antalszava commented Sep 15, 2020 •

edited

Loading

codecov bot commented Sep 15, 2020 •

edited

Loading

antalszava Sep 25, 2020 •

edited

Loading