# GPU Simulator


## Introduction

This notebook shows how to accelerate Qiskit Aer simulators by using GPUs. 

To install GPU support in Qiskit Aer, please install GPU version of Qiskit Aer by

`pip install qiskit-aer-gpu`

In [5]:
from qiskit import *
from qiskit.circuit.library import *
from qiskit.providers.aer import *

## GPU Qiskit Aer Simulator Backends and Methods

 Following Qiskit Aer backends currently support GPU acceleration:
* `QasmSimulator`
* `StatevectorSimulator`
* `UnitarySimulator`

To check the availability of GPU support on these backends, `available_method()` returns methods with gpu suports.

In [6]:
qasm_sim = QasmSimulator()
print(qasm_sim.available_methods())

['automatic', 'statevector', 'statevector_gpu', 'density_matrix', 'density_matrix_gpu', 'stabilizer', 'matrix_product_state', 'extended_stabilizer']


If Qiskit Aer with GPU support is installed correctly, you can see `statevector_gpu` and `density_matrix_gpu`

In [9]:
st_sim = StatevectorSimulator()
print(st_sim.available_methods())
u_sim = UnitarySimulator()
print(u_sim.available_methods())

['automatic', 'statevector', 'statevector_gpu']
['automatic', 'unitary', 'unitary_gpu']


### Simulation with GPU

Here is a simple example to run quantum volume circuit with 20 qubits by using `QasmSimulator` backend.
Setting the simulation method `statevector_gpu` in `backend_options` parameter passed to `QasmSimulator.run` method to use GPU for the simulaiton.

In [12]:
shots = 64
qubit = 20
qv20 = QuantumVolume(qubit, 10, seed = 0)
qv20 = transpile(qv20, basis_gates=['u1', 'u2', 'u3', 'cx'],
                 optimization_level=0, seed_transpiler=1)
qv20.measure_all()
qobj = assemble(qv20, shots=shots, memory=True)
result = qasm_sim.run(qobj, backend_options={"method" : "statevector_gpu"}).result()

counts = result.get_counts(qv20)
print(counts)

{'10111010100000101101': 1, '11100110111101100000': 1, '10100101001011000100': 1, '11111110011011101001': 1, '11010011101110000111': 1, '00111010101101110000': 1, '11101011000101101110': 1, '01011000111111001100': 1, '10100011000111101010': 1, '00000011000101001100': 1, '00100110110101010000': 1, '11010010001111100111': 1, '11011110011011001001': 1, '01110100010100100011': 1, '10000001110001110101': 1, '00111111011001100001': 1, '11101100010011100010': 1, '00001010001000010110': 1, '11100111010110100001': 1, '01011110010011010101': 1, '10101000000101111000': 1, '01000101110001110010': 1, '10111100010111100000': 1, '01011111111000111011': 1, '10001010110110000111': 1, '01101001010011111010': 1, '10001001101001010010': 1, '00100000111111110110': 1, '11110011110101110010': 1, '01000011010010101010': 1, '01000111010011011100': 1, '01010011010101001001': 1, '10110001101111010101': 1, '11001000110010001111': 1, '11011010110010001111': 1, '01110101010110101011': 1, '10010000000000001101': 1, 

The following sample shows an example using `density_matrix_gpu` mthod in `QasmSimulator`.

In [13]:
qubit = 10
qv10 = QuantumVolume(qubit, 10, seed = 0)
qv10 = transpile(qv10, basis_gates=['u1', 'u2', 'u3', 'cx'],
                 optimization_level=0, seed_transpiler=1)
qv10.measure_all()
qobj = assemble(qv10, shots=shots, memory=True)
result = qasm_sim.run(qobj, backend_options={"method" : "density_matrix_gpu"}).result()

counts = result.get_counts(qv10)
print(counts)

{'0010000101': 1, '1110000111': 1, '0010001001': 1, '0100001100': 2, '0011110110': 1, '0011111010': 1, '1101011100': 1, '1010011100': 1, '1010010100': 1, '0101001111': 1, '1001011011': 1, '1011101111': 1, '0001001000': 1, '1000101011': 1, '1110011001': 1, '1101110000': 1, '1111000110': 1, '0000110101': 1, '0101111000': 1, '0011010110': 1, '1100011000': 1, '1111100011': 1, '1111100000': 1, '1100111110': 1, '0011001100': 1, '0100000100': 1, '1110100000': 1, '0100100100': 1, '1101000001': 1, '1110001000': 1, '1011011010': 1, '0100100101': 1, '0010000100': 1, '0111100000': 1, '0110100101': 1, '1111111110': 1, '0101101010': 2, '1001000011': 1, '0010111010': 1, '1001101010': 1, '1101100011': 1, '0111101011': 1, '1001100011': 1, '0011000100': 1, '1101100001': 1, '0100101110': 2, '0101010010': 1, '1000101001': 1, '1111011001': 1, '1100011010': 1, '1010001001': 1, '1010100001': 1, '0010001011': 1, '0111010000': 1, '1000100000': 1, '0011101000': 1, '1010101011': 1, '0110110011': 1, '1001111011':

## Parallelizing Simulaiton by Using Multiple GPUs

In general GPU has less memory size than CPU, and the largest number of qubits is depending on the memory size. For example, if a GPU has 16 GB of memory, Qiskit Aer can simulate up to 29 qubits by using `statevector_gpu` method in `QasmSimulator` and `StatevectorSimulator` backends or up to 14 qubits by using `density_matrix_gpu` method in `QasmSimulator` backend and `unitary_gpu` method in `UnitarySimulator` backend in double precision.

To simulate more larger nnumber of qubits, multiple GPUs can be used to parallelize the simulation or also parallel simulation can accelerate the simulation time. 

Currently, multi-GPU mode is not selected automatically and if one GPU does not have enough memory for the circuit's qubits the simulation will fail. Setting some options in the `backend_options` parameter passed to `run` method is required to enter multi-GPU mode.

Following 2 options should be passed:
* `blocking_enable` : Set `True` to enable parallelization
* `blocking_qubits` : This option sets the size of chunk that is distributed to parallel memory space. Set this parameter to satisfy `16*(2^(blocking_qubits+4)) < smallest memory on the system (in byte)` for double precision. (`8*` for single precision).

Here is an example of Quantum Volume of 30 qubits with multiple GPUs by using `QasmSimulator` backend and `statevector_gpu` method.

In [16]:
qubit = 30
qv30 = QuantumVolume(qubit, 10, seed = 0)
qv30 = transpile(qv30, basis_gates=['u1', 'u2', 'u3', 'cx'],
                 optimization_level=0, seed_transpiler=1)
qv30.measure_all()
qobj = assemble(qv30, shots=shots, memory=True)
result = qasm_sim.run(qobj, backend_options={"method" : "statevector_gpu", "blocking_enable" : True, "blocking_qubits" : 23 }).result()

counts = result.get_counts(qv30)
print(counts)

{'110010100100001110100101100000': 1, '100100100000001111010010010110': 1, '001001011001001101100111010010': 1, '000100100100101000000101110110': 1, '100010001001001000100110011111': 1, '000110010101001001110110000100': 1, '100001110011001100110001101111': 1, '100000110011011010001111011101': 1, '101011101100000111000100001110': 1, '000100111111000001000101001011': 1, '101101101110111101100011110001': 1, '110001010010101100111001010011': 1, '001111011111011000000111011100': 1, '111110111000010110001100001111': 1, '001100100000010001111000101111': 1, '101010011100000100111100010001': 1, '110110010010110111010111110100': 1, '001000100110001000001001110010': 1, '111111110011101010111111001100': 1, '010001111011100100010001110000': 1, '100011100011011000110101111010': 1, '110000110100110010001100100111': 1, '010001001010110011011010010110': 1, '111101000011000000101001100100': 1, '010101110010010100001110110001': 1, '011110010010101001011100010111': 1, '001101100011010010101001111001': 1, 

### Note

Note that only `QasmSimulator` can be applied for large qubit circuits because `StatevectorSimulator` and `UnitarySimulator` backends currently returns snapshots of state that will require large memory space. If CPU has enough memory to store snapshots these 2 backends can be used with GPUs.

## Distribution of Shots by Using Multiple GPUs

Also GPUs can be used to accelerate simulating multiple shots with noise models. If the system has multiple GPUs, shots are automatically distributed to GPUs if there is enough memory to simulate one shot on single GPU. 



In [19]:
shots = 1000
qobj = assemble(qv10, shots=shots, memory=True)
result = qasm_sim.run(qobj, backend_options={"method" : "statevector_gpu"}).result()

rdict = result.to_dict()
print("simulation time = {0}".format(rdict['time_taken']))

simulation time = 0.02101421356201172


In [None]:
from qiskit.providers.aer.noise import *
qubit=10
noise_model = NoiseModel()
error = depolarizing_error(0.05, qubit)
noise_model.add_all_qubit_quantum_error(error, ['u1', 'u2', 'u3', 'cx'])
shots = 1000
qobj = assemble(qv10, shots=shots, memory=True)
result = qasm_sim.run(qobj, noise_model = noise_model, backend_options={"method" : "statevector_gpu"}).result()

rdict = result.to_dict()
print("simulation time = {0}".format(rdict['time_taken']))

In [14]:
import qiskit.tools.jupyter
%qiskit_version_table
%qiskit_copyright

Qiskit Software,Version
Qiskit,0.23.4
Terra,0.16.3
Aer,0.8.0
Ignis,0.5.1
Aqua,0.8.1
IBM Q Provider,0.11.1
System information,
Python,"3.9.1 (default, Dec 11 2020, 14:41:06) [GCC 7.3.0]"
OS,Linux
CPUs,40
