# Using the experimental v2 local simulator

This tutorial serves as an introduction to the experimental v2 local simulator for Amazon Braket. This tutorial explains how to use the v2 local simulator and the performance difference you can expect to see.

## How to set up and use the new local simulator

The new local simulator is available as a Python package, [`amazon-braket-simulator-v2`](https://github.com/amazon-braket/amazon-braket-simulator-v2-python). You can install it locally with `pip`. You should `import braket.simulator_v2` as well to install all the backend dependencies. Then, all you need to do is create a `LocalSimulator` object with the `"braket_sv_v2"` (state vector) or `"braket_dm_v2"` backend names to use the new local simulator. The new local simulator supports qubit counts up to 32 (state vector) or 16 (density matrix). Keep in mind larger qubit counts require more memory!

In [6]:
# general imports
import numpy as np
import math
import time

# AWS imports: Import Braket SDK modules
from braket.circuits import Circuit, circuit, noises, Gate, Instruction
from braket.devices import LocalSimulator
import braket.simulator_v2

default_simulator = LocalSimulator("braket_sv")
new_sv_simulator  = LocalSimulator("braket_sv_v2")
default_dm_simulator = LocalSimulator("braket_dm")
new_dm_simulator     = LocalSimulator("braket_dm_v2")

n_shots = 100

## Two simple examples: The Quantum Fourier Transform with and without noise

We already presented the Quantum Fourier Transform (QFT) in [the QFT notebook](../advanced_circuits_algorithms/Quantum_Fourier_Transform/Quantum_Fourier_Transform.ipynb). These circuits have a mix of one- and two-qubit gates so we can compare the efficiency of each simulator's implementation. We will simulate the measurement counts for these circuits on both local simulators. The older local simulator can only simulate up to 18 or so qubits for state-vectors, but the new one can work with substantially more. In this case we will not run up to 32 qubits on the new simulator, because the memory use can become quite substantial. 25 qubits is enough to see that the new simulator can outperform the existing default.

In [7]:
@circuit.subroutine(register=True)
def qft(qubits):    
    """
    Construct a circuit object corresponding to the Quantum Fourier Transform (QFT)
    algorithm, applied to the argument qubits.  Does not use recursion to generate the QFT.
    
    Args:
        qubits (int): The list of qubits on which to apply the QFT
    """
    qftcirc = Circuit()

    # get number of qubits
    num_qubits = len(qubits)
    
    for k in range(num_qubits):
        # First add a Hadamard gate
        qftcirc.h(qubits[k])
    
        # Then apply the controlled rotations, with weights (angles) defined by the distance to the control qubit.
        # Start on the qubit after qubit k, and iterate until the end.  When num_qubits==1, this loop does not run.
        for j in range(1,num_qubits - k):
            angle = 2*math.pi/(2**(j+1))
            qftcirc.cphaseshift(qubits[k+j],qubits[k], angle)
            
    # Then add SWAP gates to reverse the order of the qubits:
    for i in range(math.floor(num_qubits/2)):
        qftcirc.swap(qubits[i], qubits[-i-1])
        
    return qftcirc

In [8]:
qubit_range = range(5, 21, 5)
qft_circs   = {}
old_results = {}
new_results = {}
old_durations = {}
new_durations = {}
for num_qubits in qubit_range:
    # generate QFT circuit
    qft_circ = qft(range(num_qubits))
    old_start = time.time()
    old_results[num_qubits] = default_simulator.run(qft_circ, shots=n_shots).result()
    old_stop  = time.time()
    old_durations[num_qubits] = old_stop - old_start
    new_start = time.time()
    new_results[num_qubits] = new_sv_simulator.run(qft_circ, shots=n_shots).result()
    new_stop  = time.time()
    new_durations[num_qubits] = new_stop - new_start
    qft_circs[num_qubits] = qft_circ

for num_qubits in qubit_range:
    print(f"QFT circuit with {num_qubits} qubits:")
    print(f'Old local simulator runtime: {old_durations[num_qubits]}')
    print(f'New local simulator runtime: {new_durations[num_qubits]}')
    print()

QFT circuit with 5 qubits:
Old local simulator runtime: 0.03917956352233887
New local simulator runtime: 0.0059778690338134766

QFT circuit with 10 qubits:
Old local simulator runtime: 0.0754079818725586
New local simulator runtime: 0.013960123062133789

QFT circuit with 15 qubits:
Old local simulator runtime: 0.2364799976348877
New local simulator runtime: 0.06110024452209473

QFT circuit with 20 qubits:
Old local simulator runtime: 4.88277268409729
New local simulator runtime: 0.5842669010162354



## QFT circuits in the presence of noise

As shown in [Simulating Noise on Amazon Braket](./Simulating_Noise_On_Amazon_Braket.ipynb), we can attach noise operations to Braket circuits and use a density matrix simulator to simulate the evolution of the circuit in the presence of these noise channels. The density matrix simulators can simulate half the qubits of their state vector counterparts, so we will simulate noisy QFT circuits up to 12 qubits. We'll define some simple noise channels -- `BitFlip` and `PhaseFlip` -- apply them to the QFT circuit gates, and compare the simulator performance.

In [9]:
bit_flip   = noises.BitFlip(probability=0.1)
phase_flip = noises.PhaseFlip(probability=0.15)

qubit_range = range(2, 9, 2)
qft_circs   = {}
old_results = {}
new_results = {}
old_durations = {}
new_durations = {}
for num_qubits in qubit_range:
    # generate QFT circuit
    qft_circ = qft(range(num_qubits))
    qft_circ.apply_gate_noise(bit_flip)
    qft_circ.apply_gate_noise(phase_flip)
    
    old_start = time.time()
    old_results[num_qubits] = default_dm_simulator.run(qft_circ, shots=n_shots).result()
    old_stop  = time.time()
    old_durations[num_qubits] = old_stop - old_start
    new_start = time.time()
    new_results[num_qubits] = new_dm_simulator.run(qft_circ, shots=n_shots).result()
    new_stop  = time.time()
    new_durations[num_qubits] = new_stop - new_start
    qft_circs[num_qubits] = qft_circ

for num_qubits in qubit_range:
    print(f"Noisy QFT circuit with {num_qubits} qubits:")
    print(f'Old noise local simulator runtime: {old_durations[num_qubits]}')
    print(f'New noise local simulator runtime: {new_durations[num_qubits]}')
    print()

Noisy QFT circuit with 2 qubits:
Old noise local simulator runtime: 0.02772045135498047
New noise local simulator runtime: 0.005122661590576172

Noisy QFT circuit with 4 qubits:
Old noise local simulator runtime: 0.06953191757202148
New noise local simulator runtime: 0.008308172225952148

Noisy QFT circuit with 6 qubits:
Old noise local simulator runtime: 0.16034436225891113
New noise local simulator runtime: 0.03748893737792969

Noisy QFT circuit with 8 qubits:
Old noise local simulator runtime: 0.4437224864959717
New noise local simulator runtime: 0.20065069198608398



## Running circuit batches

The new local simulator also has improved support for running *batches* of circuits. To see the effectiveness of this new functionality, we'll run a batch of 5 QFT circuits for varying qubit counts:

In [10]:
qubit_range = range(5, 16, 5)
qft_circs   = {}
old_results = {}
new_results = {}
old_durations = {}
new_durations = {}

batch_size = 20

for num_qubits in qubit_range:
    # generate QFT circuit
    qft_circ = qft(range(num_qubits))
    old_start = time.time()
    batch_circs = [qft_circ for c_ix in range(batch_size)]
    old_results[num_qubits] = default_simulator.run_batch(batch_circs, shots=n_shots).results()
    old_stop  = time.time()
    old_durations[num_qubits] = old_stop - old_start
    new_start = time.time()
    new_results[num_qubits] = new_sv_simulator.run_batch(batch_circs, shots=n_shots, max_parallel=2).results()
    new_stop  = time.time()
    new_durations[num_qubits] = new_stop - new_start
    qft_circs[num_qubits] = qft_circ

for num_qubits in qubit_range:
    print(f"{batch_size} QFT circuits with {num_qubits} qubits:")
    print(f'Old local simulator runtime: {old_durations[num_qubits]}')
    print(f'New local simulator runtime: {new_durations[num_qubits]}')
    print()

20 QFT circuits with 5 qubits:
Old local simulator runtime: 0.7503087520599365
New local simulator runtime: 0.23053908348083496

20 QFT circuits with 10 qubits:
Old local simulator runtime: 1.6170601844787598
New local simulator runtime: 0.5473954677581787

20 QFT circuits with 15 qubits:
Old local simulator runtime: 4.406347274780273
New local simulator runtime: 0.6591923236846924



## Performance comparison on a high performance `ml.c5.xlarge` instance type

Smaller notebook instances, like the default `ml.t3.medium` type, don't have enough memory to see the performance benefit of the new simulators for larger circuits. Here we include a comparison from a high performance notebook instance, an `ml.c5.xlarge`. First, we look at how `braket_sv_v2` and `braket_dm_v2` compare to their default counterparts for *single* circuits:

![Performance comparison for single circuits](single_circuits.png)

The data are the same between rows, but the bottom row is plotted with a `log10` y-scale to see the difference across qubit counts.

We can also examine the performance for batches:

![Performance comparison for circuit batches](batch_circuits.png)

Again, the data are the same between rows, but the bottom row is plotted with a `log10` y-scale to see the difference across qubit counts and batches.