# Embedded simulators in Braket Jobs

In this notebook, we introduce embedded simulators in Braket Jobs. An embedded simulator is a local simulator that runs completely within a job instance, i.e., the compute resource that is running your algorithm script. In contrast, [on-demand simulators](https://docs.aws.amazon.com/braket/latest/developerguide/braket-devices.html#braket-simulator-sv1), such as SV1, DM1, or TN1, calculate the results of a quantum circuit on dedicated compute infrastructure on-demand by Amazon Braket. At a high level, hybrid workloads usually consist of iterations of quantum circuit executions and variational parameter optimizations. By using embedded simulators, we keep all computations in the same environment. This allows the optimization algorithm to access advanced features supported by the embedded simulator. For example, users could leverage advanced gradient computation methods, such as [the adjoint and the backprop method](https://pennylane.readthedocs.io/en/stable/introduction/interfaces.html#simulation-based-differentiation), for supported simulators via PennyLane. These methods rely on accessing the intermediate states of the wave function, which is not supported by Amazon Braket on-demand simulators or QPUs. Furthermore, with the [Bring Your Own Container (BYOC)](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-byoc.html) feature of Jobs, users may choose to use open source simulators or their own proprietary simulation tools. 

In contrast to running a local simulator in a manually set up [Amazon EC2 instance](https://aws.amazon.com/ec2/) or in a user's local environment, Hybrid Jobs manage the computational resources on your behalf. A job instance is automatically started when the job is created, and is ended when the job is finished, so you only pay for the resources you use. Users can submit multiple jobs at the same time to accelerate experimentation, e.g., during hyperparameter optimization. In addition, users can switch from an embedded simulator to other Amazon Braket devices, including QPUs, by changing the selected device when creating a job.

## Specify a device for embedded simulations

Typically, when creating a job, you assign the `device` argument of `AwsQuantumJob.create()` to the ARN (Amazon Resource Name) of an on-demand simulator or a QPU. To select an embedded simulator, you instead specify the `device` argument to be a string of the form: <br> 
`device = "local:<provider>/<simulator_name>"` <br>
Note that, `<provider>` and `<simulator_name>` must consist only of letters, numbers, `_`, `-` and `.`. For example, to use the Amazon Braket local simulator through the [Braket-Pennylane plugin](https://github.com/aws/amazon-braket-pennylane-plugin-python), you would write:

In [1]:
device = "local:braket/braket.local.qubit"

The algorithm script can access the string in `device` through the environment variable `"AMZN_BRAKET_DEVICE_ARN"`. Braket service does not perform logic on `<provider>` and `<simulator_name>`. Users can specify `<provider>` and `<simulator_name>` for their custom logic in the algorithm script to create any simulators. In this notebook, we use simulators through Braket-PennyLane plugin. We prepare a helper function `get_device()` as a part of the algorithm script to parse the `device` variable and set up the Pennylane `qml.device`.

In [2]:
!cat qaoa/utils.py

import pennylane as qml
import os

def get_device(n_wires):
    device_string = os.environ["AMZN_BRAKET_DEVICE_ARN"]
    device_prefix = device_string.split(":")[0]

    if device_prefix=="local":
        prefix, device_name = device_string.split("/")
        device = qml.device(device_name, 
                            wires=n_wires, 
                            custom_decomps={"MultiRZ": qml.MultiRZ.compute_decomposition})
        print("Using local simulator: ", device.name)
    else:
        device = qml.device('braket.aws.qubit', 
                             device_arn=device_string, 
                             s3_destination_folder=None,
                             wires=n_wires,
                             parallel=True,
                             max_parallel=30)
        print("Using AWS on-demand device: ", device.name)
        
    return device

## Submit a job with an embedded simulator
In this example, we use Hybrid Jobs with an embedded simulator to run a QAOA algorithm on the Max-Clique problem  which tries to find the largest set of fully-connected nodes in a graph. You can learn more about the details of this example in the notebook [graph optimization notebook](https://github.com/aws/amazon-braket-examples/blob/main/examples/pennylane/2_Graph_optimization_with_QAOA.ipynb). The QAOA algorithm for our job is defined in qaoa_algorithm.py. The algorithm requires hyperparameters that are related to setting up the Max-Clique problem, such as n_nodes and n_edges, and also the ones related to training itself, such as the number of iterations and the step size.

In [3]:
hyperparameters={
    "n_nodes": "6", 
    "n_edges": "8", 
    "n_layers": "3",
    "iterations": "10",
    "stepsize": "0.1",
    "seed": "42",
    "diff_method": "parameter-shift",
}

Since the algorithm is written is PennyLane you need to select a corresponding container. There are two pre-configured containers that include PennyLane, the PyTorch and the TensorFlow containers. See the [developer guide](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-script-environment.html) to learn more about pre-configured containers. Let's use the PyTorch container.

In [4]:
from braket.aws import AwsSession
from braket.jobs.image_uris import Framework, retrieve_image

region = AwsSession().region
image_uri = retrieve_image(Framework.PL_PYTORCH, region)
print(image_uri)

292282985366.dkr.ecr.us-east-1.amazonaws.com/amazon-braket-pytorch-jobs:1.8.1-cpu-py37-ubuntu18.04


When using embedded simulators in Hybrid Jobs, the circuits are executed on the job instance. For simulations that requires larger computational resources, such as simulating circuits with many qubits, choosing a job instance with a higher number of cores or larger memory may be required. For gpu-based simulators, such as lightning.gpu, you need to select a GPU instance. You can use the `InstanceConfig` argument to configure the instance for your job. The available instance types are listed in the [developer guide](https://docs.aws.amazon.com/braket/latest/APIReference/API_InstanceConfig.html). Let's use a general purpose instance type (ml.m5.large) for now.

In [5]:
from braket.jobs.config import InstanceConfig

instance_config = InstanceConfig(instanceType='ml.m5.large')

You can easily switch between different device using the `device` argument. In the following cell you can uncomment one of the three lines to use either the Braket local simulator, Pennylane default.qubit simulator, or the on-demand state vector simulator SV1.

In [6]:
device="local:braket/braket.local.qubit" # Using Braket local simulator
# device="local:pennylane/default.qubit" # Using Pennylane default.qubit simulator
# device="arn:aws:braket:::device/quantum-simulator/amazon/sv1" # Using Braket on-demand SV1 

Let's now submit the job! At a minimum, you have to specify the device, source_module, and entry_point arguments. But you can customize your job with other arguments, including the following:
- `device`: The specification of a embedded simulator that follows the syntax of `"local:<provider>/<simulator_name>"`, or the arn of the Braket on-demand simulator or QPU you want to use. It will be stored as an environment variable for the algorithm script.
- `source_module`: The path to a file or a python module that contains your algorithm script. It will be uploaded to the container for Braket Job execution.
- `entry_point`: The path relative to the source_module. It points to the piece of code to be executed when the Braket Job starts.
- `hyperparameters`: The Python dictionary containing the hyperparameter names and values as strings. (optional)
- `job_name`: A unique string to identify the job. It appears in the Braket Job console and in the job arn. (optional)
- `instance_config`: The configuration of the instances used to execute the job. Default to `InstanceConfig(instanceType='ml.m5.large', volumeSizeInGb=30)`. (optional)
- `image_uri`: The path to a Docker container image. (optional)
- `wait_until_complete`: If True, the function call will wait until the job is completed, and will additionally print logs to the local console. Otherwise, it will run asynchronously. Defaults to False. (optional)

In [7]:
import time
from braket.aws import AwsQuantumJob

job = AwsQuantumJob.create(
    device=device,
    source_module="qaoa",
    entry_point="qaoa.qaoa_algorithm",
    job_name="embedded-simulation-" + str(int(time.time())),
    hyperparameters=hyperparameters,
    instance_config=instance_config,
    image_uri=image_uri,
    wait_until_complete=False,
)

In [8]:
# This cell should take about 6 minutes
print(job.result())

{'params': [[0.9170388720505319, 0.7204929371752294, 1.4528933994269753], [1.3895487869726206, 0.9665264386809151, -0.47009700306676794]], 'cost': -0.5536775832039897}


## Custom gradient computation methods
The [parameter-shift rule](https://pennylane.ai/qml/glossary/parameter_shift.html) is a general method for computing gradients of a cost function with respect to the variational parameters of a quantum circuit. With the parameter-shift rule, the gradient is calculated exactly by running the same circuit multiple times with shifted parameters. Running all shifted circuits can take a long time unless using highly parallel simulators like SV1. Even then, the number of circuits scales linearly with the number of parameters. In contrast, other gradient methods, such as adjoint differentiation, require fewer circuit executions, at the cost of increase memory requirements.

For example, we can use the adjoint method for PennyLane's default.qubit simulator via the `diff_method` variable in the hyperparameters. Note that Amazon Braket on-demand simulators can only use the parameter-shift method in PennyLane.

In [9]:
hyperparameters={
    "n_nodes": "6", 
    "n_edges": "8", 
    "n_layers": "3",
    "iterations": "10",
    "stepsize": "0.1",
    "seed": "42",
    "diff_method": "adjoint",
}

In [10]:
job = AwsQuantumJob.create(
    device="local:pennylane/default.qubit",
    source_module="qaoa",
    entry_point="qaoa.qaoa_algorithm",
    job_name="embedded-simulation-" + str(int(time.time())),
    hyperparameters=hyperparameters,
    instance_config=instance_config,
    image_uri=image_uri,
    wait_until_complete=False,
)

In [11]:
# This cell should take about 6 minutes
print(job.result())

{'params': [[0.9170388720505317, 0.7204929371752299, 1.4528933994269748], [1.3895487869726197, 0.9665264386809147, -0.47009700306676944]], 'cost': -0.5536775832039891}


## Accelerate your simulations with `lightning.gpu` and cuQuantum
The PyTorch and the TensorFlow [job containers](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-script-environment.html?tag=local002-20) are pre-configured with [NVIDIA cuQuantum library](https://developer.nvidia.com/cuquantum-sdk) and PennyLane's [GPU simulator](https://github.com/PennyLaneAI/pennylane-lightning-gpu), `lightning.gpu`. The GPU simulator accelerates circuit simulations for bigger circuits, and increases the number of qubits that can be simulated within a given time. To use `lightning.gpu`, we need to choose a GPU instance. Braket Jobs support these instances type that are compatible with `lightning.gpu`: 
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge

In [12]:
instance_config = InstanceConfig(instanceType='ml.p3.2xlarge')

The GPU simulator also supports the `adjoint` gradient method which can greatly speed up the gradient evaluation compared to using the parameter-shift rule. In the following, we create a job to solve a 22-node Max-Clique problem using QAOA which requires simulating circuits with 22 qubits. It takes `lightning.gpu` roughly 1 minute for each optimization step, while it takes `lightning.qubit`, a CPU-based simulator, roughly 12 minutes. It is important to note that the run time depends on the size of the circuit, the problem type and the computational resource. You may see a different behavior of performance on a different problem or with a different instance type. In general, CPU-based simulators are suitable for running smaller circuits while GPU-based simulators perform better for the bigger circuits.

In [13]:
hyperparameters={
    "n_nodes": "22", 
    "n_edges": "150", 
    "n_layers": "3",
    "iterations": "1",
    "stepsize": "0.1",
    "seed": "42",
    "diff_method": "adjoint",
}

**Note:** The following cell may be unable to complete with the default resource limits. You may contact [AWS Support](https://support.console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase) to increase the limits on your account.

In [14]:
job = AwsQuantumJob.create(
    device="local:pennylane/lightning.gpu",
    source_module="qaoa",
    entry_point="qaoa.qaoa_algorithm",
    job_name="embedded-simulation-" + str(int(time.time())),
    hyperparameters=hyperparameters,
    instance_config=instance_config,
    image_uri=image_uri,
    wait_until_complete=False,
)

In [15]:
# This cell should take about 9 minutes
print(job.result())

{'params': [[0.2745401195836843, 1.0507143041933296, 0.831993939751942], [0.6986584818507686, 0.25601863649339063, 0.05599452108606921]], 'cost': 14.277212692531803}


## Debug with local mode
It is often useful to debug your program locally before submitting a job. You can run jobs with embedded simulation locally in your own environment for faster testing and debugging. This feature requires Docker to be installed in your programming environment. Amazon Braket notebooks have Docker pre-installed, but if you want to test your code locally, say, on your laptop, you need to install Docker. You can for instance follow these [instructions](https://docs.docker.com/get-docker/).

In local mode, a container is created in your local environment and the algorithm is run in that container. To run a job in local mode, make sure the Docker daemon is running, which is already the case for when you use Amazon Braket notebook instances. Then, create a `LocalQuantumJob` instead of an `AwsQuantumJob`. Local jobs always run synchronously and display the logs, so there is no `wait_until_complete` argument. Because a job in local mode runs in your own environment, there is no `instance_config` argument.  When a local job is created for the first time, it will take longer because it needs to build the container. The subsequent runs will be faster. Note that local jobs will not be visible in the Amazon Braket Console.  In local mode, you can still send quantum tasks to actual devices, but you do not get the performance benefits when running against an actual QPU while in local mode. To learn more about local mode, see the [developer guide](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-local-mode.html).

In [16]:
hyperparameters={
    "n_nodes": "6", 
    "n_edges": "8", 
    "n_layers": "3",
    "iterations": "10",
    "stepsize": "0.1",
    "seed": "42",
    "diff_method": "adjoint",
}

In [17]:
from braket.jobs.local.local_job import LocalQuantumJob

# This cell should take about 3 min for the first time, and about 30 seconds afterward.
job = LocalQuantumJob.create(
    device="local:pennylane/default.qubit",
    source_module="qaoa",
    entry_point="qaoa.qaoa_algorithm",
    job_name="embedded-simulation-" + str(int(time.time())),
    hyperparameters=hyperparameters,
    image_uri=image_uri,
)

Boto3 Version:  1.20.10
Beginning Setup
Running Code As Subprocess
hyperparams:  {'n_nodes': '6', 'n_edges': '8', 'n_layers': '3', 'iterations': '10', 'stepsize': '0.1', 'seed': '42', 'diff_method': 'adjoint'}
Using local simulator:  Default qubit PennyLane plugin
number of observables:  13
start optimizing...
Metrics - timestamp=1650981956.9732826; Cost=4.570568368761232; iteration_number=0;
Metrics - timestamp=1650981957.2702165; Cost=2.614814968463016; iteration_number=1;
Metrics - timestamp=1650981957.557412; Cost=1.04216221000676; iteration_number=2;
Metrics - timestamp=1650981957.8734257; Cost=0.28694967085390216; iteration_number=3;
Metrics - timestamp=1650981958.1137834; Cost=-0.2527990816917891; iteration_number=4;
Metrics - timestamp=1650981958.350162; Cost=-0.5599800238018444; iteration_number=5;
Metrics - timestamp=1650981958.5563917; Cost=-0.5452012711678225; iteration_number=6;
Metrics - timestamp=1650981958.7647054; Cost=-0.4655467626418747; iteration_number=7;
Metrics -

## Summary
In this notebook we showed you how to get started with running simulators embedded in Hybrid Jobs. To learn more, you can read the [documentation](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs.html) or follow the other example notebooks in this repo.