# Hello CUDA-Q with Braket Hybrid Jobs
[CUDA-Q](https://nvidia.github.io/cuda-quantum/latest/index.html) offers a unified programming model designed for hybrid workloads that run on CPUs, GPUs and QPUs. In this notebook, you will learn how to run CUDA-Q programs using Amazon Braket Hybrid Jobs. This notebook assumes basic knowledge of Hybrid Jobs. You can learn about Hybrid Jobs from [this page](https://docs.aws.amazon.com/braket/latest/developerguide/braket-what-is-hybrid-job.html) in the Amazon Braket Developer Guide and the notebooks in [Amazon Braket Examples](https://github.com/amazon-braket/amazon-braket-examples/tree/main/examples/hybrid_jobs).

## Running CUDA-Q with Braket
Now that we have prepared the environment for CUDA-Q in a container image, let's run our first CUDA-Q job! First, we start with the necessary imports.

In [None]:
from braket.jobs import hybrid_job
from braket.jobs.environment_variables import get_job_device_arn

For this example, we will use the CUDA-Q hybrid jobs container provided by Braket.

In [None]:
# TODO - replace with BDK enum
image_uri = "292282985366.dkr.ecr.us-east-1.amazonaws.com/amazon-braket-cudaq-jobs:latest"

## Test your CUDA-Q job locally
Before submitting a job, it is recommended to test with a local job. A local job runs scripts in a container locally. It is a good way to test your code with a small problem size before scaling up. Note that running a local hybrid job requires Docker to be installed on the local computer. 

Here, let's use a Bell circuit with CUDA-Q for the local job. This example does not require a GPU to run. The `hello_quantum` function in the code snippet below defines an experiment to sample a Bell circuit. The string `qpp-cpu` in the `device` keyword argument of the decorator is the name of a CUDA-Q CPU simulator. You can view the CUDA-Q tutorial and the available backends in the [CUDA-Q documentation](https://nvidia.github.io/cuda-quantum/latest/index.html).

When called, the decorated `hello_quantum` function starts a local job because of the `local=True` keyword in the `hybrid_job` decorator. The code inside the `hello_quantum` function will run locally in the environment defined by the CUDA-Q hybrid jobs container image.

In [None]:
@hybrid_job(device="local:cudaq/qpp-cpu", image_uri=image_uri, local=True)
def hello_quantum():
    import cudaq

    # define the backend
    device = get_job_device_arn()
    cudaq.set_target(device.split("/")[-1])
    print("CUDA-Q backend: ", cudaq.get_target())

    @cudaq.kernel
    def bell_state():
        qubits = cudaq.qvector(2)
        h(qubits[0])
        cx(qubits[0], qubits[1])

    # sample the Bell circuit
    result = cudaq.sample(bell_state, shots_count=1000)
    measurement_probabilities = dict(result.items())
    print("Samples: ", measurement_probabilities)

    return measurement_probabilities

Let's test your CUDA-Q job!

In [None]:
hello_quantum()

## Run your CUDA-Q job
After testing locally that your code works correctly in the container environment, you can remove the `local=True` keyword argument (or equivalently, set `local=False`) so that the next job will run on an AWS-managed compute instance. This is great for scaling up to larger instances or parallelizing your workload over multiple instances.

In [None]:
@hybrid_job(device="local:cudaq/qpp-cpu", image_uri=image_uri)
def hello_quantum():
    import cudaq

    device = get_job_device_arn()
    cudaq.set_target(device.split("/")[-1])
    print(cudaq.get_target())

    @cudaq.kernel
    def bell_state():
        qubits = cudaq.qvector(2)
        h(qubits[0])
        cx(qubits[0], qubits[1])

    result = cudaq.sample(bell_state, shots_count=1000)
    measurement_probabilities = dict(result.items())
    print("Samples: ", measurement_probabilities)

    return measurement_probabilities


job = hello_quantum()
print(job.arn)

When called, the decorated `hello_quantum` function will create a Braket Hybrid Job on AWS, running the code defined in the decorated function with the environment specified by the CUDA-Q hybrid jobs container image. You can view the progress of your job with `job.state()` or in the "Hybrid jobs" tab of the Amazon Braket Console.

In [None]:
result = job.result()
print(result)

## Run CUDA-Q jobs on Braket devices

Now, let’s learn how to run CUDA-Q programs on Braket devices via CUDA-Q in Braket Hybrid Jobs. All you need to do is to configure the CUDA-Q target to a Braket device. In the code snippet below, we run a Bell circuit on Braket SV1 using CUDA-Q and Braket Hybrid Jobs. When you finish testing on simulators and are ready to run experiments on QPUs, you can switch the target to Braket QPUs such as IQM, IonQ, and Rigetti devices by changing `device_arn` in the example below. The circuits run with a hybrid job receive higher-priority access to the target Braket QPUs, which not only reduces the runtime of your experiments but also minimizes the impact of hardware drift on your algorithms. 

In [None]:
device_arn = "arn:aws:braket:::device/quantum-simulator/amazon/sv1"  # set device to SV1
# device_arn = "arn:aws:braket:eu-north-1::device/qpu/iqm/Garnet" # set device to IQM Garnet


@hybrid_job(device=device_arn, image_uri=image_uri)
def job_with_braket_device():
    import cudaq

    # define the backend
    device = get_job_device_arn()
    cudaq.set_target("braket", machine=device)

    # define the Bell circuit
    @cudaq.kernel
    def bell_state():
        qubits = cudaq.qvector(2)
        h(qubits[0])
        cx(qubits[0], qubits[1])
        mz(qubits)

    # sample the Bell circuit
    result = cudaq.sample(bell_state, shots_count=1000)
    measurement_probabilities = dict(result.items())

    return measurement_probabilities


job = job_with_braket_device()
print(job.arn)

## Summary
This notebook shows you how to run your first CUDA-Q program with Amazon Braket Hybrid Jobs. With a few lines of code, you can run CUDA-Q programs with Braket Hybrid Jobs and scale your workloads up and out with the range of compute options provided by AWS. In the following tutorials, we will show you how to run CUDA-Q simulations on GPUs ([notebook](1_simulation_with_GPUs.ipynb)), distribute workloads across multiple instances ([notebook](2_parallel_simulations.ipynb)), and distribute a single state vector simulation across multiple GPUs ([notebook](3_distributed_statevector_simulations.ipynb)).

## Appendix: Using job submission script

The `@hybrid_job` decorator provides a convenient interface to submit a hybrid job, but it limits the availability of the source code to the inner functions defined in the decorator. The source code is critical for `@cudaq.kernel`. For more complex workloads, if you encounter an error related to source code availability, you can choose to submit a job without the `@hybrid_job` decorator and use `AwsQuantumJob.create` instead.

To create a hybrid job using `AwsQuantumJob.create`, first you need to write your CUDA-Q program as a separate `.py` file. We will call this file the "algorithm script". For demonstration purposes, we have prepared an example algorithm script, `algorithm_script_getting_started.py`. Then, you can run the following code snippet to create the job. This interface of creating a hybrid job will not have any source code error. To learn more about creating hybrid job this way, you can read this [documentation page](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-first.html#braket-jobs-first-create) and this [example notebook](https://github.com/amazon-braket/amazon-braket-examples/tree/main/examples/hybrid_jobs/8_Creating_Hybrid_Job_Scripts).

In [None]:
from braket.aws import AwsQuantumJob
from braket.devices import Devices

# create a hybrid job
job = AwsQuantumJob.create(
    device=Devices.Amazon.SV1,
    source_module="algorithm_script_getting_started.py",
    image_uri=image_uri,
)

# view the ARN and the status of the job
print("ARN of the job: ", job.arn)
print("Status of the job: ", job.state())