
# Agenda

### A- Introduction to CUDA-Q platform

### B- Quantum Circuit Basics

B.1- Qubit allocation

B.2- Quantum gates

B.3- Quantum kernel

B.4- Backends & running CUDA-Q programs

B.5- Examples

### C- Quantum algorithmic primitives

C.1- cudaq.sample()

- Mid-circuit measurement & conditional sampling

C.2- cudaq.observe()

- Spin Hamiltonian operator

### D- Parameterized circuit


### A- Introduction to QC and CUDA-Q platform

- #### CUDA-Q stack

![img](./CUDA-Q.png)

- Single-source Python and C++ programming model
- High performance compiler for hybrid GPU/CPU/QPU systems
- QPU agnostic - works with any type of QPU, emulated or physical
- Supports both state-vector and tensor network backend: backends are optimized for NVIDIA GPUs, including multi-GPU, multi-node support for HPC.

#### CUDA-Q performance
- NVIDIA CUDA-Q can significantly speed up quantum algorithms, compared to other quantum frameworks. Quantum algorithms can achieve a speedup of up to 2500X over CPU, scaling number of qubits using multiple GPUs.

![img](./QML-perfo.png)

#### Installation of CUDA-Q: visit [CUDA-Q installation](https://nvidia.github.io/cuda-quantum/latest/using/install/install.html)

To explore more, visit this [web page](https://developer.nvidia.com/cuda-q), [GitHub](https://github.com/NVIDIA/cuda-quantum), [documentation](https://nvidia.github.io/cuda-quantum/latest/#)

### B- Quantum circuit basics

![img](./basic-circuit.png)


### B.1- Qubit allocation

- cudaq.qubit(): a single quantum bit (2-level) in the discrete quantum memory space. 

```qubit=cudaq.qubit()```

- cudaq.qvector(N): a multi quantum bit ($2^N$ level) in the discrete quantum memory

```qubits=cudaq.qvector(N)```

    
- Is initialized to the |0> computational basis state.

- Owns the quantum memory, therefore it cannot be copied or moved (no-cloning theorem). It can be passed by reference (i.e., references to qubit vectors).


### B.2- Quantum gates


- x: Not gate (Pauli-X gate)

```python
q=cudaq.qubit()
x(q)
```
- h: Hadamard gate

```python
q=cudaq.qvector(2)
h(q[0])
```

- x.ctrl(control,target) or ([control_1, control_2], target): C-NOT gate

```python
q=cudaq.qvector(3)
x.ctrl(q[0],q[1])
```

- rx(angle, qubit): rotation around x-axis
```python
q=cudaq.qubit()
rx(np.pi,q)
```

- adj: adjoint transformation
```python
q=cudaq.qubit()
rx(np.pi,q)
rx.adj(np.pi,q)
```

- mz: measure qubits in the computational basis

```python
q=cudaq.qvector(2)
h(q[0])
x.ctrl(q[0],q[1])
mz(q)
```


To learn more about the quantum operations available in CUDA-Q, visit [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/kernels.html)

### B.3- Quantum kernel

- To differentiate between host and quantum device code, the CUDA-Q programming model defines the concept of a quantum kernel.

- All quantum kernels must be annotated to indicate they are to be compiled for, and executed on, a specified quantum coprocessor. 

- Other language bindings may opt to use other language features to enable function annotation or decoration (e.g. a `@cudaq.kernel()` function decorator in Python and `__qpu__` in C++).

- Quantum kernel can take classical data as input


``` python
@cudaq.kernel()
def my_first_entry_point_kernel(x : float):
   ... quantum code ... 

@cudaq.kernel()
def my_second_entry_point_kernel(x : float, params : list[float]):
   ... quantum code ... 

```

- CUDA-Q kernels can serve as input to other quantum kernels and invoked by kernel function body code.


```python
@cudaq.kernel()
def MyStatePrep(qubits : cudaq.qview):
    ... apply state prep operations on qubits ... 

@cudaq.kernel()
def MyGenericAlgorithm(statePrep : typing.Callable[[cudaq.qview], None]):
    q = cudaq.qvector(10)
    statePrep(q)
    ...

MyGenericAlgorithm(MyStatePrep)
```

- ```cudaq.qview()```: a non-owning reference to a subset of the discrete quantum memory space. It does not own its elements and can therefore be passed by value or reference. (see [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/types.html#quantum-containers))

- Vectors inside the quantum kernel can be only constructed with specified size

```python
@cudaq.kernel
def kernel(N : int):

   # Not Allowed
   # i = []
   # i.append(1)

   # Allowed
   i = [0 for k in range(5)]
   j = [0 for _ in range(N)]
   i[2] = 3
   f = [1., 2., 3.]
   k = 0
   pi = 3.1415926

```

- To learn more about the CUDA-Q quantum kernel, visit [this page](https://github.com/NVIDIA/cuda-quantum/blob/main/docs/sphinx/specification/cudaq/kernels.rst)

### B.4- Backends & running CUDA-Q programs

Two options:

1. Define the target when running the program:
``` python3 program.py [...] --target <target_name>```

2. Target can be defined in the application code:
```cudaq.set_target('target_name')``` . Then, to run the program, drop the target flag: 
```python3 program.py [...]```

What is target_name?

1. State vector simulators:
    - Single-GPU (Default If an NVIDIA GPU and CUDA runtime libraries are available): ```python3 program.py [...] --target nvidia``` 
    - Multi-GPUs: ```mpirun -np 2 python3 program.py [...] --target nvidia-mgpu``` 
2. Tensor network simulator:
    - Single-GPU: ```python3 program.py [...] --target tensornet``` 
    - Multi-GPUs: ```mpirun -np 2 python3 program.py [...] --target tensornet``` 
3. Matrix Product state:
    - Only supports single-GPU simulation: ```python3 program.py [...] --target tensornet-mps``` 
4. NVIDIA Quantum Cloud
    - Run any of the above backends using NVIDIA-provided cloud GPUs (early access only). To learn more, visit [this page](https://www.nvidia.com/en-us/solutions/quantum-computing/cloud/).
    - E.g. `cudaq.set_target('nvqc', backend='tensornet')`
5. Quantum hardware backend (to learn more, visit [this page](https://nvidia.github.io/cuda-quantum/latest/using/backends/hardware.html)):
    - ```cudaq.set_target('QPU_name')```. QPU_name could be `ionq`, `quantinuum`, `iqm`, `oqc`, ...etc.


To learn more about CUDA-Q backends, visit [this page](https://nvidia.github.io/cuda-quantum/latest/using/backends/backends.html)

### B.5- Examples

In [2]:
# Single qubit example

import cudaq

# Set the backend target
cudaq.set_target('nvidia')

# We begin by defining the `Kernel` that we will construct our
# program with.
@cudaq.kernel()
def first_kernel():
    '''
    This is our first CUDA-Q kernel.
    '''
    # Next, we can allocate a single qubit to the kernel via `qubit()`.
    qubit = cudaq.qubit()

    # Now we can begin adding instructions to apply to this qubit!
    # Here we'll just add non-parameterized
    # single qubit gate that is supported by CUDA-Q.
    h(qubit)
    x(qubit)
    y(qubit)
    z(qubit)
    t(qubit)
    s(qubit)

    # Next, we add a measurement to the kernel so that we can sample
    # the measurement results on our simulator!
    mz(qubit)

print(cudaq.draw(first_kernel))


     ╭───╮╭───╮╭───╮╭───╮╭───╮╭───╮
q0 : ┤ h ├┤ x ├┤ y ├┤ z ├┤ t ├┤ s ├
     ╰───╯╰───╯╰───╯╰───╯╰───╯╰───╯



In [3]:
# Multi-qubit example

import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def second_kernel(N:int):
    qubits=cudaq.qvector(N)

    h(qubits[0])
    x.ctrl(qubits[0],qubits[1])
    x.ctrl(qubits[0],qubits[2])
    x(qubits)

    mz(qubits)

print(cudaq.draw(second_kernel,3))

     ╭───╮          ╭───╮
q0 : ┤ h ├──●────●──┤ x ├
     ╰───╯╭─┴─╮  │  ├───┤
q1 : ─────┤ x ├──┼──┤ x ├
          ╰───╯╭─┴─╮├───┤
q2 : ──────────┤ x ├┤ x ├
               ╰───╯╰───╯



In [4]:
import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def bar(N:int):
    qubits=cudaq.qvector(N)
    # front and back: return a direct refernce 
    controls = qubits.front(N - 1)
    target = qubits.back()
    x.ctrl(controls, target)


print(cudaq.draw(bar,3))

          
q0 : ──●──
       │  
q1 : ──●──
     ╭─┴─╮
q2 : ┤ x ├
     ╰───╯



### C- Quantum Algorithmic Primitives

### C.1 cudaq.sample():

Sample the state of a given quantum circuit for a specified number of shots (circuit execution)

This function takes as input a quantum kernel instance followed by the concrete arguments at which the kernel should be invoked

In [5]:
import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def bell(N:int):
    qubits=cudaq.qvector(N)

    h(qubits[0])
    x.ctrl(qubits[0], qubits[1])

    mz(qubits)

print(cudaq.draw(bell,2))
# Sample the state generated by bell
# shots_count: the number of kernel executions. Default is 1000
counts = cudaq.sample(bell, 2, shots_count=10000) 

# Print to standard out
print(counts)

# Fine-grained access to the bits and counts 
for bits, count in counts.items():
    print('Observed: {}, {}'.format(bits, count))


     ╭───╮     
q0 : ┤ h ├──●──
     ╰───╯╭─┴─╮
q1 : ─────┤ x ├
          ╰───╯

{ 00:4900 11:5100 }

Observed: 00, 4900
Observed: 11, 5100


In [6]:
import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def third_example(N:int, theta:list[float]):
    qubit=cudaq.qvector(N)

    h(qubit)

    for i in range(0,N//2):
        ry(theta[i],qubit[i])
    

    x.ctrl([qubit[0],qubit[1]],qubit[2]) #ccx
    x.ctrl([qubit[0],qubit[1],qubit[2]],qubit[3]) #cccx
    x.ctrl(qubit[0:3],qubit[3]) #cccx using Python slicing syntax

    mz(qubit)

params=[0.15,1.5]

print(cudaq.draw(third_example, 4, params))

result=cudaq.sample(third_example, 4, params, shots_count=5000)

print('Result: ', result)

print('Most probable bit string: ', result.most_probable())   

     ╭───╮╭──────────╮               
q0 : ┤ h ├┤ ry(0.15) ├──●────●────●──
     ├───┤├─────────┬╯  │    │    │  
q1 : ┤ h ├┤ ry(1.5) ├───●────●────●──
     ├───┤╰─────────╯ ╭─┴─╮  │    │  
q2 : ┤ h ├────────────┤ x ├──●────●──
     ├───┤            ╰───╯╭─┴─╮╭─┴─╮
q3 : ┤ h ├─────────────────┤ x ├┤ x ├
     ╰───╯                 ╰───╯╰───╯

Result:  { 1000:1 1101:700 0100:541 1100:761 0110:526 1111:702 1110:714 1001:1 0101:506 0111:548 }

Most probable bit string:  1100


###  Mid-circuit measurement & conditional sampling

In [7]:
import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def mid_circuit_m(theta:float):
    qubit=cudaq.qvector(2)
    ancilla=cudaq.qubit()

    ry(theta,ancilla)

    aux=mz(ancilla)
    if aux:
        x(qubit[0])
        x(ancilla)
    else:
        x(qubit[0])
        x(qubit[1])
    
    mz(ancilla)
    mz(qubit)

angle=0.5
result=cudaq.sample(mid_circuit_m, angle)
print(result)

{ 
  __global__ : { 100:66 110:934 }
   aux : { 1:66 0:934 }
}



- Here, we see that we have measured the ancilla qubit to a register named ```aux```

- If any measurements appear in the kernel, then only the measured qubits will appear in the ```__global__``` register, and they will be sorted in qubit allocation order.

- To learn more about cudaq.sample(), visit [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/algorithmic_primitives.html#cudaq-sample)

### C.2 cudaq.observe()

- A common task in variational algorithms is the computation of the expected value of a given observable with respect to a parameterized quantum circuit (⟨H⟩(𝚹) = ⟨ψ(𝚹)|H|ψ(𝚹)⟩).

- The `cudaq.observe()` function is provided to enable one to quickly compute this expectation value via execution of the parameterized quantum circuit

- In the example below, the obervable H is $H= 5.907 \, I - 2.1433 \, X_0X_1 -2.1433\, Y_0 Y_1 + 0.21829 \, Z_0 -6.125\, Z_1$

In [8]:
# The example here shows a simple use case for the `cudaq.observe``
# function in computing expected values of provided spin hamiltonian operators.

import cudaq
from cudaq import spin

cudaq.set_target('nvidia')

qubit_num=2

@cudaq.kernel
def init_state(qubits:cudaq.qview):
    n=qubits.size()
    for i in range(n):
        x(qubits[i])

@cudaq.kernel
def observe_example(theta: float):
    qvector = cudaq.qvector(qubit_num)

    init_state(qvector)
    ry(theta, qvector[1])
    x.ctrl(qvector[1], qvector[0])


spin_operator = 5.907 - 2.1433 * spin.x(0) * spin.x(1) - 2.1433 * spin.y(
    0) * spin.y(1) + .21829 * spin.z(0) - 6.125 * spin.z(1)

# Pre-computed angle that minimizes the energy expectation of the `spin_operator`.
angle = 0.59

energy = cudaq.observe(observe_example, spin_operator, angle).expectation()
print(f"Energy is {energy}")


Energy is 13.562794135947076


### Spin Hamiltonian operator

CUDA-Q defines convenience functions in `cudaq.spin` namespace that produce the primitive X, Y, and Z Pauli operators on specified qubit indices which can subsequently be used in algebraic expressions to build up more complicated Pauli tensor products and their sums.

$H= 5.907 \, I - 2.1433 \, X_0X_1 -2.1433\, Y_0 Y_1 + 0.21829 \, Z_0 -6.125\, Z_1$

```python
spin_operator = 5.907 - 2.1433 * spin.x(0) * spin.x(1) - 2.1433 * spin.y(
    0) * spin.y(1) + .21829 * spin.z(0) - 6.125 * spin.z(1)
```

In [9]:
from cudaq import spin

hamiltonian = 0.5*spin.z(0) + spin.x(1) + spin.y(0) + spin.y(0) * spin.y(1)+ spin.x(0)*spin.y(1)*spin.z(2)

# add some more terms
for i in range(2):
  hamiltonian += -2.0*spin.z(i)*spin.z(i+1)

print(hamiltonian)

print('Total number of terms in the spin hamiltonian: ',hamiltonian.get_term_count())


[-2+0j] IZZ
[-2+0j] ZZI
[1+0j] XYZ
[0.5+0j] ZII
[1+0j] YII
[1+0j] IXI
[1+0j] YYI

Total number of terms in the spin hamiltonian:  7


### D- Parameterized Circuit

In [10]:
import cudaq
from cudaq import spin

cudaq.set_target("nvidia")

@cudaq.kernel
def param_circuit(theta: list[float]):
    # Allocate a qubit that is initialised to the |0> state.
    qubit = cudaq.qubit()
    # Define gates and the qubits they act upon.
    rx(theta[0], qubit)
    ry(theta[1], qubit)


# Our hamiltonian will be the Z expectation value of our qubit.
hamiltonian = spin.z(0)

# Initial gate parameters which initialize the qubit in the zero state
parameters = [0.0, 0.0]

print(cudaq.draw(param_circuit,parameters))

# Compute the expectation value using the initial parameters.
expectation_value = cudaq.observe(param_circuit, hamiltonian,parameters).expectation()

print('Expectation value of the Hamiltonian: ', expectation_value)



     ╭───────╮╭───────╮
q0 : ┤ rx(0) ├┤ ry(0) ├
     ╰───────╯╰───────╯

Expectation value of the Hamiltonian:  1.0


### To look at the MLIR and QIR generated from your code

In [1]:
import cudaq

cudaq.set_target('nvidia')

@cudaq.kernel
def kernel(N : int):
    q = cudaq.qvector(N)
    h(q[0])
    for i in range(N-1):
        x.ctrl(q[i], q[i+1])

# Look at the MLIR 
print(kernel)

# Look at the QIR
print(cudaq.translate(kernel, format="qir"))

module attributes {quake.mangled_name_map = {__nvqpp__mlirgen__kernel = "__nvqpp__mlirgen__kernel_PyKernelEntryPointRewrite"}} {
  func.func @__nvqpp__mlirgen__kernel(%arg0: i64) attributes {"cudaq-entrypoint"} {
    %c1_i64 = arith.constant 1 : i64
    %c0_i64 = arith.constant 0 : i64
    %0 = cc.alloca i64
    cc.store %arg0, %0 : !cc.ptr<i64>
    %1 = cc.load %0 : !cc.ptr<i64>
    %2 = quake.alloca !quake.veq<?>[%1 : i64]
    %3 = quake.extract_ref %2[0] : (!quake.veq<?>) -> !quake.ref
    quake.h %3 : (!quake.ref) -> ()
    %4 = cc.load %0 : !cc.ptr<i64>
    %5 = arith.subi %4, %c1_i64 : i64
    %6 = cc.loop while ((%arg1 = %c0_i64) -> (i64)) {
      %7 = arith.cmpi slt, %arg1, %5 : i64
      cc.condition %7(%arg1 : i64)
    } do {
    ^bb0(%arg1: i64):
      %7 = quake.extract_ref %2[%arg1] : (!quake.veq<?>, i64) -> !quake.ref
      %8 = arith.addi %arg1, %c1_i64 : i64
      %9 = quake.extract_ref %2[%8] : (!quake.veq<?>, i64) -> !quake.ref
      quake.x [%7] %9 : (!quake.ref, !qu