# CUDA-Q Introduction

## Installation of CUDA-Q

- Visit [CUDA-Q Quick Start](https://nvidia.github.io/cuda-quantum/latest/using/quick_start.html)
- To explore more, visit [CUDA-Q installation](https://nvidia.github.io/cuda-quantum/latest/using/install/install.html)

## Quantum Circuit basics

The purpose of this notebook is to create and execute quantum circuits below.

Example of Quantum Circuit

In [None]:
import numpy as np

In [None]:
from cudaq.qis import *

In [None]:
import cudaq

In [None]:
@cudaq.kernel
def circuit():
    qubits = cudaq.qvector(3)
    h(qubits[0])
    cx(qubits[0], qubits[1])
    cx(qubits[1], qubits[2])

In [1]:
print(cudaq.draw(circuit))

     ╭───╮          
q0 : ┤ h ├──●───────
     ╰───╯╭─┴─╮     
q1 : ─────┤ x ├──●──
          ╰───╯╭─┴─╮
q2 : ──────────┤ x ├
               ╰───╯



### Qubit allocation

- `cudaq.qubit()`: a single quantum bit (2-level) in the discrete quantum memory space.

```python
qubit = cudaq.qubit()
```

- `cudaq.qvector(n)`: a multi quantum bit ($2^n$ level) in the discrete quantum memory.

```python
qubits = cudaq.qvector(n)
```

    
- These are initialized to the |0> computational basis state.

- Owns the quantum memory, therefore it cannot be copied or moved (no-cloning theorem). It can be passed by reference (i.e., references to qubit vectors).

### Quantum Operations


- `x`: Not gate (Pauli-X gate)

```python
q = cudaq.qubit()
x(q)
```
- `h`: Hadamard gate

```python
q = cudaq.qvector(2)
h(q[0])
```

- `x.ctrl(control, target)` or `([control_1, control_2], target)`: CNOT (Controlled-NOT) gate

```python
q = cudaq.qvector(3)
x.ctrl(q[0], q[1])
cx(q[0], q[1])  # alias of x.ctrl
```

- `rx(angle, qubit)`: rotation around x-axis
```python
q=cudaq.qubit()
rx(np.pi, q)
```

- `adj`: adjoint transformation
```python
q=cudaq.qubit()
rx(np.pi, q)
rx.adj(np.pi, q)
```

- `mz`: measure qubits in the computational basis

```python
q=cudaq.qvector(2)
h(q[0])
x.ctrl(q[0], q[1])
mz(q)
```

To learn more about the quantum operations available in CUDA-Q, visit [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/kernels.html).

Gate examples

In [None]:
@cudaq.kernel
def do_nothing():
    q = cudaq.qubit()

In [None]:
@cudaq.kernel
def x_gate():
    q = cudaq.qubit()
    x(q)

In [None]:
@cudaq.kernel
def h_gate():
    q = cudaq.qubit()
    h(q)

In [2]:
@cudaq.kernel
def bell():
    q = cudaq.qvector(2)
    h(q[0])    
    x.ctrl(q[0], q[1])
#    cx(q[0], q[1])

In [3]:
print("initial state:", np.array(cudaq.get_state(do_nothing)))

initial state: [1.+0.j 0.+0.j]


In [4]:
print("apply X:", np.array(cudaq.get_state(x_gate)))

apply X: [0.+0.j 1.+0.j]


In [5]:
print("apply H:", np.array(cudaq.get_state(h_gate)))

apply H: [0.70710677+0.j 0.70710677+0.j]


In [6]:
print("Bell state:", np.array(cudaq.get_state(bell)))

Bell state: [0.70710677+0.j 0.        +0.j 0.        +0.j 0.70710677+0.j]


### Quantum kernel

- To differentiate between host and quantum device code, the CUDA-Q programming model defines the concept of a quantum kernel.

- All quantum kernels must be annotated to indicate they are to be compiled for, and executed on, a specified quantum coprocessor.

- Other language bindings may opt to use other language features to enable function annotation or decoration (e.g. a `@cudaq.kernel()` function decorator in Python and `__qpu__` in C++).

- Quantum kernel can take classical data as input.

``` python
@cudaq.kernel()
def my_first_entry_point_kernel(x : float):
   ... quantum code ...

@cudaq.kernel()
def my_second_entry_point_kernel(x : float, params : list[float]):
   ... quantum code ...

```

- CUDA-Q kernels can serve as input to other quantum kernels and invoked by kernel function body code.


```python
@cudaq.kernel()
def my_state_prep(qubits : cudaq.qview):
    ... apply state prep operations on qubits ...

@cudaq.kernel()
def my_generic_algorithm(state_prep : Callable[[cudaq.qview], None]):
    q = cudaq.qvector(10)
    state_prep(q)
    ...

my_generic_algorithm(my_state_prep)
```

- `cudaq.qview`: a non-owning reference to a subset of the discrete quantum memory space. It does not own its elements and can therefore be passed by value or reference. (see [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/types.html#quantum-containers))

- Vectors inside the quantum kernel can be only constructed with specified size

```python
@cudaq.kernel
def kernel(n: int):

   # Not Allowed
   # i = []
   # i.append(1)

   # Allowed
   i = [0 for k in range(5)]
   j = [0 for _ in range(n)]
   i[2] = 3
   f = [1., 2., 3.]
   k = 0
   pi = 3.1415926

```

- To learn more about the CUDA-Q quantum kernel, visit [this page](https://github.com/NVIDIA/cuda-quantum/blob/main/docs/sphinx/specification/cudaq/kernels.rst).

### Code Examples

Single qubit example

In [None]:
from cudaq.qis import *

In [None]:
import cudaq

In [None]:
# We begin by defining the `Kernel` that we will construct our
# program with.
@cudaq.kernel()
def first_kernel():
    """
    This is our first CUDA-Q kernel.
    """
    # Next, we can allocate a single qubit to the kernel via `qubit()`.
    qubit = cudaq.qubit()

    # Now we can begin adding instructions to apply to this qubit!
    # Here we'll just add non-parameterized
    # single qubit gate that is supported by CUDA-Q.
    h(qubit)
    x(qubit)
    y(qubit)
    z(qubit)
    t(qubit)
    s(qubit)

    # Next, we add a measurement to the kernel so that we can sample
    # the measurement results on our simulator!
    mz(qubit)

In [7]:
print(cudaq.draw(first_kernel))

     ╭───╮╭───╮╭───╮╭───╮╭───╮╭───╮
q0 : ┤ h ├┤ x ├┤ y ├┤ z ├┤ t ├┤ s ├
     ╰───╯╰───╯╰───╯╰───╯╰───╯╰───╯



Multi-qubit example

In [None]:
import cudaq

In [None]:
@cudaq.kernel
def second_kernel(num_qubits: int):
    qubits = cudaq.qvector(num_qubits)

    h(qubits[0])
    x.ctrl(qubits[0], qubits[1])
    cx(qubits[0], qubits[2])  # cx is also ok
    x(qubits[0:4])

    mz(qubits)

In [8]:
print(cudaq.draw(second_kernel, 5))

     ╭───╮          ╭───╮
q0 : ┤ h ├──●────●──┤ x ├
     ╰───╯╭─┴─╮  │  ├───┤
q1 : ─────┤ x ├──┼──┤ x ├
          ╰───╯╭─┴─╮├───┤
q2 : ──────────┤ x ├┤ x ├
     ╭───╮     ╰───╯╰───╯
q3 : ┤ x ├───────────────
     ╰───╯               



In [None]:
import cudaq

In [None]:
@cudaq.kernel
def bar(num_qubits: int):
    qubits = cudaq.qvector(num_qubits)
    h(qubits[0:3])
    controls = qubits[0:-1]
    target = qubits[-1]

    x.ctrl(controls, target)

In [9]:
print(cudaq.draw(bar, 10))

     ╭───╮     
q0 : ┤ h ├──●──
     ├───┤  │  
q1 : ┤ h ├──●──
     ├───┤  │  
q2 : ┤ h ├──●──
     ╰───╯  │  
q3 : ───────●──
            │  
q4 : ───────●──
            │  
q5 : ───────●──
            │  
q6 : ───────●──
            │  
q7 : ───────●──
            │  
q8 : ───────●──
          ╭─┴─╮
q9 : ─────┤ x ├
          ╰───╯



<div class="alert alert-block alert-success">

### Exerciese 1

Now you can make quantum kernels! Let's make the kernel to create the GHZ state for $n$ qubits $\frac{1}{\sqrt{2}}(|00\dots 0\rangle + |11\dots 1\rangle)$!

**Advanced**: Assume that the qubits are connected in one dimension. Let's build a circuit so that the depth of 2-qubit gates is as small as possible.
<div>

In [None]:
@cudaq.kernel
def ghz(num_qubits: int):
    q = cudaq.qvector(num_qubits)
    # Write your code here

In [10]:
print(cudaq.draw(ghz, 10))




## Execute quantum kernels

### Function call

The kernel can be executed by calling a function. If the results need to be output, the return value and its type must be specified

In [None]:
@cudaq.kernel
def bit_flip(flip: bool = True) -> bool:
    qubit = cudaq.qubit()
    if flip:
        x(qubit)
    result = mz(qubit)
    return result

In [11]:
print(bit_flip(False))

False


### cudaq.sample()

Sample a given quantum circuit for a specified number of shots (circuit execution).

This function takes as input a quantum kernel instance followed by the concrete arguments at which the kernel should be invoked.

In [None]:
import cudaq

In [None]:
@cudaq.kernel
def bell(num_qubits: int):
    qubits = cudaq.qvector(num_qubits)

    h(qubits[0])
    x.ctrl(qubits[0], qubits[1])

    mz(qubits)

In [None]:
print(cudaq.draw(bell, 2))
# Sample the state generated by bell
# shots_count: the number of kernel executions. Default is 1000
counts = cudaq.sample(bell, 2, shots_count=10000)

In [None]:
# Print to standard out
print(counts)

In [12]:
# Fine-grained access to the bits and counts
for bits, count in counts.items():
    print(f"Observed {bits}: {count}")

     ╭───╮     
q0 : ┤ h ├──●──
     ╰───╯╭─┴─╮
q1 : ─────┤ x ├
          ╰───╯

{ 00:4969 11:5031 }

Observed 00: 4969
Observed 11: 5031


In [None]:
import cudaq

In [None]:
@cudaq.kernel
def third_example(num_qubits: int, theta: list[float]):
    qubit = cudaq.qvector(num_qubits)

    h(qubit)

    for i in range(0, num_qubits // 2):
        ry(theta[i], qubit[i])

    x.ctrl([qubit[0], qubit[1]], qubit[2])  # ccx
    x.ctrl([qubit[0], qubit[1], qubit[2]], qubit[3])  # cccx
    x.ctrl(qubit[0:3], qubit[3])  # cccx using Python slicing syntax

    mz(qubit)

In [None]:
params = [0.15, 1.5]

In [None]:
print(cudaq.draw(third_example, 4, params))

In [None]:
result = cudaq.sample(third_example, 4, params, shots_count=5000)

In [None]:
print("Result: ", result)

In [13]:
print("Most probable bit string: ", result.most_probable())  # Custom dictionary

     ╭───╮╭──────────╮               
q0 : ┤ h ├┤ ry(0.15) ├──●────●────●──
     ├───┤├─────────┬╯  │    │    │  
q1 : ┤ h ├┤ ry(1.5) ├───●────●────●──
     ├───┤╰─────────╯ ╭─┴─╮  │    │  
q2 : ┤ h ├────────────┤ x ├──●────●──
     ├───┤            ╰───╯╭─┴─╮╭─┴─╮
q3 : ┤ h ├─────────────────┤ x ├┤ x ├
     ╰───╯                 ╰───╯╰───╯

Result:  { 0000:1 0100:533 0101:540 0110:556 0111:543 1000:3 1001:2 1100:748 1101:729 1110:720 1111:625 }

Most probable bit string:  1100


In [None]:
from typing import Callable

In [None]:
@cudaq.kernel()
def my_state_prep(qubits: cudaq.qview):
    for i in range(qubits.size // 2):
        x(qubits[i])

In [None]:
@cudaq.kernel()
def my_generic_algorithm(state_prep: Callable[[cudaq.qview], None]):
    q = cudaq.qvector(10)
    state_prep(q)

In [14]:
print(cudaq.sample(my_generic_algorithm, my_state_prep))

{ 1111100000:1000 }



###  Mid-circuit measurement & conditional sampling

In [None]:
import cudaq

In [None]:
@cudaq.kernel
def mid_circuit_m(theta: float):
    qubit = cudaq.qvector(2)
    ancilla = cudaq.qubit()

    ry(theta, ancilla)

    aux = mz(ancilla)
    if aux:
        x(qubit[0])
        x(ancilla)
    else:
        x(qubit[0])
        x(qubit[1])

    anc = mz(ancilla)
    sys = mz(qubit)

In [None]:
angle = 0.5

In [15]:
result = cudaq.sample(mid_circuit_m, angle)
print(result)

{ 
  __global__ : { 100:61 110:939 }
   anc : { 0:1000 }
   aux : { 0:939 1:61 }
   sys : { 10:61 11:939 }
}



- Here, we see that we have measured the ancilla qubit to a register named ```aux```.

- If any measurements appear in the kernel, then only the measured qubits will appear in the ```__global__``` register, and they will be sorted in qubit allocation order.

- To learn more about cudaq.sample(), visit [this page](https://nvidia.github.io/cuda-quantum/latest/specification/cudaq/algorithmic_primitives.html#cudaq-sample).

<div class="alert alert-block alert-success">
    
### Exercise 2

Let's run your ghz kernel with sampler! Set `shots_count=10000`. Do you obtain the expected result?

</div>

Write your code here!

### cudaq.observe()

- A common task in variational algorithms is the computation of the expected value of a given observable with respect to a parameterized quantum circuit ($\langle H\rangle_\theta = \langle \psi(\theta)\mid H \mid\psi(\theta)\rangle$).

- The `cudaq.observe()` function is provided to enable one to quickly compute this expectation value via execution of the parameterized quantum circuit.

- In the example below, the obervable H is $H= 5.907 \, I - 2.1433 \, X_0X_1 -2.1433\, Y_0 Y_1 + 0.21829 \, Z_0 -6.125\, Z_1$.

The example here shows a simple use case for the `cudaq.observe``
function in computing expected values of provided spin hamiltonian operators.

In [None]:
import cudaq
from cudaq import spin

In [None]:
num_qubits = 2

In [None]:
@cudaq.kernel
def init_state(qubits: cudaq.qview):
    n = qubits.size()
    for i in range(n):
        x(qubits[i])

In [None]:
@cudaq.kernel
def observe_example(theta: float):
    qvector = cudaq.qvector(num_qubits)

    init_state(qvector)
    ry(theta, qvector[1])
    x.ctrl(qvector[1], qvector[0])

In [None]:
spin_operator = (
    5.907
    - 2.1433 * spin.x(0) * spin.x(1)
    - 2.1433 * spin.y(0) * spin.y(1)
    + 0.21829 * spin.z(0)
    - 6.125 * spin.z(1)
)

In [None]:
# Pre-computed angle that minimizes the energy expectation of the `spin_operator`.
angle = 0.59

In [17]:
energy = cudaq.observe(observe_example, spin_operator, angle).expectation()
print(f"Energy is {energy}")

Energy is 13.562794135947076


### Spin Hamiltonian operator

CUDA-Q defines convenience functions in `cudaq.spin` namespace that produce the primitive X, Y, and Z Pauli operators on specified qubit indices which can subsequently be used in algebraic expressions to build up more complicated Pauli tensor products and their sums.

$H= 5.907 \, I - 2.1433 \, X_0X_1 -2.1433\, Y_0 Y_1 + 0.21829 \, Z_0 -6.125\, Z_1$

```python
spin_operator = 5.907 - 2.1433 * spin.x(0) * spin.x(1) - 2.1433 * spin.y(
    0) * spin.y(1) + .21829 * spin.z(0) - 6.125 * spin.z(1)
```

In [None]:
from cudaq import spin

In [None]:
hamiltonian = (
    0.5 * spin.z(0)
    + spin.x(1)
    + spin.y(0)
    + spin.y(0) * spin.y(1)
    + spin.x(0) * spin.y(1) * spin.z(2)
)

In [None]:
# add some more terms
for i in range(2):
    hamiltonian += -2.0 * spin.z(i) * spin.z(i + 1)

In [None]:
print(hamiltonian)

In [18]:
print("Total number of terms in the spin hamiltonian: ", hamiltonian.get_term_count())

[1+0j] IXI
[-2+0j] IZZ
[1+0j] XYZ
[1+0j] YYI
[0.5+0j] ZII
[-2+0j] ZZI
[1+0j] YII

Total number of terms in the spin hamiltonian:  7


### Parameterized Circuit

In [None]:
import cudaq
from cudaq import spin

In [None]:
@cudaq.kernel
def param_circuit(theta: list[float]):
    # Allocate a qubit that is initialised to the |0> state.
    qubit = cudaq.qubit()
    # Define gates and the qubits they act upon.
    rx(theta[0], qubit)
    ry(theta[1], qubit)

In [None]:
# Our hamiltonian will be the Z expectation value of our qubit.
hamiltonian = spin.z(0)

In [None]:
# Initial gate parameters which initialize the qubit in the zero state
parameters = [0.0, 0.0]

In [None]:
print(cudaq.draw(param_circuit, parameters))

In [None]:
# Compute the expectation value using the initial parameters.
expectation_value = cudaq.observe(param_circuit, hamiltonian, parameters).expectation()

In [19]:
print("Expectation value of the Hamiltonian: ", expectation_value)

     ╭───────╮╭───────╮
q0 : ┤ rx(0) ├┤ ry(0) ├
     ╰───────╯╰───────╯

Expectation value of the Hamiltonian:  1.0


You can construct `SpinOperator` using `from_word` class method.

In [20]:
op = cudaq.SpinOperator.from_word("XXXX")
print(op)

[1+0j] XXXX



<div class="alert alert-block alert-success">

### Exercise 3

Calculate expectation value $\langle \mathrm{ghz} | ZZ...Z | \mathrm{ghz}\rangle$ and $\langle \mathrm{ghz} | XX...X | \mathrm{ghz}\rangle$ for 10 qubits and 20 qubits.
</div>

Write your codes here!

## Internal Representations
To look at the MLIR and QIR generated from your code

### MLIR

In [None]:
import cudaq

In [None]:
@cudaq.kernel
def kernel():
    q = cudaq.qvector(2)
    h(q[0])
    cx(q[0], q[1])

In [22]:
# Look at the MLIR
print(kernel)

module attributes {quake.mangled_name_map = {__nvqpp__mlirgen__kernel = "__nvqpp__mlirgen__kernel_PyKernelEntryPointRewrite"}} {
  func.func @__nvqpp__mlirgen__kernel() attributes {"cudaq-entrypoint"} {
    %0 = quake.alloca !quake.veq<2>
    %1 = quake.extract_ref %0[0] : (!quake.veq<2>) -> !quake.ref
    quake.h %1 : (!quake.ref) -> ()
    %2 = quake.extract_ref %0[1] : (!quake.veq<2>) -> !quake.ref
    quake.x [%1] %2 : (!quake.ref, !quake.ref) -> ()
    return
  }
}



### QIR

In [23]:
# Look at the QIR
print(cudaq.translate(kernel, format="qir"))

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

%Array = type opaque
%Qubit = type opaque

declare void @__quantum__rt__qubit_release_array(%Array*) local_unnamed_addr

declare void @invokeWithControlQubits(i64, void (%Array*, %Qubit*)*, ...) local_unnamed_addr

declare void @__quantum__qis__x__ctl(%Array*, %Qubit*)

declare void @__quantum__qis__h(%Qubit*) local_unnamed_addr

declare i8* @__quantum__rt__array_get_element_ptr_1d(%Array*, i64) local_unnamed_addr

declare %Array* @__quantum__rt__qubit_allocate_array(i64) local_unnamed_addr

define void @__nvqpp__mlirgen__kernel() local_unnamed_addr {
  %1 = tail call %Array* @__quantum__rt__qubit_allocate_array(i64 2)
  %2 = tail call i8* @__quantum__rt__array_get_element_ptr_1d(%Array* %1, i64 0)
  %3 = bitcast i8* %2 to %Qubit**
  %4 = load %Qubit*, %Qubit** %3, align 8
  tail call void @__quantum__qis__h(%Qubit* %4)
  %5 = tail call i8* @__quantum__rt__array_get_element_ptr_1d(%Array* %1, i64 1)
  %6 = bitcast 

### OPENQASM 2

In [24]:
print(cudaq.translate(kernel, format="openqasm2"))

// Code generated by NVIDIA's nvq++ compiler
OPENQASM 2.0;

include "qelib1.inc";

qreg var0[2];
h var0[0];
cx var0[0], var0[1];



In [25]:
### Version information
print(cudaq.__version__)

CUDA-Q Version cu12-0.9.0 (https://github.com/NVIDIA/cuda-quantum 77a1c80a18896b4c7ff4ece99f06e6a62c8a28ef)
