In [None]:
# Gate Cutting with CUDA-Q

Gate cutting is a powerful technique that allows for the decomposition of large quantum circuits into smaller subcircuits that can be executed on separate quantum devices. This approach enables the simulation of circuits that would otherwise be too large for available quantum hardware.

In this tutorial, we demonstrate:
1. **KAK Decomposition**: Breaking down a 2-qubit unitary into canonical form
2. **Quasi-Probability Distribution (QPD)**: Expressing the canonical gate as a linear combination of Pauli operators
3. **Circuit Cutting**: Splitting the circuit and executing subcircuits independently
4. **Expectation Value Reconstruction**: Combining results from subcircuits to reconstruct the original circuit's expectation value

## Theory

Gate cutting works by replacing a two-qubit gate with a classical postprocessing scheme that involves:
- Decomposing the gate using KAK decomposition
- Expressing the canonical interaction in terms of Pauli operators
- Creating measurement subcircuits for each Pauli term
- Classically combining the results using quasi-probability coefficients


In [None]:
import cudaq
from cudaq import spin
import numpy as np
from scipy.stats import unitary_group
from scipy.linalg import expm
from numpy.linalg import norm

# Set up the CUDA-Q target for multiple QPUs if available
cudaq.set_target("nvidia", option="mqpu")
print(f"Number of QPUs available: {cudaq.get_target().num_qpus()}")

# Define the circuit parameters
cnot_gate = np.array([[1, 0, 0, 0],[0, 1, 0, 0],[0, 0, 0, 1],[0, 0, 1, 0]])
circuit = [cnot_gate, cnot_gate, cnot_gate, cnot_gate, cnot_gate]  # 5 CNOT gates
index_to_cut = 2  # Cut at the third gate (index 2)


In [None]:
## Quasi-Probability Distribution (QPD) Implementation

The QPD function is crucial for gate cutting. It takes the canonical gate from KAK decomposition and expresses it as a linear combination of two-qubit Pauli operators:

$$A = \sum_{P \in \{I,X,Y,Z\}^{\otimes 2}} c_P \cdot P$$

Where the coefficients $c_P = \frac{1}{4} \text{Tr}(A \cdot P)$ form the quasi-probability distribution.

In [4]:
def get_two_qubit_paulis():
    """Generate all 16 two-qubit Pauli combinations and return as dictionary."""
    # Single qubit Pauli matrices
    I = np.array([[1, 0], [0, 1]], complex)
    X = np.array([[0, 1], [1, 0]], complex)
    Y = np.array([[0, -1j], [1j, 0]], complex)
    Z = np.array([[1, 0], [0, -1]], complex)
    
    pauli_dict = {'I': I, 'X': X, 'Y': Y, 'Z': Z}
    pauli_names = ['I', 'X', 'Y', 'Z']
    
    # Generate all 16 two-qubit combinations
    two_qubit_paulis = {}
    for p1 in pauli_names:
        for p2 in pauli_names:
            name = p1 + p2
            matrix = np.kron(pauli_dict[p1], pauli_dict[p2])
            two_qubit_paulis[name] = matrix
    
    return two_qubit_paulis

def QPD(kak_result):
    """Quasi-Probability Distribution - decompose canonical gate into Pauli coefficients.
    
    Args:
        kak_result: Dictionary containing KAK decomposition results
        
    Returns:
        Dictionary mapping Pauli strings to their coefficients
    """
    # Extract canonical parameters and compute canonical gate (A matrix)
    x = kak_result.x
    y = kak_result.y
    z = kak_result.z
    
    # Pauli matrices for canonical gate computation
    X = np.array([[0, 1], [1, 0]], complex)
    Y = np.array([[0, -1j], [1j, 0]], complex)
    Z = np.array([[1, 0], [0, -1]], complex)
    
    # Canonical interaction Hamiltonian
    H = x * np.kron(X, X) + y * np.kron(Y, Y) + z * np.kron(Z, Z)
    canonical_gate = expm(1j * H)  # This is the A matrix
    
    # Get all two-qubit Pauli combinations
    two_qubit_paulis = get_two_qubit_paulis()
    
    # Calculate coefficients: 1/4 * Tr(A * Pauli)
    coefficients = {}
    for pauli_name, pauli_matrix in two_qubit_paulis.items():
        # Compute trace of canonical_gate * Pauli (A * Pauli)
        trace_value = np.trace(canonical_gate @ pauli_matrix)
        coefficient = trace_value * 0.25  # multiply by 1/4
        coefficients[pauli_name] = coefficient
    
    return coefficients


In [None]:
## KAK Decomposition and QPD Calculation

The KAK (Cartan) decomposition breaks down any two-qubit unitary into the form:

$$U = (A_1 \otimes A_0) \cdot \exp(i(x X \otimes X + y Y \otimes Y + z Z \otimes Z)) \cdot (B_1 \otimes B_0)$$

Where $A_1, A_0, B_1, B_0$ are single-qubit unitaries and $x, y, z$ are the canonical parameters.

We'll use CUDA-Q's built-in KAK decomposition and then apply our QPD function to get the Pauli coefficients.

In [None]:
# Perform KAK decomposition on the gate to be cut
kak_result = cudaq.unitary_synthesis.kak_decompose(circuit[index_to_cut])

print("KAK Decomposition Results:")
print(f"Canonical parameters: x={kak_result.x:.6f}, y={kak_result.y:.6f}, z={kak_result.z:.6f}")
print(f"Phase: {kak_result.phase:.6f}")
print("\nLocal unitary matrices:")
print(f"A1: {kak_result.a1}")
print(f"A0: {kak_result.a0}")
print(f"B1: {kak_result.b1}")
print(f"B0: {kak_result.b0}")

# Perform QPD to get Pauli coefficients
qpd_data = QPD(kak_result)

# Filter out near-zero coefficients for cleaner output
qpd_data_filtered = {p: c for p, c in qpd_data.items() if abs(c) > 1e-10}

print(f"\nQPD Results:")
print(f"Total Pauli terms: {len(qpd_data)}")
print(f"Non-zero terms (>1e-10): {len(qpd_data_filtered)}")
print("\nNon-zero QPD coefficients:")
for pauli, coeff in qpd_data_filtered.items():
    print(f"  {pauli}: {coeff:.6f}")

# Extract Pauli strings for subcircuits
pauli_string1 = [key[0] for key in qpd_data_filtered.keys()]  # First Pauli for subcircuit 1
pauli_string2 = [key[1] for key in qpd_data_filtered.keys()]  # Second Pauli for subcircuit 2

print(f"\nPauli strings for subcircuit 1: {pauli_string1}")
print(f"Pauli strings for subcircuit 2: {pauli_string2}")


In [None]:
## Building the Subcircuits

Now we construct two subcircuits that replace the original circuit:

1. **Subcircuit 1**: Contains gates before the cut, plus the right part of the KAK decomposition
2. **Subcircuit 2**: Contains the left part of the KAK decomposition, plus gates after the cut

Each subcircuit is prepared with different Pauli measurements corresponding to the QPD coefficients.

In [None]:
# Register custom unitary operations (all gates except the one being cut)
base_name = 'U'
for i in range(len(circuit)):
    if i != index_to_cut:  # Skip the gate at the cut index
        cudaq.register_operation(f'{base_name}_{i}', circuit[i])

# Register the KAK decomposition matrices
cudaq.register_operation("A0", kak_result.a0)
cudaq.register_operation("A1", kak_result.a1)
cudaq.register_operation("B0", kak_result.b0)
cudaq.register_operation("B1", kak_result.b1)

############     Subcircuit 1     ############
# op1 will be pauli_strings arg for subexperiments)
kernel1, op1 = cudaq.make_kernel(cudaq.pauli_word)
q1 = kernel1.qalloc(index_to_cut + 1)
kernel1.h(q1[0])  # Apply Hadamard gate to the first qubit to force expectation to be non-zero

# Apply the custom operations to the first subcircuit
for i in range(index_to_cut):
    kernel1.__getattr__(f'{base_name}_{i}')(q1[i], q1[i+1])
kernel1.B1(q1[index_to_cut])    # Right K in KAK is first and is B1 x B0
kernel1.exp_pauli(np.pi/2, q1[index_to_cut], op1) 
kernel1.A1(q1[index_to_cut]) 

############     Subcircuit 2     ############
# Construct subcircuit 2 (op2 will be pauli_strings arg for subexperiments)
kernel2, op2 = cudaq.make_kernel(cudaq.pauli_word)
q2 = kernel2.qalloc(len(circuit)-index_to_cut)

# Apply the custom operations to the second subcircuit
kernel2.B0(q2[0])
kernel2.exp_pauli(np.pi/2, q2[0], op2)
kernel2.A0(q2[0])   # Left K in KAK is second and is A1 x A0

# Apply the remaining custom operations to the second subcircuit
for i in range(index_to_cut+1, len(circuit)):
    j = i - (index_to_cut + 1) # offset the index to start at 0
    kernel2.__getattr__(f'{base_name}_{i}')(q2[j], q2[j+1])


# Display circuit diagrams for the first Pauli measurement
if pauli_string1:
    sample_pauli1 = cudaq.pauli_word(pauli_string1[0])
    sample_pauli2 = cudaq.pauli_word(pauli_string2[0])
    
    print(f"\nSubcircuit 1 (with Pauli '{pauli_string1[0]}'):")
    print(cudaq.draw(kernel1, sample_pauli1))
    
    print(f"\nSubcircuit 2 (with Pauli '{pauli_string2[0]}'):")
    print(cudaq.draw(kernel2, sample_pauli2))


In [None]:
## Executing Subcircuits and Computing Expectation Values

Now we execute the subcircuits for each Pauli term in the QPD decomposition. Each subcircuit measures expectation values with respect to all-X observables:

- **Subcircuit 1**: Measures $\langle \prod_{i=0}^{\text{cut\_index}} X_i \rangle$
- **Subcircuit 2**: Measures $\langle \prod_{i=0}^{\text{remaining\_qubits}} X_i \rangle$

The measurements are parallelized across available QPUs for efficiency.

In [None]:
# Define observables for each subcircuit (all-X measurements)
ham1 = 1.0
for i in range(index_to_cut + 1):
    ham1 = ham1 * spin.x(i)

ham2 = 1.0  
for i in range(len(circuit) - index_to_cut):
    ham2 = ham2 * spin.x(i)

#print(f"Observable for subcircuit 1: {ham1}")
#print(f"Observable for subcircuit 2: {ham2}")

# Execute subcircuits asynchronously across available QPUs
qpu_count = cudaq.get_target().num_qpus()
print(f"\nDistributing {len(pauli_string1)} experiments across {qpu_count} QPUs")

results_async1 = []
results_async2 = []
qpu_id = 0

# Submit all subcircuit 1 experiments
for i, pauli1 in enumerate(pauli_string1):
    result = cudaq.observe_async(kernel1, ham1, cudaq.pauli_word(pauli1), qpu_id=qpu_id)
    results_async1.append(result)
    qpu_id = (qpu_id + 1) % qpu_count

# Submit all subcircuit 2 experiments  
for i, pauli2 in enumerate(pauli_string2):
    result = cudaq.observe_async(kernel2, ham2, cudaq.pauli_word(pauli2), qpu_id=qpu_id)
    results_async2.append(result)
    qpu_id = (qpu_id + 1) % qpu_count

# Collect results
print("Collecting results...")
results1 = [result.get().expectation() for result in results_async1]
results2 = [result.get().expectation() for result in results_async2]

print(f"Subcircuit 1 expectation values: {[f'{r:.6f}' for r in results1]}")
print(f"Subcircuit 2 expectation values: {[f'{r:.6f}' for r in results2]}")

# Verify lengths match
assert len(results1) == len(results2) == len(qpd_data_filtered), \
    f"Length mismatch: {len(results1)}, {len(results2)}, {len(qpd_data_filtered)}"

# Extract QPD coefficients in the correct order
coeffs = [qpd_data_filtered[pauli1 + pauli2] 
          for pauli1, pauli2 in zip(pauli_string1, pauli_string2)]

print(f"QPD coefficients: {[f'{c:.6f}' for c in coeffs]}")


Observable for subcircuit 1: (1+0i) * X0X1X2
Observable for subcircuit 2: (1+0i) * X0X1X2

Distributing 2 experiments across 2 QPUs
Collecting results...
Subcircuit 1 expectation values: ['0.000000', '-0.000000']
Subcircuit 2 expectation values: ['0.000000', '0.000000']
QPD coefficients: ['0.707107+0.000000j', '0.000000+0.707107j']


In [None]:
## Reconstructing the Original Expectation Value

The final step combines the results from both subcircuits using the QPD coefficients:

$$\langle O \rangle_{\text{original}} = \sum_{P_1, P_2} c_{P_1 P_2} \cdot \langle O_1 \rangle_{P_1} \cdot \langle O_2 \rangle_{P_2}$$

Where:
- $c_{P_1 P_2}$ are the QPD coefficients
- $\langle O_1 \rangle_{P_1}$ and $\langle O_2 \rangle_{P_2}$ are the expectation values from the subcircuits

In [None]:
# Calculate the reconstructed expectation value
individual_terms = [r1 * r2 * c for r1, r2, c in zip(results1, results2, coeffs)]
reconstructed_expectation = sum(individual_terms)

print("=== GATE CUTTING RESULTS ===")
print(f"Number of Pauli terms: {len(individual_terms)}")
print(f"Individual contribution terms: {[f'{term:.6f}' for term in individual_terms]}")
print(f"Reconstructed expectation value: {reconstructed_expectation:.6f}")

# Summary of the gate cutting process
print(f"\n=== SUMMARY ===")
print(f"Original circuit: {len(circuit)} gates")
print(f"Cut location: Gate {index_to_cut} (0-indexed)")
print(f"Subcircuit 1: {index_to_cut + 1} qubits, {index_to_cut} gates + KAK right")
print(f"Subcircuit 2: {len(circuit) - index_to_cut} qubits, KAK left + {len(circuit) - index_to_cut - 1} gates")
print(f"Total subexperiments: {len(pauli_string1) + len(pauli_string2)}")
print(f"QPU parallelization: {qpu_count} QPUs used")

# Show the Pauli decomposition breakdown
print(f"\n=== PAULI DECOMPOSITION ===")
for i, (p1, p2, c, r1, r2, term) in enumerate(zip(pauli_string1, pauli_string2, coeffs, results1, results2, individual_terms)):
    print(f"Term {i+1}: {p1}{p2} | coeff={c:8.6f} | <O1>={r1:8.6f} | <O2>={r2:8.6f} | contribution={term:8.6f}")

print(f"\nFinal result: {reconstructed_expectation:.6f}")
print("\nGate cutting completed successfully!")

In [None]:
## Conclusion

This tutorial demonstrated gate cutting using CUDA-Q, a technique that enables:

- **Circuit Decomposition**: Breaking large circuits into smaller, manageable subcircuits
- **Hardware Efficiency**: Running subcircuits on separate quantum devices or QPUs
- **Scalability**: Handling circuits larger than available quantum hardware
- **Parallelization**: Distributing subexperiments across multiple QPUs

### Key Takeaways

1. **KAK Decomposition**: Any 2-qubit unitary can be decomposed into local operations and canonical interactions
2. **QPD Method**: The canonical gate can be expressed as a weighted sum of Pauli operators
3. **Circuit Cutting**: Replace the original gate with measurement subcircuits and classical postprocessing
4. **Parallel Execution**: CUDA-Q's multi-QPU support enables efficient parallel execution of subexperiments

### Extensions

- **Multiple Cuts**: Apply cutting to multiple gates for even larger circuit decomposition
- **Error Mitigation**: Combine with error mitigation techniques for better accuracy
- **Optimization**: Optimize cut locations to minimize the number of required subexperiments
- **Different Gates**: Apply to other 2-qubit gates beyond CNOT

### References

- Peng, T., Harrow, A. W., Ozols, M., & Wu, X. (2020). "Simulating large quantum circuits on a small quantum computer." *Physical Review Letters*, 125(15), 150504.
- Tang, W., Tomesh, T., Suchara, M., Larson, J., & Martonosi, M. (2021). "CutQC: Using small quantum computers for large quantum circuit evaluations." *ASPLOS 2021*.
